

# Bottom-Up and Top-Down Neuromorphic Processor Design:

#### Unveiling Roads to Embedded Cognition

**Charlotte Frenkel** 

#### Institute of Neuroinformatics, UZH and ETH Zürich, Switzerland charlotte@ini.uzh.ch

Neuro-Inspired Computational Elements workshop Virtual, March 16-19, 2021











Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich



[Silver & Hassabis, https://deepmind.com/blog/article/alphago-zero-starting-scratch, 2017]



Frenkel, NICE'21 keynote

[Poon & Zhou, Front. Neurosci., 2011]

#### Neuromorphic Engineering – How? A design strategy toward efficiency and cognition?



[Poon & Zhou, Front. Neurosci., 2011]

# Neuromorphic Engineering – How?

A design strategy toward efficiency and cognition?



#### Neuromorphic Engineering – How?

Unveiling roads to embedded cognition



# Outline

Part I – Bottom-up neuromorphic design

- Building blocks
- Integration

Part II – Top-down neuromorphic design

- Algorithms
- Integration

Conclusion and perspectives

# Outline

Part I – Bottom-up neuromorphic design

• Building blocks

Neurons and synapses as adaptive processing and memory elements[Frenkel, ISCAS, 2017]Integration[Frenkel, BioCAS, 2017]

Part II – Top-down neuromorphic design

• Algorithms

• Integration

Conclusion and perspectives

#### Design strategy Analog or digital?





How can we make the best of both worlds?

#### Design strategy

What should we aim for and phenomenologically implement?

Neurons



[Izhikevich, IEEE Trans. NN, 2004]

### Proposed phenomenological digital neuron

Tackling the versatility/efficiency tradeoff



11

## Design strategy

What should we aim for and phenomenologically implement?

Neurons

20 Izhikevich behaviors of cortical spiking neurons •

Synapses

Spike-based online learning 





 $\rightarrow$  perspectives

# Proposed digital synapse

Tackling the versatility/efficiency tradeoff

#### Key challenge – Fan-in = 100-10000 synapses/neuron



# Outline

#### Part I – Bottom-up neuromorphic design

- Building blocks
- Integration

Proposed neuromorphic experimentation platforms

Part II – Top-down neuromorphic design

[Frenkel, *Trans. BioCAS*, 2019a] [Frenkel, *Trans. BioCAS*, 2019b]

- Algorithms
- Integration

Conclusion and perspectives

#### Architecture of ODIN





#### ODIN – Chip microphotograph and specifications





| Technology                           | 28nm FDSOI                      |
|--------------------------------------|---------------------------------|
| Implementation                       | Digital                         |
| Area                                 | 0.086mm <sup>2</sup>            |
| # neurons                            | 256                             |
| # synapses                           | 64k                             |
| # Izhikevich behav.                  | 20                              |
| Online learning                      | SDSP, (3+1)-bit weight          |
| Time constant                        | Biological to accelerated       |
| Supply voltage                       | 0.55V - 1.0V                    |
| Leakage power (P <sub>leak</sub> )   | 27.3µW @0.55V                   |
| Idle power (P <sub>idle</sub> )      | 1.78µW/MHz @0.55V               |
| Incr. energy/SOP (E <sub>SOP</sub> ) | 8.43pJ @0.55V                   |
| Global energy/SOP (E <sub>tot.</sub> | <sub>SOP</sub> ) >12.7pJ @0.55V |
| Routing flexibility/effic            | iency 🙁 (AER)                   |
| Fan-in                               | 256                             |
| Fan-out                              | 256                             |

#### Architecture of MorphIC



#### MorphIC – Chip microphotograph and specifications

| 1.87mm | Core 0<br>synapse<br>SRAM<br>CO neur SRAM | Core 1<br>synapse<br>SRAM                 |
|--------|-------------------------------------------|-------------------------------------------|
|        | Core 2<br>synapse<br>SRAM                 | C3 neur SRAM<br>Core 3<br>synapse<br>SRAM |

| Technology                         | 65nm LP CMOS                     |
|------------------------------------|----------------------------------|
| Implementation                     | Digital                          |
| Area                               | 3.5mm <sup>2</sup> (incl. pads)  |
| Aled                               | 2.86mm <sup>2</sup> (excl. pads) |
| Number of cores                    | 4                                |
| Total # neurons (type)             | 2048 (LIF)                       |
| Total # synapses (hier.)           | 1M (L0), 1M (L1), 64k (L2)       |
| Fan-in (hier.)                     | 512 (L0), 512 (L1), 32 (L2)      |
| Fan-out (hier.)                    | 512 (L0), 3x512 (L1), 4 (L2)     |
| Online learning                    | Stochastic SDSP, 1-bit weight    |
| Time constant                      | Biological to accelerated        |
| Supply voltage                     | 0.8V - 1.2V                      |
| Max. clock frequency               | 55MHz (0.8V) – 210MHz (1.2V)     |
| Leakage power (P <sub>leak</sub> ) | 45µW @0.8V                       |
| ldle power (P <sub>idle</sub> )    | 41.3µW/MHz @0.8V                 |
| Energy/SOP (E <sub>SOP</sub> )     | 30pJ @0.8V                       |

| Author<br>Publication<br>Chip name                            | Schemmel [30]<br>ISCAS, 2010<br>HICANN | Benjamin [32]<br>PIEEE, 2014<br>Neurogrid | Qiao [27]<br>Front. NS, 2015<br>ROLLS | Moradi [29]<br>TBioCAS, 2017<br>DYNAPs       | Park [26]<br>BioCAS, 2014<br>IFAT | Mayr [28]<br>TBCAS, 2016 | Painkras [31]<br>JSSC, 2013<br>SpiNNaker  | Seo [25]<br>CICC, 2011 | Akopyan [33]<br>TCAD, 2015<br>TrueNorth | Davies [34]<br>IEEE Micro, 2018<br>Loihi | Frenkel<br>TBCAS, 2019a<br>ODIN                 | Frenkel<br>TBCAS, 2019b<br>MorphIC          |
|---------------------------------------------------------------|----------------------------------------|-------------------------------------------|---------------------------------------|----------------------------------------------|-----------------------------------|--------------------------|-------------------------------------------|------------------------|-----------------------------------------|------------------------------------------|-------------------------------------------------|---------------------------------------------|
| Implementation                                                | Mixed-signal                           | Mixed-signal                              | Mixed-signal                          | Mixed-signal                                 | Mixed-signal                      | Mixed-signal             | Digital                                   | Digital                | Digital                                 | Digital                                  | Digital                                         | Digital                                     |
| Technology                                                    | $0.18 \mu m$                           | $0.18 \mu m$                              | $0.18 \mu m$                          | $0.18 \mu m$                                 | 90nm                              | 28nm                     | $0.13 \mu m$                              | 45nm SOI               | 28nm                                    | 14nm FinFET                              | 28nm FDSOI                                      | 65nm LP                                     |
| $\# \text{ cores}^{\diamond}$                                 | 1                                      | 16                                        | 1                                     | 4                                            | 32                                | 1                        | 18                                        | 1                      | 4096                                    | 128                                      | 1                                               | 4                                           |
| Neurosynaptic core area [mm <sup>2</sup> ]                    | 49                                     | 168                                       | 51.4                                  | 7.5                                          | 0.31                              | 0.36                     | 3.75                                      | 0.8                    | 0.095                                   | 0.4                                      | 0.086                                           | 0.715                                       |
| # Izhikevich behaviors <sup>†</sup>                           | (20)                                   | N/A                                       | (20)                                  | (20)                                         | 3                                 | 3                        | Programmable                              | 3                      | 11 (3 neur: 20)                         | (6)                                      | 20                                              | 3                                           |
| # neurons per core                                            | 512                                    | 64k                                       | 256                                   | 256                                          | 2k                                | 64                       | max. 1000°                                | 256                    | 256                                     | max. 1024                                | 256                                             | 512                                         |
| Synaptic weight storage                                       | 4-bit (SRAM)                           | Off-chip                                  | Capacitor                             | 12-bit (CAM)                                 | Off-chip                          | 4-bit (SRAM)             | Off-chip                                  | 1-bit (SRAM)           | 1-bit (SRAM)                            | 1- to 9-bit (SRAM)                       | (3+1)-bit (SRAM)                                | 1-bit (SRAM)                                |
| Embedded online learning                                      | STDP                                   | No                                        | SDSP                                  | No                                           | No                                | SDSP                     | Programmable                              | S-STDP                 | No                                      | Programmable                             | SDSP                                            | S-SDSP                                      |
| # synapses per core                                           | 112k                                   | _                                         | 128k                                  | 16k                                          | _                                 | 8k                       | _                                         | 64k                    | 64k                                     | 1M to 114k (1-9 bits)                    | 64k                                             | 528k                                        |
| Time constant                                                 | Accelerated                            | Biological                                | Biological                            | Biological                                   | Biological                        | Bio. to accel.           | Bio. to accel.                            | Biological             | Biological                              | N/A                                      | Bio. to accel.                                  | Bio. to accel.                              |
| ru routing                                                    | Medium                                 | Medium                                    | Low                                   | Medium                                       | Medium                            | Low                      | High                                      | Low                    | Medium                                  | High                                     | Low                                             | Medium                                      |
| Flexibility learning                                          | Low                                    | _                                         | Low                                   | Low                                          | _                                 | Low                      | _                                         | Low                    | _                                       | High                                     | Low                                             | Low                                         |
| N I I I I I I I I I I I I I I I I I I I                       | 10.5                                   | 390                                       | 5                                     | 34                                           | 6.5k                              | 178                      | max. 267°                                 | 320                    | 2.6k                                    | max. 2.5k                                | 3.0k                                            | 716                                         |
| Neuron core density [neur/mm <sup>2</sup> ] <sup>*</sup> norm | . –                                    | _                                         | _                                     | _                                            | _                                 | _                        | max. 5.8k                                 | 826                    | 2.6k                                    | max. 1k                                  | 3.0k                                            | 3.9k                                        |
| Commence density for (21* raw                                 | 2.3k                                   |                                           | 2.5k                                  | 2.1k                                         |                                   | 22.2k                    |                                           | 80k                    | 674k                                    | 2.5M to 282k                             | 741k                                            | 738k                                        |
| Synapse core density [syn/mm <sup>2</sup> ] <sup>*</sup> norm | _                                      | -                                         | _                                     | _                                            | _                                 | _                        | _                                         | 207k                   | 674k                                    | 1M to 113k                               | 741k                                            | 4M                                          |
| Supply voltage                                                | 1.8V                                   | 3.0V                                      | 1.8V                                  | 1.3V-1.8V                                    | 1.2V                              | 0.75V, 1.0V              | 1.2V                                      | 0.53V-1.0V             | 0.7V-1.05V                              | 0.5V-1.25V                               | 0.55V-1.0V                                      | 0.8V-1.2V                                   |
|                                                               | NT / A                                 | (941pJ)▲                                  | $>77 fJ^{\Delta}$                     | 134fJ <sup>△</sup> /30pJ <sup>▲</sup> (1.3V) | 22pJ▲                             | >850pJ▲                  | >11.3nJ <sup>△</sup> /26.6nJ <sup>▲</sup> | NT / A                 | 26pJ <sup>▲</sup> (0.775V)              | >23.6pJ <sup>△</sup> (0.75V)             | 8.4pJ <sup>△</sup> /12.7pJ <sup>▲</sup> (0.55V) | 30pJ <sup>△</sup> /51pJ <sup>▲</sup> (0.8V) |
| Energy per SOP <sup>‡</sup> raw<br>norm.                      | N/A                                    | _                                         | _                                     | -                                            | _                                 | _                        | $>2.4 n J^{4}/5.7 n J^{4}$                | N/A                    | 26pJ▲                                   | (66.1pJ <sup>△</sup> )                   | 8.4pJ^/12.7pJ▲                                  | 12.9pJ^/22pJ▲                               |

<sup>6</sup> When chips are composed of several neurosynaptic cores, we report the density data associated to a single core. Care should be taken that, depending on the core definition in the different chips, routing resources might be included (all single-core designs, IFAT, TrueNorth, Loihi and MorphIC) or excluded (Neurogrid, DYNAPs and SpiNNaker). As opposed to the other reported designs, we consider the full Neurogrid system, which is composed of 16 NeuroCore chips, each one considered as a core; routing resources are off-chip. For DYNAPs and SpiNNaker, sharing routing overhead among cores would lead to 28-% and 37-% density penalties compared to the reported results, respectively. The HICANN chip can be considered as a core of the BrainScaleS wafer-scale system. Pad area is excluded from all reported designs.

<sup>†</sup> By its similarity with the Izhikevich neuron model, the AdExp neuron model is believed to reach the 20 Izhikevich behaviors [76], but it has not been demonstrated in HICANN, ROLLS and DYNAPs. The neuron model of TrueNorth can reach 11 behaviors per neuron and 20 by combining three neurons together [85]. The neuron model of Loihi is based on a LIF model to which threshold adaptation is added: the neuron should therefore reach 6 Izhikevich behaviors, although it has not been demonstrated.

<sup>o</sup> Experiment 1 reported in Table III from [31] is considered as a best-case neuron density: 1000 simple LIF neuron models are implemented per core, each firing at a low frequency.

\* Neuron (resp. synapse) core densities are computed by dividing the number of neurons (resp. synapses) per neurosynaptic core by the neurosynaptic core area. Regarding the synapse core density, Neurogrid, IFAT and SpiNNaker use an off-chip memory to store synaptic data. As the synapse core density cannot be extracted when off-chip resources are involved, no synapse core density values are reported for these chips. Values normalized to a 28-nm CMOS technology node are provided for digital designs using the node factor, at the exception of the 14-nm FinFET node of Loihi for which Intel data from [120] has been used.

<sup>‡</sup> The synaptic operation energy measurements reported for the different chips do not follow a standardized measurement process. There are two main categories for energy measurements in neuromorphic chips. On the one hand, incremental values (denoted with  $^{\diamond}$ ) describe the amount of energy paid per each additional SOP computation, they are measured by subtracting the leakage and idle power consumption of the chip, as in Eq. (2.2), although the exact power contributions taken into account in the SOP energy vary across chips. On the other hand, global values (denoted with 4) are obtained by dividing the total chip power consumption by the SOP rate, as in Eq. (2.3). Values normalized to a 28-nm CMOS technology node are provided for digital designs using the node factor, including for the 14-nm FinFET node of Loihi in the absence of reliable data for power normalization in [120]. The conditions under which all of these measurements have been done can be found hereafter. For Neurogrid, a SOP energy of 941pJ is reported for a network of 16 Neurocore chips (1M neurons, 8B synapses, 413k spikes/s): it is a board-level measurement, no chip-level measurement is provided [32]. For ROLLS, the measured SOP energy of 77fJ is reported in [163], it accounts for a point-to-point synaptic input event and includes the contribution of weight adaptation and digital-to-analog conversion, it represents a lower bound as it does not account for synaptic event broadcasting. For DYNAPs, the measured SOP energy of 134fJ at 1.3V is also reported in [163] while the global SOP energy of 30pJ can be estimated from [29] using the measured 800-µW power consumption with all 1k neurons spiking at 100Hz with 25% connectivity (26.2MSOP/s), excluding the synaptic input currents. For IFAT, the SOP energy of 22pJ is extracted by measuring the chip power consumption when operated at the peak rate of 73M synaptic events/s [26]. In the chip of Mayr et al., the SOP energy of 850pJ represents a lower bound extracted from the chip power consumption, estimated by considering the synaptic weights at half their dynamic at maximum operating frequency [28]. For SpiNNaker, an incremental SOP energy of 11.3nJ is measured in [164], a global SOP energy of 26.6nJ at the maximum SOP rate of 16.56MSOP/s can be estimated by taking into account the leakage and idle power; both values represent a lower bound as the energy cost of neuron updates is not included. For TrueNorth, the measured SOP energy of 26pJ at 0.775V is reported in [165], it is extracted by measuring the chip power consumption when all neurons fire at 20Hz with 128 active synapses. For Loihi, a minimum SOP energy of 23.6pJ at 0.75V is extracted from pre-silicon SDF and SPICE simulations, in accordance with early post-silicon characterization [34]; it represents a lower bound as it includes only the contribution of the synaptic operation, without taking into account the cost of neuron update and learning engine update. For ODIN and MorphIC, the detailed measurement process is described in Sections 2.2.2 and 2.3.2, respectively.

|                                                          |              | Mix                   | ed-signa                   | al                                                            | Digital                                                                                      |                                       |                                                          |                                                                                                          |                                                                                                    |
|----------------------------------------------------------|--------------|-----------------------|----------------------------|---------------------------------------------------------------|----------------------------------------------------------------------------------------------|---------------------------------------|----------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| Author                                                   | Schemmel     | Benjamin              | Qiao                       | Moradi                                                        | Painkras                                                                                     | Akopyan                               | Davies                                                   | Frenkel                                                                                                  | Frenkel                                                                                            |
| Publication                                              | ISCAS, 2010  | PIEEE, 2014           | Front. NS, 2015            | TBioCAS, 2017                                                 | JSSC, 2013                                                                                   | TCAD, 2015                            | IEEE Micro, 2018                                         | TBCAS, 2019a                                                                                             | TBCAS, 2019b                                                                                       |
| Chip name                                                | HICANN       | Neurogrid             | ROLLS                      | DYNAPs                                                        | SpiNNaker                                                                                    | TrueNorth                             | Loihi                                                    | ODIN                                                                                                     | MorphIC                                                                                            |
| Implementation                                           | Mixed-signal | Mixed-signal          | Mixed-signal               | Mixed-signal                                                  | Digital                                                                                      | Digital                               | Digital                                                  | Digital                                                                                                  | Digital                                                                                            |
| Technology                                               | 0.18µm       | 0.18µm                | 0.18µm                     | 0.18µm                                                        | $0.13 \mu m$                                                                                 | 28nm                                  | 14nm FinFET                                              | 28nm FDSOI                                                                                               | 65nm LP                                                                                            |
| # cores                                                  | 1            | 16                    | 1                          | 4                                                             | 18                                                                                           | 4096                                  | 128                                                      | 1                                                                                                        | 4                                                                                                  |
| Neurosynaptic core area [mm <sup>2</sup> ]               | 49           | 168                   | 51.4                       | 7.5                                                           | 3.75                                                                                         | 0.095                                 | 0.4                                                      | 0.086                                                                                                    | 0.715                                                                                              |
| # Izhikevich behaviors                                   | (20)         | N/A                   | (20)                       | (20)                                                          | Programmable                                                                                 | 11 (3 neur: 20)                       | (6)                                                      | 20                                                                                                       | 3                                                                                                  |
| # neurons per core                                       | 512          | 64k                   | 256                        | 256                                                           | max. 1000                                                                                    | 256                                   | max. 1024                                                | 256                                                                                                      | 512                                                                                                |
| Synaptic weight storage                                  | 4-bit (SRAM) | Off-chip              | Capacitor                  | 12-bit (CAM)                                                  | Off-chip                                                                                     | 1-bit (SRAM)                          | 1- to 9-bit (SRAM)                                       | (3+1)-bit (SRAM)                                                                                         | 1-bit (SRAM)                                                                                       |
| Embedded online learning                                 | STDP         | No                    | SDSP                       | No                                                            | Programmable                                                                                 | No                                    | Programmable                                             | SDSP                                                                                                     | S-SDSP                                                                                             |
| # synapses per core                                      | 112k         | –                     | 128k                       | 16k                                                           | Bio. to accel.                                                                               | 64k                                   | 1M to 114k (1-9 bits)                                    | 64k                                                                                                      | 528k                                                                                               |
| Time constant                                            | Accelerated  | Biological            | Biological                 | Biological                                                    |                                                                                              | Biological                            | N/A                                                      | Bio. to accel.                                                                                           | Bio. to accel.                                                                                     |
| Flexibility routing                                      | Medium       | Medium                | Low                        | Medium                                                        | High                                                                                         | Medium                                | High                                                     | Low                                                                                                      | Medium                                                                                             |
| learning                                                 | Low          |                       | Low                        | Low                                                           | —                                                                                            | _                                     | High                                                     | Low                                                                                                      | Low                                                                                                |
| Neuron core density $[neur/mm^2]$ raw norm.              | 10.5         | 390                   | 5                          | 34                                                            | max. 267<br>max. 5.8k                                                                        | 2.6k<br>2.6k                          | max. 2.5k<br>max. 1k                                     | 3.0k<br>3.0k                                                                                             | 716<br>3.9k                                                                                        |
| Synapse core density [syn/mm <sup>2</sup> ] raw<br>norm. | 2.3k         | _                     | 2.5k<br>_                  | 2.1k                                                          | _                                                                                            | 674k<br>674k                          | 2.5M to 282k<br>1M to 113k                               | 741k<br>741k                                                                                             | 738k<br>4M                                                                                         |
| Supply voltage<br>Energy per SOP raw<br>norm.            | 1.8V<br>N/A  | 3.0V<br>(941pJ)▲<br>_ | 1.8V<br>>77fJ <sup>△</sup> | 1.3V-1.8V<br>134fJ <sup>△</sup> /30pJ <sup>▲</sup> (1.3V)<br> | 1.2V<br>>11.3nJ <sup>△</sup> /26.6nJ <sup>▲</sup><br>>2.4nJ <sup>△</sup> /5.7nJ <sup>▲</sup> | 0.7V-1.05V<br>26pJ▲ (0.775V)<br>26pJ▲ | $0.5V-1.25V > 23.6pJ^{\Delta} (0.75V) (66.1pJ^{\Delta})$ | 0.55V-1.0V<br>8.4pJ <sup>△</sup> /12.7pJ <sup>▲</sup> (0.55V)<br>8.4pJ <sup>△</sup> /12.7pJ <sup>▲</sup> | 0.8V-1.2V<br>30pJ <sup>△</sup> /51pJ <sup>▲</sup> (0.8V)<br>12.9pJ <sup>△</sup> /22pJ <sup>▲</sup> |

Most direct comparison: IBM TrueNorth core vs. ODIN (same technology node, same number of neurons and synapses per neurosynaptic core, same area).



|                                                          |                       | Mix                      | ed-signa                 | al                                           |                                           |                            | Digita                       | al                                              |                                             |
|----------------------------------------------------------|-----------------------|--------------------------|--------------------------|----------------------------------------------|-------------------------------------------|----------------------------|------------------------------|-------------------------------------------------|---------------------------------------------|
| Author                                                   | Schemmel              | Benjamin                 | Qiao                     | Moradi                                       | Painkras                                  | Akopyan                    | Davies                       | Frenkel                                         | Frenkel                                     |
| Publication<br>Chin name                                 | ISCAS, 2010<br>HICANN | PIEEE, 2014<br>Neurogrid | Front. NS, 2015<br>ROLLS | TBioCAS, 2017<br>DYNAPs                      | JSSC, 2013<br>SpiNNaker                   | TCAD, 2015<br>TrueNorth    | IEEE Micro, 2018<br>Loihi    | TBCAS, 2019a<br>ODIN                            | TBCAS, 2019b                                |
| Chip name                                                | moann                 | Neurogriu                | ROLLS                    | DINALS                                       | Spinnakei                                 | muenonui                   | LOIII                        | ODIN                                            | MorphIC                                     |
| Implementation                                           | Mixed-signal          | Mixed-signal             | Mixed-signal             | Mixed-signal                                 | Digital                                   | Digital                    | Digital                      | Digital                                         | Digital                                     |
| Technology                                               | $0.18 \mu m$          | $0.18 \mu m$             | $0.18 \mu m$             | $0.18 \mu m$                                 | $0.13 \mu m$                              | 28nm                       | 14nm FinFET                  | 28nm FDSOI                                      | 65nm LP                                     |
| # cores                                                  | 1                     | 16                       | 1                        | 4                                            | 18                                        | 4096                       | 128                          | 1                                               | 4                                           |
| Neurosynaptic core area [mm <sup>2</sup> ]               | 49                    | 168                      | 51.4                     | 7.5                                          | 3.75                                      | 0.095                      | 0.4                          | 0.086                                           | 0.715                                       |
| # Izhikevich behaviors                                   | (20)                  | N/A                      | (20)                     | (20)                                         | Programmable                              | 11 (3 neur: 20)            | (6)                          | 20                                              | 3                                           |
| # neurons per core                                       | 512                   | 64k                      | 256                      | 256                                          | max. 1000                                 | 256                        | max. 1024                    | 256                                             | 512                                         |
| Synaptic weight storage                                  | 4-bit (SRAM)          | Off-chip                 | Capacitor                | 12-bit (CAM)                                 | Off-chip                                  | 1-bit (SRAM)               | 1- to 9-bit (SRAM)           | (3+1)-bit (SRAM)                                | 1-bit (SRAM)                                |
| Embedded online learning                                 | STDP                  | No                       | SDSP                     | No                                           | Programmable                              | No                         | Programmable                 | SDSP                                            | S-SDSP                                      |
| # synapses per core                                      | 112k                  | _                        | 128k                     | 16k                                          | _                                         | 64k                        | 1M to 114k (1-9 bits)        | 64k                                             | 528k                                        |
| Time constant                                            | Accelerated           | Biological               | Biological               | Biological                                   | Bio. to accel.                            | Biological                 | N/A                          | Bio. to accel.                                  | Bio. to accel.                              |
| routing                                                  | Medium                | Medium                   | Low                      | Medium                                       | High                                      | Medium                     | High                         | Low                                             | Medium                                      |
| Flexibility learning                                     | Low                   | _                        | Low                      | Low                                          | _                                         | _                          | High                         | Low                                             | Low                                         |
| NUL 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1                | 10.5                  | 390                      | 5                        | 34                                           | max. 267                                  | 2.6k                       | max. 2.5k                    | 3.0k                                            | 716                                         |
| Neuron core density [neur/mm <sup>2</sup> ] raw<br>norm. | _                     | _                        | _                        | _                                            | max. 5.8k                                 | 2.6k                       | max. 1k                      | 3.0k                                            | 3.9k                                        |
| c l traw                                                 | 2.3k                  |                          | 2.5k                     | 2.1k                                         |                                           | 674k                       | 2.5M to 282k                 | 741k                                            | 738k                                        |
| Synapse core density [syn/mm <sup>2</sup> ] norm.        | _                     | _                        |                          |                                              | -                                         | 674k                       | 1M to 113k                   | 741k                                            | $4\mathrm{M}$                               |
| Supply voltage                                           | 1.8V                  | 3.0V                     | 1.8V                     | 1.3V-1.8V                                    | 1.2V                                      | 0.7V-1.05V                 | 0.5V-1.25V                   | 0.55V-1.0V                                      | 0.8V-1.2V                                   |
|                                                          |                       | (941pJ)▲                 | $>77 fJ^{\Delta}$        | 134fJ <sup>△</sup> /30pJ <sup>▲</sup> (1.3V) | >11.3nJ <sup>△</sup> /26.6nJ <sup>▲</sup> | 26pJ <sup>▲</sup> (0.775V) | >23.6pJ <sup>△</sup> (0.75V) | 8.4pJ <sup>△</sup> /12.7pJ <sup>▲</sup> (0.55V) | 30pJ <sup>△</sup> /51pJ <sup>▲</sup> (0.8V) |
| Energy per SOP norm.                                     | N/A                   | -                        | _                        | -                                            | $>2.4 n J^{4}/5.7 n J^{4}$                | 26pJ▲                      | (66.1pJ <sup>△</sup> )       | 8.4pJ^/12.7pJ▲                                  | 12.9pJ <sup>△</sup> /22pJ <sup>▲</sup>      |

#### Area

ODIN and MorphIC have the highest neuron and synapse densities among all SNNs with embedded synaptic weight storage

|                                                   |              | Mix          | ed-signa                   | l –                                          |                                           |                            | Digita                       | al                                              |                                             |
|---------------------------------------------------|--------------|--------------|----------------------------|----------------------------------------------|-------------------------------------------|----------------------------|------------------------------|-------------------------------------------------|---------------------------------------------|
| Author                                            | Schemmel     | Benjamin     | Qiao                       | Moradi                                       | Painkras                                  | Akopyan                    | Davies                       | Frenkel                                         | Frenkel                                     |
| Publication                                       | ISCAS, 2010  | PIEEE, 2014  | Front. NS, 2015            | TBioCAS, 2017                                | JSSC, 2013                                | TCAD, 2015                 | IEEE Micro, 2018             | TBCAS, 2019a                                    | TBCAS, 2019b                                |
| Chip name                                         | HICANN       | Neurogrid    | ROLLS                      | DYNAPs                                       | SpiNNaker                                 | TrueNorth                  | Loihi                        | ODIN                                            | MorphIC                                     |
| Implementation                                    | Mixed-signal | Mixed-signal | Mixed-signal               | Mixed-signal                                 | Digital                                   | Digital                    | Digital                      | Digital                                         | Digital                                     |
| Technology                                        | $0.18 \mu m$ | $0.18 \mu m$ | $0.18 \mu m$               | $0.18 \mu m$                                 | $0.13 \mu m$                              | 28nm                       | 14nm FinFET                  | 28nm FDSOI                                      | 65nm LP                                     |
| # cores                                           | 1            | 16           | 1                          | 4                                            | 18                                        | 4096                       | 128                          | 1                                               | 4                                           |
| Neurosynaptic core area [mm <sup>2</sup> ]        | 49           | 168          | 51.4                       | 7.5                                          | 3.75                                      | 0.095                      | 0.4                          | 0.086                                           | 0.715                                       |
| # Izhikevich behaviors                            | (20)         | N/A          | (20)                       | (20)                                         | Programmable                              | 11 (3 neur: 20)            | (6)                          | 20                                              | 3                                           |
| # neurons per core                                | 512          | 64k          | 256                        | 256                                          | max. 1000                                 | 256                        | max. 1024                    | 256                                             | 512                                         |
| Synaptic weight storage                           | 4-bit (SRAM) | Off-chip     | Capacitor                  | 12-bit (CAM)                                 | Off-chip                                  | 1-bit (SRAM)               | 1- to 9-bit (SRAM)           | (3+1)-bit (SRAM)                                | 1-bit (SRAM)                                |
| Embedded online learning                          | STDP         | No           | SDSP                       | No                                           | Programmable                              | No                         | Programmable                 | SDSP                                            | S-SDSP                                      |
| # synapses per core                               | 112k         | _            | 128k                       | 16k                                          | _                                         | 64k                        | 1M to 114k (1-9 bits)        | 64k                                             | 528k                                        |
| Time constant                                     | Accelerated  | Biological   | Biological                 | Biological                                   | Bio. to accel.                            | Biological                 | N/A                          | Bio. to accel.                                  | Bio. to accel.                              |
| routing                                           | Medium       | Medium       | Low                        | Medium                                       | High                                      | Medium                     | High                         | Low                                             | Medium                                      |
| Flexibility learning                              | Low          | _            | Low                        | Low                                          | _                                         | _                          | High                         | Low                                             | Low                                         |
| N I I I I I I I I I I I I I I I I I I I           | 10.5         | 390          | 5                          | 34                                           | max. 267                                  | 2.6k                       | max. 2.5k                    | 3.0k                                            | 716                                         |
| Neuron core density [neur/mm <sup>2</sup> ] norm. | _            | _            | _                          | _                                            | max. 5.8k                                 | 2.6k                       | max. 1k                      | 3.0k                                            | 3.9k                                        |
| a la la la calaraw                                | 2.3k         |              | 2.5k                       | 2.1k                                         |                                           | 674k                       | 2.5M to 282k                 | 741k                                            | 738k                                        |
| Synapse core density [syn/mm <sup>2</sup> ] norm. | _            | _            | _                          | _                                            | _                                         | 674k                       | 1M to 113k                   | 741k                                            | 4M                                          |
| Supply voltage                                    | 1.8V         | 3.0V         | 1.8V                       | 1.3V-1.8V                                    | 1.2V                                      | 0.7V-1.05V                 | 0.5V-1.25V                   | 0.55V-1.0V                                      | 0.8V-1.2V                                   |
|                                                   | NI / A       | (941pJ)▲     | $>77 \mathrm{fJ}^{\Delta}$ | 134fJ <sup>△</sup> /30pJ <sup>▲</sup> (1.3V) | >11.3nJ <sup>△</sup> /26.6nJ <sup>▲</sup> | 26pJ <sup>▲</sup> (0.775V) | >23.6pJ <sup>△</sup> (0.75V) | 8.4pJ <sup>△</sup> /12.7pJ <sup>▲</sup> (0.55V) | 30pJ <sup>△</sup> /51pJ <sup>▲</sup> (0.8V) |
| Energy per SOP norm.                              | N/A          | _            | -                          | - '                                          | $>2.4 n J^{\Delta}/5.7 n J^{A}$           | 26pJ▲                      | $(66.1 \text{pJ}^{\Delta})$  | 8.4pJ^/12.7pJ▲                                  | 12.9pJ <sup>△</sup> /22pJ <sup>▲</sup>      |

#### Power

ODIN has the lowest energy per synaptic event among all digital SNNs, MorphIC keeps a competitive energy efficiency. They outperform subthreshold analog SNNs in accelerated time, but not for biological-time processing.

#### Results on the spiking EMG/DVS sensor fusion benchmark

[Ceolini, Frenkel, Shrestha et al., Front. Neurosci., 2020]



#### Results on the spiking EMG/DVS sensor fusion benchmark

[Ceolini, Frenkel, Shrestha et al., Front. Neurosci., 2020]



See the ODIN and MorphIC papers for more benchmarking, incl. online- and offline-trained MNIST.

# Outline

Part I – Bottom-up neuromorphic design

- Building blocks
- Integration

Part II – Top-down neuromorphic design

• Algorithms

Minimizing the training cost of neural networks for adaptive edge computing

[Frenkel & Lefebvre, Front. Neurosci., 2021]

Integration

#### Conclusion and perspectives

### Learning without feedback

Releasing the weight transport and update locking of backprop





Computational and memory cost  $\setminus$ 

#### Direct Random Target Projection (DRTP) Ideal use cases?

#### Adaptive edge computing

- Very low power and area overheads can be expected for an on-chip implementation.
- Datasets representative of the complexity associated to autonomous smart sensors: MNIST or CIFAR-10.
  - $\rightarrow$  We'll verify these claims in silico.

Disclaimer: whether DRTP scales to ImageNET is probably **not** the right question. ☺

#### Neuroscience

DRTP could come in line with recent findings in cortical areas that reveal the existence of output-independent target signals in the dendritic instructive pathways of intermediate-layer neurons.

[Magee & Grienberger, Annual Review of Neuroscience, 2020]

# Outline

Part I – Bottom-up neuromorphic design

- Building blocks
- Integration

#### Part II – Top-down neuromorphic design

- Algorithms
- Integration

Neuromorphic accelerators

[Frenkel, *ISCAS*, 2020]

Conclusion and perspectives

# Which bio-inspired elements?

Taking a step back with the top-down design strategy



#### Architecture of SPOON

SPOON – A <u>Sp</u>iking <u>O</u>nline-Learning C<u>o</u>nvolutional <u>N</u>euromorphic Processor



### SPOON – Chip microphotograph and specifications



| 951μm                               | 1<br>1<br>1<br>2<br>2<br>3 | <u>aaaa</u><br>amaa aaaaa |
|-------------------------------------|----------------------------|---------------------------|
| SPOON                               |                            |                           |
| 28-nm eCN<br>(0.32mm <sup>2</sup> ) | 331µ                       | m 🛔                       |

| (pre-silic               | on numbers, not yet updated)                              |        |
|--------------------------|-----------------------------------------------------------|--------|
| Technology               | 28nm FDSOI CMOS                                           | journ  |
| Implementation           | Digital                                                   |        |
| Area                     | 0.32mm <sup>2</sup> ( $0.26$ mm <sup>2</sup> excl. rails) |        |
| Topology                 | C5×5@10–FC128–FC10                                        |        |
| Online learning          | Stochastic DRTP, 8-bit weights                            |        |
| Time constant            | Biological to accelerated                                 |        |
| Supply voltage           | 0.6V - 1.0V                                               |        |
| Max. clock frequency     | 150MHz                                                    |        |
| Leakage power            | $61\mu W$ at 0.6V                                         |        |
| Energy for CONV core     | 1.7nJ/event at 0.6V                                       | DR     |
| Energy for FC core       | 55nJ/inference at 0.6V                                    |        |
| Online learning overhead | 16.8% in power, 11.8% in area                             | implem |
|                          |                                                           | at a v |

Stay tuned for the journal extension!

DRTP can be implemented on-chip at a very low cost!

Benchmarking: MNIST and N-MNIST

#### SPOON benchmarking

Against SoA spiking neural networks on MNIST



#### SPOON benchmarking

Against SoA spiking neural networks on MNIST



Only SPOON allows reaching the efficiency of ANN/CNN/BNN accelerators while enabling online learning with event-based sensors.

# Outline

Part I – Bottom-up neuromorphic design

- Building blocks
- Integration

Part II – Top-down neuromorphic design

- Algorithms
- Integration

Conclusion and perspectives

Summary of the key messages, next directions

Unveiling roads to embedded cognition



Unveiling roads to embedded cognition



Versatility / efficiency tradeoff Claim 1

Hardware-aware neuroscience model design and selection allows reaching record neuron and synapse densities with lowpower operation for large-scale integration *in silico*.

Unveiling roads to embedded cognition





Combining event-driven and frame-based processing with weight-transport-free update-unlocked training supports low-cost adaptive edge computing with spike-based sensors. Accuracy / efficiency tradeoff

Unveiling roads to embedded cognition



#### Perspectives

Neuromorphic engineering and spiking neural networks:

"Can we make it work?" → "Will it bring a competitive advantage?" (not only against GPUs) Need something better than MNIST → Audio (KWS) and bio-signal processing (time, biological-time) [Davies, Nat. Mach. Intel., 2019]

- Promising avenue: fine-grained mixed-signal design.
- Bottom-up trend: dendrites
- Top-down trend: new wave of training algorithms mapping onto bio-plausible primitives [Sa
- Cognition: a case for neuromorphic robots? [Man & Damasio, Nat. Mach. Intel., 2019]



[Sacramento, NeurIPS'18] [Payeur, bioRxiv, 2020] [Bellec, Nat. Comms., 2020]

#### Acknowledgments



Frenkel, NICE'21 keynote

## Questions?



n cfrenkel

Charlotte-Frenkel

ChFrenkel

charlotte@ini.uzh.ch

#### Main references:

- ODIN: [C. Frenkel et al. "MorphIC: A 65-nm 738k-synapse/mm<sup>2</sup> quad-core binary-weight digital neuromorphic processor with stochastic spike-driven online learning," *IEEE Trans. BioCAS*, 2019]
- MorphIC: [C. Frenkel et al., "A 0.086-mm<sup>2</sup> 12.7-pJ/SOP 64k-synapse 256neuron online-learning digital spiking neuromorphic processor in 28nm CMOS," *IEEE Trans. BioCAS*, 2019]
- DRTP: [C. Frenkel, M. Lefebvre et al., "Learning without feedback: Fixed random learning signals allow for feedforward training of deep neural networks," Frontiers in Neuroscience, 2021]
- SPOON: [C. Frenkel et al., "A 28-nm convolutional neuromorphic processor enabling online learning with spike-based retinas," *IEEE ISCAS*, 2020]

*Open-sourced!* github.com/ChFrenkel/ODIN

Open-sourced! github.com/ChFrenkel/Direct RandomTargetProjection

Journal extension coming soon