# Demonstrating the Advantages of Analog Wafer-Scale Neuromorphic Hardware

Hartmut Schmidt Andreas Grübl José Montes <u>Eric Müller</u> Sebastian Schmitt Johannes Schemmel

mueller@kip.uni-heidelberg.de

Kirchhoff Institute for Physics Ruprecht-Karls-Universität Heidelberg

> 2025-03-25 NICE 2025

#### **Conventional Computing**

Significant demand from AI training and applications<sup>1,2</sup>



<sup>&</sup>lt;sup>1</sup>N. Maslej et al., "The Al index 2024 annual report," Al Index Steering Committee, Institute for Human-Centered Al, Stanford University, Stanford, CA, Apr. 2024

<sup>2</sup>S. Chen, "How much energy will AI really consume? the good, the bad and the unknown," Nature, vol. 639, no. 8053, pp. 22–24, 2025. DOI:

<sup>10.1038/</sup>d41586-025-00616-z

## **Conventional Computing**

- Significant demand from AI training and applications<sup>1,2</sup>
- Dennard (energy-density) scaling ended ~2006
  - Dynamic power consumption, power wall, dark silicon, memory wall
  - $\rightarrow$  New computing stacks



<sup>3</sup>T. Conte, "IEEE rebooting computing initiative & international roadmap of devices and systems," in IEEE Rebooting Computer Architecture 2030 Workshop, 2015

## **Conventional Computing**

- Significant demand from AI training and applications<sup>1,2</sup>
- Dennard (energy-density) scaling ended ~2006
  - Dynamic power consumption, power wall, dark silicon, memory wall
  - $\rightarrow~$  New computing stacks
- Domain-specific hardware accelerators<sup>4</sup>: GPUs, FPGAs, and beyond



<sup>&</sup>lt;sup>4</sup>W. J. Dally et al., "Domain-specific hardware accelerators," Commun. ACM, vol. 63, no. 7, pp. 48–57, Jun. 2020. DOI: 10.1145/3361682

#### Neuromorphic Hardware?

- Numerical simulation:
  - high level of parallelism is possible but latency to result is limited<sup>1,2</sup>
- SNNs follow an event-driven computing paradigm: sparse in space and time
- Neuromorphic hardware can complement simulation  $\rightarrow$  SNN accelerators
- Functional modeling (ML-inspired?), but also in Computational Neuroscience:
  - Complex neuron dynamics, plasticity, long/repetitive experiments or guided reconfiguration!



<sup>1</sup>A. C. Kurth et al., "Sub-realtime simulation of a neuronal network of natural density," Neuromorphic comput. eng., vol. 2, no. 2, p. 021 001, 2022. DOI: 10.1088/2634-4386/ac55fc

<sup>2</sup>J. Jordan <u>et al.</u>, "Extremely scalable spiking neuronal network simulation code: From laptops to exascale computers," Frontiers in Neuroinformatics, vol. 12, p. 2, 2018. DOI: 10.3389/fninf.2018.00002 Eric Müller

#### **BrainScaleS-1**

- ( $\leq$ ) 20× modules
- Wafer-scale integration (180 nm CMOS)
- 384 ASICs per 20 cm wafer
- 48 FPGAs, 40 GbE uplink to control cluster
- Typical speedup factor of 10'000







#### Two Network Models from Computational Neuroscience

#### Balanced Random Network<sup>1</sup>



#### Cortical Microcircuit Network Model<sup>2</sup>



<sup>1</sup>N. Brunel, "Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons," Journal of Computational Neuroscience, vol. 8, no. 3, pp. 183–208, 2000. DOI: 10.1023/A:1008925309027

<sup>2</sup>T. C. Potjans and M. Diesmann, "The cell-type specific cortical microcircuit: Relating structure and activity in a full-scale spiking network modela," Cereb. Cortex, vol. 24, pp. 785–806, 3 2012. DOI: 10.1093/cercor/bh385 Eric Müller

#### Mapping the "Microcircuit" to BrainScaleS-1







## Mapping the "Microcircuit" to BrainScaleS-1

- 200k analog neuron circuits & 43M synapses
- Neurons follow configurable AdEx dynamics
- Configurable maximum fan-in implemented by linking multiple neuron circuits (up to 64 neurons resulting in 14k synapses)
- On-wafer sparse configurable circuit-switched network for asynchronous spike communication<sup>1</sup>
- Modeling API: PyNN (on top of the BSS-1 "Operating System")



<sup>&</sup>lt;sup>1</sup>H. Schmidt et al., "From clean room to machine room: Commissioning of the first-generation BrainScaleS wafer-scale neuromorphic system," Neuromorphic comput. eng., vol. 3, no. 3, p. 034 013, 2023. DOI: 10.1088/2634-4386/acf7e4

## Mapping the "Microcircuit" to BrainScaleS-1

- 384 ASICs (each marked w/ white triangle at the bottom)
- Neuron placement represented by shading
- Darker shades indicate higher neuron counts
- Routed connections visualized as colored lines
- Colored borders indicate model populations



## Adapting Network Models to BrainScaleS-1 I

• Size of the network models:





Balanced Random Network 12'400 neurons 15'625'000 synapses Cortical Microcircuit 80'000 neurons 300'000'000 synapses

- Number of model neurons < neuron circuits per wafer, but average neuron fan-in requires interlinked neuron circuits.
- $\rightarrow$  Reduced amount of (model) neurons available.

## Adapting Network Models to BrainScaleS-1 II

- $\Rightarrow$  Downscaling of neuron count and in-degree
  - maintaining the original connectivity probability, and
  - compensating<sup>2</sup> for the reduced input by linear weight increase following the approach by Albada et al.<sup>1</sup>
- Due to random network structure, some additional "synapse loss" occurs across all populations.
  - We incorporate this network model "distortions" into our simulations comprising
    - 2'083 neurons and 690'157 synapses (Balanced Random Network)
    - 7'712 neurons and 2'373'933 synapses (Cortical Microcircuit)

<sup>2</sup>D. Brüderle et al., "A comprehensive workflow for general-purpose neural modeling with highly configurable neuromorphic hardware systems," Biological Cybernetics, vol. 104, pp. 263–296, 4 2011

<sup>&</sup>lt;sup>1</sup>S. J. van Albada <u>et al.</u>, "Scalability of asynchronous networks is limited by one-to-one mapping between effective connectivity and correlations," PLoS Comput. Biol., vol. 11, pp. 1–37, Sep. 2015. DOI: 10.1371/journal.pcbi.1004490

## Result: (Downscaled) Balanced Random Network

- Varying relative inhibitory weight and external input spike rates.
- For firing rates exceeding 50 Hz, saturation effects on the hardware introduce deviations in network behavior.

#### Mean firing rates of neurons



#### Result: (Downscaled) Balanced Random Network

- Varying relative inhibitory weight and external input spike rates.
- For firing rates exceeding 50 Hz, saturation effects on the hardware introduce deviations in network behavior.

#### Mean firing rates of neurons



#### Result: (Downscaled) Cortical Microcircuit

- Results are extracted from a 9 s interval of biological time, starting 1 s after the experiment onset (BSS-1 & NEST).
- Reevaluation after 53 min of wall-clock time on BSS-1.

Firing rate distribution of neurons across eight network model populations



#### Result: (Downscaled) Cortical Microcircuit II

| Simulator                         | Performance<br>(10 <sup>9</sup> synaptic event/s) | Energy<br>(µJ/synaptic event) |
|-----------------------------------|---------------------------------------------------|-------------------------------|
| BrainScaleS-1                     | 162                                               | < 0.012                       |
| NeuroAlx-Framework <sup>0,1</sup> | 19                                                | 0.048                         |
| CsNN <sup>0,2</sup>               | 3.8                                               | 0.783                         |
| NEST <sup>0,3</sup>               | 1.8                                               | 0.48                          |
| SpiNNaker <sup>4</sup>            | 0.9                                               | 0.6                           |

<sup>0</sup>Values are estimated from the reported speedup factor and the network behavior of the full-scale model with external Poisson inputs.

<sup>1</sup>K. Kauth <u>et al.</u>, "neuroAlx-framework: Design of future neuroscience simulation systems exhibiting execution of the cortical microcircuit model 20× faster than biological real-time," Front. Comput. Neurosci., vol. 17, p. 1144143, 2023. DOI: 10.3389/fncom.2023.1144143

<sup>2</sup>A. Heittmann et al., "Simulating the cortical microcircuit significantly faster than real time on the ibm inc-3000 neural supercomputer," Front. Neurosci., vol. 15, p. 728 460, 2022. DOI: 10.3389/fnins.2021.728460

<sup>3</sup>A. C. Kurth et al., "Sub-realtime simulation of a neuronal network of natural density," Neuromorphic comput. eng., vol. 2, no. 2, p. 021 001, 2022. DOI: 10.1088/2634-4386/ac55fc

<sup>4</sup>O. Rhodes et al., "Real-time cortical simulation on neuromorphic hardware," Philos. Trans. R. Soc. A, vol. 378, no. 2164, p. 20190160, 2020. DOI: 10.1098/rsta.2019.0160

## Conclusion

- Speedup from physical emulation most evident for long/repetitive emulations
- Main operational overhead introduced by configuration and data transfer (e.g., read out of recorded observables)
- Comparably low energy consumption of BrainScaleS-1 can still yield advantages in comparison to numerical simulation
- Network model size limitations come from neuron, synapse, and routing resources
  - Biological connection densities difficult to efficiently scale beyond wafer-scale
- Co-execution approach:
  - validation and network topology exploration in simulation
  - neuromorphic backend handles continuous time emulation, extended-duration experiments, and iterative parameter sweeps

## Outlook

- Area efficiency limited by use of "plastic" synapses in fully static networks

   dedicated static (higher-density) synapses in future hardware systems?
- Newer technology node! (BrainScaleS-1 uses 180 nm CMOS)
- No plasticity was involved, i.e. the model dynamics are numerically "cheap"; introducing, e.g., synaptic plasticity would amplify the benefit of physical emulation.
- No "dependent" reconfiguration was used neuromorphic hardware can also deliver in latency-to-result use cases.

### BrainScaleS is an Open Research Platform

 Integrated into the EBRAINS Software Distribution



- Access to accelerated neuromorphic BrainScaleS via EBRAINS
- Register for EBRAINS:





#### Eric Müller

#### **References I**

N. Maslej et al., "The Al index 2024 annual report," Al Index Steering Committee, Institute for Human-Centered Al, Stanford University, Stanford, CA, Apr. 2024.

S. Chen, "How much energy will AI really consume? the good, the bad and the unknown," Nature, vol. 639, no. 8053, pp. 22–24, 2025. DOI: 10.1038/d41586-025-00616-z.

T. Conte, "IEEE rebooting computing initiative & international roadmap of devices and systems," in IEEE Rebooting Computer Architecture 2030 Workshop, 2015.

W. J. Dally, Y. Turakhia, and S. Han, "Domain-specific hardware accelerators," Commun. ACM, vol. 63, no. 7, pp. 48–57, Jun. 2020. DOI: 10.1145/3361682.

A. C. Kurth et al., "Sub-realtime simulation of a neuronal network of natural density," Neuromorphic comput. eng., vol. 2, no. 2, p. 021 001, 2022. DOI: 10.1088/2634-4386/ac55fc.

J. Jordan et al., "Extremely scalable spiking neuronal network simulation code: From laptops to exascale computers," Frontiers in Neuroinformatics, vol. 12, p. 2, 2018. DOI: 10.3389/fninf.2018.00002.

N. Brunel, "Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons," Journal of Computational Neuroscience, vol. 8, no. 3, pp. 183–208, 2000. DOI: 10.1023/A:1008925309027.

#### **References II**

T. C. Potjans and M. Diesmann, "The cell-type specific cortical microcircuit: Relating structure and activity in a full-scale spiking network modela," Cereb. Cortex, vol. 24, pp. 785–806, 3 2012. DOI: 10.1093/cercor/bbs358.

H. Schmidt et al., "From clean room to machine room: Commissioning of the first-generation BrainScaleS wafer-scale neuromorphic system," Neuromorphic comput. eng., vol. 3, no. 3, p. 034 013, 2023. DOI: 10.1088/2634-4386/acf7e4.

S. J. van Albada, M. Helias, and M. Diesmann, "Scalability of asynchronous networks is limited by one-to-one mapping between effective connectivity and correlations," PLoS Comput. Biol., vol. 11, pp. 1–37, Sep. 2015. DOI: 10.1371/journal.pcbi.1004490.

D. Brüderle et al., "A comprehensive workflow for general-purpose neural modeling with highly configurable neuromorphic hardware systems," Biological Cybernetics, vol. 104, pp. 263–296, 4 2011.

K. Kauth et al., "neuroAlx-framework: Design of future neuroscience simulation systems exhibiting execution of the cortical microcircuit model 20 × faster than biological real-time," Front. Comput. Neurosci., vol. 17, p. 1 144 143, 2023. DOI: 10.3389/fncom.2023.1144143.

A. Heittmann et al., "Simulating the cortical microcircuit significantly faster than real time on the ibm inc-3000 neural supercomputer," Front. Neurosci., vol. 15, p. 728460, 2022. DOI: 10.3389/fnins.2021.728460.

O. Rhodes et al., "Real-time cortical simulation on neuromorphic hardware," Philos. Trans. R. Soc. A, vol. 378, no. 2164, p. 20190160, 2020. DOI: 10.1098/rsta.2019.0160.

#### **References III**

H. Schmidt, "Large-scale experiments on wafer-scale neuromorphic hardware," Ph.D. dissertation, Ruprecht-Karls-Universität Heidelberg, 2024. DOI: 10.11588/heidok.00034446.