

## Quantized Context Based LIF Neurons for Recurrent Spiking Neural Networks in 45nm

**Sai Sukruth Bezugam\***, Yihao Wu\*, JaeBum Yoo\*,

Dmitri Strukov, Bongjin Kim

Department of ECE

University of California Santa Barbara

saisukruthbezugam@ieee.org

(\* Contributed Equally)





Without Context Find something? With Context Find something to play music with.

#### Basic Idea : Context helps in finding things easier.

Sai Bezugam NICE 2024

#### Is it with the SNNs too?



#### Is it with the SNNs too? - Yes



#### How does it work?



**Context Based LIF** 

#### Mathematics of Context based Leaky Integrate and Fire Neuron Model



#### Mathematics of Context based Leaky Integrate and Fire Neuron Model



Amplification/ Attenuation of somatic input based on context/ Apical compartment Voltage

#### Lets build hardware for it CLIF Digital design (No Brainer)

 $V_{ap}(t+dt) = a_{leak} * V_{ap}(t) + (1-a) * I_{context}(t+dt)$ 

 $V_{soma}(t+dt) = \beta_{leak} V_{soma}(t) + (1-\beta_{leak}) V_{ap}(t+dt) |_{som}(t+dt) - S_{j}(t)V_{th}$ 



If N starts at 8 ends at 64 bits



## (a)Linear Decay





#### Proposed qCLIF

$$V_{ap}(t+dt) = V_{ap}(t) - a_{leak} + I_{context}(t+dt)$$





### Proposed qCLIF

$$V_{ap}(t+dt) = V_{ap}(t) - a_{leak} + I_{context}(t+dt)$$



Sai Bezugam NICE 2024

N -> 8N bits (Orignal)

#### Architecture of qCLIF Neuron layer



#### What should be the bit width of modules?

## Targeted network size (S) $\rightarrow$ 200, (Rec) $\rightarrow$ 200, (Context) $\rightarrow$ 10



#### What should be the bit width of modules?

Assume K bit weights

Somatic Inputs worst case  $\rightarrow$  400<sup>\*</sup>K

 $\rightarrow$  N  $\rightarrow$  log2(400)\*K  $\rightarrow$  ~64bits



#### What should be the bit width of modules?



#### What should be the bit width of modules? Thanks to sparse activity of SNN



8 bits weights  $\rightarrow$  N = 8

#### **Quantization of weights**

#### Normal distribution (Region bounded quantization)



#### EFFECT OF QUANTIZATION ON NETWORK PERFORMANCE

Quantization Aware Training done for both neuron states and weights

| Precision Level | Neuron<br>Quantization<br>Accurancy (%) | Weight and Neuron<br>Quantization<br>Accuracy (%) |  |  |
|-----------------|-----------------------------------------|---------------------------------------------------|--|--|
| Full Precision  | 94.5                                    | 94.5                                              |  |  |
| 16-bit          | 93.4                                    | 93                                                |  |  |
| 8-bit           | 92                                      | 90                                                |  |  |
| 4-bit           | 77.5                                    | 73                                                |  |  |
| 2-bit           | 55                                      | N/A                                               |  |  |

\* All results on proposed qCLIF

#### Hardware results - Layout 45nm FreePDK



#### Single qCLIF

#### 10 qCLIF neurons network

# Hardware results -Performance of Layer of 10 qCLIF neurons

**ESOP vs Freq.** 

Timing vs Freq.



| Synapses   | 250, 8bit   |
|------------|-------------|
| Area (mm²) | 0.125*0.125 |

Sai Bezugam NICE 2024

#### Hardware results - Scalability of Design

| Clock<br>Frequency | No. of<br>qCLIF | Synapse<br>s | Area<br>(mm²)   | Slack<br>(ns) | Total<br>Power<br>(mW) | Energy<br>Per<br>Spike<br>(pJ) |
|--------------------|-----------------|--------------|-----------------|---------------|------------------------|--------------------------------|
| 100 MHz            | 10              | 250, 8bit    | 0.125*0.1<br>25 | 5.10          | 1.315                  | 1.342                          |
| 100 MHz            | 200             | 82K, 8bit    | 1.925*1.9<br>25 | 4.07          | 358.0                  | 17.9                           |

20X Neurons and 328x Synapses but 15.4X increase in area Sublinear increase in Total Power Consumption

#### Hardware results - Scalability of Design (Precision)

| Clock<br>Frequency | No. of<br>qCLIF | Synapses  | Area<br>(mm²)   | Slack<br>(ns) | Total<br>Power<br>(mW) | Energy<br>Per Spike<br>(pJ) |
|--------------------|-----------------|-----------|-----------------|---------------|------------------------|-----------------------------|
| 100 MHz            | 200             | 82K, 8bit | 1.925*1.9<br>25 | 4.07          | 358.0                  | 17.9                        |
| 100 MHz            | 200             | 82K, 4bit | 1.365*1.3<br>65 | 6.45          | 174.0                  | 8.7                         |

#### Energy reduction > 50%

Slack increased by 2.4 ns -> Lower precision may even operate @ 200 MHz

#### Comparison with literature

|                        | [18]                 | [19]                 | [20]                   | [21]                  | [22]                  | This<br>work                | This<br>work                |
|------------------------|----------------------|----------------------|------------------------|-----------------------|-----------------------|-----------------------------|-----------------------------|
|                        | Fabricated           | Fabricated           | Fabricated             | Fabricated            | Fabricated            | Simulated                   | Simulated                   |
| Technology<br>(nm)     | 65                   | 90                   | 65                     | 10                    | 28                    | 45                          | 45                          |
| Neuron<br>count        | 650                  | 400                  | 410                    | 4096                  | 1 <b>M</b>            | 200                         | 200                         |
| Network Type           | FF SNN               | FF SNN               | SNN                    | FF SNN                | FF SNN                | cRSNN                       | cRSNN                       |
| Neuron Type            | IF                   | Stochastic           | IF                     | LIF                   | LIF                   | qCLIF                       | qCLIF                       |
| Synapse<br>count       | 67k                  | 313k                 | N//A                   | 1M                    | 256M                  | 82k                         | 82 k                        |
| Precision              | 6 bit                | 1bit                 | 4 bit                  | 7 bit                 | 4 bit                 | 4 bit                       | 8 bit                       |
| Area (mm2)             | 1.99                 | 0.15                 | 10.08                  | 1.72                  | 430                   | 1.86                        | 3.71                        |
| Clock<br>frequency     | 70KHz@<br>0.52V      | 37.5MHz              | 20MHz                  | 105MHz<br>@ 0.5V      | 1KHz@<br>1.05V        | 100MHz@<br>1.1V             | 100MHz@<br>1.1V             |
| Energy per<br>SOP (pJ) | 1.5                  | 8.4                  | N//A                   | 3.8                   | 26                    | 8.7                         | 17.9                        |
| Dataset                | GSCD<br>(4 Keywords) | GSCD<br>(2 Keywords) | GSCD<br>(10 Keywords ) | TIMIT<br>(4 Keywords) | TDIGIT<br>(4 classes) | DVS Gesture<br>(10 Classes) | DVS Gesture<br>(10 Classes) |
| Accuracy (%)           | 91.8                 | 94.6                 | 90.2                   | 94                    | 90.8                  | 73                          | 90                          |

- Proposed quantized cLIF digital implementation and First cRSNN implementation (Simulated).
- Although the neuron model is complex relatively low energy per spike consumption compared to literature.
- Careful hardware software codesign helped in network optimization.

#### Limitations of the work / Future work

- The accumulator occupies a significant portion of the area.
  - Space-efficient alternatives, such as sparse accumulators or in-memory computing (e.g., memristor crossbar architectures), could be explored.
- All results were simulated on an open source 45 nm technology node.
  - Fabrication using a smaller technology node may further optimize performance.
- Deviation from true asynchronous nature of neuromorphic system, Synchronous behavior between apical and somatic compartment is expected in current design.

#### Thanks to my co authors





Yihao Wu

JaeBum Yoo



**Dmitri Strukov** 



**Bongjin Kim** 

This work is outcome of course taught by Prof. Bongjin Kim at UC Santa Barbara during Fall 2023. All authors would like to thank discussions with Prof. Robert Legenstein, George Hutchinson and Tinish Bhattacharya. I was funded by National Science Foundation BRAID award #2318152

Thank you Any questions please feel free to reach out to me at

saisukruthbezugam@ieee.org sbezugam@ucsb.edu

# UC SANTA BARBARA