

# Energy Efficient Implementation of MVM Operations Using Filament-free Bulk RRAM Arrays

<u>Ashwani Kumar\*</u>, J. Park, Y. Zhou, J. Kim, S. Jain, C. D. Schuman, G. Cauwenberghs, D. Kuzum\* <u>\*(ask010@ucsd.edu</u>, <u>dkuzum@ucsd.edu</u>)



Neuroelectronics Lab Department of Electrical and Computer Engineering University of California San Diego, CA, USA

## Outline

- □ Filament-Free Bulk RRAM Fabrication and Characterization
- □ Weight Mapping on RRAM Crossbar
- □ SNN Implementation with our Bulk RRAM Crossbars
- Conclusion

## **Memory Access**



## Analog Compute-in-Memory (CIM)

□ Challenging requirements by today's AI Models:

- □ Massive training and inference exercises require large amount of energy
- MAC operations (as MVM matrix vector multiplication) contribute 70- 90% of the total operational cost of neural network implementation.
- □ Need energy efficient MVM operations:



□ CMOS compatible RRAM crossbar

array for MVM using CIM.



# **Optimizing RRAM Technology for CIM**



### Solution: Bulk (Area-type) RRAM



- High voltage forming → not compatible with advanced CMOS, requires additional peripheral circuitry to support this operation.
   Abrupt resistive switching → Variations and noise, accuracy loss, many iterations of read/verify cycles.
- Low ON state resistance (~kΩ) → increases power consumption, limits the arrays size and parallel MAC operations.
- Limited number of states or binary operations
  → not suitable for on-chip learning.
- 1. Forming-free operation, no filaments
- 2. Area-type switching, uniform switching with no compliance current
- MΩ level resistance enables large size arrays and parallel read, reduced array level energy consumption
- 4. Multi-level gradual switching for on-chip learning

## Fabrication of Trilayer Bulk RRAM

□ Trilayer bulk RRAM stack:  $\Box$  Al<sub>2</sub>O<sub>3</sub>(3nm) / TiO<sub>2</sub>(3nm) / TiO<sub>x</sub> (40nm)  $\Box$  Tunnel barrier from Al<sub>2</sub>O<sub>3</sub>, high oxygen vacancy concentration in  $TiO_x$ , separated by ALD deposited  $TiO_2$ 

Crossbar with via-hole structured RRAM



Corner



\*Eliminates the edge effects due to high-field corners or sidewalls





Park, J., Kumar, A., Zhou, Y. et al. Nat Commun 15, 3492 (2024).

## **RRAM Structure and Resistive Switching**



Darker contrast of ALD TiO<sub>2</sub> confirms higher atomic density than sputtered 40nm TiO<sub>x</sub>
 STEM-EELS line-scan profile also shows lower oxygen concentration in 40nm TiO<sub>x</sub>
 STEM-EELS composition map (red-dotted (a)) shows nm-scale dark areas pointing to a porous structure.

□ Bulk RRAM's resistive switching mechanism:

- Distribution of oxygen vacancies (V<sub>O</sub>) is modulated between TiO<sub>x</sub> and TiO<sub>2</sub> layers by applying a field across the device.
- Migration of oxygen vacancies near the TiO<sub>2</sub>/TiO<sub>x</sub> interface either extend or reduce the effective thickness of oxygen vacancy rich TiO<sub>x</sub> layer to switch the device in LRS or HRS.



## **Bulk RRAM DC Switching Characterization**



Bulk (area-type) Switching: Resistance scales with area for both HRS and LRS.

- $\Box$  M $\Omega$  level bulk switching
- Low device-to-device and cycle-to-cycle variations



Park, J., Kumar, A., Zhou, Y. et al. Nat Commun 15, 3492 (2024).

## Bulk RRAM Pulse Switching Characterization



## $\Box$ Achieved Multilevel States at M $\Omega$ Range:

- 1. Identical pulse programming scheme  $\rightarrow$  same pulse amplitude
- Incremental pulse programming scheme → pulse amplitude increases with 20mV step.

## Row Differential Weight Scheme in Crossbar



- Row Differential Crossbar Schematic
  - Signed weight implementation with voltagesensing scheme.
  - Increased effective switching dynamic range (~170) while observing the many (100) conductance levels.
  - Enables mapping of a wide range of realvalued weights

## MVM Operations with Bulk RRAM Crossbars





- A neuromorphic CIM platform utilizing a switched capacitor voltage sensing
  - Packaged crossbar array tested using neuromorphicboard developed with on board energy efficient voltage sensing.
    - A representative resistance map of 16x16 bulk RRAM crossbar read by using voltage sensing scheme.

Measured MVM and expected MVM result show good linearity (low error) for differential mapping scheme.

In collaboration with Prof. G. Cauwenberghs @UCSD Jain et al. IEEE ISCAS, 2023.



0.1

## SNN Implementation: F-1 racetrack navigation



- SNN Model Using Evolutionary Optimization for Neuromorphic Systems (EONS):
  - SNN is optimized and trained for small-scale autonomous racing task (representative tracks).
  - Trained on 5 F-1 tracks and tested on an additional 15 tracks.
  - Pruned SNN consists of 14 input neurons and 30 output neurons.

In collaboration with Prof. C. Schuman @UTK

J. S. Plank, et al., IEEE Letters of the Computer Society, 2018. C. D. Schuman, et al., NICE, Workshop, 2020.

## SNN Weight Implementation on Bulk RRAM Crossbars



### Two 16x16 crossbars for all encoded weights

 SNN's signed 4-bit weights were encoded into differential conductance (G+ and G-) using row differential scheme and programmed in crossbars.



- Ideal (software) verses programmed
  (hardware) weight map in RRAM
  crossbars.
- Network outputs: steering angle and speed

## **SNN Hardware Implementation: Results**



- □ Network Performance and Energy Comparison:
  - Speed and steering angle computations across navigation through all 15-racetracks show highly consistent results between software and hardware implementations.
  - Average energy consumed for MVM operations across all 15 tracks shows that our trilayer bulk RRAM substantially (more than two orders of magnitude) reduces energy consumption compared to other filamentary RRAM technologies.

[CEA-Leti] L. Grenouillet et al., IEEE International Memory Workshop (IMW), 2021. [UCSB, Strukov] H. Kim, et al., Nature communications, 2021. [Tsinghua] W. Wan et al., Nature, 2022.

## Conclusion

Developed a novel trilayer filament-free bulk RRAM crossbar technology

- Proposed row-differential weight mapping to achieve higher dynamic range for mapping of a wide range of real-valued weights in bulk RRAM crossbars.
- Performed highly linearized MVM operation in an energy efficient way using in-house design neuromorphic CIM hardware platform.
- Presented SNN implementation using our bulk RRAM crossbars for autonomous navigation tasks for scaled F1-tracks and showed great agreement with ideal software and hardware results.
- Our bulk RRAM crossbars for at edge neuromorphic computing application substantially reduced energy consumption compared to other filamentary RRAM technologies.
- Presented bulk RRAM crossbar technology with capability of multilevel switching in MΩ regime and CMOS-BEOL compatibility addresses several challenges and offer great potential for energy and area efficient computing.

## Acknowledgements

#### **Collaborators:**

-Prof. Gert Cauwenberghs (UC San Diego)-Prof. Catherine Schuman (UT Knoxville)-Prof. Ivan Schuller (UC San Diego)

#### Students:

Yuhan Shi

Sangheon Oh

Madison Wilson

Mehrdad Ramezani

Yucheng Zhou

Yuyi Zhang

Shaan Shah

Fengyi Sun

**Postdocs:** 

Ashwani Kumar

Yue Zhou

Visitors:

Seonghyun Kim (SK Hynix)







Qualcom



# Thank You!

## References

- 1. J. S. Plank, C. D. Schuman, G. Bruer, M. E. Dean, and G. S. Rose, "The TENNLab exploratory neuromorphic computing framework," IEEE Letters of the Computer Society, 1, 2, 17-20, 2018.
- 2. C. D. Schuman, J. P. Mitchell, R. M. Patton, T. E. Potok, and J. S. Plank, "Evolutionary optimization for neuromorphic systems," Annual NeuroInspired Computational Elements Workshop, pp. 1-9. 2020.
- 3. https://github.com/f1tenth/f1tenth racetracks.
- 4. L. Grenouillet et al., "16kbit 1T1R OxRAM arrays embedded in 28nm FDSOI technology demonstrating low BER, high endurance, and compatibility with core logic transistors," IEEE International Memory Workshop (IMW), 1-4, 2021.
- 5. H. Kim, M. Mahmoodi, H. Nili, D. B. Strukov, "4K-memristor analog grade passive crossbar circuit," Nature communications 12, 5198, 2021.
- 6. W. Wan et al., "A compute-in-memory chip based on resistive random access memory," Nature 608, 504-512, 2022.
- B. Fleischer et al., "A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference," 2018 IEEE Symposium on VLSI Circuits, Honolulu, HI, USA, 2018.