DATE 2021

Running Efficiently CNNs on the Edge Thanks to Hybrid SRAM-RRAM In-Memory Computing

Marco Rios^a, Flavio Ponzina^b, Giovanni Ansaloni^c, Alexandre Levisse^d and David Atienza^e
Embedded Systems Laboratory (ESL), EPFL, Switzerland
^amarco.rios@epfl.ch
^bflavio.ponzina@epfl.ch
^cgiovanni.ansaloni@epfl.ch@epfl.ch
^dalexandre.levisse@epfl.ch@epfl.ch
^edavid.atienza@epfl.ch@epfl.ch

ABSTRACT

The increasing size of Convolutional Neural Networks (CNNs) and the high computational workload required for inference pose major challenges for their deployment on resource-constrained edge devices. In this paper, we address them by proposing a novel In-Memory Computing (IMC) architecture. Our IMC strategy allows us to efficiently perform arithmetic operations based on bitline computing, enabling a high degree of parallelism while reducing energy-costly data transfers. Moreover, it features a hybrid memory structure, where a portion of each subarray, dedicated to storing CNN weights, is implemented as high-density, zero-standby-power Resistive RAM. Finally, it exploits an innovative method for storing quantized weights based on their value, named Weight Data Mapping (WDM), which further increases efficiency. Compared to state-of-the-art IMC alternatives, our solution provides up to 93% improvements in energy efficiency and up to 6x less run-time when performing inference on Mobilenet and AlexNet neural networks.

Keywords: SRAM, RRAM, In-Memory Computing, CNN, Edge computing.

Full Text (PDF)