Running Efficiently CNNs on the Edge Thanks to Hybrid SRAM-RRAM In-Memory Computing

Marco Riosa, Flavio Ponzinab, Giovanni Ansalonic, Alexandre Levissed and David Atienzae
Embedded Systems Laboratory (ESL), EPFL, Switzerland
amarco.rios@epfl.ch
bflavio.ponzina@epfl.ch
cgiovanni.ansaloni@epfl.ch@epfl.ch
dalexandre.levisse@epfl.ch@epfl.ch
edavid.atienza@epfl.ch@epfl.ch

ABSTRACT


The increasing size of Convolutional Neural Networks (CNNs) and the high computational workload required for inference pose major challenges for their deployment on resource-constrained edge devices. In this paper, we address them by proposing a novel In-Memory Computing (IMC) architecture. Our IMC strategy allows us to efficiently perform arithmetic operations based on bitline computing, enabling a high degree of parallelism while reducing energy-costly data transfers. Moreover, it features a hybrid memory structure, where a portion of each subarray, dedicated to storing CNN weights, is implemented as high-density, zero-standby-power Resistive RAM. Finally, it exploits an innovative method for storing quantized weights based on their value, named Weight Data Mapping (WDM), which further increases efficiency. Compared to state-of-the-art IMC alternatives, our solution provides up to 93% improvements in energy efficiency and up to 6x less run-time when performing inference on Mobilenet and AlexNet neural networks.

Keywords: SRAM, RRAM, In-Memory Computing, CNN, Edge computing.



Full Text (PDF)