A Case for Emerging Memories in DNN Accelerators

Avilash Mukherjee1,a, Kumar Saurav2, Prashant Nair1,b, Sudip Shekhar1,c and Mieszko Lis1,d
1The University of British Columbia, Vancouver, Canada
aavilash@ece.ubc.ca
bprashantnair@ece.ubc.ca
csudip@ece.ubc.ca
dmieszko@ece.ubc.ca
2QUALCOMM India
saurav@qti.qualcomm.com

ABSTRACT


The popularity of Deep Neural Networks (DNNs) has led to many DNN accelerator architectures, which typically focus on the on-chip storage and computation costs. However, much of the energy is spent on accesses to off-chip DRAM memory. While emerging resistive memory technologies such as MRAM, PCM, and RRAM can potentially reduce this energy component, they suffer from drawbacks such as low endurance that prevent them from being a DRAM replacement in DNN applications.
In this paper, we examine how DNN accelerators can be designed to overcome these limitations and how emerging memories can be used for off-chip storage. We demonstrate that through (a) careful mapping of DNN computation to the accelerator and (b) a hybrid setup (both DRAM and an emerging memory), we can reduce inference energy over a DRAM-only design by a factor ranging from 1.12× on EfficientNetB7 to 6.3× on ResNet-50, while also increasing the endurance from 2 weeks to over a decade. As the energy benefits vary dramatically across DNN models, we also develop a simple analytical heuristic solely based on DNN model parameters that predicts the suitability of a given DNN for emerging-memory-based accelerators.

Keywords: Machine-Learning, Convolutional Neural Networks, Non-Volatile Memories, PCM, RRAM, MRAM.



Full Text (PDF)