DATE 2022

Fault-Tolerant Deep Neural Networks for Processing-In-Memory based Autonomous Edge Systems

Siyue Wang^1,a, Geng Yuan^1,b, Xiaolong Ma^1,c, Yanyu Li^1,d, Xue Lin^1,e and Bhavya Kailkhura²
¹Dept. of Electrical and Computer Engineering, Northeastern University. Boston, MA, USA
^awang.siy@northesatern.edu
^byuan.geng@northesatern.edu
^cma.xiao1@northesatern.edu
^dli.yanyu@northesatern.edu
^exue.lin@northesatern.edu
²Lawrence Livermore National Laboratory. Livermore, CA, USA
kailkhura1@llnl.gov

ABSTRACT

In-memory deep neural network (DNN) accelerators will be the key for energy-efficient autonomous edge systems. The resistive random access memory (ReRAM) is a potential solution for the non-CMOS-based in-memory computing platform for energy-efficient autonomous edge systems, thanks to its promising characteristics, such as near-zero leakage-power and nonvolatility. However, due to the hardware instability of ReRAM, the weights of the DNN model may deviate from the originally trained weights, resulting in accuracy loss. To mitigate this undesirable accuracy loss, we propose two stochastic fault-tolerant training methods to generally improve the models’ robustness without dealing with individual devices. Moreover, we propose Stability Score—a comprehensive metric that serves as an indicator to the instability problem. Extensive experiments demonstrate that the DNN models trained using our proposed stochastic fault-tolerant training method achieve superior performance, which provides better flexibility, scalability, and deployability of ReRAM on the autonomous edge systems.

Keywords: Fault-Tolerant DNNs, DNN Accelerators, Autonomous Edge Systems.

Full Text (PDF)