Fault-Tolerant Deep Neural Networks for Processing-In-Memory based Autonomous Edge Systems

Siyue Wang1,a, Geng Yuan1,b, Xiaolong Ma1,c, Yanyu Li1,d, Xue Lin1,e and Bhavya Kailkhura2
1Dept. of Electrical and Computer Engineering, Northeastern University. Boston, MA, USA
awang.siy@northesatern.edu
byuan.geng@northesatern.edu
cma.xiao1@northesatern.edu
dli.yanyu@northesatern.edu
exue.lin@northesatern.edu
2Lawrence Livermore National Laboratory. Livermore, CA, USA
kailkhura1@llnl.gov

ABSTRACT


In-memory deep neural network (DNN) accelerators will be the key for energy-efficient autonomous edge systems. The resistive random access memory (ReRAM) is a potential solution for the non-CMOS-based in-memory computing platform for energy-efficient autonomous edge systems, thanks to its promising characteristics, such as near-zero leakage-power and nonvolatility. However, due to the hardware instability of ReRAM, the weights of the DNN model may deviate from the originally trained weights, resulting in accuracy loss. To mitigate this undesirable accuracy loss, we propose two stochastic fault-tolerant training methods to generally improve the models’ robustness without dealing with individual devices. Moreover, we propose Stability Score—a comprehensive metric that serves as an indicator to the instability problem. Extensive experiments demonstrate that the DNN models trained using our proposed stochastic fault-tolerant training method achieve superior performance, which provides better flexibility, scalability, and deployability of ReRAM on the autonomous edge systems.

Keywords: Fault-Tolerant DNNs, DNN Accelerators, Autonomous Edge Systems.



Full Text (PDF)