Value-aware Parity Insertion ECC for Fault-tolerant Deep Neural Network

Seo-Seok Lee1 and Joon-Sung Yang2
1Sungkyunkwan University, Suwon, Korea System LSI Division, Samsung Electronics, Korea
2School of Electrical and Electronic Engineering, and Department of System Semiconductor Engineering Yonsei University, Seoul, Korea
js.yang@yonsei.ac.kr

ABSTRACT


Deep neural networks (DNNs) are deployed on hardware devices and are widely used in various fields to perform inference from inputs. Unfortunately, hardware devices can become unreliable by incidents such as unintended process, voltage and temperature variations, and this can introduce the occurrence of erroneous weights. Prior study reports that the erroneous weights can cause a significant accuracy degradation. In safety-critical applications such as autonomous driving, it can bring catastrophic results. Retraining or fine-tuning can be used to adjust corrupted weights to prevent the accuracy degradation. However, training-based approaches would incur a significant computational overhead due to a massive size of training datasets and intensive training operations. Thus, this paper proposes a value-aware parity insertion error correction code (ECC) to recover erroneous weights with a reduced parity storage overhead and no additional training processes. Previous ECCbased reliability improvement methods, Weight Nulling and Inplace Zero-space ECC, are compared with the proposed method. Experimental results demonstrate that DNNs with the valueaware parity insertion ECC can perform inference without the accuracy degradation, on average, in 122:5× and 15:1× higher bit error rate conditions over Weight Nulling and In-place Zerospace ECC, respectively.

Keywords: Value-aware Parity Insertion ECC, Deep Neural Network, Fault-tolerance, Error Correction Code.



Full Text (PDF)