Retraining-Based Timing Error Mitigation for Hardware Neural Networks
Jiachao Deng1,2,a, Yuntan Fang1,3,5,b, Zidong Du1,2,c, Ying Wang1,d, Huawei Li1,e, Olivier Temam4,i, Paolo Ienne5,j, David Novo5,k, Xiaowei Li1,f, Yunji Chen1,6,g and Chengyong Wu1,h
1SKL Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
2University of Chinese Academy of Sciences, Beijing, China
3Shannon Laboratory, Huawei Technologies Co., Ltd., China
4Google Inc., CA, USA.
5École Polytechnique Fédérale de Lausanne (EPFL), Switzerland.
6Center for Excellence in Brain Science, Chinese Academy of Sciences, Beijing, China
Recently, neural network (NN) accelerators are gaining popularity as part of future heterogeneous multi-core architectures due to their broad application scope and excellent energy efficiency. Additionally, since neural networks can be retrained, they are inherently resillient to errors and noises. Prior work has utilized the error tolerance feature to design approximate neural network circuits or tolerate logical faults. However, besides high-level faults or noises, timing errors induced by delay faults, process variations, aging, etc. are dominating the reliability of NN accelerator under nanoscale manufacturing process. In this paper, we leverage the error resiliency of neural network to mitigate timing errors in NN accelerators. Specifically, when timing errors significantly affect the output results, we propose to retrain the accelerators to update their weights, thus circumventing critical timing errors. Experimental results show that timing errors in NN accelerators can be well tamed for different applications.
Keywords: Neural networks, Error tolerance, Machine learning, Timing errors, Overclocking.
Full Text (PDF)