CHITIN: A Comprehensive In-thread Instruction Replication Technique Against Transient Faults

Hwisoo So1,a, Moslem Didehbany2, Jinhyo Jung1,b, Aviral Shrivastava3, Kyoungwoo Lee1,c
1Dependable Computing Lab, Yonsei University, Seoul, South Korea
aShs7719@yonsei.ac.kr
bJinhyo.Jung@yonsei.ac.kr
cKyoungwoo.Lee@yonsei.ac.kr
2Cadence Design Systems, San Jose, California
Moslem@cadence.com
3Compiler Microarchitecture Lab, Arizona State University, Tempe, AZ
Aviral.Shrivastava@asu.edu

ABSTRACT


Soft errors have become one of the most important design concerns due to drastic technology scaling. Softwarebased error detection techniques are attractive, due to their flexibility and hardware independence. However, our in-depth analysis reveals that the state-of-the-art techniques in the area cannot provide comprehensive fault coverage: i) their control-flow protection schemes provide incomplete redundancy of original instructions, ii) they do not protect function calls and returns, and iii) their instruction scheduling leaves many vulnerabilities open. In this paper, we propose CHITIN – code transformations for soft error resilience that adopts the load-back checking scheme of nZDC, an improved version of SWIFT-like control-flow protection scheme, and a contiguous scheduling of the original and redundant instructions to dramatically improve the vulnerability from soft errors that disrupt the control-flow. Our fault injection experiments demonstrate that CHITIN can reduce more than 89% of the silent data corruptions in the state-of-the-art solutions.



Full Text (PDF)