DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search
Ioannis Tsiokanos1,a, Lev Mukhanov1,b, Giorgis Georgakoudis2, Dimitrios S.Nikolopoulos3 and Georgios Karakonstantis1,c
1Institute of Electronics, Communications and Information Technology, Queen’s University Belfast, UK
aitsiokanos01@qub.ac.uk
bl.mukhanov@qub.ac.uk
cg.karakonstantis@qub.ac.uk
2Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, USA
giorgis@llnl.gov
3Department of Computer Science, Virginia Tech, USA
dsn@vt.edu
ABSTRACT
The increased variability and adopted low supply voltages render nanometer devices prone to timing failures, which threaten the functionality of digital circuits. Recent schemes focused on developing instruction-aware failure prediction models and adapting voltage/frequency to avoid errors while saving energy. However, such schemes may be inaccurate when applied to pipelined cores since they consider only the currently executed instruction and the preceding one, thereby neglecting the impact of all the concurrently executing instructions on failure occurrence. In this paper, we first demonstrate that the order and type of instructions in sequences with a length equal to the pipeline depth affect significantly the failure rate. To overcome the practically impossible evaluation of the impact of all possible sequences on failures, we present DEFCON, a fully automated framework that stochastically searches for the most failure-prone instruction sequences (ISQs). DEFCON generates such sequences by integrating a properly formulated genetic algorithm with accurate post-layout dynamic timing analysis, considering the data-dependent path sensitization and instruction execution history. The generated micro-architecture aware ISQs are then used by DEFCON to estimate the failure vulnerability of any application. To evaluate the efficacy of the proposed framework, we implement a pipelined floating-point unit and perform dynamic timing analysis based on input data that we extract from a variety of applications consisting of up-to 43.5M ISQs. Our results show that DEFCON reveals quickly ISQs that maximize the output quality loss and correctly detects 99.7% of the actual faulty ISQs in different applications under various levels of variation-induced delay increase. Finally, DEFCON enable us to identify failure-prone ISQs early at the design cycle, and save 26.8% of energy on average when combined with a clock stretching mechanism.