3.3 Methods and Characterisation techniques for Reliability

Printer-friendly version PDF version

Date: Tuesday 26 March 2019
Time: 14:30 - 16:00
Location / Room: Room 3

Chair:
Said Hamdioui, TU Delft, NL

Co-Chair:
Arnaud Virazel, LIRMM, FR

This sections discusses the characterisation of BIT and ESD as well as a methodology to analyse the aging of SRAMs

TimeLabelPresentation Title
Authors
14:303.3.1(Best Paper Award Candidate)
NEW METHOD FOR THE AUTOMATED MASSIVE CHARACTERIZATION OF BIAS TEMPERATURE INSTABILITY IN CMOS TRANSISTORS
Speaker:
Pablo Sarazá Canflanca, Universidad de Sevilla, ES
Authors:
Pablo Saraza-Canflanca1, Javier Diaz-Fortuny2, Rafael Castro-Lopez1, Elisenda Roca1, Javier Martin-Martinez2, Rosana Rodriguez2, Montserrat Nafria2 and Francisco Vidal Fernandez1
1Instituto de Microelectrónica de Sevilla, ES; 2Universitat Autonoma de Barcelona UAB, ES
Abstract
Bias Temperature Instability has become a critical issue for circuit reliability. This phenomenon has been found to have a stochastic and discrete nature in nanometer-scale CMOS technologies. To account for this random nature, massive experimental characterization is necessary so that the extracted model parameters are accurate enough. However, there is a lack of automated analysis tools for the extraction of the BTI parameters from the extensive amount of generated data in those massive characterization tests. In this paper, a novel algorithm that allows the precise and fully automated parameter extraction from experimental BTI recovery current traces is presented. This algorithm is based on the Maximum Likelihood Estimation principles, and is able to extract, in a robust and exact manner, the threshold voltage shifts and emission times associated to oxide trap emissions during BTI recovery, required to properly model the phenomenon.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.3.2GUILTY AS CHARGED: COMPUTATIONAL RELIABILITY THREATS POSED BY ELECTROSTATIC DISCHARGE-INDUCED SOFT ERRORS
Speaker:
Keven Feng, University of Illinois at Urbana Champaign, US
Authors:
Keven Feng, Sandeep Vora, Rui Jiang, Elyse Rosenbaum and Shobha Vasudevan, ECE at Univ. of Illinois at Urbana-Champaign, US
Abstract
Electrostatic discharge (ESD) has been shown to cause severe reliability hazards at the physical level, resulting in permanent and transient failures. We present the first analysis of the effects of ESD induced errors on instruction level computation. Our data was measured on microcontroller test chip fabricated for this study, with discharges from a controlled ESD gun. Cosmic ray induced soft errors have been widely researched, and modeled as single event upsets (SEUs). Our observations across multiple trials on 3 test chips show that in contrast to radiation induced errors, ESD can cause much more widespread errors than SEUs. In our trials, we observe system hangs and clock glitches which are serious errors. We also observe errors in the following categories. Category A: multiple bit corruptions across multiple registers, Category B: multiple bit corruptions in the same register, and Category C: single bit corruptions across multiple registers. At the instruction level, these errors manifest as system hangs, serious malfunctioning of I/O operations, interrupt operations, data and program memory. We demonstrate that ESD induced errors form a significant reliability threat to higher level functionality, warranting modeling and mitigation techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.3.3METHODOLOGY FOR APPLICATION-DEPENDENT DEGRADATION ANALYSIS OF MEMORY TIMING
Speaker:
Daniel Kraak, Delft University of Technology, NL
Authors:
Daniel Kraak1, Innocent Agbo1, Mottaqiallah Taouil1, Said Hamdioui1, Pieter Weckx2, Stefan Cosemans2 and Francky Catthoor2
1Delft University of Technology, NL; 2imec vzw., BE
Abstract
Memory designs typically contain design margins to compensate for aging. As aging impact becomes more severe with technology scaling, it is crucial to accurately predict such impact to prevent overestimation or underestimation of the margins. This paper proposes a methodology to accurately and efficiently analyze the impact of aging on the memory's digital logic (e.g., timing circuit and address decoder) while considering realistic workloads extracted from applications. To demonstrate the superiority of the methodology, we analyzed the degradation of the L1 data and instruction caches for an ARM v8-a processor using both our methodology as well as the state-of-the-art methods. The results show that the existing methods may significantly over- or underestimate the impact (e.g., the decoder margin up to 221% and the access time up to 20%) as compared with the proposed scheme. In addition, the results show that in general the instruction cache has the highest degradation. For example, its access time degrades up to 9% and its decoder margin up to 44%.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-14, 303CHIP HEALTH TRACKING USING DYNAMIC IN-SITU DELAY MONITORING
Speaker:
Hadi Ahmadi Balef, Eindhoven University of Technology, NL
Authors:
Hadi Ahmadi Balef1, Kees Goossens2 and José Pineda de Gyvez1
1Eindhoven University of Technology, NL; 2Eindhoven university of technology, NL
Abstract
Tracking the gradual effect of silicon aging on circuit delays requires fine-grain slack monitoring. The conventional slack monitoring techniques intend to measure the worst-case static slack, i.e. the slack of longest timing path. In sharp contrast to the conventional techniques, we propose a novel technique that is based on dynamic excitation of in-situ delay monitors (i.e. the dynamic excitation of timing paths that are monitored). As delays degrade, path delays increase and the monitors are excited more frequently. With the proposed technique, a fine-grained signature of delay degradation is extracted from the excitation rate of monitors. The in-situ monitors are inserted at intermediate points along timing paths to increase the sensitivity of signature to delay degradation. A new efficient monitor insertion algorithm is also proposed that reduces the number of monitors by ~2.1X compared to other works for an ARM Cortex M0 processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP1-15, 541PCFI: PROGRAM COUNTER GUIDED FAULT INJECTION FOR ACCELERATING GPU RELIABILITY ASSESSMENT
Speaker:
Paolo Rech, UFRGS, BR
Authors:
Fritz Previlon, Charu Kalra, Devesh Tiwari and David Kaeli, Northeastern University, US
Abstract
Reliability has become a first-class design objective for GPU devices due to increasing soft-error rate. To assess the reliability of GPU programs, researchers rely on software fault-injection methods. Unfortunately, software fault-injection process is prohibitively expensive, requiring multiple days to complete a statistically sound fault-injection campaign. Therefore, to address this challenge, this paper proposes a novel fault-injection method, PCFI, that reduces the number of fault injections by exploiting the predictability in fault-injection outcome based on the program counter of the soft-error affected instruction. Evaluation on a variety of GPU programs covering a wide range of application domains shows that PCFI reduces the time to complete fault-injection campaigns by 22% on average without sacrificing the accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:02IP1-16, 696CHARACTERIZING THE RELIABILITY AND THRESHOLD VOLTAGE SHIFTING OF 3D CHARGE TRAP NAND FLASH
Speaker:
Weihua Liu, Huazhong University of Science and Technology, CN
Authors:
Weihua Liu1, Fei Wu1, Meng Zhang1, Yifei Wang1, Zhonghai Lu2, Xiangfeng Lu3 and Changsheng Xie1
1Huazhong University of Science and Technology, CN; 2KTH Royal Institute of Technology, SE; 3Beijing Memblaze Technology Co., Ltd., CN
Abstract
3D charge trap (CT) triple-level cell (TLC) NAND flash gradually becomes a mainstream storage component due to high storage capacity and performance, but introducing a concern about reliability. Fault tolerance and data management schemes are capable of improving reliability. Designing a more efficient solution, however, needs to understand the reliability characteristics of 3D CT TLC NAND flash. To facilitate such understanding, by exploiting a real-world testing platform, we investigate the reliability characteristics including the raw bit error rate (RBER) and the threshold voltage (Vth) shifting features after suffering from variable disturbances. We give an analysis of why these characteristics exist in 3D CT TLC NAND flash. We hope these observations can guide the designers to propose high efficient solutions to the reliability problem.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:03IP1-17, 882HIDDEN DELAY FAULT SENSOR FOR TEST, RELIABILITY AND SECURITY
Speaker:
Giorgio Di Natale, CNRS - TIMA, FR
Authors:
Giorgio Di Natale1, Elena Ioana Vatajelu2, Kalpana SENTHAMARAI KANNAN2 and Lorena Anghel3
1LIRMM, FR; 2TIMA, FR; 3Grenoble-Alpes University, FR
Abstract
In this paper we present a novel hidden-delay-fault sensor design and a preliminary analysis of its circuit integration and applicability. In our proposed method, the delay sensing is achieved by sampling data on both rising and falling clock edges and using a variable duty cycle to control the range of the sensed delay fault. The main advantage of our proposed method is that it works at nominal frequency, can cover a wide range of delay faults and it is versatile in its applicability. It can be used (i) during testing to perform user-defined hidden-delay-fault test, (ii) for reliability degradation estimation due to process, environmental variations and ageing, and (iii) in security to detect the insertion of Trojan horses that alter the path delay.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:03IP1-18, 219EFFECT OF DEVICE VARIATION ON MAPPING BINARY NEURAL NETWORK TO MEMRISTOR CROSSBAR ARRAY
Speaker:
Wooseok Yi, POSTECH, KR
Authors:
Wooseok Yi, Yulhwa Kim and Jae-Joon Kim, Pohang University of Science and Technology, KR
Abstract
In memristor crossbar array (MCA)-based neural network hardware, it is generally assumed that entire wordlines (WLs) are simultaneously enabled for parallel matrix-vector multiplication (MxV) operation. However, the error probability of MxV in a memristor crossbar array (MCA) increases as the resistance ratio (R-ratio) of a memristor decreases and the resistance variation and the number of simultaneously activated WLs increase. In this paper, we analyze the effect of R-ratio and variation of memristor devices on read sense margin and inference accuracy of MCA-based Binary Neural Network (BNN) hardware. We first show that only a limited number of WLs should be enabled to ensure correct MxV output when the R-ratio is small. On the other hand, we also show that, if the resistance variation becomes higher than a certain level, simultaneous activation of large number of WLs produces the higher accuracy even when R-ratio is small. Based on the analysis, we propose the Accuracy Estimation (AE) factor to find the optimal number of word lines that are simultaneously activated.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Wednesday, March 27, 2019

Thursday, March 28, 2019