2.6 Improving reliability and fault tolerance of advanced memories

Printer-friendly version PDF version

Date: Tuesday 10 March 2020
Time: 11:30 - 13:00
Location / Room: Lesdiguières

Chair:
Mounir Benabdenbi, TIMA Laboratory, FR

Co-Chair:
Said Hamdioui, TU Delft, NL

This session discusses reliability issues for different memory technologies; addressing fault tolerance of memristors, how to reduce simulations with importance sampling and advance metrics as measure for the reliability of NAND flash memories.

TimeLabelPresentation Title
Authors
11:302.6.1ON IMPROVING FAULT TOLERANCE OF MEMRISTOR CROSSBAR BASED NEURAL NETWORK DESIGNS BY TARGET SPARSIFYING
Speaker:
Yu Wang, North China Electric Power University, CN
Authors:
Song Jin1, Songwei Pei2 and Yu Wang1
1North China Electric Power University, CN; 2School of Computer Science, Beijing University of Posts and Telecommunications, CN
Abstract
Memristor based crossbar (MBC) can execute neural network computations in an extremely energy efficient manner. However, stuck-at faults make memristors cannot represent network weight correctly, thus degrading classification accuracy of the network deployed on the MBC significantly. By carefully analyzing all the possible fault combinations in a pair of differential crossbars, we found that most of the stuck-at faults can be accommodated perfectly by mapping a zero value weight onto the memristors. Based on such observation, in this paper we propose a target sparsifying based fault tolerant scheme for the MBC which executes neural network applications. We first exploit a heuristic algorithm to map weight matrix onto the MBC, aiming at minimizing weight variations in the presence of stuck-at faults. After that, some weights mapped onto the faulty memristors which still have large variations will be purposefully forced to zero value. Network retraining is then performed to recover classification accuracy. For a 4-layer CNN designed for MNIST digit recognition, experimental results demonstrate that our scheme can achieve almost no accuracy loss when 10% of memristors in the MBC are faulty. As the faulty memristors increasing to 20%, accuracy loss is only within 3%.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.6.2AN EFFICIENT YIELD ANALYSIS OF SRAM USING SCALED-SIGMA ADAPTIVE IMPORTANCE SAMPLING
Speaker:
Liang Pang, Southeast University, CN
Authors:
Liang Pang1, Mengyun Yao2 and Yifan Chai1
1School of Electronic Science & Engineering, Southeast University, CN; 2School of Microelectronics, Southeast University, CN
Abstract
Statistical SRAM yield analysis has become a growing concern for its high integrated density and reliability. It is a challenge to estimate the SRAM failure probability efficiently because the circuit failure is a "rare-event". Existing methods are still not enough to solve the problem especially in high dimension under advanced process. In this paper, we develop a scaled-sigma adaptive importance sampling (SSAIS) which is an extension of the adaptive importance sampling. This method changes not only the location parameters but the shape parameters by iteratively searching the failure region. Our 40nm SRAM cell experiments validated that our method has outperform Monte Carlo method by 1500x which is 2.3x~5.2x faster than the state-of-art methods with remaining the enough accuracy. The another experiment on sense amplifier shows our method achieves 3968x speedup over the Monte Carlo method and 2.1x~11x speedup over the other methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.6.3FAST AND ACCURATE HIGH-SIGMA FAILURE RATE ESTIMATION THROUGH EXTENDED BAYESIAN OPTIMIZED IMPORTANCE SAMPLING
Speaker:
Michael Hefenbrock, Karlsruhe Institute of Technology, DE
Authors:
Michael Hefenbrock, Dennis Weller, Michael Beigl and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
Due to the aggressive technology downscaling, process variations are becoming pre-dominent, causing performance fluctuations and impacting the chip yield. Therefore, individual circuit components have to be designed with very small failure rates to guarantee functional correctness and robust operation. The assessment of high-sigma failure rates however cannot be achieved with conventional Monte Carlo (MC) methods due to the huge amount of required time-consuming circuit simulations. To this end, Importance Sampling (IS) methods were proposed to solve the otherwise intractable failure rate estimation problem by focusing on high-probable failure regions. However, the failure rate could largely be underestimated while the computational effort for deriving them is high. In this paper, we propose an eXtended Bayesian Optimized IS (XBOIS) method, which addresses the aforementioned shortcomings by deployment of an accurate surrogate model (e.g. delay) of the circuit around the failure region. The number of costly circuit simulations is therefore minimized and estimation accuracy is substantially improved by efficient exploration of the variation space. As especially memory elements occupy a large amount of on-chip resources, we evaluate our approach on SRAM cell failure rate estimation. Results show a speedup of about 16x as well as a two orders of magnitude higher failure rate estimation accuracy compared to the best state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.6.4VALID WINDOW: A NEW METRIC TO MEASURE THE RELIABILITY OF NAND FLASH MEMORY
Speaker:
Min Ye, City University of Hong Kong, HK
Authors:
Min Ye1, Qiao Li1, Jianqiang Nie2, Tei-Wei Kuo1 and Chun Jason Xue1
1City University of Hong Kong, HK; 2YEESTOR Microelectronics Co., Ltd, CN
Abstract
NAND flash memory has been widely adopted in storage systems today. The most important issue in flash memory is its reliability, especially for 3D NAND, which suffers from several types of errors. The raw bit error rate (RBER) when applying default read reference voltages is usually adopted as the reliability metric for NAND flash memory. However, RBER is closely related to the way how data is read, and varies greatly if read retry operations are conducted with tuned read reference voltages. In this work, a new metric, valid window is proposed to measure the reliability, which is stable and accurate. A valid window expresses the size of error regions between two neighboring levels and determines if the data can be correctly read with further read retry. Taking advantage of these features, we design a method to reduce the number of read retry operations. This is achieved by adjusting program operations of 3D NAND flash memories. Experiments on a real 3D NAND flash chip verify the effectiveness of the proposed method.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-8, 110BINARY LINEAR ECCS OPTIMIZED FOR BIT INVERSION IN MEMORIES WITH ASYMMETRIC ERROR PROBABILITIES
Speaker:
Valentin Gherman, CEA, FR
Authors:
Valentin Gherman, Samuel Evain and Bastien Giraud, CEA, FR
Abstract
Many memory types are asymmetric with respect to the error vulnerability of stored 0's and 1's. For instance, DRAM, STT-MRAM and NAND flash memories may suffer from asymmetric error rates. A recently proposed error-protection scheme consists in the inversion of the memory words with too many vulnerable values before they are stored in an asymmetric memory. In this paper, a method is pro-posed for the optimization of systematic binary linear block error-correcting codes in order to maximize their impact when combined with memory word inversion.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-9, 634BELDPC: BIT ERRORS AWARE ADAPTIVE RATE LDPC CODES FOR 3D TLC NAND FLASH MEMORY
Speaker:
Meng Zhang, Huazhong University of Science & Technology, CN
Authors:
Meng Zhang, Fei Wu, Qin Yu, Weihua Liu, Lanlan Cui, Yahui Zhao and Changsheng Xie, Huazhong University of Science & Technology, CN
Abstract
Three-dimensional (3D) NAND flash memory has high capacity and cell storage density by using the multi-bit technology and vertical stack architecture, but degrading data reliability due to high raw bit error rates (RBER) caused by program/erase (P/E) cycles and retention periods. Low-density parity-check (LDPC) codes become more popular error-correcting technologies to improve data reliability due to strong error correction capability, but introducing more decoding iterations at higher RBER. To reduce decoding iterations, this paper proposes BeLDPC: bit errors aware adaptive rate LDPC codes for 3D triple-level cell (TLC) NAND flash memory. Firstly, bit error characteristics in 3D charge trap TLC NAND flash memory are studied on a real FPGA testing platform, including asymmetric bit flipping and temporal locality of bit errors. Then, based on these characteristics, a high-efficiency LDPC code is designed. Experimental results show BeLDPC can reduce decoding iterations under different P/E cycles and retention periods.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session