10.3 From Non-Volatile Flip-Flops to Storage Systems

Printer-friendly version PDF version

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 1

Chair:
Alexandre Levisse, EPFL, CH

Co-Chair:
Weisheng Zhao, Beihang University, CN

This session combines research from circuit to system level on non-volatile memories. The first paper proposes an STT-MRAM-based multi-bit non-volatile flip-flop. The other papers addresse system-level challenges, such as write disturbance mitigation, wear levelling scheme and latency reduction, for various technologies (PCM, SSD-flash).

TimeLabelPresentation Title
Authors
11:0010.3.1(Best Paper Award Candidate)
MULTI-BIT NON-VOLATILE SPINTRONIC FLIP-FLOP
Speaker:
Rajendra Bishnoi, Karlsruhe Institute of Technology, DE
Authors:
Christopher Münch, Rajendra Bishnoi and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
As leakage increases proportionally with the technology downscaling, it becomes extremely challenging to manage to meet the total power budget. This is because, CMOS-based logic blocks can not be completely power-gated as their flip-flops always require a retention supply. Alternatively, their data can be stored in a separate memory during the standby mode, however, that result in a huge area and energy overhead. Spin Transfer Torque (STT) based non-volatile flip-flops can offer normally-off/instant-on computing features to reduce leakage by complete power shut-down without the need to transfer and restore system states before and after the power-down phases. The non-volatile component of such flip-flops can be easily shared for the overall design optimizations. In this paper, we design a unique multi-bit non-volatile flip-flop architecture using STT devices to reduce the area and energy costs associated with non-volatile components. This architecture is developed based on the resource sharing principle using a custom designing technique that enables the optimization for the area and energy consumption. We have developed a framework in which we have replaced the conventional neighbor flip-flops in the layout with our proposed multi-bit non-volatile designs. Results show that, at system-level, using our proposed multi-bit flip-flop architecture, we significantly improve the area and energy compared to the standard single bit non-volatile flip-flop designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.3.2ADAM: ARCHITECTURE FOR WRITE DISTURBANCE MITIGATION IN SCALED PHASE CHANGE MEMORY
Speaker:
Shivam Swami, University of Pittsburgh, US
Authors:
Shivam Swami and Kartik Mohanram, University of Pittsburgh, US
Abstract
With technology scaling, phase change memory (PCM) has become highly vulnerable to write disturbance (WD) errors. A PCM WD error occurs when a cell write dissipates heat to idle cells in the same/adjacent word lines (WLs), disturbing the states of those cells. Whereas state-of-the-art solutions, e.g., data insulation (DIN) and super dense PCM (SD-PCM), have successfully addressed WL PCM WD errors, reducing (i) bit line (BL) WD errors and (ii) the performance penalties of aggregate (WL+BL) WD error recovery remain areas of active research and development. Architecture for Write DisturbAnce Mitigation, ADAM, is a low cost, high performance pattern-based data compression and alignment solution to reduce the aggregate (WL+BL) WD error rate in PCM. At no impact to inter-cell spacing, ADAM increases the lateral separation between the cells storing useful data in adjacent WLs, ensuring that the heat dissipated to adjacent WLs minimally impacts the cells storing useful data. For one compression tag bit per 512-bit cache line, ADAM provides an effective solution to reduce the number of WL and BL cells vulnerable to WD errors. ADAM also integrates a novel Deferred WD Correction scheme, DEFT, that opportunistically defers latency-intensive WD error recovery of cached data in the adjacent WLs without impacting memory reliability. ADAM is evaluated on single-/multi-level cell (SLC/MLC) PCM using the SPEC CPU2006 benchmarks. Results for SLC (MLC) PCM show that in comparison to state-of-the-art SD-PCM, ADAM reduces the aggregate WD error rate by 32% (60%); this translates to a 50% (61%) reduction in error correction energy and a 7% (15%) improvement in system performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.3.3PROGRAM ERROR RATE-BASED WEAR LEVELING FOR NAND FLASH MEMORY
Speaker:
Fei Wu, Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology, CN
Authors:
Xin Shi1, Fei Wu1, Shunzhuo Wang1, Changsheng Xie1 and Zhonghai Lu2
1Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, CN; 2KTH Royal Institute of Technology, SE
Abstract
Wear leveling scheme has became a fundamental issue in the design of Solid State Disk (SSD) based on NAND Flash memory. Existing schemes aim to equalize the number of programming/erase (P/E) cycles and memory raw bit error rates (BER) among all the flash blocks. However, due to fabrication process variation, different blocks of the same flash chip usually have largely different endurance in terms of BER and program error rate (PER). Such conventional design cannot obtain the wear status of flash blocks precisely. This paper proposes PER-WL,an efficient PER-based wear leveling scheme that uses PER statistics as the measurement of flash block wear-out pace, and performs block data swapping to improve the wear leveling efficiency. In our evaluation with four realistic workloads, PER-based wear leveling scheme can achieve 17% and 9% variance of program error rate reduction, 8% and 3% program error rate reduction with 5% and 2% system performance degradation when compared to two state-of-the-art wear leveling schemes on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.3.4SHADOWGC: COOPERATIVE GARBAGE COLLECTION WITH MULTI-LEVEL BUFFER FOR PERFORMANCE IMPROVEMENT IN NAND FLASH-BASED SSDS
Speaker:
Jinhua Cui, Xi'an Jiaotong University, CN
Authors:
Jinhua Cui1, Youtao Zhang2, Jianhang Huang1, Weiguo Wu1 and Jun Yang2
1Xi'an Jiaotong University, CN; 2University of Pittsburgh, US
Abstract
Garbage collection, an essential background activity in NAND flash based Solid-State Drives, often introduces large runtime overhead. Recent studies showed that it is beneficial to separate the flash pages that have dirty copies in the write buffers from those that do not. However, the existing schemes exploring this observation have limitations, which prevent them from maximizing the performance improvement. In this paper, we address the above challenge through ShadowGC, a novel GC design that exploits the pages in both host-side and device-side write buffers and adopts different read and write strategies to minimize the GC overhead. When garbage collecting flash pages that have dirty copies in the device-side write buffer, ShadowGC reads data from the write buffer such that the page relocation operations in GC merge with the write-back operations of the buffer. When garbage collecting flash pages that have dirty copies in the host-side write buffer, ShadowGC moves them to dedicated blocks and speeds up the movement with fast-write operations. Our experimental results show that, on average, ShadowGC reduces the write amplification by 16.2% and the GC latency by 20.5% over the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP4-13, 755POWER OPTIMIZATION THROUGH PERIPHERAL CIRCUIT REUSING INTEGRATED WITH LOOP TILING FOR RRAM CROSSBAR-BASED CNN
Authors:
Yuanhui Ni, Weiwen Chen and Keni Qiu, Capital Normal University, CN
Abstract
Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, this paper adopts the Peripheral Circuit Unit (PeriCU)-Reuse scheme to meet a certain power budget. The underlying idea is to put the expensive AD/DAs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. Furthermore, it is observed that memory accesses can be bypassed if two adjacent layers are assigned in the different PeriCU. Then a loop tiling technique is proposed to further improve the energy and throughput of RCS. The experiments of two convolutional applications validate that the PeriCU-Reuse scheme integrated with the loop tiling techniques can efficiently meet power requirement, and further reduce energy consumption by 61.7%.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP4-14, 697ORIENT: ORGANIZED INTERLEAVED ECCS FOR NEW STT-RAM CACHES
Speaker:
Hamed Farbeh, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR
Authors:
Zahra Azad1, Hamed Farbeh2 and Amir Mahdi Hosseini Monazzah3
1Sharif University of Technology, IR; 2School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR; 3Department of Computer Engineering, Sharif University of Technology, Tehran, Iran, IR
Abstract
Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising alternative to SRAM in cache memories. However, STT-MRAMs face with high probability of write errors due to its stochastic switching behavior. To correct the write errors, Error-Correcting Codes (ECCs) used in SRAM caches are conventionally employed. A cache line consists of several codewords and the data bits are selected in such a way that the maximum correction capability is provided based on the error patterns in SRAMs. However, the different write error patterns in STT-MRAM caches leads to inefficiency of conventional ECC configurations. In this paper, first we investigate the efficiency of ECC configurations and demonstrate that the vulnerability of codewords in a cache line varies by up to 17x. This variation means that, while some words are overprotected, some others are highly probable to experience uncorrectable errors. Then, we propose an ECC bit selection scheme, so-called ORIENT, to reduce the vulnerability variation of codewords to 1.4x. The simulation results show that conventional ECC configuration increases the write error rate by up to about 64.4% compared with the optimum ECC bit selection, whereas this value for ORIENT is only 4.5%.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP5-16, 594EFFICIENT WEAR LEVELING FOR INODES OF FILE SYSTEMS ON PERSISTENT MEMORIES
Speaker:
Xianzhang Chen, Chongqing University, CN
Authors:
Xianzhang Chen1, Edwin Sha2, Yuansong Zeng1, Chaoshu Yang1, Weiwen Jiang1 and Qingfeng Zhuge3
1Chongqing University, CN; 2Chongqing University, US; 3East China Normal University, CN
Abstract
Existing persistent memory file systems achieve high-performance file accesses by exploiting advanced characteristics of persistent memories (PMs), such as PCM. However, they ignore the limited endurance of PMs. Particularly, the frequently updated inodes are stored on fixed locations throughout their lifetime, which can easily damage PM with common file operations. To address such issues, we propose a new mechanism, Virtualized Inode (VInode), for the wear leveling of inodes of persistent memory file systems. In VInode, we develop an algorithm called Pages as Communicating Vessels (PCV) to efficiently find and migrate the heavily written inodes. We implement VInode in SIMFS, a typical persistent memory file system. Experiments are conducted with well-known benchmarks. Compared with original SIMFS, experimental results show that VInode can reduce the maximum value and standard deviation of the write counts of pages to 1800x and 6200x lower, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00