12.5 Lifetime Improvement for Persistent Memory

Time	Label	Presentation Title Authors
16:00	12.5.1	EXTENDING THE LIFETIME OF NVMS WITH COMPRESSION Speaker: Jie Xu, Huazhong University of Science and Technology, CN Authors: Jie Xu, Dan Feng, Yu Hua, Wei Tong, Jingning Liu and Chunyan Li, Wuhan National Lab for Optoelectronics, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan, China, CN Abstract Emerging Non-Volatile Memories (NVMs) such as Phase Change Memory (PCM) and Resistive RAM (RRAM) are promising to replace traditional DRAM technology. However, they suffer from limited write endurance and high write energy consumption. Encoding methods such as Flip-N-Write, FlipMin and CAFO can reduce the bit flips of NVMs by exploiting additional capacity to store the tag bits of encoding methods. The effects of encoding methods are limited by the capacity overhead of the tag bits. In this paper, we propose COE to COmpress cacheline for Extending the lifetime of NVMs. COE exploits the space saved by compression to store the tag bits of data encoding methods. Through combining data compression techniques with data encoding methods, COE can reduce the bit flips with negligible capacity overhead. We further observe that the saved space size of each compressed cacheline varies, and different encoding methods have different tradeoffs between capacity overhead and effects. To fully exploit the space saved by compression for improving lifetime, we select the proper encoding methods according to the saved space size. Experimental results show that our scheme can reduce the bit flips by 14.2%, decrease the energy consumption by 11.8% and improve the lifetime by 27.5% with only 0.2% capacity overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.5.2	HETEROGENEOUS PCM ARRAY ARCHITECTURE FOR RELIABILITY, PERFORMANCE AND LIFETIME ENHANCEMENT Speaker: Taehyun Kwon, Sungkyunkwan University, KR Authors: Taehyun Kwon¹, Muhammad Imran¹, Jung Min You² and Joon-Sung Yang¹ ¹Sungkyunkwan University, KR; ²SungKyunKwan University, KR Abstract Conventional DRAM and flash memory are reaching their scaling limits thus motivating research in various emerging memory technologies as a potential replacement. Among these, phase change memory (PCM) has received considerable attention owing to its high scalability and multi-level cell (MLC) operation for high storage density. However, due to the resistance drift over time, the soft error rate in MLC PCM is high. Additionally, the iterative programming in MLC negatively impacts performance and cell endurance. The conventional methods to overcome the drift problem incur large overheads, impact memory lifetime and are inadequate in terms of acceptable soft error rate (SER). In this paper, we propose a new PCM memory architecture with heterogeneous PCM arrays to increase reliability, performance and lifetime. The basic storage unit in the proposed architecture consists of two single-level cells (SLCs) and one four-level cell (4LC). Using the reduced number of 4LCs compared to conventional homogeneous 4LC PCM arrays, the drift-induced error rate is considerably reduced. By alternating each cell operation between SLC and 4LC over time, the overall lifetime can also be significantly enhanced. The proposed architecture achieves up to 10^5 times lower soft error rate with considerably less ECC overhead. With simple ECC scheme, about 22% performance improvement is achieved and additionally, the overall lifetime is also enhanced by about 57%. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.5.3	AN EFFICIENT PCM-BASED MAIN MEMORY SYSTEM VIA EXPLOITING FINE-GRAINED DIRTINESS OF CACHELINES Speaker: Jie Xu, Huazhong University of Science and Technology, CN Authors: Jie Xu¹, Dan Feng², Yu Hua², Wei Tong², Jingning Liu², Chunyan Li² and Zheng Li³ ¹Wuhan National Lab for Optoelectronics, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan, China, CN; ²Wuhan National Lab for Optoelectronics, CN; ³WNLO, CN Abstract Phase Change Memory (PCM) has the potential to replace traditional DRAM memory due to its better scalability and non-volatility. However, PCM also suffers from high write latency and energy consumption. To mitigate the write overhead of PCM-based main memory, we propose a Fine-grained Dirtiness Aware (FDA) last-level cache (LLC) victimization scheme. The key idea of FDA is to preferentially evict cachelines with fewer dirty words when victimizing dirty cachelines. The modified word is defined to be dirty. FDA exploits two key observations. First, the write service time of a cacheline is proportional to the number of dirty words. Second, a cacheline with fewer dirty words has the same or lower reference frequency compared with other dirty cachelines. Therefore, evicting cachelines with fewer dirty words can reduce the write service time of cachelines and will not increase the miss rate. To reduce the write service time of cachelines, FDA evicts the cacheline with the fewest dirty words when victimizing dirty cachelines. We also present FDARP to decrease the miss rate by further synergizing the number of dirty words with Re-reference Prediction Value. Experimental results show that FDA (FDARP) can improve the IPC performance by 8.3% (14.8%), decrease the write service time of cachelines by 37.0% (36.3%) and reduce write energy consumption of PCM by 27.0% (32.5%) under the mixed benchmarks. Download Paper (PDF; Only available from the DATE venue WiFi)
17:15	12.5.4	DFPC: A DYNAMIC FREQUENT PATTERN COMPRESSION SCHEME IN NVM-BASED MAIN MEMORY Speaker: Yuncheng Guo, Huazhong University of Science and Technology, CN Authors: Yuncheng Guo, Yu Hua and Pengfei Zuo, Huazhong University of Science and Technology, CN Abstract Non-volatile memory technologies (NVMs) are promising candidates as the next-generation main memory due to high scalability and low energy consumption. However, the performance bottlenecks, such as high write latency and low cell endurance, still exist in NVMs. To address these problems, frequent pattern compression schemes have been widely used, which however suffer from the lack of flexibility and adaptability. In order to overcome these shortcomings, we propose a well-adaptive NVM write scheme, called Dynamic Frequent Pattern Compression (DFPC), to significantly reduce the amount of write units and extend the lifetime. Instead of only using static frequent patterns in existing FPC schemes, which are pre-defined and not always efficient for all applications, the idea behind DFPC is to exploit the characteristics of data distribution in execution to obtain dynamic patterns, which often appear in the real-world applications. To further improve the compression ratio, we exploit the value locality in a cache line to extend the granularity of dynamic patterns. Hence DFPC can encode the contents of cache lines with more kinds of frequent data patterns. We implement DFPC in GEM5 with NVMain and execute 8 applications from SPEC CPU2006 to evaluate our scheme. Experimental results demonstrate the efficacy and efficiency of DFPC. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

Time

Label

Presentation Title
Authors

16:00

12.5.1

EXTENDING THE LIFETIME OF NVMS WITH COMPRESSION
Speaker:
Jie Xu, Huazhong University of Science and Technology, CN
Authors:
Jie Xu, Dan Feng, Yu Hua, Wei Tong, Jingning Liu and Chunyan Li, Wuhan National Lab for Optoelectronics, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan, China, CN
Abstract
Emerging Non-Volatile Memories (NVMs) such as Phase Change Memory (PCM) and Resistive RAM (RRAM) are promising to replace traditional DRAM technology. However, they suffer from limited write endurance and high write energy consumption. Encoding methods such as Flip-N-Write, FlipMin and CAFO can reduce the bit flips of NVMs by exploiting additional capacity to store the tag bits of encoding methods. The effects of encoding methods are limited by the capacity overhead of the tag bits. In this paper, we propose COE to COmpress cacheline for Extending the lifetime of NVMs. COE exploits the space saved by compression to store the tag bits of data encoding methods. Through combining data compression techniques with data encoding methods, COE can reduce the bit flips with negligible capacity overhead. We further observe that the saved space size of each compressed cacheline varies, and different encoding methods have different tradeoffs between capacity overhead and effects. To fully exploit the space saved by compression for improving lifetime, we select the proper encoding methods according to the saved space size. Experimental results show that our scheme can reduce the bit flips by 14.2%, decrease the energy consumption by 11.8% and improve the lifetime by 27.5% with only 0.2% capacity overhead.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:30

12.5.2

HETEROGENEOUS PCM ARRAY ARCHITECTURE FOR RELIABILITY, PERFORMANCE AND LIFETIME ENHANCEMENT
Speaker:
Taehyun Kwon, Sungkyunkwan University, KR
Authors:
Taehyun Kwon¹, Muhammad Imran¹, Jung Min You² and Joon-Sung Yang¹
¹Sungkyunkwan University, KR; ²SungKyunKwan University, KR
Abstract
Conventional DRAM and flash memory are reaching their scaling limits thus motivating research in various emerging memory technologies as a potential replacement. Among these, phase change memory (PCM) has received considerable attention owing to its high scalability and multi-level cell (MLC) operation for high storage density. However, due to the resistance drift over time, the soft error rate in MLC PCM is high. Additionally, the iterative programming in MLC negatively impacts performance and cell endurance. The conventional methods to overcome the drift problem incur large overheads, impact memory lifetime and are inadequate in terms of acceptable soft error rate (SER). In this paper, we propose a new PCM memory architecture with heterogeneous PCM arrays to increase reliability, performance and lifetime. The basic storage unit in the proposed architecture consists of two single-level cells (SLCs) and one four-level cell (4LC). Using the reduced number of 4LCs compared to conventional homogeneous 4LC PCM arrays, the drift-induced error rate is considerably reduced. By alternating each cell operation between SLC and 4LC over time, the overall lifetime can also be significantly enhanced. The proposed architecture achieves up to 10^5 times lower soft error rate with considerably less ECC overhead. With simple ECC scheme, about 22% performance improvement is achieved and additionally, the overall lifetime is also enhanced by about 57%.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:00

12.5.3

AN EFFICIENT PCM-BASED MAIN MEMORY SYSTEM VIA EXPLOITING FINE-GRAINED DIRTINESS OF CACHELINES
Speaker:
Jie Xu, Huazhong University of Science and Technology, CN
Authors:
Jie Xu¹, Dan Feng², Yu Hua², Wei Tong², Jingning Liu², Chunyan Li² and Zheng Li³
¹Wuhan National Lab for Optoelectronics, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan, China, CN; ²Wuhan National Lab for Optoelectronics, CN; ³WNLO, CN
Abstract
Phase Change Memory (PCM) has the potential to replace traditional DRAM memory due to its better scalability and non-volatility. However, PCM also suffers from high write latency and energy consumption. To mitigate the write overhead of PCM-based main memory, we propose a Fine-grained Dirtiness Aware (FDA) last-level cache (LLC) victimization scheme. The key idea of FDA is to preferentially evict cachelines with fewer dirty words when victimizing dirty cachelines. The modified word is defined to be dirty. FDA exploits two key observations. First, the write service time of a cacheline is proportional to the number of dirty words. Second, a cacheline with fewer dirty words has the same or lower reference frequency compared with other dirty cachelines. Therefore, evicting cachelines with fewer dirty words can reduce the write service time of cachelines and will not increase the miss rate. To reduce the write service time of cachelines, FDA evicts the cacheline with the fewest dirty words when victimizing dirty cachelines. We also present FDARP to decrease the miss rate by further synergizing the number of dirty words with Re-reference Prediction Value. Experimental results show that FDA (FDARP) can improve the IPC performance by 8.3% (14.8%), decrease the write service time of cachelines by 37.0% (36.3%) and reduce write energy consumption of PCM by 27.0% (32.5%) under the mixed benchmarks.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:15

12.5.4

DFPC: A DYNAMIC FREQUENT PATTERN COMPRESSION SCHEME IN NVM-BASED MAIN MEMORY
Speaker:
Yuncheng Guo, Huazhong University of Science and Technology, CN
Authors:
Yuncheng Guo, Yu Hua and Pengfei Zuo, Huazhong University of Science and Technology, CN
Abstract
Non-volatile memory technologies (NVMs) are promising candidates as the next-generation main memory due to high scalability and low energy consumption. However, the performance bottlenecks, such as high write latency and low cell endurance, still exist in NVMs. To address these problems, frequent pattern compression schemes have been widely used, which however suffer from the lack of flexibility and adaptability. In order to overcome these shortcomings, we propose a well-adaptive NVM write scheme, called Dynamic Frequent Pattern Compression (DFPC), to significantly reduce the amount of write units and extend the lifetime. Instead of only using static frequent patterns in existing FPC schemes, which are pre-defined and not always efficient for all applications, the idea behind DFPC is to exploit the characteristics of data distribution in execution to obtain dynamic patterns, which often appear in the real-world applications. To further improve the compression ratio, we exploit the value locality in a cache line to extend the granularity of dynamic patterns. Hence DFPC can encode the contents of cache lines with more kinds of frequent data patterns. We implement DFPC in GEM5 with NVMain and execute 8 applications from SPEC CPU2006 to evaluate our scheme. Experimental results demonstrate the efficacy and efficiency of DFPC.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:30

End of session