Enhancing Reliability of STT-MRAM Caches by Eliminating Read Disturbance Accumulation

Elham Cheshmikhani1,a, Hamed Farbeh2 and Hossein Asadi1,b
1Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
aelham.cheshmikhani@sharif.edu
basadi@sharif.edu
2Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran
farbeh@aut.ac.ir

ABSTRACT


Spin-Transfer Torque Magnetic RAM (STT-MRAM) as one of the most promising replacements for SRAMs in on-chip cache memories benefits from higher density and scalability, near-zero leakage power, and non-volatility, but its reliability is threatened by high read disturbance error rate. Error-Correcting Codes (ECCs) are conventionally suggested to overcome the read disturbance errors in STT-MRAM caches. By employing aggressive ECCs and checking out a cache block on every read access, a high level of cache reliability is achieved. However, to minimize the cache access time in modern processors, all blocks in the target cache set are simultaneously read in parallel for tags comparison operation and only the requested block is sent out, if any, after checking its ECC. These extra cache block reads without checking their ECCs until requesting the blocks by the processor cause the accumulation of read disturbance error, which significantly degrades the cache reliability. In this paper, we first introduce and formulate the read disturbance accumulation phenomenon and reveal that this accumulation due to conventional parallel accesses of cache blocks significantly increases the cache error rate. Then, we propose a simple yet effective scheme, so-called Read Error Accumulation Preventer cache (REAP-cache) to completely eliminate the accumulation of read disturbances without compromising the cache performance. Our evaluations show that the proposed REAP-cache extends the cache Mean Time To Failure (MTTF) by 171x, while increases the cache area by less than 1% and energy consumption by only 2.7%.

Keywords: Cache, STT-MRAM, Read Disturbance, Error-Correcting Code (ECC), Error Rate.



Full Text (PDF)