doi: 10.7873/DATE.2015.0551

System Level Exploration of a STT-MRAM based Level 1 Data-Cache

Manu Perumkunnil Komalan1,2,3, Christian Tenllado1, José Ignacio Gómez Pérez1, Francisco Tirado Fern├índez1 and Francky Catthoor2,3

1Department of Computer Architecture and Automation, Universidad Complutense de Madrid, Spain

2KU Leuven, Leuven 3000, Belgium

3IMEC, Leuven 3001, Belgium


Since Non-Volatile Memory (NVM) technologies are being explored extensively nowadays as viable replacements for SRAM based memories in LLCs and even L2 caches, we try to take stock of their potential as level 1 (L1) data caches. These NVMs like Spin Torque Transfer RAM(STT-MRAM), Resistive- RAM(ReRAM) and Phase Change RAM (PRAM) are not subject to leakage problems with technology scaling. They also show significant area gains and lower dynamic power consumption. A direct drop-in replacement of SRAM by NVMs is, however, still not feasible due to a number of shortcomings with latency (write or read) and/or endurance/reliability among them being the major issues. STT-MRAM is increasingly becoming the NVM of choice for high performance and general purpose embedded platforms due to characteristics like low access latency, low power and long lifetime. With advancements in cell technology, and taking into account the stringent reliability and performance requirements for advanced technology nodes, the major bottleneck to the use of STT-MRAM in high level caches has become read latency (instead of write latency as previously believed). The main focus of this paper is the exploration of read penalty issues in a NVM based L1 data cache (D-cache) for an ARM like single core general purpose system. We propose a design method for the STT-MRAM based D-cache in such a platform. This design addresses the adverse effects due to the STT-MRAM read penalty issues by means of micro-architectural modifications along with code transformations. According to our simulations, the appropriate tuning of selective architecture parameters in our proposal and suitable optimizations can reduce the performance penalty introduced by the NVM (initially ~ 54%) to extremely tolerable levels ( ~ 8%).

Full Text (PDF)