5.3 Heterogeneous multi-level caching

Time	Label	Presentation Title Authors
08:30	5.3.1	WALL: A WRITEBACK-AWARE LLC MANAGEMENT FOR PCM-BASED MAIN MEMORY SYSTEMS Speaker: Bahareh Pourshirazi, University of Illinois at Chicago, US Authors: Bahareh Pourshirazi¹, Majed Valad Beigi², Zhichun Zhu¹ and Gokhan Memik² ¹University of Illinois at Chicago, US; ²Northwestern University, US Abstract In this paper, we propose WALL, a novel writeback-aware LLC management scheme to reduce the number of LLC writebacks and consequently improve performance, energy efficiency, and lifetime of a PCM-based main memory system. First, we investigate the writeback behavior of LLC sets and show that writebacks are not uniformly distributed among sets; some sets observe much higher writeback rates than others. We then propose a writeback-aware set-balancing mechanism, which employs the underutilized LLC sets with few writebacks as an auxiliary storage for storing the evicted dirty lines of sets with frequent writebacks. We also propose a simple and effective writeback-aware replacement policy to avoid the eviction of the writeback blocks that are highly reused after being evicted from the cache. Our experimental results show that WALL achieves an average of 26.6% reduction in the total number of LLC writebacks, compared to the baseline scheme, which uses the LRU replacement policy. As a result, WALL can reduce the memory energy consumption by 19.2% and enhance PCM lifetime by 1.25×, on average, on an 8-core system with a 4GB PCM main memory, running memory intensive applications. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.3.2	DESIGN AND INTEGRATION OF HIERARCHICAL-PLACEMENT MULTI-LEVEL CACHES FOR REAL-TIME SYSTEMS Speaker: Pedro Benedicte, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES Authors: Pedro Benedicte¹, Carles Hernandez², Jaume Abella³ and Francisco Cazorla⁴ ¹Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; ²Barcelona Supercomputing Center, ES; ³Barcelona Supercomputing Center (BSC-CNS), ES; ⁴Barcelona Supercomputing Center and IIIA-CSIC, ES Abstract Enabling timing analysis in the presence of caches has been pursued by the real-time embedded systems (RTES) community for years due to cache's huge potential to reduce software's worst-case execution time (WCET). However, caches heavily complicate timing analysis due to hard-to-predict access patterns, with few works dealing with time analyzability of multi-level cache hierarchies. For measurement-based timing analysis (MBTA) techniques - widely used in domains such as avionics, automotive, and rail - we propose several cache hierarchies amenable to MBTA. We focus on a probabilistic variant of MBTA (or MBPTA) that requires caches with time-randomized behavior whose execution time variability can be captured in the measurements taken during system's test runs. For this type of caches, we explore and propose different multi-level cache setups. From those, we choose a cost-effective cache hierarchy that we implement and integrate in a 4-core LEON3 RTL processor model and prototype in a FPGA. Our results show that our proposed setup implemented in RTL results in better (reduced) WCET estimates with similar implementation cost and no impact on average performance w.r.t. other MBPTA-amenable setups. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.3.3	LARS: LOGICALLY ADAPTABLE RETENTION TIME STT-RAM CACHE FOR EMBEDDED SYSTEMS Speaker: Tosiron Adegbija, University of Arizona, US Authors: Kyle Kuan and Tosiron Adegbija, University of Arizona, US Abstract STT-RAMs have been studied as a promising alternative to SRAMs in embedded systems' caches and main memories. STT-RAMs are attractive due to their low leakage power and high density; STT-RAMs, however, also have drawbacks of long write latency and high dynamic write energy. A popular solution to this drawback relaxes the retention time to lower both write latency and energy, and uses a dynamic refresh scheme that refreshes data blocks to prevent them from prematurely expiring. However, the refreshes can incur overheads, thus limiting optimization potential. In addition, this solution only provides a single retention time, and cannot adapt to applications' variable retention time requirements. In this paper, we propose LARS (Logically Adaptable Retention Time STT-RAM) cache as a viable alternative for reducing the write energy and latency. LARS cache comprises of multiple STT-RAM units with different retention times, with only one unit on at a given time. LARS dynamically determines which STT-RAM unit to power on during runtime, based on executing applications' needs. Our experiments show that LARS cache is low-overhead, and can reduce the average energy and latency by 35.8% and 13.2%, respectively, as compared to the dynamic refresh scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-7, 789	FUSIONCACHE: USING LLC TAGS FOR DRAM CACHE Speaker: Evangelos Vasilakis, Chalmers University of Technology, SE Authors: Evangelos Vasilakis¹, Vassilis Papaefstathiou², Pedro Trancoso¹ and Ioannis Sourdis¹ ¹Chalmers University of Technology, SE; ²FORTH-ICS, GR Abstract DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stacked DRAM. Although they can capture the spatial and temporal data locality of applications, their access latency is still substantially higher than conventional on-chip SRAM caches. Moreover, their tag access latency and storage overheads are excessive. Storing tags for a large DRAM cache in SRAM is impractical as it would occupy a significant fraction of the processor chip. Storing them in the DRAM itself incurs high access overheads. Attempting to cache the DRAM tags on the processor adds a constant delay to the access time. In this paper, we introduce FusionCache, a DRAM cache that offers more efficient tag accesses by fusing DRAM cache tags with the tags of the on-chip Last Level Cache (LLC). We observe that, in an inclusive cache model where the DRAM cachelines are multiples of on-chip SRAM cachelines, LLC tags could be re-purposed to access a large part of the DRAM cache contents. Then, accessing DRAM cache tags incurs zero additional latency in the common case. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD). Lunch Breaks (Großer Saal + Saal 1) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 20, 2018 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 21, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 22, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00 Keynote Lecture in "Saal 2" 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

08:30

5.3.1

WALL: A WRITEBACK-AWARE LLC MANAGEMENT FOR PCM-BASED MAIN MEMORY SYSTEMS
Speaker:
Bahareh Pourshirazi, University of Illinois at Chicago, US
Authors:
Bahareh Pourshirazi¹, Majed Valad Beigi², Zhichun Zhu¹ and Gokhan Memik²
¹University of Illinois at Chicago, US; ²Northwestern University, US
Abstract
In this paper, we propose WALL, a novel writeback-aware LLC management scheme to reduce the number of LLC writebacks and consequently improve performance, energy efficiency, and lifetime of a PCM-based main memory system. First, we investigate the writeback behavior of LLC sets and show that writebacks are not uniformly distributed among sets; some sets observe much higher writeback rates than others. We then propose a writeback-aware set-balancing mechanism, which employs the underutilized LLC sets with few writebacks as an auxiliary storage for storing the evicted dirty lines of sets with frequent writebacks. We also propose a simple and effective writeback-aware replacement policy to avoid the eviction of the writeback blocks that are highly reused after being evicted from the cache. Our experimental results show that WALL achieves an average of 26.6% reduction in the total number of LLC writebacks, compared to the baseline scheme, which uses the LRU replacement policy. As a result, WALL can reduce the memory energy consumption by 19.2% and enhance PCM lifetime by 1.25×, on average, on an 8-core system with a 4GB PCM main memory, running memory intensive applications.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:00

5.3.2

DESIGN AND INTEGRATION OF HIERARCHICAL-PLACEMENT MULTI-LEVEL CACHES FOR REAL-TIME SYSTEMS
Speaker:
Pedro Benedicte, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Pedro Benedicte¹, Carles Hernandez², Jaume Abella³ and Francisco Cazorla⁴
¹Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; ²Barcelona Supercomputing Center, ES; ³Barcelona Supercomputing Center (BSC-CNS), ES; ⁴Barcelona Supercomputing Center and IIIA-CSIC, ES
Abstract
Enabling timing analysis in the presence of caches has been pursued by the real-time embedded systems (RTES) community for years due to cache's huge potential to reduce software's worst-case execution time (WCET). However, caches heavily complicate timing analysis due to hard-to-predict access patterns, with few works dealing with time analyzability of multi-level cache hierarchies. For measurement-based timing analysis (MBTA) techniques - widely used in domains such as avionics, automotive, and rail - we propose several cache hierarchies amenable to MBTA. We focus on a probabilistic variant of MBTA (or MBPTA) that requires caches with time-randomized behavior whose execution time variability can be captured in the measurements taken during system's test runs. For this type of caches, we explore and propose different multi-level cache setups. From those, we choose a cost-effective cache hierarchy that we implement and integrate in a 4-core LEON3 RTL processor model and prototype in a FPGA. Our results show that our proposed setup implemented in RTL results in better (reduced) WCET estimates with similar implementation cost and no impact on average performance w.r.t. other MBPTA-amenable setups.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:30

5.3.3

LARS: LOGICALLY ADAPTABLE RETENTION TIME STT-RAM CACHE FOR EMBEDDED SYSTEMS
Speaker:
Tosiron Adegbija, University of Arizona, US
Authors:
Kyle Kuan and Tosiron Adegbija, University of Arizona, US
Abstract
STT-RAMs have been studied as a promising alternative to SRAMs in embedded systems' caches and main memories. STT-RAMs are attractive due to their low leakage power and high density; STT-RAMs, however, also have drawbacks of long write latency and high dynamic write energy. A popular solution to this drawback relaxes the retention time to lower both write latency and energy, and uses a dynamic refresh scheme that refreshes data blocks to prevent them from prematurely expiring. However, the refreshes can incur overheads, thus limiting optimization potential. In addition, this solution only provides a single retention time, and cannot adapt to applications' variable retention time requirements. In this paper, we propose LARS (Logically Adaptable Retention Time STT-RAM) cache as a viable alternative for reducing the write energy and latency. LARS cache comprises of multiple STT-RAM units with different retention times, with only one unit on at a given time. LARS dynamically determines which STT-RAM unit to power on during runtime, based on executing applications' needs. Our experiments show that LARS cache is low-overhead, and can reduce the average energy and latency by 35.8% and 13.2%, respectively, as compared to the dynamic refresh scheme.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

IP2-7, 789

FUSIONCACHE: USING LLC TAGS FOR DRAM CACHE
Speaker:
Evangelos Vasilakis, Chalmers University of Technology, SE
Authors:
Evangelos Vasilakis¹, Vassilis Papaefstathiou², Pedro Trancoso¹ and Ioannis Sourdis¹
¹Chalmers University of Technology, SE; ²FORTH-ICS, GR
Abstract
DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stacked DRAM. Although they can capture the spatial and temporal data locality of applications, their access latency is still substantially higher than conventional on-chip SRAM caches. Moreover, their tag access latency and storage overheads are excessive. Storing tags for a large DRAM cache in SRAM is impractical as it would occupy a significant fraction of the processor chip. Storing them in the DRAM itself incurs high access overheads. Attempting to cache the DRAM tags on the processor adds a constant delay to the access time. In this paper, we introduce FusionCache, a DRAM cache that offers more efficient tag accesses by fusing DRAM cache tags with the tags of the on-chip Last Level Cache (LLC). We observe that, in an inclusive cache model where the DRAM cachelines are multiples of on-chip SRAM cachelines, LLC tags could be re-purposed to access a large part of the DRAM cache contents. Then, accessing DRAM cache tags incurs zero additional latency in the common case.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

End of session
Coffee Break in Exhibition Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018