2.3 Cache memory management for performance and reliability

Printer-friendly version PDF version

Date: Tuesday 28 March 2017
Time: 11:30 - 13:00
Location / Room: 2BC

Chair:
Dionisios Pnevmatikatos, Technical University of Crete, GR

Co-Chair:
Cristina Silvano, Politecnico di Milano, IT

Cache memory design optimizations and management can have a significant effect on cost, performance, and reliability. The first paper proposes an asymmetric cache management policy for GPGPUs with hybrid main memories that significantly improve performance for memory intensive workloads. The second paper targets the optimization of the bank placement in GPUs' last level cache, with the goal of maximizing the performance of the GPU's on-chip network. The third paper proposes a methodology for jointly analyzing all the cache level configurations to determine and minimize the susceptibility of the caches to soft errors

TimeLabelPresentation Title
Authors
11:302.3.1(Best Paper Award Candidate)
SHARED LAST-LEVEL CACHE MANAGEMENT FOR GPGPUS WITH HYBRID MAIN MEMORY
Speaker:
Lei Ju, Shandong University, CN
Authors:
Guan Wang, Xiaojun Cai, Lei Ju, Chuanqi Zang, Mengying Zhao and Zhiping Jia, Shandong University, CN
Abstract
Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) technologies, hybrid memory combining both DRAM and NVM achieves high performance, low power and high density simultaneously, which provides a promising main memory design for GPGPUs. In this work, we explore the shared last-level cache management for GPGPUs with consideration of the underlying hybrid main memory. In order to improve the overall memory subsystem performance, we exploit the characteristics of both the asymmetric read/write latency of the hybrid main memory architecture, as well as the memory coalescing feature of GPGPU. In particular, to reduce the average cost of L2 cache misses, we prioritize cache blocks from DRAM or NVM based on observation that operations to NVM part of main memory have large impact on the system performance. Furthermore, the cache management scheme also integrates the GPU memory coalescing and cache bypassing techniques to improve the overall cache hit ratio. Experimental results show that in the context of a hybrid main memory system, our proposed L2 cache management policy improves performance against the traditional LRU policy and a state-of-the-art GPU cache strategy EABP [20] by up to 27.76% and 14%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.3.2EFFECTIVE CACHE BANK PLACEMENT FOR GPUS
Speaker:
Mohammad Sadrosadati, Sharif University of Technology, IR
Authors:
Mohammad Sadrosadati1, Amirhossein Mirhosseini2, Shahin Roozkhosh1, Hazhir Bakhishi1 and Hamid Sarbazi-Azad1
1Sharif University of Technology, IR; 2University of Michigan, US
Abstract
The placement of the Last Level Cache (LLC) banks in the GPU on-chip network can significantly affect the performance of memory-intensive workloads. In this paper, we attempt to offer a placement methodology for the LLC banks to maximize the performance of the on-chip network connecting the LLC banks to the streaming multiprocessors in GPUs. We argue that an efficient placement needs to be derived based on a novel metric that considers the latency hiding capability of the GPUs through thread level parallelism. To this end, we propose a throughput aware metric, called Effective Latency Impact (ELI). Moreover, we define an optimization problem to formulate our placement approach based on the ELI metric mathematically. To solve this optimization problem, we deploy a heuristic solution as this optimization problem is NP-hard. Experimental results show that our placement approach improves the performance by up to 15.7% compared to the state-of-the-art placement.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.3.3SOFT ERROR-AWARE ARCHITECTURAL EXPLORATION FOR DESIGNING RELIABILITY ADAPTIVE CACHE HIERARCHIES IN MULTI-CORES
Speaker:
Semeen Rehman, Technische Universität Dresden, DE
Authors:
Arun Subramaniyan1, Semeen Rehman2, Muhammad Shafique3, Akash Kumar4 and Joerg Henkel5
1EECS, University of Michigan-Ann Arbor, US; 2Technische Universität Dresden, DE; 3Vienna University of Technology (TU Wien), AT; 4Technische Universitaet Dresden, DE; 5Karlsruhe Institute of Technology, DE
Abstract
Mainstream multi-core processors employ large multi-level on-chip caches making them highly susceptible to soft errors. We demonstrate that designing a reliable cache hierarchy requires understanding the vulnerability interdependencies across different cache levels. This involves vulnerability analyses depending upon the parameters of different cache levels (partition size, line size, etc.) and the corresponding cache access patterns for different applications. This paper presents a novel soft error-aware cache architectural space exploration methodology and vulnera-bility analysis of multi-level caches considering their vulnerability interdependencies. Our technique significantly reduces exploration time while providing reliability-efficient cache configurations. We also show applicability/benefits for ECC-protected caches under multi-bit fault scenarios.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-4, 758(Best Paper Award Candidate)
DROOP MITIGATING LAST LEVEL CACHE ARCHITECTURE FOR STTRAM
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
Authors:
Radha Krishna Aluru1 and Swaroop Ghosh2
1University of South Florida, US; 2Pennsylvania State University, US
Abstract
Spin-Transfer Torque magnetic Random Access Memory (STT-RAM) is one of the emerging technologies in the Domain of Non-volatile dense memories especially preferred for the last level cache (LLC). The amount of current needed to reorient the magnetization at present (~100μA per bit) is too high, especially for the Write operation. When we perform a full cache line (512-bit) Write, this extremely high current compared to MRAM will result in a Voltage droop in the conventional cache architecture. Due to this droop, the write operation will fail half way through when we attempt to write in the farthest Bank of the cache from the supply. In this paper, we will be proposing a new cache architecture to mitigate this problem of droop and make the write operation successful. Instead of continuously writing the entire Cache line (512-bit) in a single bank, our architecture will be writing these 512-bits in multiple different locations across the cache in parts of 8 (64-bit each). The various simulation results obtained (both circuit and micro-architectural) comparing our proposed architecture against the conventional are presented in detail.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Garden Foyer

Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420

Lunch Break in the Garden Foyer
On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.