2.3 Cache memory management for performance and reliability

Time	Label	Presentation Title Authors
11:30	2.3.1	(Best Paper Award Candidate) SHARED LAST-LEVEL CACHE MANAGEMENT FOR GPGPUS WITH HYBRID MAIN MEMORY Speaker: Lei Ju, Shandong University, CN Authors: Guan Wang, Xiaojun Cai, Lei Ju, Chuanqi Zang, Mengying Zhao and Zhiping Jia, Shandong University, CN Abstract Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) technologies, hybrid memory combining both DRAM and NVM achieves high performance, low power and high density simultaneously, which provides a promising main memory design for GPGPUs. In this work, we explore the shared last-level cache management for GPGPUs with consideration of the underlying hybrid main memory. In order to improve the overall memory subsystem performance, we exploit the characteristics of both the asymmetric read/write latency of the hybrid main memory architecture, as well as the memory coalescing feature of GPGPU. In particular, to reduce the average cost of L2 cache misses, we prioritize cache blocks from DRAM or NVM based on observation that operations to NVM part of main memory have large impact on the system performance. Furthermore, the cache management scheme also integrates the GPU memory coalescing and cache bypassing techniques to improve the overall cache hit ratio. Experimental results show that in the context of a hybrid main memory system, our proposed L2 cache management policy improves performance against the traditional LRU policy and a state-of-the-art GPU cache strategy EABP [20] by up to 27.76% and 14%, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.3.2	EFFECTIVE CACHE BANK PLACEMENT FOR GPUS Speaker: Mohammad Sadrosadati, Sharif University of Technology, IR Authors: Mohammad Sadrosadati¹, Amirhossein Mirhosseini², Shahin Roozkhosh¹, Hazhir Bakhishi¹ and Hamid Sarbazi-Azad¹ ¹Sharif University of Technology, IR; ²University of Michigan, US Abstract The placement of the Last Level Cache (LLC) banks in the GPU on-chip network can significantly affect the performance of memory-intensive workloads. In this paper, we attempt to offer a placement methodology for the LLC banks to maximize the performance of the on-chip network connecting the LLC banks to the streaming multiprocessors in GPUs. We argue that an efficient placement needs to be derived based on a novel metric that considers the latency hiding capability of the GPUs through thread level parallelism. To this end, we propose a throughput aware metric, called Effective Latency Impact (ELI). Moreover, we define an optimization problem to formulate our placement approach based on the ELI metric mathematically. To solve this optimization problem, we deploy a heuristic solution as this optimization problem is NP-hard. Experimental results show that our placement approach improves the performance by up to 15.7% compared to the state-of-the-art placement. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.3.3	SOFT ERROR-AWARE ARCHITECTURAL EXPLORATION FOR DESIGNING RELIABILITY ADAPTIVE CACHE HIERARCHIES IN MULTI-CORES Speaker: Semeen Rehman, Technische Universität Dresden, DE Authors: Arun Subramaniyan¹, Semeen Rehman², Muhammad Shafique³, Akash Kumar⁴ and Joerg Henkel⁵ ¹EECS, University of Michigan-Ann Arbor, US; ²Technische Universität Dresden, DE; ³Vienna University of Technology (TU Wien), AT; ⁴Technische Universitaet Dresden, DE; ⁵Karlsruhe Institute of Technology, DE Abstract Mainstream multi-core processors employ large multi-level on-chip caches making them highly susceptible to soft errors. We demonstrate that designing a reliable cache hierarchy requires understanding the vulnerability interdependencies across different cache levels. This involves vulnerability analyses depending upon the parameters of different cache levels (partition size, line size, etc.) and the corresponding cache access patterns for different applications. This paper presents a novel soft error-aware cache architectural space exploration methodology and vulnera-bility analysis of multi-level caches considering their vulnerability interdependencies. Our technique significantly reduces exploration time while providing reliability-efficient cache configurations. We also show applicability/benefits for ECC-protected caches under multi-bit fault scenarios. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-4, 758	(Best Paper Award Candidate) DROOP MITIGATING LAST LEVEL CACHE ARCHITECTURE FOR STTRAM Speaker: Swaroop Ghosh, Pennsylvania State University, US Authors: Radha Krishna Aluru¹ and Swaroop Ghosh² ¹University of South Florida, US; ²Pennsylvania State University, US Abstract Spin-Transfer Torque magnetic Random Access Memory (STT-RAM) is one of the emerging technologies in the Domain of Non-volatile dense memories especially preferred for the last level cache (LLC). The amount of current needed to reorient the magnetization at present (~100μA per bit) is too high, especially for the Write operation. When we perform a full cache line (512-bit) Write, this extremely high current compared to MRAM will result in a Voltage droop in the conventional cache architecture. Due to this droop, the write operation will fail half way through when we attempt to write in the farthest Bank of the cache from the supply. In this paper, we will be proposing a new cache architecture to mitigate this problem of droop and make the write operation successful. Instead of continuously writing the entire Cache line (512-bit) in a single bank, our architecture will be writing these 512-bits in multiple different locations across the cache in parts of 8 (64-bit each). The various simulation results obtained (both circuit and micro-architectural) comparing our proposed architecture against the conventional are presented in detail. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break in Garden Foyer Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420 Lunch Break in the Garden Foyer On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.

Time

Label

Presentation Title
Authors

11:30

2.3.1

(Best Paper Award Candidate)
SHARED LAST-LEVEL CACHE MANAGEMENT FOR GPGPUS WITH HYBRID MAIN MEMORY
Speaker:
Lei Ju, Shandong University, CN
Authors:
Guan Wang, Xiaojun Cai, Lei Ju, Chuanqi Zang, Mengying Zhao and Zhiping Jia, Shandong University, CN
Abstract
Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) technologies, hybrid memory combining both DRAM and NVM achieves high performance, low power and high density simultaneously, which provides a promising main memory design for GPGPUs. In this work, we explore the shared last-level cache management for GPGPUs with consideration of the underlying hybrid main memory. In order to improve the overall memory subsystem performance, we exploit the characteristics of both the asymmetric read/write latency of the hybrid main memory architecture, as well as the memory coalescing feature of GPGPU. In particular, to reduce the average cost of L2 cache misses, we prioritize cache blocks from DRAM or NVM based on observation that operations to NVM part of main memory have large impact on the system performance. Furthermore, the cache management scheme also integrates the GPU memory coalescing and cache bypassing techniques to improve the overall cache hit ratio. Experimental results show that in the context of a hybrid main memory system, our proposed L2 cache management policy improves performance against the traditional LRU policy and a state-of-the-art GPU cache strategy EABP [20] by up to 27.76% and 14%, respectively.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

2.3.2

EFFECTIVE CACHE BANK PLACEMENT FOR GPUS
Speaker:
Mohammad Sadrosadati, Sharif University of Technology, IR
Authors:
Mohammad Sadrosadati¹, Amirhossein Mirhosseini², Shahin Roozkhosh¹, Hazhir Bakhishi¹ and Hamid Sarbazi-Azad¹
¹Sharif University of Technology, IR; ²University of Michigan, US
Abstract
The placement of the Last Level Cache (LLC) banks in the GPU on-chip network can significantly affect the performance of memory-intensive workloads. In this paper, we attempt to offer a placement methodology for the LLC banks to maximize the performance of the on-chip network connecting the LLC banks to the streaming multiprocessors in GPUs. We argue that an efficient placement needs to be derived based on a novel metric that considers the latency hiding capability of the GPUs through thread level parallelism. To this end, we propose a throughput aware metric, called Effective Latency Impact (ELI). Moreover, we define an optimization problem to formulate our placement approach based on the ELI metric mathematically. To solve this optimization problem, we deploy a heuristic solution as this optimization problem is NP-hard. Experimental results show that our placement approach improves the performance by up to 15.7% compared to the state-of-the-art placement.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

2.3.3

SOFT ERROR-AWARE ARCHITECTURAL EXPLORATION FOR DESIGNING RELIABILITY ADAPTIVE CACHE HIERARCHIES IN MULTI-CORES
Speaker:
Semeen Rehman, Technische Universität Dresden, DE
Authors:
Arun Subramaniyan¹, Semeen Rehman², Muhammad Shafique³, Akash Kumar⁴ and Joerg Henkel⁵
¹EECS, University of Michigan-Ann Arbor, US; ²Technische Universität Dresden, DE; ³Vienna University of Technology (TU Wien), AT; ⁴Technische Universitaet Dresden, DE; ⁵Karlsruhe Institute of Technology, DE
Abstract
Mainstream multi-core processors employ large multi-level on-chip caches making them highly susceptible to soft errors. We demonstrate that designing a reliable cache hierarchy requires understanding the vulnerability interdependencies across different cache levels. This involves vulnerability analyses depending upon the parameters of different cache levels (partition size, line size, etc.) and the corresponding cache access patterns for different applications. This paper presents a novel soft error-aware cache architectural space exploration methodology and vulnera-bility analysis of multi-level caches considering their vulnerability interdependencies. Our technique significantly reduces exploration time while providing reliability-efficient cache configurations. We also show applicability/benefits for ECC-protected caches under multi-bit fault scenarios.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

IP1-4, 758

(Best Paper Award Candidate)
DROOP MITIGATING LAST LEVEL CACHE ARCHITECTURE FOR STTRAM
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
Authors:
Radha Krishna Aluru¹ and Swaroop Ghosh²
¹University of South Florida, US; ²Pennsylvania State University, US
Abstract
Spin-Transfer Torque magnetic Random Access Memory (STT-RAM) is one of the emerging technologies in the Domain of Non-volatile dense memories especially preferred for the last level cache (LLC). The amount of current needed to reorient the magnetization at present (~100μA per bit) is too high, especially for the Write operation. When we perform a full cache line (512-bit) Write, this extremely high current compared to MRAM will result in a Voltage droop in the conventional cache architecture. Due to this droop, the write operation will fail half way through when we attempt to write in the farthest Bank of the cache from the supply. In this paper, we will be proposing a new cache architecture to mitigate this problem of droop and make the write operation successful. Instead of continuously writing the entire Cache line (512-bit) in a single bank, our architecture will be writing these 512-bits in multiple different locations across the cache in parts of 8 (64-bit each). The various simulation results obtained (both circuit and micro-architectural) comparing our proposed architecture against the conventional are presented in detail.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

End of session
Lunch Break in Garden Foyer

Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420

Lunch Break in the Garden Foyer
On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.

available at

Visit us at DATE 2017

Booth: 20+21

Booth: 30

Booth: 17

Booth: 26

Booth: 1

Booth: 23

Submissions

2.3 Cache memory management for performance and reliability

DATE Smartphone App

Visit us at DATE 2017