Date: Tuesday 28 March 2017
Time: 11:30 - 13:00
Location / Room: 2BC
Chair:
Dionisios Pnevmatikatos, Technical University of Crete, GR
Co-Chair:
Cristina Silvano, Politecnico di Milano, IT
Cache memory design optimizations and management can have a significant effect on cost, performance, and reliability. The first paper proposes an asymmetric cache management policy for GPGPUs with hybrid main memories that significantly improve performance for memory intensive workloads. The second paper targets the optimization of the bank placement in GPUs' last level cache, with the goal of maximizing the performance of the GPU's on-chip network. The third paper proposes a methodology for jointly analyzing all the cache level configurations to determine and minimize the susceptibility of the caches to soft errors
Time | Label | Presentation Title Authors |
---|---|---|
11:30 | 2.3.1 | (Best Paper Award Candidate) SHARED LAST-LEVEL CACHE MANAGEMENT FOR GPGPUS WITH HYBRID MAIN MEMORY Speaker: Lei Ju, Shandong University, CN Authors: Guan Wang, Xiaojun Cai, Lei Ju, Chuanqi Zang, Mengying Zhao and Zhiping Jia, Shandong University, CN Abstract Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) technologies, hybrid memory combining both DRAM and NVM achieves high performance, low power and high density simultaneously, which provides a promising main memory design for GPGPUs. In this work, we explore the shared last-level cache management for GPGPUs with consideration of the underlying hybrid main memory. In order to improve the overall memory subsystem performance, we exploit the characteristics of both the asymmetric read/write latency of the hybrid main memory architecture, as well as the memory coalescing feature of GPGPU. In particular, to reduce the average cost of L2 cache misses, we prioritize cache blocks from DRAM or NVM based on observation that operations to NVM part of main memory have large impact on the system performance. Furthermore, the cache management scheme also integrates the GPU memory coalescing and cache bypassing techniques to improve the overall cache hit ratio. Experimental results show that in the context of a hybrid main memory system, our proposed L2 cache management policy improves performance against the traditional LRU policy and a state-of-the-art GPU cache strategy EABP [20] by up to 27.76% and 14%, respectively. Download Paper (PDF; Only available from the DATE venue WiFi) |
12:00 | 2.3.2 | EFFECTIVE CACHE BANK PLACEMENT FOR GPUS Speaker: Mohammad Sadrosadati, Sharif University of Technology, IR Authors: Mohammad Sadrosadati1, Amirhossein Mirhosseini2, Shahin Roozkhosh1, Hazhir Bakhishi1 and Hamid Sarbazi-Azad1 1Sharif University of Technology, IR; 2University of Michigan, US Abstract The placement of the Last Level Cache (LLC) banks in the GPU on-chip network can significantly affect the performance of memory-intensive workloads. In this paper, we attempt to offer a placement methodology for the LLC banks to maximize the performance of the on-chip network connecting the LLC banks to the streaming multiprocessors in GPUs. We argue that an efficient placement needs to be derived based on a novel metric that considers the latency hiding capability of the GPUs through thread level parallelism. To this end, we propose a throughput aware metric, called Effective Latency Impact (ELI). Moreover, we define an optimization problem to formulate our placement approach based on the ELI metric mathematically. To solve this optimization problem, we deploy a heuristic solution as this optimization problem is NP-hard. Experimental results show that our placement approach improves the performance by up to 15.7% compared to the state-of-the-art placement. Download Paper (PDF; Only available from the DATE venue WiFi) |
12:30 | 2.3.3 | SOFT ERROR-AWARE ARCHITECTURAL EXPLORATION FOR DESIGNING RELIABILITY ADAPTIVE CACHE HIERARCHIES IN MULTI-CORES Speaker: Semeen Rehman, Technische Universität Dresden, DE Authors: Arun Subramaniyan1, Semeen Rehman2, Muhammad Shafique3, Akash Kumar4 and Joerg Henkel5 1EECS, University of Michigan-Ann Arbor, US; 2Technische Universität Dresden, DE; 3Vienna University of Technology (TU Wien), AT; 4Technische Universitaet Dresden, DE; 5Karlsruhe Institute of Technology, DE Abstract Mainstream multi-core processors employ large multi-level on-chip caches making them highly susceptible to soft errors. We demonstrate that designing a reliable cache hierarchy requires understanding the vulnerability interdependencies across different cache levels. This involves vulnerability analyses depending upon the parameters of different cache levels (partition size, line size, etc.) and the corresponding cache access patterns for different applications. This paper presents a novel soft error-aware cache architectural space exploration methodology and vulnera-bility analysis of multi-level caches considering their vulnerability interdependencies. Our technique significantly reduces exploration time while providing reliability-efficient cache configurations. We also show applicability/benefits for ECC-protected caches under multi-bit fault scenarios. Download Paper (PDF; Only available from the DATE venue WiFi) |
13:00 | IP1-4, 758 | (Best Paper Award Candidate) DROOP MITIGATING LAST LEVEL CACHE ARCHITECTURE FOR STTRAM Speaker: Swaroop Ghosh, Pennsylvania State University, US Authors: Radha Krishna Aluru1 and Swaroop Ghosh2 1University of South Florida, US; 2Pennsylvania State University, US Abstract Spin-Transfer Torque magnetic Random Access Memory (STT-RAM) is one of the emerging technologies in the Domain of Non-volatile dense memories especially preferred for the last level cache (LLC). The amount of current needed to reorient the magnetization at present (~100μA per bit) is too high, especially for the Write operation. When we perform a full cache line (512-bit) Write, this extremely high current compared to MRAM will result in a Voltage droop in the conventional cache architecture. Due to this droop, the write operation will fail half way through when we attempt to write in the farthest Bank of the cache from the supply. In this paper, we will be proposing a new cache architecture to mitigate this problem of droop and make the write operation successful. Instead of continuously writing the entire Cache line (512-bit) in a single bank, our architecture will be writing these 512-bits in multiple different locations across the cache in parts of 8 (64-bit each). The various simulation results obtained (both circuit and micro-architectural) comparing our proposed architecture against the conventional are presented in detail. Download Paper (PDF; Only available from the DATE venue WiFi) |
13:00 | End of session Lunch Break in Garden Foyer Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420 Lunch Break in the Garden Foyer |