3.4 Tackling Memory Walls with Emerging Architectures and Technologies

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Chartreuse

Chair:
Akash Kumar, NUS, SG

Co-Chair:
Cristina Silvano, Politecnico di Milano, IT

This session focuses on various aspects of memory system design including non-volatile reconfigurable cache design, cache directory design, shared DRAM access, and writeback policies for stacked-die caches.

Time	Label	Presentation Title Authors
14:30	3.4.1	SELECTDIRECTORY: A SELECTIVE DIRECTORY FOR CACHE COHERENCE IN MANY-CORE ARCHITECTURES Speakers: Yuan Yao¹, Guanhua Wang², Zhiguo Ge³, Tulika Mitra², Wenzhi Chen¹ and Naxin Zhang³ ¹Zhejiang University, CN; ²National University of Singapore, SG; ³Huawei, SG Abstract As we move into many-core era fueled by Moore's Law, it has become unprecedentedly challenging to provide the shared memory abstraction through directory-based cache coherence. The main difficulty is the high area and power overhead of the directory in tracking the presence of a memory block in all the private caches. Sparse directory offers relatively better design trade-offs by decoupling the coherence meta-data from the last-level cache (LLC); but still suffers from high area/power issues. In this work, we propose a compact directory design by exploiting the observation that a significant fraction of the memory blocks are temporarily exclusive in the cache hierarchy and hence only needs minimal sharer information. Inspired by this observation, we propose to further decouple the tag array from the coherence meta-data array in the sparse directory and allocate a sharer list only for the actively shared blocks. Experimental results reveal that our proposal, called SelectDirectory, can substantially save directory storage area and energy without sacrificing performance. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.4.2	DYRECTAPE: A DYNAMICALLY RECONFIGURABLE CACHE USING DOMAIN WALL MEMORY TAPES Speaker: Rangharajan Venkatesan, NVIDIA Corporation, US Authors: Ashish Ranjan¹, Shankar Ganesh Ramasubramanian¹, Rangharajan Venkatesan², Vijay Pai¹, Kaushik Roy¹ and Anand Raghunathan¹ ¹Purdue University, US; ²NVIDIA Corporation, US Abstract Spintronic memories offer superior density, non-volatility and ultra-low standby power compared to CMOS memories, and have consequently attracted great interest in the design of on-chip caches. Domain Wall Memory (DWM) is a spintronic memory technology with unparalleled density arising from a tape-like structure. However, such a structure involves serialized access to the bits stored in each bit-cell, resulting in increased access latency, and thereby degrading performance. Prior efforts address this challenge either by limiting the number of bits per tape, in effect sacrificing the density benefits of DWM, or through cache management policies that can only partly alleviate the shift overhead. We observe that there exists significant heterogeneity in sensitivity to cache capacity and access latency across different applications, and across distinct phases of an application. We also make the key observation that DWM tapes offer a natural mechanism to tradeoff density for access latency by limiting the number of domains of each tape that are actively used to store cache data. Based on this insight, we propose DyReCTape, a dynamically reconfigurable cache that packs maximum bits per tape and leverages the intrinsic capability of DWMs to modulate the active bits per tape with minimal overhead. DyReCTape uses a history-based reconfiguration policy that tracks the number of shift operations incurred and miss rate to appropriately tailor the capacity and access latency of the DWM cache. We further propose two performance optimizations to DyReCTape: (i) a lazy migration policy to mitigate the overheads of reconfiguration, and (ii) re-use of the portion of the cache that is unused (due to reconfiguration) as a victim cache to reduce the number of off-chip accesses. We evaluate DyReCTape using applications from the PARSEC and SPLASH benchmark suites. Our experiments demonstrate that DyReCTape achieves 19.8% performance improvement over an iso-area SRAM cache and 11.7% performance improvement (due to a 3.4X reduction in the number of shifts) over a state-of-the-art DWM cache. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.4.3	COOPERATIVELY MANAGING DYNAMIC WRITEBACK AND INSERTION POLICIES IN A LAST-LEVEL DRAM CACHE Speakers: Shouyi YIN¹, Jiakun Li¹, Leibo Liu², Shaojun Wei¹ and Yike Guo³ ¹Tsinghua University, CN; ²Institute of Microelectronics and The National Lab for Information Science and Technology, Tsinghua University, CN; ³Imperial College, London, UK, GB Abstract Stacked-DRAM used as the last-level caches (LLC) in multi-core systems delivers performance enhancement due to its capacity benefit. While the performance of LLC depends heav- ily upon its block replacement policy, conventional replacement policy needs redesigning to exploit the best of DRAM cache while avoiding its drawbacks. Existing DRAM cache insertion policy blindly forwards victim lines to the off-chip memory, regardless of the potential for increased hits by placing a fraction of them in the DRAM cache; nevertheless, a na ̈ıve design that steers all dirty victims to the DRAM cache introduces excessive writeback traffic which aggravates capacity misses and DRAM interference. To leverage insertions in terms of writeback or fill requests, we propose a cooperative writeback and insertion policy that adapts to the distinct access patterns of heterogeneous applications based on runtime misses and writeback efficiency, thereby increasing HMIPC (harmonic instruction per cycle) throughput by 22.2%, 13.7% and 14.5% compared to LRU and two static writeback policies. Download Paper (PDF; Only available from the DATE venue WiFi)
15:45	3.4.4	A GENERIC, SCALABLE AND GLOBALLY ARBITRATED MEMORY TREE FOR SHARED DRAM ACCESS IN REAL-TIME SYSTEMS Speakers: Manil Dev Gomony¹, Jamie Garside², Benny Akesson³, Neil Audsley² and Kees Goossens¹ ¹Eindhoven University of Technology, NL; ²University of York, GB; ³Czech Technical University in Prague, CZ Abstract Predictable arbitration policies, such as Time Division Multiplexing (TDM) and Round-Robin (RR), are used to provide firm real-time guarantees to clients sharing a single memory resource (DRAM) between the multiple memory clients in multi-core real-time systems. Traditional centralized implementations of predictable arbitration policies in a shared memory bus or interconnect are not scalable in terms of the number of clients. On the other hand, existing distributed memory interconnects are either globally arbitrated, which do not offer diverse service according to the heterogeneous client requirements, or locally arbitrated, which suffers from larger area, power and latency overhead. Moreover, selecting the right arbitration policy according to the diverse and dynamic client requirements in reusable platforms requires a generic re-configurable architecture supporting different arbitration policies. The main contributions in this paper are: (1) We propose a novel generic, scalable and globally arbitrated memory tree (GSMT) architecture for distributed implementation of several predictable arbitration policies. (2) We present an RTL-level implementation of Accounting and Priority assignment (APA) logic of GSMT that can be configured with five different arbitration policies typically used for shared memory access in real-time systems. (3) We compare the performance of GSMT with different centralized implementations by synthesizing the designs in a 40 nm process. Our experiments show that with 64 clients GSMT can run up to four times faster than traditional architectures and have over 51% and 37% reduction in area and power consumption, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00