8.4 Efficient and reliable memory and computing architectures

Printer-friendly version PDF version

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 2

Chair:
Göhringer Diana, Technische Universität Dresden, DE

Co-Chair:
Jie Han, University of Alberta, CA

This session covers the exploitation of techniques to improve energy efficiency and resilience in memory and computing architectures. The first paper proposes a method using hybrid vertex-edge memory hierarchy to reduce the energy consumption of resistive random-access memory (ReRAM) systems. This method significantly improves the energy efficiency compared to conventional DRAM-based systems. The second paper examines the use of neural networks to improve resilience. The third paper addresses the performance bottleneck in von Neumann architectures by proposing an efficient algorithm for matrix multiplication in the memory of resistive associative processors (ReAPs). Finally, the last paper covers run-time application mapping on manycore systems for aging and process variations.

TimeLabelPresentation Title
Authors
17:008.4.1(Best Paper Award Candidate)
HYVE: HYBRID VERTEX-EDGE MEMORY HIERARCHY FOR ENERGY-EFFICIENT GRAPH PROCESSING
Speaker:
Tianhao Huang, Tsinghua University, CN
Authors:
Tianhao Huang, Guohao Dai, Yu Wang and Huazhong Yang, Tsinghua University, CN
Abstract
High energy consumption of conventional memory modules (e.g. DRAMs) hinders the further improvement of large-scale graph processing's energy efficiency. The emerging metal-oxide resistive random-access memory (ReRAM) and ReRAM crossbar have shown great potential in providing the energy-efficient memory module. However, the performance of ReRAMs suffers from data access patterns with poor locality and a large amount of written data, which are common in graph processing. In this paper, we propose a Hybrid Vertex-Edge memory hierarchy, HyVE, to avoid random access and data written to ReRAM modules. With data allocation and scheduling over vertices and edges, HyVE reduces memory energy consumption by 69% compared with conventional memory system in graph processing. Moreover, we adopt bank level power-gating scheme to further reduce the stand-by power. Our evaluations show that the optimized design achieve at least 2.0x improvement of energy efficiency against DRAM-based design.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.4.2ACCURATE NEURON RESILIENCE PREDICTION FOR A FLEXIBLE RELIABILITY MANAGEMENT IN NEURAL NETWORK ACCELERATORS
Speaker:
Christoph Schorn, Robert Bosch GmbH, DE
Authors:
Christoph Schorn1, Andre Guntoro1 and Gerd Ascheid2
1Robert Bosch GmbH, DE; 2RWTH Aachen University, DE
Abstract
Deep neural networks have become a ubiquitous tool for mastering complex classification tasks. Current research focuses on the development of power-efficient and fast neural network hardware accelerators for mobile and embedded devices. However, when used in safety-critical applications, for example autonomously operating vehicles, the reliability of such accelerators becomes a further optimization criterion which can stand in contrast to power-efficiency and latency. Furthermore, ensuring hardware reliability becomes increasingly challenging for shrinking structure widths and rising power densities in the nanometer semiconductor technology era. One solution to this challenge is the exploitation of fault tolerant parts in deep neural networks. In this paper we propose a new method for predicting the error resilience of neurons in deep neural networks and show that this method significantly improves upon existing methods in terms of accuracy as well as interpretability. We evaluate prediction accuracy by simulating hardware faults in networks trained on the CIFAR-10 and ILSVRC image classification benchmarks and protecting neurons according to the resilience estimations. In addition, we demonstrate how our resilience prediction can be used for a flexible trade-off between reliability and efficiency in neural network hardware accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.4.3RAPID IN-MEMORY MATRIX MULTIPLICATION USING ASSOCIATIVE PROCESSOR
Speaker:
Hasan Erdem Yantir, University of California Irvine, US
Authors:
Neggaz Mohamed Ayoub1, Hasan Erdem Yantır2, Smail Niar3, Ahmed Eltawil2 and Fadi Kurdahi2
1University of Valenciennes, FR; 2University of California, Irvine, US; 3LAMIH-University of Valenciennes, FR
Abstract
Memory hierarchy latency is one of the main problems that prevents processors from achieving high performance. To eliminate the need of loading/storing large sets of data, Resistive Associative Processors (ReAP) have been proposed as a solution to the von Neumann bottleneck. In ReAPs, logic and memory structures are combined together to allow in-memory computations. In this paper, we propose a new algorithm to compute the matrix multiplication inside the memory that exploits the benefits of ReAP. The proposed approach is based on the Cannon algorithm and uses a series of rotations without duplicating the data. It runs in O(n), where n is the dimension of the matrix. The method also applies to a large set of row by column matrix-based applications. Experimental results show several orders of magnitude increase in performance and reduction in energy and area when compared to the latest FPGA and CPU implementations.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.4.4HIMAP: A HIERARCHICAL MAPPING APPROACH FOR ENHANCING LIFETIME RELIABILITY OF DARK SILICON MANYCORE SYSTEMS
Speaker:
Vivek Chaturvedi, Nanyang Technological University, SG
Authors:
Vijeta Rathore1, Vivek Chaturvedi1, Amit Kumar Singh2, Thambipillai Srikanthan1, Rohith R1, Siew Kei Lam1 and Muhammad Shafique3
1Nanyang Technological University, SG; 2University of Essex, GB; 3TU Wien, AT
Abstract
Technology scaling into the nano-scale CMOS regime has resulted in increased leakage and roadblock on voltage scaling, which has led to several issues like high power density and elevated on-chip temperature. This consequently aggravates device aging, compromising lifetime reliability of the manycore systems. This paper proposes extit{HiMap}, a dynamic hierarchical mapping approach to maximize lifetime reliability of manycore systems while satisfying performance, power, and thermal constraints. HiMap is process variation- and aging-aware. It comprises of two levels: (1) it identifies a region of cores suitable for mapping, and (2) it maps threads in the region and intersperses dark cores for thermal mitigation while considering the current health of the cores. Both the levels strive to reduce aging variance across the chip. We evaluated HiMap for 64-core and 256-core systems. Results demonstrate an improved system lifetime reliability by up to 2 years at the end of 3.25 years of use, as compared to the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP3-16, 906DEMAS: AN EFFICIENT DESIGN METHODOLOGY FOR BUILDING APPROXIMATE ADDERS FOR FPGA-BASED SYSTEMS
Speaker:
Semeen Rehman, Vienna University of Technology (TU Wien), AT
Authors:
Bharath Srinivas Prabakaran1, Semeen Rehman1, Muhammad Abdullah Hanif1, Salim Ullah2, Ghazal Mazaheri3, Akash Kumar2 and Muhammad Shafique1
1TU Wien, AT; 2Technische Universität Dresden, DE; 3UC Riverside, US
Abstract
The current state-of-the-art approximate adders are mostly ASIC-based, i.e., they focus solely on gate and/or transistor level approximations (e.g., through circuit simplification or truncation) to achieve area, latency, power and/or energy savings at the cost of accuracy loss. However, when these designs are synthesized for FPGA-based systems, they do not offer similar reductions in area, latency and power/energy due to the underlying architectural differences between ASICs and FPGAs. In this paper, we present a novel generic design methodology to synthesize and implement approximate adders for any FPGA-based system by considering the underlying resources and architectural differences. Using our methodology, we have designed, analyzed and presented eight different multi-bit adder architectures. Compared to the 16-bit accurate adder, our designs are successful in achieving area, latency and power-delay product gains of 50%, 38%, and 53%, respectively. We also compare our approximate adders to state-of-the-art approximate adders specialized for ASIC and FPGA fabrics and demonstrate the benefits of our approach. We will make the RTL and behavioral models of our and state-of-the-art designs open-source at https://sourceforge.net/projects/approxfpgas/ to further fuel the research and development in the FPGA community and to ensure reproducible research.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP3-17, 515GAIN SCHEDULED CONTROL FOR NONLINEAR POWER MANAGEMENT IN CMPS
Speaker:
Nikil Dutt, University of California, Irvine, US
Authors:
Bryan Donyanavard, Amir M. Rahmani, Tiago Muck, Kasra Moazzemi and Nikil Dutt, University of California, Irvine, US
Abstract
Dynamic voltage and frequency scaling (DVFS) is a well-established technique for power management of thermal- or energy-sensitive chip multiprocessors (CMPs). In this context, linear control theoretic solutions have been successfully implemented to control the voltage-frequency knobs. However, modern CMPs with a large range of operating frequencies and multiple voltage levels display nonlinear behavior in the relationship between frequency and power. State-of-the-art linear controllers therefore leave room for opportunity in optimizing DVFS operation. We propose a Gain Scheduled Controller (GSC) for nonlinear runtime power management of CMPs that simplifies the controller implementation of systems with varying dynamic properties by utilizing an adaptive control theoretic approach in conjunction with static linear controllers. Our design improves the stability, accuracy, settling time, and overshoot of the controller over a linear controller with minimal overhead. We implement our approach on an Exynos platform containing ARM's big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that the system's response to changes in target power is improved by 2x while operating up to 12% more efficiently.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP4-1, 376EFFICIENT MAPPING OF QUANTUM CIRCUITS TO THE IBM QX ARCHITECTURES
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner, Alexandru Paler and Robert Wille, Johannes Kepler University Linz, AT
Abstract
In March 2017, IBM launched the project IBM Q with the goal to provide access to quantum computers for a broad audience. This allowed users to conduct quantum experiments on a 5-qubit and, since June 2017, also on a 16-qubit quantum computer (called IBM QX2 and IBM QX3, respectively). In order to use these, the desired quantum functionality (e.g. provided in terms of a quantum circuit) has to properly be mapped so that the underlying physical constraints are satisfied - a complex task. This demands for solutions to automatically and efficiently conduct this mapping process. In this paper, we propose such an approach which satisfies all constraints given by the architecture and, at the same time, aims to keep the overhead in terms of additionally required quantum gates minimal. The proposed approach is generic and can easily be configured for future architectures. Experimental evaluations show that the proposed approach clearly outperforms IBM's own mapping solution with respect to runtime as well as resulting costs.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session