4.7 Energy and power efficiency in GPU-based systems

Printer-friendly version PDF version

Date: Tuesday 26 March 2019
Time: 17:00 - 18:30
Location / Room: Room 7

Chair:
Muhammad Shafique, TU Wien, AT

Co-Chair:
William Fornaciari, Politecnico di Milano, IT

This session presents three papers, two on energy efficiency for GPU-based systems and one about exploring performance and accuracy tradeoffs when using GPUs for SNN modeling. The first paper presents an online thermal and energy management mechanism for CPU-GPU system enabled by efficient thread partitioning, mapping, and respective models. The second paper identifies choke points in GPUs and boost the choke point induced critical warps for achieving high energy efficiency. The third paper presents a GPU-accelerated SNN simulator that introduces stochasticity in STDP and capability of performing low-precision simulation.

TimeLabelPresentation Title
Authors
17:004.7.1TEEM: ONLINE THERMAL- AND ENERGY-EFFICIENCY MANAGEMENT ON CPU-GPU MPSOCS
Speaker:
Amit Kumar Singh, University of Essex, GB
Authors:
Samuel Isuwa, Somdip Dey, Amit Kumar Singh and Klaus McDonald-Maier, University of Essex, GB
Abstract
Heterogeneous Multiprocessor System-on-Chip (MPSoC) are progressively becoming predominant in most modern mobile devices. These devices are required to perform processing of applications within thermal, energy and performance constraints. However, most stock power and thermal management mechanisms either neglect some of these constraints or rely on frequency scaling to achieve energy-efficiency and temperature reduction on the device. Although this inefficient technique can reduce temporal thermal gradient, but at the same time hurts the performance of the executing task. In this paper, we propose a thermal and energy management mechanism which achieves reduction in thermal gradient as well as energy-efficiency through resource mapping and thread-partitioning of applications with online optimization in heterogeneous MPSoCs. The efficacy of the proposed approach is experimentally appraised using different applications from Polybench benchmark suite on Odroid-XU4 developmental platform. Results show 28% performance improvement, 28.32% energy saving and reduced thermal variance of over 76% when compared to the existing approaches. Additionally, the method is able to free more than 90% in memory storage on the MPSoC, which would have been previously utilized to store several task-to-thread mapping configurations.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.7.2PREDICTING CRITICAL WARPS IN NEAR-THRESHOLD GPGPU APPLICATIONS USING A DYNAMIC CHOKE POINT ANALYSIS
Speaker:
Sourav Sanyal, Utah State University, US
Authors:
Sourav Sanyal, Prabal Basu, Aatreyi Bal, Sanghamitra Roy and Koushik Chakraborty, Utah State University, US
Abstract
General purpose graphics processing units (GP-GPU) cansignificantly improve the power consumption at the NTC operating region. However, process variation (PV) can drastically reduce its performance. In this paper, we examine choke points-a unique device-level characteristic of PV at NTC-that can exacerbate the warp criticality problem. We show that the modern warp schedulers cannot tackle the choke point induced critical warps in an NTC GPU. We propose Warp Latency Booster, a circuit-architectural solution to dynamically predict the critical warps and accelerate them in their respective execution units. Our best scheme achieves an average improvement of ∼32% and ∼41% in performance, and ∼21% and ∼19% in energy-efficiency, respectively, over two state-of-the-art warp schedulers.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.7.3FAST AND LOW-PRECISION LEARNING IN GPU-ACCELERATED SPIKING NEURAL NETWORK
Speaker:
Xueyuan She, Georgia Institute of Technology, US
Authors:
Xueyuan She, Yun Long and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
Spiking neural network (SNN) uses biologically inspired neuron model coupled with Spike-timing-dependent-plasticity (STDP) to enable unsupervised continuous learning in artificial intelligence (AI) platform. However, current SNN algorithms shows low accuracy in complex problems and are hard to operate at reduced precision. This paper demonstrates a GPU-accelerated SNN architecture that uses stochasticity in the STDP coupled with higher frequency input spike trains. The simulation results demonstrate 2 to 3 times faster learning compared to deterministic SNN architectures while maintaining high accuracy for MNIST (simple) and fashion MNIST (complex) data sets. Further, we show stochastic STDP enables learning even with 2 bits of operation, while deterministic STDP fails.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.