2.2 Energy Efficient Neural Networks

Time	Label	Presentation Title Authors
11:30	2.2.1	(Best Paper Award Candidate) MATIC: LEARNING AROUND ERRORS FOR EFFICIENT LOW-VOLTAGE NEURAL NETWORK ACCELERATORS Speaker: Sung Kim, University of Washington, US Authors: Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze and Visvesh Sathe, University of Washington, US Abstract As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly. However, while accelerators with high performance and efficiency on convolutional deep neural networks (Conv-DNNs) have been developed, less progress has been made with regards to fully-connected DNNs (FC-DNNs). In this paper, we propose MATIC (Memory Adaptive Training with In-situ Canaries), a methodology that enables aggressive voltage scaling of accelerator weight memories to improve the energy-efficiency of DNN accelerators. To enable accurate operation with voltage overscaling, MATIC combines the characteristics of destructive SRAM reads with the error resilience of neural networks in a memory-adaptive training process. Furthermore, PVT-related voltage margins are eliminated using bit-cells from synaptic weights as in-situ canaries to track runtime environmental variation. Demonstrated on a low-power DNN accelerator that we fabricate in 65 nm CMOS, MATIC enables up to 60-80 mV of voltage overscaling (3.3x total energy reduction versus the nominal voltage), or 18.6x application error reduction. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.2.2	MAXIMIZING SYSTEM PERFORMANCE BY BALANCING COMPUTATION LOADS IN LSTM ACCELERATORS Speaker: Junki Park, POSTECH, KR Authors: Junki Park¹, Jaeha Kung¹, Wooseok Yi² and Jae-Joon Kim¹ ¹Pohang University of Science and Techology, KR; ²POSTECH CITE, KR Abstract The LSTM is a popular neural network model for modeling or analyzing the time-varying data. The main operation of LSTM is a matrix-vector multiplication and it becomes sparse (spMxV) due to the widely-accepted weight pruning in deep learning. This paper presents a new sparse matrix format, named CBSR, to maximize the inference speed of the LSTM accelerator. In the CBSR format, speed-up is achieved by balancing out the computation loads over PEs. Along with the new format, we present a simple network transformation to completely remove the hardware overhead incurred when using the CBSR format. Also, the detailed analysis on the impact of network size or the number of PEs is performed, which lacks in the prior work. The simulation results show 16~38% improvement in the system performance compared to the well-known CSC/CSR format. The power analysis is also performed in 65nm CMOS technology to show 9~22% energy savings. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.2.3	MODNN: MEMORY OPTIMAL DNN TRAINING ON GPUS Speaker: Xiaoming Chen, Institute of Computing Technology, Chinese Academy of Sciences, CN Authors: Xiaoming Chen, Danny Z. Chen and Xiaobo Sharon Hu, University of Notre Dame, US Abstract Graphics processing units (GPUs) are widely adopted to accelerate the training of deep neural networks (DNNs). However, the limited GPU memory size restricts the maximum scale of DNNs that can be trained on GPUs, which presents serious challenges. This paper proposes an moDNN framework to optimize the memory usage in DNN training. moDNN supports automatic tuning of DNN training code to match any given memory budget (not smaller than the theoretical lower bound). By taking full advantage of overlapping computations and data transfers, we have developed heuristics to judiciously schedule data offloading and prefetching, together with training algorithm selection, to optimize the memory usage. We further introduce a new sub-batch size selection method which also greatly reduces the memory usage. moDNN can save the memory usage up to 50X, compared with the ideal case which assumes that the GPU memory is sufficient to hold all data. When executing moDNN on a GPU with 12GB memory, the performance loss is only 8%, which is much lower than that caused by the best known existing approach, vDNN. moDNN is also applicable to multiple GPUs and attains 1.84X average speedup on two GPUs. Download Paper (PDF; Only available from the DATE venue WiFi)
12:45	2.2.4	HYPERPOWER: POWER- AND MEMORY-CONSTRAINED HYPER-PARAMETER OPTIMIZATION FOR NEURAL NETWORKS Speaker: Dimitrios Stamoulis, Carnegie Mellon University, US Authors: Dimitrios Stamoulis¹, Ermao Cai¹, Da-Cheng Juan² and Diana Marculescu¹ ¹Carnegie Mellon University, US; ²Google Research, US Abstract While selecting the hyper-parameters of Neural Networks (NNs) has been so far treated as an art, the emergence of more complex, deeper architectures poses increasingly more challenges to designers and Machine Learning (ML) practitioners, especially when power and memory constraints need to be considered. In this work, we propose HyperPower, a framework that enables efficient Bayesian optimization and random search in the context of power- and memory-constrained hyper-parameter optimization for NNs running on a given hardware platform. HyperPower is the first work (i) to show that power consumption can be used as a low-cost, a priori known constraint, and (ii) to propose predictive models for the power and memory of NNs executing on GPUs. Thanks to HyperPower, the number of function evaluations and the best test error achieved by a constraint-unaware method are reached up to 112.99x and 30.12x faster, respectively, while never considering invalid configurations. HyperPower significantly speeds up the hyper-parameter optimization, achieving up to 57.20x more function evaluations compared to constraint-unaware methods for a given time interval, effectively yielding significant accuracy improvements by up to 67.6%. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-1, 84	RECOM: AN EFFICIENT RESISTIVE ACCELERATOR FOR COMPRESSED DEEP NEURAL NETWORKS Speaker: Houxiang Ji, Shanghai Jiao Tong University, CN Authors: Houxiang Ji¹, Linghao Song², Li Jiang¹, Hai (Helen) Li³ and Yiran Chen² ¹Shanghai Jiao Tong University, CN; ²Duke University, US; ³Duke University/TUM-IAS, US Abstract Deep Neural Networks (DNNs) play a key role in prevailing machine learning applications. Resistive random-access memory (ReRAM) is capable of both computation and storage, contributing to the acceleration on DNNs process in memory. Besides, DNNs have a significant amount of zero weights, which provides a possibility to reduce computation cost by skipping ineffectual calculations on zero weights. However, the irregular distribution of zero weights in DNNs makes it difficult for resistive accelerators to take advantage of the sparsity, because resistive accelerators have a high reliance on regular matrix-vector multiplication in ReRAM. In this work, we propose ReCom, the first resistive accelerator to support sparse DNN processing. ReCom is an efficient resistive accelerator for compressed deep neural networks, where DNN weights are structurally compressed to eliminate zero parameters and become more friendly to computation in ReRAM, and zero DNN activations are also considered at the same time. Two technologies, Structurally-compressed Weight Oriented Fetching (SWOF) and In-layer Pipeline for Memory and Computation (IPMC),are particularly proposed to efficiently process the compressed DNNs in ReRAM. In our evaluation, ReCom can achieve 3.37x speedup and 2.41x energy efficiency compared to a state-of-the-art resistive accelerator. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-2, 645	SPARSENN: AN ENERGY-EFFICIENT NEURAL NETWORK ACCELERATOR EXPLOITING INPUT AND OUTPUT SPARSITY Speaker: Jingyang Zhu, Hong Kong University of Science and Technology, HK Authors: Jingyang Zhu, Jingbo Jiang, Xizi Chen and Chi-Ying Tsui, Hong Kong University of Science and Technology, HK Abstract Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computational complexity poses a challenge to the hardware design. In this work, we leverage the intrinsic activation sparsity of DNN to substantially reduce the execution cycles and the energy consumption. An end-to-end training algorithm is proposed to develop a lightweight (less than 5% overhead) run-time predictor for the output activation sparsity on the fly. Furthermore, an energy-efficient hardware architecture, SparseNN, is proposed to exploit both the input and output sparsity. SparseNN is a scalable architecture with distributed memories and processing elements connected through a dedicated on-chip network. Compared with the state-of-the-art accelerators which only exploit the input sparsity, SparseNN can achieve a 10%-70% improvement in throughput and a power reduction of around 50%. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break in Großer Saal and Saal 1 Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD). Lunch Breaks (Großer Saal + Saal 1) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 20, 2018 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 21, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 22, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00 Keynote Lecture in "Saal 2" 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

11:30

2.2.1

(Best Paper Award Candidate)
MATIC: LEARNING AROUND ERRORS FOR EFFICIENT LOW-VOLTAGE NEURAL NETWORK ACCELERATORS
Speaker:
Sung Kim, University of Washington, US
Authors:
Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze and Visvesh Sathe, University of Washington, US
Abstract
As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly. However, while accelerators with high performance and efficiency on convolutional deep neural networks (Conv-DNNs) have been developed, less progress has been made with regards to fully-connected DNNs (FC-DNNs). In this paper, we propose MATIC (Memory Adaptive Training with In-situ Canaries), a methodology that enables aggressive voltage scaling of accelerator weight memories to improve the energy-efficiency of DNN accelerators. To enable accurate operation with voltage overscaling, MATIC combines the characteristics of destructive SRAM reads with the error resilience of neural networks in a memory-adaptive training process. Furthermore, PVT-related voltage margins are eliminated using bit-cells from synaptic weights as in-situ canaries to track runtime environmental variation. Demonstrated on a low-power DNN accelerator that we fabricate in 65 nm CMOS, MATIC enables up to 60-80 mV of voltage overscaling (3.3x total energy reduction versus the nominal voltage), or 18.6x application error reduction.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

2.2.2

MAXIMIZING SYSTEM PERFORMANCE BY BALANCING COMPUTATION LOADS IN LSTM ACCELERATORS
Speaker:
Junki Park, POSTECH, KR
Authors:
Junki Park¹, Jaeha Kung¹, Wooseok Yi² and Jae-Joon Kim¹
¹Pohang University of Science and Techology, KR; ²POSTECH CITE, KR
Abstract
The LSTM is a popular neural network model for modeling or analyzing the time-varying data. The main operation of LSTM is a matrix-vector multiplication and it becomes sparse (spMxV) due to the widely-accepted weight pruning in deep learning. This paper presents a new sparse matrix format, named CBSR, to maximize the inference speed of the LSTM accelerator. In the CBSR format, speed-up is achieved by balancing out the computation loads over PEs. Along with the new format, we present a simple network transformation to completely remove the hardware overhead incurred when using the CBSR format. Also, the detailed analysis on the impact of network size or the number of PEs is performed, which lacks in the prior work. The simulation results show 16~38% improvement in the system performance compared to the well-known CSC/CSR format. The power analysis is also performed in 65nm CMOS technology to show 9~22% energy savings.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

2.2.3

MODNN: MEMORY OPTIMAL DNN TRAINING ON GPUS
Speaker:
Xiaoming Chen, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Xiaoming Chen, Danny Z. Chen and Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
Graphics processing units (GPUs) are widely adopted to accelerate the training of deep neural networks (DNNs). However, the limited GPU memory size restricts the maximum scale of DNNs that can be trained on GPUs, which presents serious challenges. This paper proposes an moDNN framework to optimize the memory usage in DNN training. moDNN supports automatic tuning of DNN training code to match any given memory budget (not smaller than the theoretical lower bound). By taking full advantage of overlapping computations and data transfers, we have developed heuristics to judiciously schedule data offloading and prefetching, together with training algorithm selection, to optimize the memory usage. We further introduce a new sub-batch size selection method which also greatly reduces the memory usage. moDNN can save the memory usage up to 50X, compared with the ideal case which assumes that the GPU memory is sufficient to hold all data. When executing moDNN on a GPU with 12GB memory, the performance loss is only 8%, which is much lower than that caused by the best known existing approach, vDNN. moDNN is also applicable to multiple GPUs and attains 1.84X average speedup on two GPUs.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:45

2.2.4

HYPERPOWER: POWER- AND MEMORY-CONSTRAINED HYPER-PARAMETER OPTIMIZATION FOR NEURAL NETWORKS
Speaker:
Dimitrios Stamoulis, Carnegie Mellon University, US
Authors:
Dimitrios Stamoulis¹, Ermao Cai¹, Da-Cheng Juan² and Diana Marculescu¹
¹Carnegie Mellon University, US; ²Google Research, US
Abstract
While selecting the hyper-parameters of Neural Networks (NNs) has been so far treated as an art, the emergence of more complex, deeper architectures poses increasingly more challenges to designers and Machine Learning (ML) practitioners, especially when power and memory constraints need to be considered. In this work, we propose HyperPower, a framework that enables efficient Bayesian optimization and random search in the context of power- and memory-constrained hyper-parameter optimization for NNs running on a given hardware platform. HyperPower is the first work (i) to show that power consumption can be used as a low-cost, a priori known constraint, and (ii) to propose predictive models for the power and memory of NNs executing on GPUs. Thanks to HyperPower, the number of function evaluations and the best test error achieved by a constraint-unaware method are reached up to 112.99x and 30.12x faster, respectively, while never considering invalid configurations. HyperPower significantly speeds up the hyper-parameter optimization, achieving up to 57.20x more function evaluations compared to constraint-unaware methods for a given time interval, effectively yielding significant accuracy improvements by up to 67.6%.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

IP1-1, 84

RECOM: AN EFFICIENT RESISTIVE ACCELERATOR FOR COMPRESSED DEEP NEURAL NETWORKS
Speaker:
Houxiang Ji, Shanghai Jiao Tong University, CN
Authors:
Houxiang Ji¹, Linghao Song², Li Jiang¹, Hai (Helen) Li³ and Yiran Chen²
¹Shanghai Jiao Tong University, CN; ²Duke University, US; ³Duke University/TUM-IAS, US
Abstract
Deep Neural Networks (DNNs) play a key role in prevailing machine learning applications. Resistive random-access memory (ReRAM) is capable of both computation and storage, contributing to the acceleration on DNNs process in memory. Besides, DNNs have a significant amount of zero weights, which provides a possibility to reduce computation cost by skipping ineffectual calculations on zero weights. However, the irregular distribution of zero weights in DNNs makes it difficult for resistive accelerators to take advantage of the sparsity, because resistive accelerators have a high reliance on regular matrix-vector multiplication in ReRAM. In this work, we propose ReCom, the first resistive accelerator to support sparse DNN processing. ReCom is an efficient resistive accelerator for compressed deep neural networks, where DNN weights are structurally compressed to eliminate zero parameters and become more friendly to computation in ReRAM, and zero DNN activations are also considered at the same time. Two technologies, Structurally-compressed Weight Oriented Fetching (SWOF) and In-layer Pipeline for Memory and Computation (IPMC),are particularly proposed to efficiently process the compressed DNNs in ReRAM. In our evaluation, ReCom can achieve 3.37x speedup and 2.41x energy efficiency compared to a state-of-the-art resistive accelerator.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:01

IP1-2, 645

SPARSENN: AN ENERGY-EFFICIENT NEURAL NETWORK ACCELERATOR EXPLOITING INPUT AND OUTPUT SPARSITY
Speaker:
Jingyang Zhu, Hong Kong University of Science and Technology, HK
Authors:
Jingyang Zhu, Jingbo Jiang, Xizi Chen and Chi-Ying Tsui, Hong Kong University of Science and Technology, HK
Abstract
Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computational complexity poses a challenge to the hardware design. In this work, we leverage the intrinsic activation sparsity of DNN to substantially reduce the execution cycles and the energy consumption. An end-to-end training algorithm is proposed to develop a lightweight (less than 5% overhead) run-time predictor for the output activation sparsity on the fly. Furthermore, an energy-efficient hardware architecture, SparseNN, is proposed to exploit both the input and output sparsity. SparseNN is a scalable architecture with distributed memories and processing elements connected through a dedicated on-chip network. Compared with the state-of-the-art accelerators which only exploit the input sparsity, SparseNN can achieve a 10%-70% improvement in throughput and a power reduction of around 50%.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

End of session
Lunch Break in Großer Saal and Saal 1

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018