3.4 Optimizing Computing with Neuromorphic Architectures and Accelerators

Time	Label	Presentation Title Authors
14:30	3.4.1	STRUCTURE OPTIMIZATIONS OF NEUROMORPHIC COMPUTING ARCHITECTURES FOR DEEP NEURAL NETWORKS Speaker: Heechun Park, SNUCAD, KR Authors: Heechun Park and Taewhan Kim, Seoul National University, KR Abstract This work addresses a new structure optimization of neuromorphic computing architectures. This enables to speed up the DNN (deep neural network) computation twice as fast as, theoretically, that of the existing architectures. Precisely, we propose a new structural technique of mixing both of the dendritic and axonal based neuromorphic cores in a way to totally eliminate the inherent non-zero waiting time between cores in the DNN implementation. In addition, in conjunction with the new architecture we propose a technique of maximally utilizing computation units so that the resource overhead of total computation units can be minimized. We have provided a set of experimental data to demonstrate the effectiveness (i.e., speed and area) of our proposed architectural optimizations: ~2x speedup with no accuracy penalty on the neuromorphic computation or improved accuracy with no additional computation time. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.4.2	CCR: A CONCISE CONVOLUTION RULE FOR SPARSE NEURAL NETWORK ACCELERATORS Speaker: Jiajun Li, Institute of Computing Technology, Chinese Academy of Sciences, CN Authors: Jiajun Li, Guihai Yan, Wenyan Lu, Shuhao Jiang, Shijun Gong, Jingya Wu and Xiaowei Li, Institute of Computing Technology, Chinese Academy of Sciences, CN Abstract Convolutional Neural networks (CNNs) have achieved great success in a broad range of applications. As CNN-based methods are often both computation and memory intensive, sparse CNNs have emerged as an effective solution to reduce the amount of computation and memory accesses while maintaining the high accuracy. However, dense CNN accelerators can hardly benefit from the reduction of computations and memory accesses due to the lack of support for irregular and sparse models. This paper proposed a concise convolution rule (CCR) to diminish the gap between sparse CNNs and dense CNN accelerators. CCR transforms a sparse convolution into multiple effective and ineffective ones. The ineffective convolutions in which either the neurons or synapses are all zeros do not contribute to the final results and the computations and memory accesses can be eliminated. The effective convolutions in which both the neurons and synapses are dense can be easily mapped to the existing dense CNN accelerators. Unlike prior approaches which trade complexity for flexibility, CCR advocates a novel approach to reaping the benefits from the reduction of computation and memory accesses as well as the acceleration of the existing dense architectures without intrusive PE modifications. As a case study, we implemented a sparse CNN accelerator, SparseK, following the rationale of CCR. The experiments show that SparseK achieved a speedup of $2.9imes$ on VGG16 compared to a comparably provisioned dense architecture. Compared with state-of-the-art sparse accelerators, SparseK can improve the performance and energy efficiency by 1.8x and 1.5x, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	IP1-7, 273	HIPE: HMC INSTRUCTION PREDICATION EXTENSION APPLIED ON DATABASE PROCESSING Speaker: Diego Tomé, Centrum Wiskunde & Informatica (CWI), BR Authors: Diego Gomes Tomé¹, Paulo Cesar Santos², Luigi Carro², Eduardo Cunha de Almeida³ and Marco Antonio Zanata Alves³ ¹Federal University of Paraná, BR; ²UFRGS, BR; ³UFPR, BR Abstract The recent Hybrid Memory Cube (HMC) is a smart memory which includes functional units inside one logic layer of the 3D stacked memory design. In order to execute instructions inside the Hybrid Memory Cube (HMC), the processor needs to send instructions to be executed near data, keeping most of the pipeline complexity inside the processor. Thus, control-flow and data-flow dependencies are all managed inside the processor, in such way that only update instructions are supported by the HMC. In order to solve data-flow dependencies inside the memory, previous work proposed HMC Instruction Vector Extensions(HIVE), which embeds a high number of functional units with an interlock register bank. In this work, we propose HMC Instruction Prediction Extensions (HIPE), that supports predicated execution inside the memory, in order to transform control-flow dependencies into data-flow dependencies. Our mechanism focuses on removing the high latency iteration between the processor and the smart memory during the execution of branches that depends on data processed inside the memory. In this paper, we evaluate a balanced design of HIVE comparing to x86 and HMC executions.After we show the HIPE mechanism results when executing a database workload, which is a strong candidate to use smart memories. We show interesting trade-offs of performance when comparing our mechanism to previous work. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30		End of session
16:00		Coffee Break in Exhibition Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD). Lunch Breaks (Großer Saal + Saal 1) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 20, 2018 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 21, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 22, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00 Keynote Lecture in "Saal 2" 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

14:30

3.4.1

STRUCTURE OPTIMIZATIONS OF NEUROMORPHIC COMPUTING ARCHITECTURES FOR DEEP NEURAL NETWORKS
Speaker:
Heechun Park, SNUCAD, KR
Authors:
Heechun Park and Taewhan Kim, Seoul National University, KR
Abstract
This work addresses a new structure optimization of neuromorphic computing architectures. This enables to speed up the DNN (deep neural network) computation twice as fast as, theoretically, that of the existing architectures. Precisely, we propose a new structural technique of mixing both of the dendritic and axonal based neuromorphic cores in a way to totally eliminate the inherent non-zero waiting time between cores in the DNN implementation. In addition, in conjunction with the new architecture we propose a technique of maximally utilizing computation units so that the resource overhead of total computation units can be minimized. We have provided a set of experimental data to demonstrate the effectiveness (i.e., speed and area) of our proposed architectural optimizations: ~2x speedup with no accuracy penalty on the neuromorphic computation or improved accuracy with no additional computation time.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:00

3.4.2

CCR: A CONCISE CONVOLUTION RULE FOR SPARSE NEURAL NETWORK ACCELERATORS
Speaker:
Jiajun Li, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Jiajun Li, Guihai Yan, Wenyan Lu, Shuhao Jiang, Shijun Gong, Jingya Wu and Xiaowei Li, Institute of Computing Technology, Chinese Academy of Sciences, CN
Abstract
Convolutional Neural networks (CNNs) have achieved great success in a broad range of applications. As CNN-based methods are often both computation and memory intensive, sparse CNNs have emerged as an effective solution to reduce the amount of computation and memory accesses while maintaining the high accuracy. However, dense CNN accelerators can hardly benefit from the reduction of computations and memory accesses due to the lack of support for irregular and sparse models. This paper proposed a concise convolution rule (CCR) to diminish the gap between sparse CNNs and dense CNN accelerators. CCR transforms a sparse convolution into multiple effective and ineffective ones. The ineffective convolutions in which either the neurons or synapses are all zeros do not contribute to the final results and the computations and memory accesses can be eliminated. The effective convolutions in which both the neurons and synapses are dense can be easily mapped to the existing dense CNN accelerators. Unlike prior approaches which trade complexity for flexibility, CCR advocates a novel approach to reaping the benefits from the reduction of computation and memory accesses as well as the acceleration of the existing dense architectures without intrusive PE modifications. As a case study, we implemented a sparse CNN accelerator, SparseK, following the rationale of CCR. The experiments show that SparseK achieved a speedup of $2.9imes$ on VGG16 compared to a comparably provisioned dense architecture. Compared with state-of-the-art sparse accelerators, SparseK can improve the performance and energy efficiency by 1.8x and 1.5x, respectively.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

IP1-7, 273

HIPE: HMC INSTRUCTION PREDICATION EXTENSION APPLIED ON DATABASE PROCESSING
Speaker:
Diego Tomé, Centrum Wiskunde & Informatica (CWI), BR
Authors:
Diego Gomes Tomé¹, Paulo Cesar Santos², Luigi Carro², Eduardo Cunha de Almeida³ and Marco Antonio Zanata Alves³
¹Federal University of Paraná, BR; ²UFRGS, BR; ³UFPR, BR
Abstract
The recent Hybrid Memory Cube (HMC) is a smart memory which includes functional units inside one logic layer of the 3D stacked memory design. In order to execute instructions inside the Hybrid Memory Cube (HMC), the processor needs to send instructions to be executed near data, keeping most of the pipeline complexity inside the processor. Thus, control-flow and data-flow dependencies are all managed inside the processor, in such way that only update instructions are supported by the HMC. In order to solve data-flow dependencies inside the memory, previous work proposed HMC Instruction Vector Extensions(HIVE), which embeds a high number of functional units with an interlock register bank. In this work, we propose HMC Instruction Prediction Extensions (HIPE), that supports predicated execution inside the memory, in order to transform control-flow dependencies into data-flow dependencies. Our mechanism focuses on removing the high latency iteration between the processor and the smart memory during the execution of branches that depends on data processed inside the memory. In this paper, we evaluate a balanced design of HIVE comparing to x86 and HMC executions.After we show the HIPE mechanism results when executing a database workload, which is a strong candidate to use smart memories. We show interesting trade-offs of performance when comparing our mechanism to previous work.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

End of session

16:00

Coffee Break in Exhibition Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018