2.7 Analysis and optimization techniques for neural networks

Time	Label	Presentation Title Authors
11:30	2.7.1	LOW-COMPLEXITY DYNAMIC CHANNEL SCALING OF NOISE-RESILIENT CNN FOR INTELLIGENT EDGE DEVICES Speaker: Younghoon Byun, Pohang University of Science and Technology (POSTECH), KR Authors: Younghoon Byun, Minho Ha, Jeonghun Kim, Sunggu Lee and Youngjoo Lee, Pohang University of Science and Technology (POSTECH), KR Abstract In this paper, we present a novel channel scaling scheme for convolutional neural networks (CNNs), which can improve the recognition accuracy for the practical distorted images without increasing the network complexity. During the training phase, the proposed work first prepares multiple filters under the same CNN architecture by taking account of different noise models and strengths. We then newly introduce an FFT-based noise classifier, which determines the noise property in the received input image by calculating the partial sum of the frequency-domain values. Based on the detected noise class, we dynamically change the filters of each CNN layer to provide the dedicated recognition. Furthermore, we propose a channel scaling technique to reduce the number of active filter parameters if the input data is relatively clean. Experimental results show that the proposed dynamic channel scaling reduces the computational complexity as well as the energy consumption, still providing the acceptable accuracy for intelligent edge devices. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.7.2	DATA LOCALITY OPTIMIZATION OF DEPTHWISE SEPARABLE CONVOLUTIONS FOR CNN INFERENCE ACCELERATORS Speaker: Hao-Ning Wu, National Tsing Hua University, TW Authors: Hao-Ning Wu and Chih-Tsun Huang, National Tsing Hua University, TW Abstract This paper presents a novel framework to maximize the data reusability in the depthwise separable convolutional layers with the Scan execution order of the tiled matrix multiplications. In addition, the fusion scheme across layers is proposed to minimize the data transfer of the intermediate activations, improving both the latency and energy consumption from the external memory accesses. The experimental results are validated against DRAMSim2 for the accurate timing and energy estimation. With a 64K-entry on-chip buffer, our approach can achieve the DRAM energy reduction of 67% on MobileNet V2. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.7.3	A BINARY LEARNING FRAMEWORK FOR HYPERDIMENSIONAL COMPUTING Speaker: Mohsen Imani, University of California, San Diego, US Authors: Mohsen Imani¹, John Messerly¹, Fan Wu², Wang Pi³ and Tajana Rosing¹ ¹University of California San Diego, US; ²University of California Riverside, US; ³Peking University, CN Abstract Brain-inspired Hyperdimensional (HD) computing is a computing paradigm emulating a neuron's activity in high-dimensional space. In practice, HD first encodes all data points to high-dimensional vectors, called hypervectors, and then performs the classification task in an efficient way using a well-defined set of operations. In order to provide acceptable classification accuracy, the current HD computing algorithms need to map data points to hypervectors with non-binary elements. However, working with non-binary vectors significantly increases the HD computation cost and the amount of memory requirement for both training and inference. This makes HD computing less desirable for embedded devices which often have limited resources and battery. In this paper, we propose BinHD, a novel binarization framework which enables HD computing to be trained and tested using binarized hypervectors. BinHD encodes data points to binarized hypervectors and provides a framework which enables HD to perform the training task with significantly low resources and memory footprint. In inference, BinHD binarizes the model and simplifies the costly Cosine similarity used in existing HD computing algorithms to a hardware-friendly Hamming distance metric. In addition, for the first time, BinHD introduces the concept of learning rate in HD computing which gives an extra knob to the HD to control the training efficiency and accuracy. We accordingly design a digital hardware to accelerate BinHD computation. Our evaluations on four practical classification applications show that BinHD in training (inference) can achieve 12.4× and 6.3× (13.8× and 9.9×) energy efficiency and speedup as compared to the state-of-the-art HD computing algorithm while providing the similar classification accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-11, 247	TYPECNN: CNN DEVELOPMENT FRAMEWORK WITH FLEXIBLE DATA TYPES Speaker: Lukas Sekanina, Brno University of Technology, CZ Authors: Petr Rek and Lukas Sekanina, Brno University of Technology, CZ Abstract The rapid progress in artificial intelligence technologies based on deep and convolutional neural networks (CNN) has led to an enormous interest in efficient implementations of neural networks in embedded devices and hardware. We present a new software framework for the development of (approximate) convolutional neural networks in which the user can define and use various data types for forward (inference) procedure, backward (training) procedure and weights. Moreover, non-standard arithmetic operations such as approximate multipliers can easily be integrated into the CNN under design. This flexibility enables to analyze the impact of chosen data types and non-standard arithmetic operations on CNN training and inference efficiency. The framework was implemented in C++ and evaluated using several case studies. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-12, 963	GUARANTEED COMPRESSION RATE FOR ACTIVATIONS IN CNNS USING A FREQUENCY PRUNING APPROACH Speaker: Sebatian Vogel, Robert Bosch GmbH, DE Authors: Sebastian Vogel¹, Christoph Schorn¹, Andre Guntoro¹ and Gerd Ascheid² ¹Robert Bosch GmbH, DE; ²RWTH Aachen University, DE Abstract Convolutional Neural Networks have become state of the art for many computer vision tasks. However, the size of Neural Networks prevents their application in resource constrained systems. In this work, we present a lossy compression technique for intermediate results of Convolutional Neural Networks. The proposed method offers guaranteed compression rates and additionally adapts to performance requirements. Our experiments with networks for classification and semantic segmentation show, that our method outperforms state-of-the-art compression techniques used in CNN accelerators. Download Paper (PDF; Only available from the DATE venue WiFi)
13:02	IP1-13, 290	RUNTIME MONITORING NEURON ACTIVATION PATTERNS Speaker: Chih-Hong Cheng, fortiss, DE Authors: Chih-Hong Cheng¹, Georg Nührenberg¹ and Hirotoshi Yasuoka² ¹fortiss - Landesforschungsinstitut des Freistaats Bayern, DE; ²DENSO Corporation, JP Abstract For using neural networks in safety critical domains such as automated driving, it is important to know if a decision made by a neural network is supported by prior similarities in training. We propose runtime neuron activation pattern monitoring - after the standard training process, one creates a monitor by feeding the training data to the network again in order to store the neuron activation patterns in abstract form. In operation, a classification decision over an input is further supplemented by examining if a pattern similar (measured by Hamming distance) to the generated pattern is contained in the monitor. If the monitor does not contain any pattern similar to the generated pattern, it raises a warning that the decision is not based on the training data. Our experiments show that, by adjusting the similarity-threshold for activation patterns, the monitors can report a significant portion of misclassfications to be not supported by training with a small false-positive rate, when evaluated on a test set. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break in Lunch Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Breaks (Lunch Area) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 26, 2019 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 27, 2019 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 28, 2019 Coffee Break 10:00 - 11:00 University Booth Best Demo Award Presentation at the University Booth 10:30 Lunch Break 12:30 - 14:00 Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

11:30

2.7.1

LOW-COMPLEXITY DYNAMIC CHANNEL SCALING OF NOISE-RESILIENT CNN FOR INTELLIGENT EDGE DEVICES
Speaker:
Younghoon Byun, Pohang University of Science and Technology (POSTECH), KR
Authors:
Younghoon Byun, Minho Ha, Jeonghun Kim, Sunggu Lee and Youngjoo Lee, Pohang University of Science and Technology (POSTECH), KR
Abstract
In this paper, we present a novel channel scaling scheme for convolutional neural networks (CNNs), which can improve the recognition accuracy for the practical distorted images without increasing the network complexity. During the training phase, the proposed work first prepares multiple filters under the same CNN architecture by taking account of different noise models and strengths. We then newly introduce an FFT-based noise classifier, which determines the noise property in the received input image by calculating the partial sum of the frequency-domain values. Based on the detected noise class, we dynamically change the filters of each CNN layer to provide the dedicated recognition. Furthermore, we propose a channel scaling technique to reduce the number of active filter parameters if the input data is relatively clean. Experimental results show that the proposed dynamic channel scaling reduces the computational complexity as well as the energy consumption, still providing the acceptable accuracy for intelligent edge devices.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

2.7.2

DATA LOCALITY OPTIMIZATION OF DEPTHWISE SEPARABLE CONVOLUTIONS FOR CNN INFERENCE ACCELERATORS
Speaker:
Hao-Ning Wu, National Tsing Hua University, TW
Authors:
Hao-Ning Wu and Chih-Tsun Huang, National Tsing Hua University, TW
Abstract
This paper presents a novel framework to maximize the data reusability in the depthwise separable convolutional layers with the Scan execution order of the tiled matrix multiplications. In addition, the fusion scheme across layers is proposed to minimize the data transfer of the intermediate activations, improving both the latency and energy consumption from the external memory accesses. The experimental results are validated against DRAMSim2 for the accurate timing and energy estimation. With a 64K-entry on-chip buffer, our approach can achieve the DRAM energy reduction of 67% on MobileNet V2.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

2.7.3

A BINARY LEARNING FRAMEWORK FOR HYPERDIMENSIONAL COMPUTING
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Mohsen Imani¹, John Messerly¹, Fan Wu², Wang Pi³ and Tajana Rosing¹
¹University of California San Diego, US; ²University of California Riverside, US; ³Peking University, CN
Abstract
Brain-inspired Hyperdimensional (HD) computing is a computing paradigm emulating a neuron's activity in high-dimensional space. In practice, HD first encodes all data points to high-dimensional vectors, called hypervectors, and then performs the classification task in an efficient way using a well-defined set of operations. In order to provide acceptable classification accuracy, the current HD computing algorithms need to map data points to hypervectors with non-binary elements. However, working with non-binary vectors significantly increases the HD computation cost and the amount of memory requirement for both training and inference. This makes HD computing less desirable for embedded devices which often have limited resources and battery. In this paper, we propose BinHD, a novel binarization framework which enables HD computing to be trained and tested using binarized hypervectors. BinHD encodes data points to binarized hypervectors and provides a framework which enables HD to perform the training task with significantly low resources and memory footprint. In inference, BinHD binarizes the model and simplifies the costly Cosine similarity used in existing HD computing algorithms to a hardware-friendly Hamming distance metric. In addition, for the first time, BinHD introduces the concept of learning rate in HD computing which gives an extra knob to the HD to control the training efficiency and accuracy. We accordingly design a digital hardware to accelerate BinHD computation. Our evaluations on four practical classification applications show that BinHD in training (inference) can achieve 12.4× and 6.3× (13.8× and 9.9×) energy efficiency and speedup as compared to the state-of-the-art HD computing algorithm while providing the similar classification accuracy.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

IP1-11, 247

TYPECNN: CNN DEVELOPMENT FRAMEWORK WITH FLEXIBLE DATA TYPES
Speaker:
Lukas Sekanina, Brno University of Technology, CZ
Authors:
Petr Rek and Lukas Sekanina, Brno University of Technology, CZ
Abstract
The rapid progress in artificial intelligence technologies based on deep and convolutional neural networks (CNN) has led to an enormous interest in efficient implementations of neural networks in embedded devices and hardware. We present a new software framework for the development of (approximate) convolutional neural networks in which the user can define and use various data types for forward (inference) procedure, backward (training) procedure and weights. Moreover, non-standard arithmetic operations such as approximate multipliers can easily be integrated into the CNN under design. This flexibility enables to analyze the impact of chosen data types and non-standard arithmetic operations on CNN training and inference efficiency. The framework was implemented in C++ and evaluated using several case studies.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:01

IP1-12, 963

GUARANTEED COMPRESSION RATE FOR ACTIVATIONS IN CNNS USING A FREQUENCY PRUNING APPROACH
Speaker:
Sebatian Vogel, Robert Bosch GmbH, DE
Authors:
Sebastian Vogel¹, Christoph Schorn¹, Andre Guntoro¹ and Gerd Ascheid²
¹Robert Bosch GmbH, DE; ²RWTH Aachen University, DE
Abstract
Convolutional Neural Networks have become state of the art for many computer vision tasks. However, the size of Neural Networks prevents their application in resource constrained systems. In this work, we present a lossy compression technique for intermediate results of Convolutional Neural Networks. The proposed method offers guaranteed compression rates and additionally adapts to performance requirements. Our experiments with networks for classification and semantic segmentation show, that our method outperforms state-of-the-art compression techniques used in CNN accelerators.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:02

IP1-13, 290

RUNTIME MONITORING NEURON ACTIVATION PATTERNS
Speaker:
Chih-Hong Cheng, fortiss, DE
Authors:
Chih-Hong Cheng¹, Georg Nührenberg¹ and Hirotoshi Yasuoka²
¹fortiss - Landesforschungsinstitut des Freistaats Bayern, DE; ²DENSO Corporation, JP
Abstract
For using neural networks in safety critical domains such as automated driving, it is important to know if a decision made by a neural network is supported by prior similarities in training. We propose runtime neuron activation pattern monitoring - after the standard training process, one creates a monitor by feeding the training data to the network again in order to store the neuron activation patterns in abstract form. In operation, a classification decision over an input is further supplemented by examining if a pattern similar (measured by Hamming distance) to the generated pattern is contained in the monitor. If the monitor does not contain any pattern similar to the generated pattern, it raises a warning that the decision is not based on the training data. Our experiments show that, by adjusting the similarity-threshold for activation patterns, the monitors can report a significant portion of misclassfications to be not supported by training with a small false-positive rate, when evaluated on a test set.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

End of session
Lunch Break in Lunch Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Coffee Break 10:30 - 11:30
Lunch Break 13:00 - 14:30
Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Wednesday, March 27, 2019

Coffee Break 10:00 - 11:00
Lunch Break 12:30 - 14:30
Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Thursday, March 28, 2019

Coffee Break 10:00 - 11:00
University Booth Best Demo Award Presentation at the University Booth 10:30
Lunch Break 12:30 - 14:00
Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50
Coffee Break 15:30 - 16:00