10.7 Architectures for emerging machine learning techniques

Time	Label	Presentation Title Authors
11:00	10.7.1	LEARNING TO INFER: RL-BASED SEARCH FOR DNN PRIMITIVE SELECTION ON HETEROGENEOUS EMBEDDED SYSTEMS Speaker: Miguel de Prado, HES-SO/ETHZ, CH Authors: Miguel de Prado¹, Nuria Pazos² and Luca Benini¹ ¹Integrated Systems Laboratory, ETH Zurich & Haute Ecole Arc Ingénierie, HES-SO, CH; ²Haute Ecole Arc Ingénierie, HES-SO, CH Abstract Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time ``to-solution'' is much better than with Random Search and achieves up to 15x better results for a short-time search. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.7.2	MEMORY TROJAN ATTACK ON NEURAL NETWORK ACCELERATORS Speaker: Xing Hu, University of California, Santa Barbara, CN Authors: Yang Zhao¹, Xing Hu¹, Shuangchen Li¹, Jing Ye², Lei Deng¹, Yu Ji³, Jianyu Xu⁴, Dong Wu³ and Yuan Xie¹ ¹University of California, Santa Barbara, US; ²State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, CN; ³Tsinghua University, University of California, Santa Barbara, CN; ⁴Tsinghua University, CN Abstract Neural network accelerators are widely deployed in application systems for computer vision, speech recognition, and machine translation. Due to ubiquitous deployment of these systems, a strong incentive rises for adversaries to attack such artificial intelligence (AI) systems. Trojan is one of the most important attack model in hardware security domain. Hardware Trojans are malicious modifications to original ICs inserted by adversaries, which lead the system to malfunction after being triggered. The globalization of the semiconductor gives a chance for the adversary to conduct the hardware Trojan attacks. Previous works design Neural Network (NN) trojans with access to the model, toolchain, and hardware platform. The threat model is impractical which hinders their real adoption. In this work, we propose a memory Trojan methodology without the help of toolchain manipulation and model parameter information. We first leverage the memory access patterns to identify the input image data. Then we propose a Trojan triggering method based on the dedicated input image other than the circuit events, which has better controllability. The triggering mechanism works well even with environment noise and preprocessing towards the original images. In the end, we implement and verify the effectiveness of accuracy degradation attack. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.7.3	DEEP POSITRON: A DEEP NEURAL NETWORK USING THE POSIT NUMBER SYSTEM Speaker: Zachariah Carmichael, Rochester Institute of Technology, US Authors: Zachariah Carmichael¹, Hamed F. Langroudi¹, Char Khazanov¹, Jeffrey Lillie¹, John L. Gustafson² and Dhireesha Kudithipudi¹ ¹Rochester Institute of Technology, US; ²National University of Singapore, SG Abstract The recent surge of interest in Deep Neural Networks (DNNs) has led to increasingly complex networks that tax computational and memory resources. Many DNNs presently use 16-bit or 32-bit floating point operations. Significant performance and power gains can be obtained when DNN accelerators support low-precision numerical formats. Despite considerable research, there is still a knowledge gap on how low-precision operations can be realized for both DNN training and inference. In this work, we propose a DNN architecture, Deep Positron, with posit numerical format operating successfully at less or equal to 8 bits for inference. We propose a precision-adaptable FPGA soft core for exact multiply-and-accumulate for uniform comparison across three numerical formats, fixed, floating-point and posit. Preliminary results demonstrate that 8-bit posit has better accuracy than 8-bit fixed or floating-point for three different low-dimensional datasets. Moreover, the accuracy is comparable to 32-bit floating-point on a Xilinx Virtex-7 FPGA device. The trade-offs between DNN performance and hardware resources, i.e. latency, power, and resource utilization, show that posit outperforms in accuracy and latency at 8-bit and below. Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	10.7.4	LEARNING TO SKIP INEFFECTUAL RECURRENT COMPUTATIONS IN LSTMS Speaker: Zhengyun Ji, McGill University, CA Authors: Arash Ardakani, Zhengyun Ji and Warren Gross, McGill University, CA Abstract Long Short-Term Memory (LSTM) is a special class of recurrent neural network, which has shown remarkable successes in processing sequential data. The typical architecture of an LSTM involves a set of states and gates: the states retain information over arbitrary time intervals and the gates regulate the flow of information. Due to the recursive nature of LSTMs, they are computationally intensive to deploy on edge devices with limited hardware resources. To reduce the computational complexity of LSTMs, we first introduce a method that learns to retain only the important information in the states by pruning redundant information. We then show that our method can prune over 90% of information in the states without incurring any accuracy degradation over a set of temporal tasks. This observation suggests that a large fraction of the recurrent computations are ineffectual and can be avoided to speed up the process during the inference as they involve noncontributory multiplications/accumulations with zero-valued states. Finally, we introduce a custom hardware accelerator that can perform the recurrent computations using both sparse and dense states. Experimental measurements show that performing the computations using the sparse states speeds up the process and improves energy efficiency (GOPS/W) by up to 5.2x when compared to implementation results of the accelerator performing the computations using dense states. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-8, 244	DESIGN OF HARDWARE-FRIENDLY MEMORY ENHANCED NEURAL NETWORKS Speaker: Ann Franchesca Laguna, University of Notre Dame, US Authors: Ann Franchesca Laguna, Michael Niemier and X, Sharon Hu, University of Notre Dame, US Abstract Neural networks with external memories have been proven to minimize catastrophic forgetting, a major problem in applications such as lifelong and few-shot learning. However, such memory enhanced neural networks (MENNs) typically often require a large number of floating point-based cosine distance metric calculations to perform necessary attentional operations, which greatly increases energy consumption and hardware cost. This paper investigates other distance metrics in such neural networks in order to achieve more efficient hardware implementations in MENNs. We propose using content addressable memories (CAMs) to accelerate and simplify attentional operations. We focus on reducing the bit precision, memory size (MxD) and using alternative distance metric calculations such as L1, L2, and L∞ to perform attentional mechanism computations for MENNs. Our hardware friendly approach implements fixed point L∞ distance calculations via ternary content addressable memories (TCAM) and fixed point L1 and L2 distance calculations on a general purpose graphical processing unit (GPGPU) (Computing-in-memory arrays (CIM) might also be used). As a representative example, a 32-bit floating point-based cosine distance MENN with MD multiplications has a 99.06% accuracy for the Omniglot 5-way 5-shot classification task. Based on our approach with just 4-bit fixed point precision, a L∞-L1 distance hardware accuracy of 90.35% can be achieved with just 16 TCAM lookups and 16D addition and subtraction operations. With 4-bit precision and a L∞-L2 distance, hardware classification accuracies of 96.00% are possible. Hence, 16 TCAM lookups and 16D multiplication operations are needed. Assuming the hardware memory has 512 entries, the number of multiplication operations is reduced by 32x versus the cosine distance approach. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP5-9, 107	ENERGY-EFFICIENT INFERENCE ACCELERATOR FOR MEMORY-AUGMENTED NEURAL NETWORKS ON AN FPGA Speaker: Seongsik Park, Seoul National University, KR Authors: Seongsik Park, Jaehee Jang, Seijoon Kim and Sungroh Yoon, Seoul National University, KR Abstract Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding. Download Paper (PDF; Only available from the DATE venue WiFi)
12:32	IP5-10, 345	HDCLUSTER: AN ACCURATE CLUSTERING USING BRAIN-INSPIRED HIGH-DIMENSIONAL COMPUTING Speaker: Mohsen Imani, University of California, San Diego, US Authors: Mohsen Imani, Yeseong Kim, thomas Worley, Saransh Gupta and Tajana Rosing, University of California San Diego, US Abstract Internet of things has increased the rate of data generation. Clustering is one of the most important tasks in this domain to ﬁnd the latent correlation between data. However, performing today's clustering tasks is often inefﬁcient due to the data movement cost between cores and memory. We propose HDCluster, a brain-inspired unsupervised learning algorithm which clusters input data in a high-dimensional space by fully mapping and processing in memory. Instead of clustering input data in either ﬁxed-point or ﬂoating-point representation, HDCluster maps data to vectors with dimension in thousands, called hypervectors, to cluster them. Our evaluation shows that HDCluster provides better clustering quality for the tasks that involve a large amount of data while providing a potential for accelerating in a memory-centric architecture. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break in Lunch Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Breaks (Lunch Area) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 26, 2019 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 27, 2019 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 28, 2019 Coffee Break 10:00 - 11:00 University Booth Best Demo Award Presentation at the University Booth 10:30 Lunch Break 12:30 - 14:00 Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

11:00

10.7.1

LEARNING TO INFER: RL-BASED SEARCH FOR DNN PRIMITIVE SELECTION ON HETEROGENEOUS EMBEDDED SYSTEMS
Speaker:
Miguel de Prado, HES-SO/ETHZ, CH
Authors:
Miguel de Prado¹, Nuria Pazos² and Luca Benini¹
¹Integrated Systems Laboratory, ETH Zurich & Haute Ecole Arc Ingénierie, HES-SO, CH; ²Haute Ecole Arc Ingénierie, HES-SO, CH
Abstract
Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time ``to-solution'' is much better than with Random Search and achieves up to 15x better results for a short-time search.
Download Paper (PDF; Only available from the DATE venue WiFi)

11:30

10.7.2

MEMORY TROJAN ATTACK ON NEURAL NETWORK ACCELERATORS
Speaker:
Xing Hu, University of California, Santa Barbara, CN
Authors:
Yang Zhao¹, Xing Hu¹, Shuangchen Li¹, Jing Ye², Lei Deng¹, Yu Ji³, Jianyu Xu⁴, Dong Wu³ and Yuan Xie¹
¹University of California, Santa Barbara, US; ²State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, CN; ³Tsinghua University, University of California, Santa Barbara, CN; ⁴Tsinghua University, CN
Abstract
Neural network accelerators are widely deployed in application systems for computer vision, speech recognition, and machine translation. Due to ubiquitous deployment of these systems, a strong incentive rises for adversaries to attack such artificial intelligence (AI) systems. Trojan is one of the most important attack model in hardware security domain. Hardware Trojans are malicious modifications to original ICs inserted by adversaries, which lead the system to malfunction after being triggered. The globalization of the semiconductor gives a chance for the adversary to conduct the hardware Trojan attacks. Previous works design Neural Network (NN) trojans with access to the model, toolchain, and hardware platform. The threat model is impractical which hinders their real adoption. In this work, we propose a memory Trojan methodology without the help of toolchain manipulation and model parameter information. We first leverage the memory access patterns to identify the input image data. Then we propose a Trojan triggering method based on the dedicated input image other than the circuit events, which has better controllability. The triggering mechanism works well even with environment noise and preprocessing towards the original images. In the end, we implement and verify the effectiveness of accuracy degradation attack.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

10.7.3

DEEP POSITRON: A DEEP NEURAL NETWORK USING THE POSIT NUMBER SYSTEM
Speaker:
Zachariah Carmichael, Rochester Institute of Technology, US
Authors:
Zachariah Carmichael¹, Hamed F. Langroudi¹, Char Khazanov¹, Jeffrey Lillie¹, John L. Gustafson² and Dhireesha Kudithipudi¹
¹Rochester Institute of Technology, US; ²National University of Singapore, SG
Abstract
The recent surge of interest in Deep Neural Networks (DNNs) has led to increasingly complex networks that tax computational and memory resources. Many DNNs presently use 16-bit or 32-bit floating point operations. Significant performance and power gains can be obtained when DNN accelerators support low-precision numerical formats. Despite considerable research, there is still a knowledge gap on how low-precision operations can be realized for both DNN training and inference. In this work, we propose a DNN architecture, Deep Positron, with posit numerical format operating successfully at less or equal to 8 bits for inference. We propose a precision-adaptable FPGA soft core for exact multiply-and-accumulate for uniform comparison across three numerical formats, fixed, floating-point and posit. Preliminary results demonstrate that 8-bit posit has better accuracy than 8-bit fixed or floating-point for three different low-dimensional datasets. Moreover, the accuracy is comparable to 32-bit floating-point on a Xilinx Virtex-7 FPGA device. The trade-offs between DNN performance and hardware resources, i.e. latency, power, and resource utilization, show that posit outperforms in accuracy and latency at 8-bit and below.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:15

10.7.4

LEARNING TO SKIP INEFFECTUAL RECURRENT COMPUTATIONS IN LSTMS
Speaker:
Zhengyun Ji, McGill University, CA
Authors:
Arash Ardakani, Zhengyun Ji and Warren Gross, McGill University, CA
Abstract
Long Short-Term Memory (LSTM) is a special class of recurrent neural network, which has shown remarkable successes in processing sequential data. The typical architecture of an LSTM involves a set of states and gates: the states retain information over arbitrary time intervals and the gates regulate the flow of information. Due to the recursive nature of LSTMs, they are computationally intensive to deploy on edge devices with limited hardware resources. To reduce the computational complexity of LSTMs, we first introduce a method that learns to retain only the important information in the states by pruning redundant information. We then show that our method can prune over 90% of information in the states without incurring any accuracy degradation over a set of temporal tasks. This observation suggests that a large fraction of the recurrent computations are ineffectual and can be avoided to speed up the process during the inference as they involve noncontributory multiplications/accumulations with zero-valued states. Finally, we introduce a custom hardware accelerator that can perform the recurrent computations using both sparse and dense states. Experimental measurements show that performing the computations using the sparse states speeds up the process and improves energy efficiency (GOPS/W) by up to 5.2x when compared to implementation results of the accelerator performing the computations using dense states.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

IP5-8, 244

DESIGN OF HARDWARE-FRIENDLY MEMORY ENHANCED NEURAL NETWORKS
Speaker:
Ann Franchesca Laguna, University of Notre Dame, US
Authors:
Ann Franchesca Laguna, Michael Niemier and X, Sharon Hu, University of Notre Dame, US
Abstract
Neural networks with external memories have been proven to minimize catastrophic forgetting, a major problem in applications such as lifelong and few-shot learning. However, such memory enhanced neural networks (MENNs) typically often require a large number of floating point-based cosine distance metric calculations to perform necessary attentional operations, which greatly increases energy consumption and hardware cost. This paper investigates other distance metrics in such neural networks in order to achieve more efficient hardware implementations in MENNs. We propose using content addressable memories (CAMs) to accelerate and simplify attentional operations. We focus on reducing the bit precision, memory size (MxD) and using alternative distance metric calculations such as L1, L2, and L∞ to perform attentional mechanism computations for MENNs. Our hardware friendly approach implements fixed point L∞ distance calculations via ternary content addressable memories (TCAM) and fixed point L1 and L2 distance calculations on a general purpose graphical processing unit (GPGPU) (Computing-in-memory arrays (CIM) might also be used). As a representative example, a 32-bit floating point-based cosine distance MENN with MD multiplications has a 99.06% accuracy for the Omniglot 5-way 5-shot classification task. Based on our approach with just 4-bit fixed point precision, a L∞-L1 distance hardware accuracy of 90.35% can be achieved with just 16 TCAM lookups and 16D addition and subtraction operations. With 4-bit precision and a L∞-L2 distance, hardware classification accuracies of 96.00% are possible. Hence, 16 TCAM lookups and 16D multiplication operations are needed. Assuming the hardware memory has 512 entries, the number of multiplication operations is reduced by 32x versus the cosine distance approach.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:31

IP5-9, 107

ENERGY-EFFICIENT INFERENCE ACCELERATOR FOR MEMORY-AUGMENTED NEURAL NETWORKS ON AN FPGA
Speaker:
Seongsik Park, Seoul National University, KR
Authors:
Seongsik Park, Jaehee Jang, Seijoon Kim and Sungroh Yoon, Seoul National University, KR
Abstract
Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:32

IP5-10, 345

HDCLUSTER: AN ACCURATE CLUSTERING USING BRAIN-INSPIRED HIGH-DIMENSIONAL COMPUTING
Speaker:
Mohsen Imani, University of California, San Diego, US
Authors:
Mohsen Imani, Yeseong Kim, thomas Worley, Saransh Gupta and Tajana Rosing, University of California San Diego, US
Abstract
Internet of things has increased the rate of data generation. Clustering is one of the most important tasks in this domain to ﬁnd the latent correlation between data. However, performing today's clustering tasks is often inefﬁcient due to the data movement cost between cores and memory. We propose HDCluster, a brain-inspired unsupervised learning algorithm which clusters input data in a high-dimensional space by fully mapping and processing in memory. Instead of clustering input data in either ﬁxed-point or ﬂoating-point representation, HDCluster maps data to vectors with dimension in thousands, called hypervectors, to cluster them. Our evaluation shows that HDCluster provides better clustering quality for the tasks that involve a large amount of data while providing a potential for accelerating in a memory-centric architecture.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

End of session
Lunch Break in Lunch Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Coffee Break 10:30 - 11:30
Lunch Break 13:00 - 14:30
Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Wednesday, March 27, 2019

Coffee Break 10:00 - 11:00
Lunch Break 12:30 - 14:30
Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Thursday, March 28, 2019

Coffee Break 10:00 - 11:00
University Booth Best Demo Award Presentation at the University Booth 10:30
Lunch Break 12:30 - 14:00
Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50
Coffee Break 15:30 - 16:00