10.5 Emerging Machine Learning Applications and Models

Time	Label	Presentation Title Authors
11:00	10.5.1	COMMUNICATION-EFFICIENT VIEW-POOLING FOR DISTRIBUTED INFERENCE WITH MULTI-VIEW NEURAL NETWORKS Speaker: Manik Singhal, School of Electrical and Computer Engineering, Purdue University, US Authors: Manik Singhal, Anand Raghunathan and Vijay Raghunathan, Purdue University, US Abstract Multi-view object detection or the problem of detecting an object using multiple viewpoints, is an important problem in computer vision with varied applications such as distributed smart cameras and collaborative drone swarms. Multi-view object detection algorithms based on deep neural networks (DNNs) achieve high accuracy by {em view pooling}, or aggregating features corresponding to the different views. However, when these algorithms are realized on networks of edge devices, the communication cost incurred by view pooling often dominates the overall latency and energy consumption. In this paper, we propose techniques for communication-efficient view pooling that can be used to improve the efficiency of distributed multi-view object detection and apply them to state-of-the-art multi-view DNNs. First, we propose {em significance-aware selective view pooling}, which identifies and communicates only those features from each view that are likely to impact the pooled result (and hence, the final output of the DNN). Second, we propose {em multi-resolution feature view pooling}, which divides views into dominant and non-dominant views, and down-scales the features from non-dominant views using an additional network layer before communicating them for pooling. The dominant and non-dominant views are pooled separately and the results are jointly used to derive the final classification. We implement and evaluate the proposed pooling schemes using a model test-bed of twelve Raspberry Pi 3b+ devices and show that they achieve 9X - 36X reduction in data communicated and 1.8X reduction in inference latency, with no degradation in accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.5.2	AN ANOMALY COMPREHENSION NEURAL NETWORK FOR SURVEILLANCE VIDEOS ON TERMINAL DEVICES Speaker: Yuan Cheng, Shanghai Jiao Tong University, CN Authors: Yuan Cheng¹, Guangtai Huang², Peining Zhen¹, Bin Liu², Hai-Bao Chen¹, Ngai Wong³ and Hao Yu² ¹Shanghai Jiao Tong University, CN; ²Southern University of Science and Technology, CN; ³University of Hong Kong, HK Abstract Anomaly comprehension in surveillance videos is more challenging than detection. This work introduces the design of a lightweight and fast anomaly comprehension neural network. For comprehension, a spatio-temporal LSTM model is developed based on the structured, tensorized time-series features extracted from surveillance videos. Deep compression of network size is achieved by tensorization and quantization for the implementation on terminal devices. Experiments on large-scale video anomaly dataset UCF-Crime demonstrate that the proposed network can achieve an impressive inference speed of 266 FPS on a GTX-1080Ti GPU, which is 4.29 faster than ConvLSTM-based method; a 3.34% AUC improvement with 5.55% accuracy niche versus the 3D-CNN based approach; and at least 15k× parameter reduction and 228× storage compression over the RNN-based approaches. Moreover, the proposed framework has been realized on an ARM-core based IOT board with only 2.4W power consumption. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.5.3	BYNQNET: BAYESIAN NEURAL NETWORK WITH QUADRATIC ACTIVATIONS FOR SAMPLING-FREE UNCERTAINTY ESTIMATION ON FPGA Speaker: Hiromitsu Awano, Osaka University, JP Authors: Hiromitsu Awano and Masanori Hashimoto, Osaka University, JP Abstract An efficient inference algorithm for Bayesian neural network (BNN) named BYNQNet, Bayesian neural network with quadratic activations, and its FPGA implementation are proposed. As neural networks find applications in mission critical systems, uncertainty estimations in network inference become increasingly important. BNN is a theoretically grounded solution to deal with uncertainty in neural network by treating network parameters as random variables. However, an inference in BNN involves Monte Carlo (MC) sampling, i.e., a stochastic forwarding is repeated N times with randomly sampled network parameters, which results in N times slower inference compared to non-Bayesian approach. Although recent papers proposed sampling-free algorithms for BNN inference, they still require evaluation of complex functions such as a cumulative distribution function (CDF) of Gaussian distribution for propagating uncertainties through nonlinear activation functions such as ReLU and Heaviside, which requires considerable amount of resources for hardware implementation. Contrary to conventional BNN, BYNQNet employs quadratic nonlinear activation functions and hence the uncertainty propagation can be achieved using only polynomial operations. Our numerical experiment reveals that BYNQNet has comparative accuracy with MC-based BNN which requires N=10 forwardings. We also demonstrate that BYNQNet implemented on Xilinx PYNQ-Z1 FPGA board achieves the throughput of 131x10^3 images per second and the energy efficiency of 44.7×10^3 images per joule, which corresponds to 4.07x and 8.99x improvements from the state-of-the-art MC-based BNN accelerator. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session

Time

Label

Presentation Title
Authors

11:00

10.5.1

COMMUNICATION-EFFICIENT VIEW-POOLING FOR DISTRIBUTED INFERENCE WITH MULTI-VIEW NEURAL NETWORKS
Speaker:
Manik Singhal, School of Electrical and Computer Engineering, Purdue University, US
Authors:
Manik Singhal, Anand Raghunathan and Vijay Raghunathan, Purdue University, US
Abstract
Multi-view object detection or the problem of detecting an object using multiple viewpoints, is an important problem in computer vision with varied applications such as distributed smart cameras and collaborative drone swarms. Multi-view object detection algorithms based on deep neural networks (DNNs) achieve high accuracy by {em view pooling}, or aggregating features corresponding to the different views. However, when these algorithms are realized on networks of edge devices, the communication cost incurred by view pooling often dominates the overall latency and energy consumption. In this paper, we propose techniques for communication-efficient view pooling that can be used to improve the efficiency of distributed multi-view object detection and apply them to state-of-the-art multi-view DNNs. First, we propose {em significance-aware selective view pooling}, which identifies and communicates only those features from each view that are likely to impact the pooled result (and hence, the final output of the DNN). Second, we propose {em multi-resolution feature view pooling}, which divides views into dominant and non-dominant views, and down-scales the features from non-dominant views using an additional network layer before communicating them for pooling. The dominant and non-dominant views are pooled separately and the results are jointly used to derive the final classification. We implement and evaluate the proposed pooling schemes using a model test-bed of twelve Raspberry Pi 3b+ devices and show that they achieve 9X - 36X reduction in data communicated and 1.8X reduction in inference latency, with no degradation in accuracy.
Download Paper (PDF; Only available from the DATE venue WiFi)

11:30

10.5.2

AN ANOMALY COMPREHENSION NEURAL NETWORK FOR SURVEILLANCE VIDEOS ON TERMINAL DEVICES
Speaker:
Yuan Cheng, Shanghai Jiao Tong University, CN
Authors:
Yuan Cheng¹, Guangtai Huang², Peining Zhen¹, Bin Liu², Hai-Bao Chen¹, Ngai Wong³ and Hao Yu²
¹Shanghai Jiao Tong University, CN; ²Southern University of Science and Technology, CN; ³University of Hong Kong, HK
Abstract
Anomaly comprehension in surveillance videos is more challenging than detection. This work introduces the design of a lightweight and fast anomaly comprehension neural network. For comprehension, a spatio-temporal LSTM model is developed based on the structured, tensorized time-series features extracted from surveillance videos. Deep compression of network size is achieved by tensorization and quantization for the implementation on terminal devices. Experiments on large-scale video anomaly dataset UCF-Crime demonstrate that the proposed network can achieve an impressive inference speed of 266 FPS on a GTX-1080Ti GPU, which is 4.29 faster than ConvLSTM-based method; a 3.34% AUC improvement with 5.55% accuracy niche versus the 3D-CNN based approach; and at least 15k× parameter reduction and 228× storage compression over the RNN-based approaches. Moreover, the proposed framework has been realized on an ARM-core based IOT board with only 2.4W power consumption.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

10.5.3

BYNQNET: BAYESIAN NEURAL NETWORK WITH QUADRATIC ACTIVATIONS FOR SAMPLING-FREE UNCERTAINTY ESTIMATION ON FPGA
Speaker:
Hiromitsu Awano, Osaka University, JP
Authors:
Hiromitsu Awano and Masanori Hashimoto, Osaka University, JP
Abstract
An efficient inference algorithm for Bayesian neural network (BNN) named BYNQNet, Bayesian neural network with quadratic activations, and its FPGA implementation are proposed. As neural networks find applications in mission critical systems, uncertainty estimations in network inference become increasingly important. BNN is a theoretically grounded solution to deal with uncertainty in neural network by treating network parameters as random variables. However, an inference in BNN involves Monte Carlo (MC) sampling, i.e., a stochastic forwarding is repeated N times with randomly sampled network parameters, which results in N times slower inference compared to non-Bayesian approach. Although recent papers proposed sampling-free algorithms for BNN inference, they still require evaluation of complex functions such as a cumulative distribution function (CDF) of Gaussian distribution for propagating uncertainties through nonlinear activation functions such as ReLU and Heaviside, which requires considerable amount of resources for hardware implementation. Contrary to conventional BNN, BYNQNet employs quadratic nonlinear activation functions and hence the uncertainty propagation can be achieved using only polynomial operations. Our numerical experiment reveals that BYNQNet has comparative accuracy with MC-based BNN which requires N=10 forwardings. We also demonstrate that BYNQNet implemented on Xilinx PYNQ-Z1 FPGA board achieves the throughput of 131x10^3 images per second and the energy efficiency of 44.7×10^3 images per joule, which corresponds to 4.07x and 8.99x improvements from the state-of-the-art MC-based BNN accelerator.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

End of session