DATE 2020

A Convolutional Result Sharing Approach for Binarized Neural Network Inference

Ya-Chun Chang¹, Chia-Chun Lin¹, Yi-Ting Lin¹, Yung-Chih Chen² and Chun-Yao Wang¹
¹Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, R.O.C.
²Department of Computer Science and Engineering, Yuan Ze University, Chungli, Taiwan, R.O.C.

ABSTRACT

The binary-weight-binary-input binarized neural network (BNN) allows a much more efficient way to implement convolutional neural networks (CNNs) on mobile platforms. During inference, the multiply-accumulate operations in BNNs can be reduced to XNOR-popcount operations. Thus, the XNORpopcount operations dominate most of the computation in BNNs. To reduce the number of required operations in convolution layers of BNNs, we decompose 3-D filters into 2-D filters and exploit the repeated filters, inverse filters, and similar filters to share results. By sharing the results, the number of operations in convolution layers of BNNs can be reduced effectively. Experimental results show that the number of operations can be reduced by about 60% for CIFAR-10 on BNNs while keeping the accuracy loss within 1% of originally trained network.

Keywords: Convolutional Neural Network, Binarized Neural Network, Approximate Computing.

Full Text (PDF)