A Convolutional Result Sharing Approach for Binarized Neural Network Inference
Ya-Chun Chang1, Chia-Chun Lin1, Yi-Ting Lin1, Yung-Chih Chen2 and Chun-Yao Wang1
1Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, R.O.C.
2Department of Computer Science and Engineering, Yuan Ze University, Chungli, Taiwan, R.O.C.
ABSTRACT
The binary-weight-binary-input binarized neural network (BNN) allows a much more efficient way to implement convolutional neural networks (CNNs) on mobile platforms. During inference, the multiply-accumulate operations in BNNs can be reduced to XNOR-popcount operations. Thus, the XNORpopcount operations dominate most of the computation in BNNs. To reduce the number of required operations in convolution layers of BNNs, we decompose 3-D filters into 2-D filters and exploit the repeated filters, inverse filters, and similar filters to share results. By sharing the results, the number of operations in convolution layers of BNNs can be reduced effectively. Experimental results show that the number of operations can be reduced by about 60% for CIFAR-10 on BNNs while keeping the accuracy loss within 1% of originally trained network.
Keywords: Convolutional Neural Network, Binarized Neural Network, Approximate Computing.