DATE 2020

Hardware Acceleration of CNN with One-Hot Quantization of Weights and Activations

Gang Li^1,2,a, Peisong Wang^1,b, Zejian Liu^1,2, Cong Leng¹ and Jian Cheng^1,2,3,c

¹National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
²University of Chinese Academy of Sciences
³Center for Excellence in Brain Science and Intelligence Technology, CAS
^agang.li@nlpr.ia.ac.cn
^bpeisong.wang@nlpr.ia.ac.cn
^cjcheng@nlpr.ia.ac.cn

ABSTRACT

In this paper, we propose a novel one-hot representation for weights and activations in CNN model and demonstrate its benefits on hardware accelerator design. Specifically, rather than merely reducing the bitwidth, we quantize both weights and activations into n-bit integers that containing only one non-zero bit per value. In this way, the massive multiply and accumulates (MACs) are equivalent to additions of powers of two that can be efficiently calculated with histogram based computaitons. Experiments on the ImageNet classification task show that comparable accuracy can be obtained on our proposed One-Hot Networks (OHN) compared to conventional fixed-point networks. As case studies, we evaluate the efficacy of the onehot data representation on two state-of-the-art CNN accelerators on FPGA, our preliminary results show that 50% and 68.5% resource saving can be achieved on DaDianNao and Laconic respectively. Besides, the one-hot optimized Laconic can further achieve an average speedup of 4:94⨯ on AlexNet.

Full Text (PDF)