Hardware Acceleration of CNN with One-Hot Quantization of Weights and Activations
Gang Li1,2,a, Peisong Wang1,b, Zejian Liu1,2, Cong Leng1 and Jian Cheng1,2,3,c
1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2University of Chinese Academy of Sciences
3Center for Excellence in Brain Science and Intelligence Technology, CAS
agang.li@nlpr.ia.ac.cn
bpeisong.wang@nlpr.ia.ac.cn
cjcheng@nlpr.ia.ac.cn
ABSTRACT
In this paper, we propose a novel one-hot representation for weights and activations in CNN model and demonstrate its benefits on hardware accelerator design. Specifically, rather than merely reducing the bitwidth, we quantize both weights and activations into n-bit integers that containing only one non-zero bit per value. In this way, the massive multiply and accumulates (MACs) are equivalent to additions of powers of two that can be efficiently calculated with histogram based computaitons. Experiments on the ImageNet classification task show that comparable accuracy can be obtained on our proposed One-Hot Networks (OHN) compared to conventional fixed-point networks. As case studies, we evaluate the efficacy of the onehot data representation on two state-of-the-art CNN accelerators on FPGA, our preliminary results show that 50% and 68.5% resource saving can be achieved on DaDianNao and Laconic respectively. Besides, the one-hot optimized Laconic can further achieve an average speedup of 4:94⨯ on AlexNet.