L2L: A Highly Accurate Log_2_Lead Quantization of Pre-trained Neural Networks
Salim Ullah1,a, Siddharth Gupta2,c, Kapil Ahuja2,d, Aruna Tiwari2,e and Akash Kumar1,b
1Technische Universität Dresden, Germany
asalim.ullah@tu-dresden.de
bakash.kumar@tu-dresden.de
2Indian Institute of Technology Indore, India
cms1804101006@iiti.ac.in
dkahuja@iiti.ac.in
eartiwari@iiti.ac.in
ABSTRACT
Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel quantization technique for parameters of pre-trained deep neural networks. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Compared to the single-precision floating-point numbers-based implementation, our proposed 8- bit quantization technique generates only ∼ 1% and ∼ 0:4%, loss in the top-1 and top-5 accuracies respectively for VGG16 network using ImageNet dataset.
Keywords: Machine Learning, Neural Networks, Quantization.