Self-Supervised Quantization of Pre-Trained Neural Networks for Multiplierless Acceleration

Sebastian Vogel1,a, Jannik Springer1,2,b, Andre Guntoro1,c and Gerd Ascheid2
1Robert Bosch GmbH, Renningen, Germany
asebastian.vogel@de.bosch.com
bfixed-term.jannik.springer@de.bosch.com
candre.guntoro@de.bosch.com
2RWTH Aachen University, Aachen, Germany
gerd.ascheid@ice.rwth-aachen.de

ABSTRACT


To host intelligent algorithms such as Deep Neural Networks on embedded devices, it is beneficial to transform the data representation of neural networks into a fixed-point format with reduced bit-width. In this paper we present a novel quantization procedure for parameters and activations of pre-trained neural networks. For 8 bit linear quantization, our procedure achieves close to original network performance without retraining and consequently does not require labeled training data. Additionally, we evaluate our method for powerof-two quantization as well as for a two-hot quantization scheme, enabling shift-based inference. To underline the hardware benefits of a multiplierless accelerator, we propose the design of a shift-based processing element.

Keywords: Quantization, Neural Networks, Hardware, Multiplierless Acceleration.



Full Text (PDF)