On the Automatic Exploration of Weight Sharing for Deep Neural Network Compression

Etienne Dupuis1,a, David Novo2, Ian O’Connor1,b and Alberto Bosio1,c

1Ecole Centrale de Lyon, Institut des Nanotechnologies de Lyon, France
aetienne.dupuis@ec-lyon.fr
bian.oconnor@ec-lyon.fr
calberto.bosio@ec-lyon.fr
2LIRMM, Université de Montpellier, CNRS, France
david.novo@lirmm.fr

ABSTRACT

Deep neural networks demonstrate impressive levels of performance, particularly in computer vision and speech recognition. However, the computational workload and associated storage inhibit their potential in resource-limited embedded systems. The approximate computing paradigm has been widely explored in the literature. It improves performance and energyefficiency by relaxing the need for fully accurate operations. There are a large number of implementation options with very different approximation strategies (such as pruning, quantization, low-rank factorization, knowledge distillation, etc.). To the best of our knowledge, no automated approach exists to explore, select and generate the best approximate versions of a given convolutional neural network (CNN) according to the design objectives. The goal of this work in progress is to demonstrate that the design space exploration phase can enable significant network compression without noticeable accuracy loss. We demonstrate this via an example based on weight sharing and show that our method can obtain a 4× compression rate in an int-16 version of LeNet-5 (5-layer 1,720-kbit CNNs) without re-training and without any accuracy loss.

Keywords: Deep Neural Networks, Approximate Computing, Model Compression, Weight Sharing, Design Space Exploration, Embedded System, Hardware Accelerator.



Full Text (PDF)