DATE 2020

Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning

Kwangbae Lee^a, Hoseung Kim^b, Hayun Lee^c and Dongkun Shin^d
Department of Electrical and Computer Engineering, Sungkyunkwan University, Korea
^akblee93@skku.edu
^bghtmd123@skku.edu
^clhy920806@skku.edu
^ddongkun@skku.edu

ABSTRACT

Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. Pruning techniques are classified into two types: fine-grained pruning and coarse-grained pruning. Finegrained pruning eliminates individual connections if they are insignificant and thus usually generates irregular networks. Therefore, it is hard to reduce model execution time. Coarsegrained pruning such as filter-level and channel-level techniques can make hardware-friendly networks. However, it can suffer from low accuracy. In this paper, we focus on the group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group to mitigate the irregularity of pruned networks while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques select weight groups to be pruned at group-size-aligned locations. In this paper, we propose a more flexible approach, called unaligned group-level pruning, to improve the accuracy of the compressed model. We can find the optimal solution of the unaligned group selection problem with dynamic programming. Our technique also generates balanced sparse networks to get load balance at parallel computing units. Experiments demonstrate that the 2D unaligned group-level pruning shows 3.12% a lower error rate at ResNet-20 network on CIFAR-10 compared to the previous 2D aligned group-level pruning under 95% of sparsity.

Keywords: Deep Neural Networks, Pruning, DNN compression, Alignment, Cache-aware

Full Text (PDF)