Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning

Kwangbae Leea, Hoseung Kimb, Hayun Leec and Dongkun Shind
Department of Electrical and Computer Engineering, Sungkyunkwan University, Korea
akblee93@skku.edu
bghtmd123@skku.edu
clhy920806@skku.edu
ddongkun@skku.edu

ABSTRACT


Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. Pruning techniques are classified into two types: fine-grained pruning and coarse-grained pruning. Finegrained pruning eliminates individual connections if they are insignificant and thus usually generates irregular networks. Therefore, it is hard to reduce model execution time. Coarsegrained pruning such as filter-level and channel-level techniques can make hardware-friendly networks. However, it can suffer from low accuracy. In this paper, we focus on the group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group to mitigate the irregularity of pruned networks while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques select weight groups to be pruned at group-size-aligned locations. In this paper, we propose a more flexible approach, called unaligned group-level pruning, to improve the accuracy of the compressed model. We can find the optimal solution of the unaligned group selection problem with dynamic programming. Our technique also generates balanced sparse networks to get load balance at parallel computing units. Experiments demonstrate that the 2D unaligned group-level pruning shows 3.12% a lower error rate at ResNet-20 network on CIFAR-10 compared to the previous 2D aligned group-level pruning under 95% of sparsity.

Keywords: Deep Neural Networks, Pruning, DNN compression, Alignment, Cache-aware



Full Text (PDF)