DATE 2021

Joint Sparsity with Mixed Granularity for Efficient GPU Implementation

Chuliang Guo¹, Xingang Yan¹, Yufei Chen¹, He Li², Xunzhao Yin^1,a and Cheng Zhuo^1,b
¹Zhejiang University, Hangzhou, China
^axzyin1@zju.edu.cn
^bczhuo@zju.edu.cn
²University of Cambridge, Cambridge, UK

ABSTRACT

Given the over-parameterization property in recent deep neural networks, sparsification is widely used to compress networks and save memory footprint. Unstructured sparsity, i.e., fine-grained pruning, can help preserve model accuracy, while structured sparsity, i.e., coarse-grained pruning, is preferred for general-purpose hardwares, e.g., GPUs. This paper proposes a novel joint sparsity pattern using mixed granularity to take advantage of both unstructured and structured sparsity. We utilize a heuristic strategy to infer the joint sparsity pattern by mixing vector-wise fine-grained and block-wise coarse-grained pruning masks. Experimental results show that the joint sparsity can achieve higher model accuracy and sparsity ratio while consistently maintaining moderate inference speed for VGG-16 on CIFAR-100 in comparison to the commonly used block sparsity and balanced sparsity strategies.

Keywords: Structured Sparsity, Network Pruning, Machine Learning, Lightweight Architecture.

Full Text (PDF)