Joint Sparsity with Mixed Granularity for Efficient GPU Implementation
Chuliang Guo1, Xingang Yan1, Yufei Chen1, He Li2, Xunzhao Yin1,a and Cheng Zhuo1,b
1Zhejiang University, Hangzhou, China
axzyin1@zju.edu.cn
bczhuo@zju.edu.cn
2University of Cambridge, Cambridge, UK
ABSTRACT
Given the over-parameterization property in recent deep neural networks, sparsification is widely used to compress networks and save memory footprint. Unstructured sparsity, i.e., fine-grained pruning, can help preserve model accuracy, while structured sparsity, i.e., coarse-grained pruning, is preferred for general-purpose hardwares, e.g., GPUs. This paper proposes a novel joint sparsity pattern using mixed granularity to take advantage of both unstructured and structured sparsity. We utilize a heuristic strategy to infer the joint sparsity pattern by mixing vector-wise fine-grained and block-wise coarse-grained pruning masks. Experimental results show that the joint sparsity can achieve higher model accuracy and sparsity ratio while consistently maintaining moderate inference speed for VGG-16 on CIFAR-100 in comparison to the commonly used block sparsity and balanced sparsity strategies.
Keywords: Structured Sparsity, Network Pruning, Machine Learning, Lightweight Architecture.