DATE 2022

ENCORE Compression: Exploiting Narrow-width Values for Quantized Deep Neural Networks

Myeongjae Jang^a, Jinkwon Kim^b, Jesung Kim^c and Soontae Kim^d
KAIST School of Computing Daejeon, Republic of Korea
^amyeongjae0409@kaist.ac.kr
^bcoco@kaist.ac.kr
^cjesung.kim@kaist.ac.kr
^dkims@kaist.ac.kr

ABSTRACT

Deep Neural Networks (DNNs) become a practical machine learning algorithm running on various Neural Processing Units (NPUs). For higher performance and lower hardware overheads, DNN datatype reduction through quantization is proposed. Moreover, to solve the memory bottleneck caused by large data size in DNNs, several zero value-aware compression algorithms are used. However, these compression algorithms do not compress modern quantized DNNs well because of decreased zero values.We find that the latest quantized DNNs have data redundancy due to frequent narrow-width values. Because low-precision quantization reduces DNN datatypes to a simple datatype with less bits, scattered DNN data are gathered to a small number of discrete values and incur a biased data distribution. Narrow-width values occupy a large proportion of the biased distribution. Moreover, an appropriate zero run-length bits can be dynamically changed according to DNN sparsity. Based on this observation, we propose a compression algorithm that exploits narrow-width values and variable zero run-length for quantized DNNs. In experiments with three quantized DNNs, our proposed scheme yields an average compression ratio of 2.99.

Keywords: DNN, Compression, Quantization, NPU.

Full Text (PDF)