BenQ: Benchmarking Automated Quantization on Deep Neural Network Accelerators

Zheng Weia, Xingjun Zhangb, Jingbo Li, Zeyu Ji and Jia Wei
School of Computer Science and Technology, X'an Jiaotong University, X'an, China
afrank.wei@stu.xjtu.edu.cn
bxjzhang@xjtu.edu.cn

ABSTRACT


Hardware-aware automated quantization promises to unlock an entirely new algorithm-hardware co-design paradigm for efficiently accelerating deep neural network (DNN) inference by incorporating the hardware cost into the reinforcement learning (RL) -based quantization strategy search process. Existing works usually design an automated quantization algorithm targeting one hardware accelerator with a device-specific performance model or pre-collected data. However, determining the hardware cost is non-trivial for algorithm experts due to their lack of cross-disciplinary knowledge in computer architecture, compiler, and physical chip design. Such a barrier limits reproducibility and fair comparison. Moreover, it is notoriously challenging to interpret the results due to the lack of quantitative metrics. To this end, we first propose BenQ, which includes various RLbased automated quantization algorithms with aligned settings and encapsulates two off-the-shelf performance predictors with standard OpenAI Gym API. Then, we leverage cosine similarity and manhattan distance to interpret the similarity between the searched policies. The experiments show that different automated quantization algorithms can achieve near equivalent optimal trade-offs because of the high similarity between the searched policies, which provides insights for revisiting the innovations in automated quantization algorithms.

Keywords: Reinforcement Learning, Automated Quantization, DNN Accelerator, Benchmark.



Full Text (PDF)