DATE 2019

Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture

Kaiwei Zou^1,2,a, Ying Wang^1,2,b, Huawei Li^1,2,c and Xiaowei Li^1,2,d
¹SKLCA, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
²University of Chinese Academy of Sciences, Beijing, China
^azoukaiwei@ict.ac.cn
^bwangying2009@ict.ac.cn
^clihuawei@ict.ac.cn
^dlxw@ict.ac.cn

ABSTRACT

Accelerating deep neural networks on resource-con-strained embedded devices is becoming increasingly important for real-time applications. However, in contrast to the intensive re-search works on specialized neural network inference architectures, there is a lack of study on the acceleration and parallelization of deep learning inference on embedded chip-multiprocessor archi-tectures, which are favored by many real-time applications for su-perb energy-efficiency and scalability. In this work, we investigate the strategies of parallelizing single-pass deep neural network in-ference on embedded on-chip multi-core accelerators. These meth-ods exploit the elasticity and noise-tolerance features of deep learn-ing algorithms to circumvent the bottleneck of on-chip inter-core data moving and reduce the communication overhead aggravated as the core number scales up. The experimental results show that the communication-aware sparsified parallelization method im-proves the system performance by 1.6×-1.1× and achieves 4×-1.6× better interconnects energy efficiency for different neural networks.

Keywords: Parallelization, Multi-core, Inference, Neural network, Embedded devices.

Full Text (PDF)