Data Locality Optimization of Depthwise Separable Convolutions for CNN Inference Accelerators
Hao-Ning Wua and Chih-Tsun Huangb
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan ROC
awuhoward2002@gmail.com
bcthuang@cs.nthu.edu.tw
ABSTRACT
This paper presents a novel framework to maximize the data reusability in the depthwise separable convolutional layers with the Scan execution order of the tiled matrix multiplications. In addition, the fusion scheme across layers is proposed to minimize the data transfer of the intermediate activations, improving both the latency and energy consumption from the external memory accesses. The experimental results are validated against DRAMSim2 for the accurate timing and energy estimation. With a 64K-entry on-chip buffer, our approach can achieve the DRAM energy reduction of 67% on MobileNet V2.