HeSA: Heterogeneous Systolic Array Architecture for Compact CNNs Hardware Accelerators
Rui Xu1,a, Sheng Ma2, Yaohua Wang1,b and Yang Guo1,c
1Institute of Microelectronics National University of Defense Technology Changsha, China
anudtxurui@gmail.com
byaowangeth@gmail.com
cguoyang@nudt.edu.cn
2Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha, China
masheng@nudt.edu.cn
ABSTRACT
Compact convolutional neural networks have become a hot research topic. However, we find that the hardware accelerator with systolic arrays processing compact models is extremely performance-inefficient, especially when processing depthwise convolutional layers in the networks.
To make systolic arrays efficient for compact convolutional neural networks, we propose the heterogeneous systolic array (HeSA) architecture. It introduces heterogeneous processing elements that support multiple modes of dataflow, which can further exploit the reuse data chance of depthwise convolutional layers and without changing the architecture of the na¨ıve systolic array. By increasing the utilization rate of processing elements in the array, HeSA improves the performance, throughput, and energy efficiency compared to the standard baseline. Based on our evaluation with typical workloads, HeSA improves the utilization rate of the computing resource in depthwise convolutional layers by 4.5×-5.5× and acquires 1.5-2.2× total performance speedup compared to the standard systolic array architecture. HeSA also improves the onchip data reuse chance and saves over 20% of energy consumption. Meanwhile, the area of HeSA is basically unchanged compared to the baseline due to its simple design.
Keywords: Hardware Accelerator, Architecture, Convolutional Neural Network, Depthwise Separable Convolution, Systolic Array.