Exploration of Memory Access Optimization for FPGA-based 3D CNN Accelerator
Teng Tiana, Xi Jinb, Letian Zhaoc, Xiaotian Wangd, Jie Wang e and Wei Wuf
Key Laboratory of Strongly-Coupled Quantum Matter Physics, Chinese Academy of Sciences Institute of Microelectronics, School of Physical Sciences, University of Science and Technology of China. Hefei, Anhui, China
atianteng@mail.ustc.edu.cn
bjinxi@ustc.edu.cn
czhaolt@mail.ustc.edu.cn
dwxtdsg@mail.ustc.edu.cn
ewangj1e@mail.ustc.edu.cn
fwuw1993@mail.ustc.edu.cn
ABSTRACT
Three-dimensional convolutional networks (3D CNNs) are used efficiently in various video recognition applications. Compared to traditional 2D CNNs, extra temporal dimension causes 3D CNNs more computationally intensive and to have a larger memory footprint. Therefore, the memory optimization is extremely crucial in this case. This paper presents a design space exploration of memory access optimization for FPGA-based 3D CNN accelerator. We present a non-overlapping data tiling method for contiguous off-chip memory access and explore on-chip data reuse opportunity by leveraging different loop ordering strategies. We propose a hardware architecture design which can flexibly support different loop ordering strategies for each 3D CNN layer. With the help of hardware/software co-design, we can provide the optimal configuration toward an energy-efficient and high-performance accelerator design. According to the experiments on AlexNet, VGG16, and C3D, our optimal model reduces up to 84% DRAM accesses and 55% energy consumption on C3D compared to a baseline model, and demonstrates state-of-the-art performance compared to prior FPGA implementations.
Keywords: 3D CNN, data tiling, Loop ordering, Energy-efficient