Power Optimization Through Peripheral Circuit Reusing Integrated with Loop Tiling for RRAM Crossbar‐based CNN
Yuanhui Ni1, Weiwen Chen1, Wenjuan Cui2, Yuanchun Zhou2 and Keni Qiu1,a
1Capital Normal University, Beijing, China
aqiukn@cnu.edu.cn
2Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
ABSTRACT
Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar‐based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy‐unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, this paper adopts the Peripheral Circuit Unit (PeriCU)‐Reuse scheme to meet a certain power budget. The underlying idea is to put the expensive AD/DAs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. Furthermore, it is observed that memory accesses can be bypassed if two adjacent layers are assigned in the different PeriCU. Then a loop tiling technique is proposed to further improve the energy and throughput of RCS. The experiments of two convolutional applications validate that the PeriCU‐Reuse scheme integrated with the loop tiling techniques can efficiently meet power requirement, and further reduce energy consumption by 61.7%.