Double MAC: Doubling the Performance of Convolutional Neural Networks on Modern FPGAs
Dong Nguyena, Daewoo Kimb and Jongeun Leec
School of Electrical and Computer Engineering, UNIST, Ulsan, Korea.
adongnm@unist.ac.kr
bdaewoo@unist.ac.kr
cjlee@unist.ac.kr
ABSTRACT
This paper presents a novel method to double the computation rate of convolutional neural network (CNN) accelerators by packing two multiply-and-accumulate (MAC) operations into one DSP block of off-the-shelf FPGAs (called Double MAC). While a general SIMD MAC using a single DSP block seems impossible, our solution is tailored for the kind of MAC operations required for a convolution layer. Our preliminary evaluation shows that not only can our Double MAC approach increase the computation throughput of a CNN layer by twice with essentially the same resource utilization, the network level performance can also be improved by 14 ~ 84% over a highly optimized state-of-the-art accelerator solution depending on the CNN hyper-parameters.