DATE 2017

Double MAC: Doubling the Performance of Convolutional Neural Networks on Modern FPGAs

Dong Nguyen^a, Daewoo Kim^b and Jongeun Lee^c
School of Electrical and Computer Engineering, UNIST, Ulsan, Korea.
^adongnm@unist.ac.kr
^bdaewoo@unist.ac.kr
^cjlee@unist.ac.kr

ABSTRACT

This paper presents a novel method to double the computation rate of convolutional neural network (CNN) accelerators by packing two multiply-and-accumulate (MAC) operations into one DSP block of off-the-shelf FPGAs (called Double MAC). While a general SIMD MAC using a single DSP block seems impossible, our solution is tailored for the kind of MAC operations required for a convolution layer. Our preliminary evaluation shows that not only can our Double MAC approach increase the computation throughput of a CNN layer by twice with essentially the same resource utilization, the network level performance can also be improved by 14 ~ 84% over a highly optimized state-of-the-art accelerator solution depending on the CNN hyper-parameters.

Full Text (PDF)