Accelerating Local Binary Pattern Networks with Software-Programmable FPGAs

Jeng-Hau Lin1,a, Atieh Lotfi1,b, Vahideh Akhlaghi1,c, Zhuowen Tu2,1,d and Rajesh K. Gupta1,e
1Dept. of Computer Science and Engineering
2Dept. of Cognitive Science, UC San Diego
ajel252@ucsd.edu
balotfi@ucsd.edu
cvakhlagh@ucsd.edu
dztu@ucsd.edu
ergupta@ucsd.edu

ABSTRACT


Fueled by the success of mobile devices, the computational demands on these platforms have been rising faster than the computational and storage capacities or energy availability to perform tasks ranging from recognizing speech, images to automated reasoning and cognition. While the success of convolutional neural networks (CNNs) have contributed to such a vision, these algorithms stay out of the reach of limited computing and storage capabilities of mobile platforms. It is clear to most researchers that such a transition can only be achieved by using dedicated hardware accelerators on these platforms. However, CNNs with arithmetic-intensive operations remain particularly unsuitable for such acceleration both computationally as well as for the high memory bandwidth needs of highly parallel processing required. In this paper, we implement and optimize an alternative genre of networks, local binary pattern network (LBPNet) which eliminates arithmetic operations by combinatorial operations thus substantially boosting the efficiency of hardware implementation. LBPNet is built upon a radically different view of the arithmetic operations sought by conventional neural networks to overcome limitations posed by compression and quantization methods used for hardware implementation of CNNs. This paper explores in depth the design and implementation of both an architecture and critical optimizations of LBPNet for realization in accelerator hardware and provides a comparison of results with the stateof- art CNN on multiple datasets.

Keywords: Deep learning hardware accelerator, FPGA, High-level-synthesis.



Full Text (PDF)