Hardware Architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition
Vladimir Rybalkin1,a, Norbert Wehn1,b, Mohammad Reza Yousefi2,c and Didier Stricker2,d
1Microelectronic Systems Design Research Group, University of Kaiserslautern, Germany.
arybalkin@eit.uni-kl.de
bwehn@eit.uni-kl.de
2Augmented Vision Department, German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany.
cyousefi@dfki.de
ddidier.stricker@dfki.de
ABSTRACT
Optical Character Recognition is conversion of printed or handwritten text images into machine-encoded text. It is a building block of many processes such as machine translation, text-to-speech conversion and text mining. Bidirectional Long Short-Term Memory Neural Networks have shown a superior performance in character recognition with respect to other types of neural networks. In this paper, to the best of our knowledge, we propose the first hardware architecture of Bidirectional Long Short-Term Memory Neural Network with Connectionist Temporal Classification for Optical Character Recognition. Based on the new architecture, we present an FPGA hardware accelerator that achieves 459 times higher throughput than state-of-the-art. Visual recognition is a typical task on mobile platforms that usually use two scenarios either the task runs locally on embedded processor or offloaded to a cloud to be run on high performance machine. We show that computationally intensive visual recognition task benefits from being migrated to our dedicated hardware accelerator and outperforms high-performance CPU in terms of runtime, while consuming less energy than low power systems with negligible loss of recognition accuracy.