SACC: Split and Combine Approach to Reduce the Off-chip Memory Accesses of LSTM Accelerators

Saurabh Tewaria, Anshul Kumarb and Kolin Paulc
Dept. of Computer Science and Engg. Indian Institute of Technology Delhi New Delhi, India
asaurabh.tewari@cse.iitd.ac.in
banshul@cse.iitd.ac.in
ckolin@cse.iitd.ac.in

ABSTRACT


Long Short-Term Memory (LSTM) networks are widely used in speech recognition and natural language processing. Recently, a large number of LSTM accelerators have been proposed for the efficient processing of LSTM networks. The high energy consumption of these accelerators limits their usage in energy-constrained systems. LSTM accelerators repeatedly access large weight matrices from off-chip memory, significantly contributing to energy consumption. Reducing off-chip memory access is the key to improving the energy efficiency of these accelerators. We propose a data reuse approach that splits and combines the LSTM cell computations in a way that reduces the off-chip memory accesses of LSTM hidden state matrices by 50%. In addition, the data reuse efficiency of our approach is independent of on-chip memory size, making it more suitable for small on-chip memory LSTM accelerators. Experimental results show that our approach reduces off-chip memory access by 28% and 32%, and energy consumption by 13% and 16%, respectively, compared to conventional approaches for character level Language Modelling and Speech Recognition LSTM models.

Keywords: LSTM, Off-Chip Memory Access, Data-Reuse.



Full Text (PDF)