PSB-RNN: A Processing-in-Memory Systolic Array Architecture using Block Circulant Matrices for Recurrent Neural Networks

Nagadastagiri Challapalle1,a, Sahithi Rampalli1,b, Makesh Chandran1,c, Gurpreet Kalsi2,e, Sreenivas Subramoney2,f, John Sampson1,d and Vijaykrishnan Narayanan1,f
1Pennsylvania State University, University Park, PA, USA
anrc53@psu.edu
bsvr46@psu.edu
cmzc88@psu.edu
djms1257@psu.edu
evijaykrishnan.narayanan@psu.edu
2Processor Architecture Research Lab, Intel Labs, Bangalore, KA, India
egurpreet.s.kalsi@intel.com
fsreenivas.subramoney@intel.com

ABSTRACT


Recurrent Neural Networks (RNNs) are widely used in Natural Language Processing (NLP) applications as they inherently capture contextual information across spatial and temporal dimensions. Compared to other classes of neural networks, RNNs have more weight parameters as they primarily consist of fully connected layers. Recently, several techniques such as weight pruning, zero-skipping, and block circulant compression have been introduced to reduce the storage and access requirements of RNN weight parameters. In this work, we present a ReRAM crossbar based processing-in-memory (PIM) architecture with systolic dataflow incorporating block circulant compression for RNNs. The block circulant compression decomposes the operations in a fully connected layer into a series of Fourier transforms and point-wise operations resulting in reduced space and computational complexity. We formulate the Fourier transform and point-wise operations into in-situ multiply-and-accumulate (MAC) operations mapped to ReRAM crossbars for high energy efficiency and throughput. We also incorporate systolic dataflow for communication within the crossbar arrays, in contrast to broadcast and multicast communications, to further improve energy efficiency. The proposed architecture achieves average improvements in compute efficiency of 44x and 17x over a custom FPGA architecture and conventional crossbar based architecture implementations, respectively.

Keywords: Recurrent neural network, Processing-inmemory, Block circulant, Fourier transform, ReRAM Crossbar.



Full Text (PDF)