Exploiting Activation Sparsity In Dram-Based Scalable CNN and RNN Accelerators

Tobi Delbrück
ETHZ, CH

ABSTRACT

Large deep neural networks (DNNs) need lots of fast memory for states and weights. Although DRAM is the dominant high-throughput, low-cost memory (costing 20X less than SRAM), its long random access latency is bad for the unpredictable access patterns in spiking neural networks (SNNs). But sparsely active SNNs are key to biological computational efficiency. This talk reports on our 5 year developments of convolutional and recurrent deep neural network hardware accelerators that exploit spatial and temporal sparsity like SNNs but achieve SOA throughput, power efficiency and latency using DRAM for the large weight and state memory required by powerful DNNs.