Sparsity-Aware Caches to Accelerate Deep Neural Networks

Vinod Ganesan1,a, Sanchari Sen2,d, Pratyush Kumar1,b, Neel Gala3, Kamakoti Veezhinathan1,c and Anand Raghunathan2,e

1Department of Computer Science and Engineering, IIT Madras, India
avinodg@cse.iitm.ac.in
bpratyush@cse.iitm.ac.in
ckama@cse.iitm.ac.in
2School of Electrical and Computer Engineering, Purdue University
dsen9@purdue.edu
eraghunathan@purdue.edu
3InCore Semiconductors Pvt. Ltd
neelgala@incoresemi.com

ABSTRACT

Deep Neural Networks (DNNs) have transformed the field of artificial intelligence and represent the state-of-the-art in many machine learning tasks. There is considerable interest in using DNNs to realize edge intelligence in highly resourceconstrained devices such as wearables and IoT sensors. Unfortunately, the high computational requirements of DNNs pose a serious challenge to their deployment in these systems. Moreover, due to tight cost (and hence, area) constraints, these devices are often unable to accommodate hardware accelerators, requiring DNNs to execute on the General Purpose Processor (GPP) cores that they contain. We address this challenge through lightweight micro-architectural extensions to the memory hierarchy of GPPs that exploit a key attribute of DNNs, viz. sparsity, or the prevalence of zero values. We propose SparseCache, an enhanced cache architecture that utilizes a null cache based on a Ternary Content Addressable Memory (TCAM) to compactly store zerovalued cache lines, while storing non-zero lines in a conventional data cache. By storing address rather than values for zero-valued cache lines, SparseCache increases the effective cache capacity, thereby reducing the overall miss rate and execution time. SparseCache utilizes a Zero Detector and Approximator (ZDA) and Address Merger (AM) to perform reads and writes to the null cache. We evaluate SparseCache on four state-of-the-art DNNs programmed with the Caffe framework. SparseCache achieves 5- 28% reduction in miss-rate, which translates to 5-21% reduction in execution time, with only 0.1% area and 3.8% power overhead in comparison to a low-end Intel Atom Z-series processor.



Full Text (PDF)