DATE 2020

Sparsity-Aware Caches to Accelerate Deep Neural Networks

Vinod Ganesan^1,a, Sanchari Sen^2,d, Pratyush Kumar^1,b, Neel Gala³, Kamakoti Veezhinathan^1,c and Anand Raghunathan^2,e

¹Department of Computer Science and Engineering, IIT Madras, India
^avinodg@cse.iitm.ac.in
^bpratyush@cse.iitm.ac.in
^ckama@cse.iitm.ac.in
²School of Electrical and Computer Engineering, Purdue University
^dsen9@purdue.edu
^eraghunathan@purdue.edu
³InCore Semiconductors Pvt. Ltd
neelgala@incoresemi.com

ABSTRACT

Deep Neural Networks (DNNs) have transformed the field of artificial intelligence and represent the state-of-the-art in many machine learning tasks. There is considerable interest in using DNNs to realize edge intelligence in highly resourceconstrained devices such as wearables and IoT sensors. Unfortunately, the high computational requirements of DNNs pose a serious challenge to their deployment in these systems. Moreover, due to tight cost (and hence, area) constraints, these devices are often unable to accommodate hardware accelerators, requiring DNNs to execute on the General Purpose Processor (GPP) cores that they contain. We address this challenge through lightweight micro-architectural extensions to the memory hierarchy of GPPs that exploit a key attribute of DNNs, viz. sparsity, or the prevalence of zero values. We propose SparseCache, an enhanced cache architecture that utilizes a null cache based on a Ternary Content Addressable Memory (TCAM) to compactly store zerovalued cache lines, while storing non-zero lines in a conventional data cache. By storing address rather than values for zero-valued cache lines, SparseCache increases the effective cache capacity, thereby reducing the overall miss rate and execution time. SparseCache utilizes a Zero Detector and Approximator (ZDA) and Address Merger (AM) to perform reads and writes to the null cache. We evaluate SparseCache on four state-of-the-art DNNs programmed with the Caffe framework. SparseCache achieves 5- 28% reduction in miss-rate, which translates to 5-21% reduction in execution time, with only 0.1% area and 3.8% power overhead in comparison to a low-end Intel Atom Z-series processor.

Full Text (PDF)