A Uniform Latency Model for DNN Accelerators with Diverse Architectures and Dataflows

Linyan Mei1,2,e, Huichu Liu1,a, Tony Wu1,b, H. Ekin Sumbul1,c Marian Verhelst2,f and Edith Beigne1,d
1Meta Reality Labs
ahuichu@fb.com
btonyfwu@fb.com
cekinsumbul@fb.com
dEdith.Beigne@fb.com
2MICAS-ESAT, KU Leuven
elinyan.mei@kuleuven.be
fmarian.verhelst@kuleuven.be

ABSTRACT


In the early design phase of a Deep Neural Network (DNN) acceleration system, fast energy and latency estimation are important to evaluate the optimality of different design candidates on algorithm, hardware, and algorithm-to-hardware mapping, given the gigantic design space. This work proposes a uniform intra-layer analytical latency model for DNN accelerators that can be used to evaluate diverse architectures and dataflows. It employs a 3-step approach to systematically estimate the latency breakdown of different system components, capture the operation state of each memory component, and identify stall-induced performance bottlenecks. To achieve high accuracy, different memory attributes, operands’ memory sharing scenarios, as well as dataflow implications have been taken into account. Validation against an in-house taped-out accelerator across various DNN layers has shown an average latency model accuracy of 94.3%. To showcase the capability of the proposed model, we carry out 3 case studies to assess respectively the impact of mapping, workloads, and diverse hardware architectures on latency, driving design insights for algorithm-hardware-mapping co-optimization.

Keywords: DNN accelerator, latency, cycle count, cost model, analytical model, dataflow, mapping.



Full Text (PDF)