A Throughput-Latency Co-Optimised Cascade of Convolutional Neural Network Classifiers
Alexandros Kouris1,a, Stylianos I. Venieris2 and Christos-Savvas Bouganis1,b
1Electrical & Electronic Engineering Dept. Imperial College London, UK
aa.kouris16@imperial.ac.uk
bchristos-savvas.bouganis@imperial.ac.uk
2Samsung AI Center Cambridge, UK
s.venieris@samsung.com
ABSTRACT
Convolutional Neural Networks constitute a prominent AI model for classification tasks, serving a broad span of diverse application domains. To enable their efficient deployment in real-world tasks, the inherent redundancy of CNNs is frequently exploited to eliminate unnecessary computational costs. Driven by the fact that not all inputs require the same amount of computation to drive a confident prediction, multi-precision cascade classifiers have been recently introduced. FPGAs comprise a promising platform for the deployment of such input-dependent computation models, due to their enhanced customisation capabilities. Current literature, however, is limited to throughputoptimised cascade implementations, employing large batching at the expense of a substantial latency aggravation prohibiting their deployment on real-time scenarios. In this work, we introduce a novel methodology for throughput-latency co-optimised cascaded CNN classification, deployed on a custom FPGA architecture tailored to the target application and deployment platform, with respect to a set of user-specified requirements on accuracy and performance. Our experiments indicate that the proposed approach achieves comparable throughput gains with related state-of-the-art works, under substantially reduced overhead in latency, enabling its deployment on latency-sensitive applications.