XpulpNN: Accelerating Quantized Neural Networks on RISC-V Processors Through ISA Extensions

Angelo Garofalo1,a, Giuseppe Tagliavini1,b, Francesco Conti1,2,d, Davide Rossi1,c and Luca Benini1,2,e
1DEI, University of Bologna, Italy
aangelo.garofalo@unibo.it
bgiuseppe.tagliavini@unibo.it
cdavide.rossi@unibo.it
2IIS lab, ETH Zurich, Switzerland
dfconti@iis.ee.ethz.ch
elbenini@iis.ee.ethz.ch

ABSTRACT


Strongly quantized fixed-point arithmetic is considered the key direction to enable the inference of CNNs on low-power, resource-constrained edge devices. However, the deployment of highly quantized Neural Networks at the extreme edge of IoT, on fully programmable MCUs, is currently limited by the lack of support, at the Instruction Set Architecture (ISA) level, for sub-byte fixed-point data types, making it necessary to add numerous instructions for packing and unpacking data when running low-bitwidth (i.e. 2- and 4-bit) QNN kernels, creating a bottleneck for performance and energy efficiency of QNN inference. In this work we present a set of extensions to the RISC-V ISA, aimed at boosting the energy efficiency of low-bitwidth QNNs on low-power microcontroller-class cores. The microarchitecture supporting the new extensions is built on top of a RISC-V core featuring instruction set extensions targeting energy-efficient digital signal processing. To evaluate the extensions, we integrated the core into a full microcontroller system, synthesized and placed&routed in 22nm FDX technology. QNN convolution kernels, implemented on the new core, run 5.3× and 8.9× faster when considering 4- and 2-bit data operands respectively, compared to the baseline processor only supporting 8-bit SIMD instructions. With a peak of 279 GMAC/s/W, the proposed solution achieves 9× better energy efficiency compared to the baseline and two orders of magnitudes better energy efficiency compared to state-of-the-art microcontrollers.



Full Text (PDF)