A Quantization Framework for Neural Network Adaption at the Edge
Mengyuan Lia and Xiaobo Sharon Hub
Department of Computer Science and Engineering University of Notre Dame, Notre Dame, IN, USA, 46556
amli22@nd.edu
bshu@nd.edu
ABSTRACT
Edge devices employing a neural network (NN) inference engine running a pre-trained model often perform poorly or simply fail at unseen situations. Meta learning, consisting of meta training, NN adaptation and inference, has been shown to be quite effective in quickly learning and responding to a new environment. The adaption phase, including both forward and backward computation, should be performed on edge devices to maximize the benefit in the few-shot learning application. However, deploying high-precision, full-blown training accelerators at the edge can be rather costly for most Internet of Things applications. This paper reveals some unique observations in the adaptation phase and introduces a quantization framework, AIQ, based on these observations to support adaption at the edge with inference-level bit widths. AIQ includes two key ideas, i.e., gated weight buffering and dynamic error scaling, to reduce memory and computational needs with minimal sacrifice in accuracy. Major modules of AIQ are synthesized and evaluated. Experimental results show that AIQ saves 41% and 70% weight memory for two widely used datasets while incurring minimum hardware overhead and negligible accuracy loss.