A Fast and Energy Efficient Computing-in-Memory Architecture for Few-Shot Learning Applications
Dayane Reisa, Ann Franchesca Lagunab, Michael Niemierc and Xiaobo Sharon Hud
Department of Computer Science and Engineering University of Notre Dame, Notre Dame, IN, USA
adreis@nd.edu
balaguna@nd.edu
cmniemier@nd.edu
dshu@nd.edu
ABSTRACT
Among few-shot learning methods, prototypical networks (PNs) are one of the most popular approaches due to their excellent classification accuracies and network simplicity. Test examples are classified based on their distances from class prototypes. Despite the application-level advantages of PNs, the latency of transferring data from memory to compute units is much higher than the PN computation time. Thus, PNs performance is limited by memory bandwidth. Computing-inmemory addresses this bandwidth-bottleneck problem by bringing a subset of compute units closer to memory. In this work, we propose a CiM-PN framework that enables the computation of distance metrics and prototypes inside the memory. CiM-PN replaces the computationally intensive Euclidean distance metric by the CiM-friendly Manhattan distance metric. Additionally, prototypes are computed using an in-memory mean operation realized by accumulation and division by powers of two, which enables few-shot learning implementations where “shots” are powers of two. The CiM-PN hardware uses CMOS memory cells, as well as CMOS peripherals such as customized sense amplifiers, carry-look-ahead adders, in-place copy buffers and a logarithmic shifter. Compared with a GPU implementation, a CMOS-based CiM-PN achieves speedups of 2808x/111x and energy savings of 2372x/5170x at iso-accuracy for the prototype and nearest-neighbor computation, respectively, and over 2x endto- end speedup and energy improvements. We also gain 3-14% accuracy improvement when compared to existing non-GPU hardware approaches due to the floating-point CiM operations.
Keywords: Few-Shot Learning, Prototypical Networks, Computing-in-Memory