DATE 2022

Accelerating Spatiotemporal Supervised Training of Large-Scale Spiking Neural Networks on GPU

Ling Liang^1,a, Zhaodong Chen^1,b, Lei Deng^2,e, Fengbin Tu^1,c, Guoqi Li^2,f and Yuan Xie^1,d
¹Department of Electrical Computer Engineering, University of California, Santa Barbara, CA, USA
^alingliang@ucsb.edu
^bchenzd15thu@ucsb.edu
^cfengbintu@ucsb.edu
^dyuanxie@ucsb.edu
²Department of Precision Instrument, Tsinghua University, Beijing, China
^eleideng@mail.tsinghua.edu.cn
^fliguoqi@mail.tsinghua.edu.cn

ABSTRACT

Spiking neural networks (SNNs) have great potential to achieve brain-like intelligence, however, it suffers low accuracy of conventional synaptic plasticity rules and low training efficiency on GPUs. Recently, the emerging backpropagation through time (BPTT) inspired learning algorithms bring new opportunities to boost the accuracy of SNNs, while training on GPUs still remains inefficient due to the complex spatiotemporal dynamics and huge memory consumption, which restricts the model exploration for SNNs and prevents the advance of neuromorphic computing.

In this work, we build a framework to solve the inefficiency of BPTT-based SNN training on modern GPUs. To reduce the memory consumption, we optimize the dataflow by saving CONV/FC results only in the forward pass and recomputing other intermediate results in the backward pass. Then, we customize kernel functions to accelerate the neural dynamics for all training stages. Finally, we provide a Pytorch interface to make our framework easy-to-deploy in real systems. Compared to vanilla Pytorch implementation, our framework can achieve up to 2.13× end-to-end speedup and consume only 0.41× peak memory on the CIFAR10 dataset. Moreover, for the distributed training on the large ImageNet dataset, we can achieve up to 1.81× end-to-end speedup and consume only 0.38× peak memory.

Keywords: Neuromorphic Computing, Spiking Neural Network, GPU Optimization, Training Acceleration.

Full Text (PDF)