10.7 Accelerators for Neuromorphic Computing

Time	Label	Presentation Title Authors
11:00	10.7.1	A PULSE WIDTH NEURON WITH CONTINUOUS ACTIVATION FOR PROCESSING-IN-MEMORY ENGINES Speaker: Shuhang Zhang, TU Munich, DE Authors: Shuhang Zhang¹, Bing Li¹, Hai (Helen) Li² and Ulf Schlichtmann¹ ¹TU Munich, DE; ²Duke University, US / TU Munich, US Abstract Processing-in-memory engines have been applied successfully to accelerate deep neural networks. For improving computing efficiency, spiking-based platforms are widely utilized. However, spiking-based designs quantize inter-layer signals naturally, leading to performance loss. In addition, the spike mismatch effect makes digital processing an essential part, impeding direct signal transferring between layers and thus resulting in longer latency. In this paper, we propose a novel neuron design based on pulse width modulation, avoiding quantization step and bypassing spike mismatch via its continuous activation. The computation latency and circuit complexity can be reduced significantly due to the absence of quantization and digital processing steps, while keeping a competitive performance. Experimental results demonstrate that the proposed neuron design can achieve >100× speedup, and the area and power consumption can be reduced up to 75% and 25% compared with spiking-based designs. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.7.2	GO UNARY: A NOVEL SYNAPSE CODING AND MAPPING SCHEME FOR RELIABLE RERAM-BASED NEUROMORPHIC COMPUTING Speaker: Li Jiang, Shanghai Jiao Tong University, CN Authors: Chang Ma, Yanan Sun, Weikang Qian, Ziqi Meng, Rui Yang and Li Jiang, Shanghai Jiao Tong University, CN Abstract Neural network (NN) computing contains a large number of multiply-and-accumulate (MAC) operations, which is the speed bottleneck in traditional von Neumann architecture. Resistive random access memory (ReRAM)-based crossbar is well suited for matrix-vector multiplication. Existing ReRAM-based NNs are mainly based on the binary coding for synaptic weights. However, the imperfect fabrication process combined with stochastic filament-based switching leads to resistance variations, which can significantly affect the weights in binary synapses and degrade the accuracy of NNs. Further, as multi-level cells (MLCs) are being developed for reducing hardware overhead, the NN accuracy deteriorates more due to the resistance variations in the binary coding. In this paper, a novel unary coding of synaptic weights is presented to overcome the resistance variations of MLCs and achieve reliable ReRAM-based neuromorphic computing. The priority mapping is also proposed in compliance with the unary coding to guarantee high accuracy by mapping those bits with lower resistance states to ReRAMs with smaller resistance variations. Our experimental results show that the proposed method provides less than 0.45% and 5.48% accuracy loss on LeNet (on MNIST dataset) and VGG16 (on CIFAR-10 dataset), respectively, while maintaining acceptable hardware cost. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.7.3	LIGHTBULB: A PHOTONIC-NONVOLATILE-MEMORY-BASED ACCELERATOR FOR BINARIZED CONVOLUTIONAL NEURAL NETWORKS Authors: Farzaneh Zokaee¹, Qian Lou¹, Nathan Youngblood², Weichen Liu³, Yiyuan Xie⁴ and Lei Jiang¹ ¹Indiana University Bloomington, US; ²University of Pittsburh, US; ³Nanyang Technological University, SG; ⁴Southwest University, CN Abstract Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, LightBulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by $17imessim 173imes$ and the inference throughput per Watt by $17.5imessim 660imes$. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-4, 863	ROQ: A NOISE-AWARE QUANTIZATION SCHEME TOWARDS ROBUST OPTICAL NEURAL NETWORKS WITH LOW-BIT CONTROLS Speaker: Jiaqi Gu, University of Texas at Austin, US Authors: Jiaqi Gu¹, Zheng Zhao¹, Chenghao Feng¹, Hanqing Zhu², Ray T. Chen¹ and David Z. Pan¹ ¹University of Texas at Austin, US; ²Shanghai Jiao Tong University, CN Abstract Optical neural networks (ONNs) demonstrate orders-of-magnitude higher speed in deep learning acceleration than their electronic counterparts. However, limited control precision and device variations induce accuracy degradation in practical ONN implementations. To tackle this issue, we propose a quantization scheme that adapts a full-precision ONN to low-resolution voltage controls. Moreover, we propose a protective regularization technique that dynamically penalizes quantized weights based on their estimated noise-robustness, leading to an improvement in noise robustness. Experimental results show that the proposed scheme effectively adapts ONNs to limited-precision controls and device variations. The resultant four-layer ONN demonstrates higher inference accuracy with lower variances than baseline methods under various control precisions and device noises. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP5-5, 789	STATISTICAL TRAINING FOR NEUROMORPHIC COMPUTING USING MEMRISTOR-BASED CROSSBARS CONSIDERING PROCESS VARIATIONS AND NOISE Speaker: Ying Zhu, TU Munich, DE Authors: Ying Zhu¹, Grace Li Zhang¹, Tianchen Wang², Bing Li¹, Yiyu Shi², Tsung-Yi Ho³ and Ulf Schlichtmann¹ ¹TU Munich, DE; ²University of Notre Dame, US; ³National Tsing Hua University, TW Abstract Memristor-based crossbars are an attractive platform to accelerate neuromorphic computing. However, process variations during manufacturing and noise in memristors cause significant accuracy loss if not addressed. In this paper, we propose to model process variations and noise as correlated random variables and incorporate them into the cost function during training. Consequently, the weights after this statistical training become more robust and together with global variation compensation provide a stable inference accuracy. Simulation results demonstrate that the mean value and the standard deviation of the inference accuracy can be improved significantly, by even up to 54% and 31%, respectively, in a two-layer fully connected neural network. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session

Time

Label

Presentation Title
Authors

11:00

10.7.1

A PULSE WIDTH NEURON WITH CONTINUOUS ACTIVATION FOR PROCESSING-IN-MEMORY ENGINES
Speaker:
Shuhang Zhang, TU Munich, DE
Authors:
Shuhang Zhang¹, Bing Li¹, Hai (Helen) Li² and Ulf Schlichtmann¹
¹TU Munich, DE; ²Duke University, US / TU Munich, US
Abstract
Processing-in-memory engines have been applied successfully to accelerate deep neural networks. For improving computing efficiency, spiking-based platforms are widely utilized. However, spiking-based designs quantize inter-layer signals naturally, leading to performance loss. In addition, the spike mismatch effect makes digital processing an essential part, impeding direct signal transferring between layers and thus resulting in longer latency. In this paper, we propose a novel neuron design based on pulse width modulation, avoiding quantization step and bypassing spike mismatch via its continuous activation. The computation latency and circuit complexity can be reduced significantly due to the absence of quantization and digital processing steps, while keeping a competitive performance. Experimental results demonstrate that the proposed neuron design can achieve >100× speedup, and the area and power consumption can be reduced up to 75% and 25% compared with spiking-based designs.
Download Paper (PDF; Only available from the DATE venue WiFi)

11:30

10.7.2

GO UNARY: A NOVEL SYNAPSE CODING AND MAPPING SCHEME FOR RELIABLE RERAM-BASED NEUROMORPHIC COMPUTING
Speaker:
Li Jiang, Shanghai Jiao Tong University, CN
Authors:
Chang Ma, Yanan Sun, Weikang Qian, Ziqi Meng, Rui Yang and Li Jiang, Shanghai Jiao Tong University, CN
Abstract
Neural network (NN) computing contains a large number of multiply-and-accumulate (MAC) operations, which is the speed bottleneck in traditional von Neumann architecture. Resistive random access memory (ReRAM)-based crossbar is well suited for matrix-vector multiplication. Existing ReRAM-based NNs are mainly based on the binary coding for synaptic weights. However, the imperfect fabrication process combined with stochastic filament-based switching leads to resistance variations, which can significantly affect the weights in binary synapses and degrade the accuracy of NNs. Further, as multi-level cells (MLCs) are being developed for reducing hardware overhead, the NN accuracy deteriorates more due to the resistance variations in the binary coding. In this paper, a novel unary coding of synaptic weights is presented to overcome the resistance variations of MLCs and achieve reliable ReRAM-based neuromorphic computing. The priority mapping is also proposed in compliance with the unary coding to guarantee high accuracy by mapping those bits with lower resistance states to ReRAMs with smaller resistance variations. Our experimental results show that the proposed method provides less than 0.45% and 5.48% accuracy loss on LeNet (on MNIST dataset) and VGG16 (on CIFAR-10 dataset), respectively, while maintaining acceptable hardware cost.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

10.7.3

LIGHTBULB: A PHOTONIC-NONVOLATILE-MEMORY-BASED ACCELERATOR FOR BINARIZED CONVOLUTIONAL NEURAL NETWORKS
Authors:
Farzaneh Zokaee¹, Qian Lou¹, Nathan Youngblood², Weichen Liu³, Yiyuan Xie⁴ and Lei Jiang¹
¹Indiana University Bloomington, US; ²University of Pittsburh, US; ³Nanyang Technological University, SG; ⁴Southwest University, CN
Abstract
Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations. To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters. The electro-optical accelerator also uses SRAM registers to store intermediate data. However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs. In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, LightBulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units. LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency. Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by $17imessim 173imes$ and the inference throughput per Watt by $17.5imessim 660imes$.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

IP5-4, 863

ROQ: A NOISE-AWARE QUANTIZATION SCHEME TOWARDS ROBUST OPTICAL NEURAL NETWORKS WITH LOW-BIT CONTROLS
Speaker:
Jiaqi Gu, University of Texas at Austin, US
Authors:
Jiaqi Gu¹, Zheng Zhao¹, Chenghao Feng¹, Hanqing Zhu², Ray T. Chen¹ and David Z. Pan¹
¹University of Texas at Austin, US; ²Shanghai Jiao Tong University, CN
Abstract
Optical neural networks (ONNs) demonstrate orders-of-magnitude higher speed in deep learning acceleration than their electronic counterparts. However, limited control precision and device variations induce accuracy degradation in practical ONN implementations. To tackle this issue, we propose a quantization scheme that adapts a full-precision ONN to low-resolution voltage controls. Moreover, we propose a protective regularization technique that dynamically penalizes quantized weights based on their estimated noise-robustness, leading to an improvement in noise robustness. Experimental results show that the proposed scheme effectively adapts ONNs to limited-precision controls and device variations. The resultant four-layer ONN demonstrates higher inference accuracy with lower variances than baseline methods under various control precisions and device noises.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:31

IP5-5, 789

STATISTICAL TRAINING FOR NEUROMORPHIC COMPUTING USING MEMRISTOR-BASED CROSSBARS CONSIDERING PROCESS VARIATIONS AND NOISE
Speaker:
Ying Zhu, TU Munich, DE
Authors:
Ying Zhu¹, Grace Li Zhang¹, Tianchen Wang², Bing Li¹, Yiyu Shi², Tsung-Yi Ho³ and Ulf Schlichtmann¹
¹TU Munich, DE; ²University of Notre Dame, US; ³National Tsing Hua University, TW
Abstract
Memristor-based crossbars are an attractive platform to accelerate neuromorphic computing. However, process variations during manufacturing and noise in memristors cause significant accuracy loss if not addressed. In this paper, we propose to model process variations and noise as correlated random variables and incorporate them into the cost function during training. Consequently, the weights after this statistical training become more robust and together with global variation compensation provide a stable inference accuracy. Simulation results demonstrate that the mean value and the standard deviation of the inference accuracy can be improved significantly, by even up to 54% and 31%, respectively, in a two-layer fully connected neural network.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

End of session