A 16×128 Stochastic-Binary Processing Element Array for Accelerating Stochastic Dot-Product Computation Using 1-16 Bit-Stream Length

Qian Chena, Yuqi Sub, Hyunjoon Kimc, Taegeun Yood, Tony Tae-Hyoung Kime and Bongjin Kimf

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 50 Nanyang Avenue, Singapore
ae170029@e.ntu.edu.sg
byuqi003@e.ntu.edu.sg
ckimh0003@e.ntu.edu.sg
dtgyoo@ntu.edu.sg
ethkim@ntu.edu.sg
fbjkim@ntu.edu.sg

ABSTRACT

This work presents 16×128 stochastic-binary processing elements for energy/area efficient processing of artificial neural networks. A processing element (PE) with all-digital components consists of an XNOR gate as a bipolar stochastic multiplier and an 8bit binary adder with 8× registers for accumulating partialsums. The PE array comprises 16× dot-product units, each with 128 PEs cascaded in a single row. The latency and energy of the proposed dot-product unit is minimized by reducing the number of bit-streams required for minimizing the accuracy degradation induced by the approximate stochastic computing. A 128-input dot-product operation requires the bit-stream length (N) of 1-to- 16, which is two orders of magnitude smaller than the baseline stochastic computation using MUX-based adders. The simulated dot-product error is 6.9-to-1.5% for N=1-to-16, while the error from the baseline stochastic method is 5.9-to-1.7% with N=128- to-2048. A mean MNIST classification accuracy is 96.11% (which is 1.19% lower than 8b binary) using a three-layer MLP at N=16. The measured energy from a 65nm test-chip is 10.04pJ per dotproduct, and the energy efficiency is 25.5TOPS/W at N=16.

Keywords: Stochastic Computation, Artificial Neural Networks, Dotproduct, Multi-Layer Perceptron, Image Classification.



Full Text (PDF)