5.7 Stochastic Computing

Time	Label	Presentation Title Authors
08:30	5.7.1	THE HYPERGEOMETRIC DISTRIBUTION AS A MORE ACCURATE MODEL FOR STOCHASTIC COMPUTING Speaker: Timothy Baker, University of Michigan, US Authors: Timothy Baker and John Hayes, University of Michigan, US Abstract A fundamental assumption in stochastic computing (SC) is that bit-streams are generally well-approximated by a Bernoulli process, i.e., a sequence of independent 0-1 choices. We show that this assumption is flawed in unexpected and significant ways for some bit-streams such as those produced by a typical LFSR-based stochastic number generator (SNG). In particular, the Bernoulli assumption leads to a surprising overestimation of output errors and how they vary with input changes. We then propose a more accurate model for such bit-streams based on the hypergeometric distribution and examine its implications for several SC applications. First, we explore the effect of correlation on a mux-based stochastic adder and show that, contrary to what was previously thought, it is not entirely correlation insensitive. Further, inspired by the hypergeometric model, we introduce a new mux tree adder that offers major area savings and accuracy improvement. The effectiveness of this study is validated on a large image processing circuit which achieves an accuracy improvement of 32%, combined with a reduction in overall circuit area. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.7.2	ACCURACY ANALYSIS FOR STOCHASTIC CIRCUITS WITH D-FLIP FLOP INSERTION Speaker: Kuncai Zhong, University of Michigan-Shanghai Jiao Tong University Joint Institute, CN Authors: Kuncai Zhong and Weikang Qian, Shanghai Jiao Tong University, CN Abstract One of the challenges stochastic computing (SC) faces is the high cost of stochastic number generators (SNG). A solution to it is inserting D flip-flops (DFFs) into the circuit. However, the accuracy of the stochastic circuits would be affected and it is crucial to capture it. In this work, we propose an efficient method to analyze the accuracy of stochastic circuits with DFFs inserted. Furthermore, given the importance of multiplication, we apply this method to analyze stochastic multiplier with DFFs inserted. Several interesting claims are obtained about the use of probability conversion circuits. For example, using weighted binary generator is more accurate than using comparator. The experimental results show the correctness of the proposed method and the claims. Furthermore, the proposed method is up to 560× faster than the simulation-based method. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.7.3	DYNAMIC STOCHASTIC COMPUTING FOR DIGITAL SIGNAL PROCESSING APPLICATIONS Speaker: Jie Han, University of Alberta, CA Authors: Siting Liu and Jie Han, University of Alberta, CA Abstract Stochastic computing (SC) utilizes a random binary bit stream to encode a number by counting the frequency of 1's in the stream (or sequence). Typically, a small circuit is used to perform a bit-wise logic operation on the stochastic sequences, which leads to significant hardware and power savings. Energy efficiency, however, is a challenge for SC due to the long sequences required for accurately encoding numbers. To overcome this challenge, we consider to use a stochastic sequence to encode a continuously variable signal instead of a number to achieve higher accuracy, higher energy efficiency and greater flexibility. Specifically, one single bit is used to encode a sample from a signal for efficient processing. This type of sequences encodes constantly variable values, so it is referred to as dynamic stochastic sequences (DSS's). The DSS enables the use of SC circuits to efficiently perform tasks such as frequency mixing and function estimation. It is shown that such a dynamic SC (DSC) system achieves savings up to 98.4% in energy and up to 96.8% in time with a slightly higher accuracy compared to conventional SC. It also achieves energy and time savings of up to 60% compared to a fixed-width binary implementation. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-18, 437	A 16×128 STOCHASTIC-BINARY PROCESSING ELEMENT ARRAY FOR ACCELERATING STOCHASTIC DOT-PRODUCT COMPUTATION USING 1-16 BIT-STREAM LENGTH Speaker: Hyunjoon Kim, Nanyang Technological University, SG Authors: Qian Chen, Yuqi Su, Hyunjoon Kim, Taegeun Yoo, Tony Tae-Hyoung Kim and Bongjin Kim, Nanyang Technological University, SG Abstract This work presents 16×128 stochastic-binary processing elements for energy/area efficient processing of artificial neural networks. A processing element (PE) with all-digital components consists of an XNOR gate as a bipolar stochastic multiplier and an 8bit binary adder with 8× registers for accumulating partial-sums. The PE array comprises 16× dot-product units, each with 128 PEs cascaded in a single row. The latency and energy of the proposed dot-product unit is minimized by reducing the number of bit-streams required for minimizing the accuracy degradation induced by the approximate stochastic computing. A 128-input dot-product operation requires the bit-stream length (N) of 1-to-16, which is two orders of magnitude smaller than the baseline stochastic computation using MUX-based adders. The simulated dot-product error is 6.9-to-1.5% for N=1-to-16, while the error from the baseline stochastic method is 5.9-to-1.7% with N=128-to-2048. A mean MNIST classification accuracy is 96.11% (which is 1.19% lower than 8b binary) using a three-layer MLP at N=16. The measured energy from a 65nm test-chip is 10.04pJ per dot-product, and the energy efficiency is 25.5TOPS/W at N=16. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP2-19, 599	TOWARDS EXPLORING THE POTENTIAL OF ALTERNATIVE QUANTUM COMPUTING ARCHITECTURES Speaker: Arighna Deb, Kalinga Institute of Industrial Technology, IN Authors: Arighna Deb¹, Gerhard W. Dueck² and Robert Wille³ ¹Kalinga Institute of Industrial Technology, IN; ²University of New Brunswick, CA; ³Johannes Kepler University Linz, AT Abstract The recent advances in the physical realization of Noisy Intermediate Scale Quantum (NISQ) computers have motivated research on design automation that allows users to execute quantum algorithms on them. Certain physical constraints in the architectures restrict how logical qubits used to describe the algorithm can be mapped to physical qubits used to realize the corresponding functionality. Thus far, this has been addressed by inserting additional operations in order to overcome the physical constrains. However, all these approaches have taken the existing architectures as invariant and did not explore the potential of changing the quantum architecture itself—a valid option as long as the underlying physical constrains remain satisfied. In this work, we propose initial ideas to explore this potential. More precisely, we introduce several schemes for the generation of alternative coupling graphs (and, by this, quantum computing architectures) that still might be able to satisfy physical constraints but, at the same time, allow for a more efficient realization of the desired quantum functionality. Download Paper (PDF; Only available from the DATE venue WiFi)
10:02	IP2-20, 719	ACCELERATING QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM USING MACHINE LEARNING Speaker: Swaroop Ghosh, Pennsylvania State University, US Authors: Mahabubul Alam, Abdullah Ash- Saki and Swaroop Ghosh, Pennsylvania State University, US Abstract We propose a machine learning based approach to accelerate quantum approximate optimization algorithm (QAOA) implementation which is a promising quantum-classical hybrid algorithm to prove the so-called quantum supremacy. In QAOA, a parametric quantum circuit and a classical optimizer iterates in a closed loop to solve hard combinatorial optimization problems. The performance of QAOA improves with an increasing number of stages (depth) in the quantum circuit. However, two new parameters are introduced with each added stage for the classical optimizer increasing the number of optimization loop iterations. We note a correlation among parameters of the lower-depth and the higher-depth QAOA implementations and, exploit it by developing a machine learning model to predict the gate parameters close to the optimal values. As a result, the optimization loop converges in a fewer number of iterations. We choose graph MaxCut problem as a prototype to solve using QAOA. We perform a feature extraction routine using 100 different QAOA instances and develop a training data-set with 13,860 optimal parameters. We present our analysis for 4 flavors of regression models and 4 flavors of classical optimizers. Finally, we show that the proposed approach can curtail the number of optimization iterations by on average 44.9% (up to 65.7%) from an analysis performed with 264 flavors of graphs. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session

Time

Label

Presentation Title
Authors

08:30

5.7.1

THE HYPERGEOMETRIC DISTRIBUTION AS A MORE ACCURATE MODEL FOR STOCHASTIC COMPUTING
Speaker:
Timothy Baker, University of Michigan, US
Authors:
Timothy Baker and John Hayes, University of Michigan, US
Abstract
A fundamental assumption in stochastic computing (SC) is that bit-streams are generally well-approximated by a Bernoulli process, i.e., a sequence of independent 0-1 choices. We show that this assumption is flawed in unexpected and significant ways for some bit-streams such as those produced by a typical LFSR-based stochastic number generator (SNG). In particular, the Bernoulli assumption leads to a surprising overestimation of output errors and how they vary with input changes. We then propose a more accurate model for such bit-streams based on the hypergeometric distribution and examine its implications for several SC applications. First, we explore the effect of correlation on a mux-based stochastic adder and show that, contrary to what was previously thought, it is not entirely correlation insensitive. Further, inspired by the hypergeometric model, we introduce a new mux tree adder that offers major area savings and accuracy improvement. The effectiveness of this study is validated on a large image processing circuit which achieves an accuracy improvement of 32%, combined with a reduction in overall circuit area.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:00

5.7.2

ACCURACY ANALYSIS FOR STOCHASTIC CIRCUITS WITH D-FLIP FLOP INSERTION
Speaker:
Kuncai Zhong, University of Michigan-Shanghai Jiao Tong University Joint Institute, CN
Authors:
Kuncai Zhong and Weikang Qian, Shanghai Jiao Tong University, CN
Abstract
One of the challenges stochastic computing (SC) faces is the high cost of stochastic number generators (SNG). A solution to it is inserting D flip-flops (DFFs) into the circuit. However, the accuracy of the stochastic circuits would be affected and it is crucial to capture it. In this work, we propose an efficient method to analyze the accuracy of stochastic circuits with DFFs inserted. Furthermore, given the importance of multiplication, we apply this method to analyze stochastic multiplier with DFFs inserted. Several interesting claims are obtained about the use of probability conversion circuits. For example, using weighted binary generator is more accurate than using comparator. The experimental results show the correctness of the proposed method and the claims. Furthermore, the proposed method is up to 560× faster than the simulation-based method.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:30

5.7.3

DYNAMIC STOCHASTIC COMPUTING FOR DIGITAL SIGNAL PROCESSING APPLICATIONS
Speaker:
Jie Han, University of Alberta, CA
Authors:
Siting Liu and Jie Han, University of Alberta, CA
Abstract
Stochastic computing (SC) utilizes a random binary bit stream to encode a number by counting the frequency of 1's in the stream (or sequence). Typically, a small circuit is used to perform a bit-wise logic operation on the stochastic sequences, which leads to significant hardware and power savings. Energy efficiency, however, is a challenge for SC due to the long sequences required for accurately encoding numbers. To overcome this challenge, we consider to use a stochastic sequence to encode a continuously variable signal instead of a number to achieve higher accuracy, higher energy efficiency and greater flexibility. Specifically, one single bit is used to encode a sample from a signal for efficient processing. This type of sequences encodes constantly variable values, so it is referred to as dynamic stochastic sequences (DSS's). The DSS enables the use of SC circuits to efficiently perform tasks such as frequency mixing and function estimation. It is shown that such a dynamic SC (DSC) system achieves savings up to 98.4% in energy and up to 96.8% in time with a slightly higher accuracy compared to conventional SC. It also achieves energy and time savings of up to 60% compared to a fixed-width binary implementation.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

IP2-18, 437

A 16×128 STOCHASTIC-BINARY PROCESSING ELEMENT ARRAY FOR ACCELERATING STOCHASTIC DOT-PRODUCT COMPUTATION USING 1-16 BIT-STREAM LENGTH
Speaker:
Hyunjoon Kim, Nanyang Technological University, SG
Authors:
Qian Chen, Yuqi Su, Hyunjoon Kim, Taegeun Yoo, Tony Tae-Hyoung Kim and Bongjin Kim, Nanyang Technological University, SG
Abstract
This work presents 16×128 stochastic-binary processing elements for energy/area efficient processing of artificial neural networks. A processing element (PE) with all-digital components consists of an XNOR gate as a bipolar stochastic multiplier and an 8bit binary adder with 8× registers for accumulating partial-sums. The PE array comprises 16× dot-product units, each with 128 PEs cascaded in a single row. The latency and energy of the proposed dot-product unit is minimized by reducing the number of bit-streams required for minimizing the accuracy degradation induced by the approximate stochastic computing. A 128-input dot-product operation requires the bit-stream length (N) of 1-to-16, which is two orders of magnitude smaller than the baseline stochastic computation using MUX-based adders. The simulated dot-product error is 6.9-to-1.5% for N=1-to-16, while the error from the baseline stochastic method is 5.9-to-1.7% with N=128-to-2048. A mean MNIST classification accuracy is 96.11% (which is 1.19% lower than 8b binary) using a three-layer MLP at N=16. The measured energy from a 65nm test-chip is 10.04pJ per dot-product, and the energy efficiency is 25.5TOPS/W at N=16.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:01

IP2-19, 599

TOWARDS EXPLORING THE POTENTIAL OF ALTERNATIVE QUANTUM COMPUTING ARCHITECTURES
Speaker:
Arighna Deb, Kalinga Institute of Industrial Technology, IN
Authors:
Arighna Deb¹, Gerhard W. Dueck² and Robert Wille³
¹Kalinga Institute of Industrial Technology, IN; ²University of New Brunswick, CA; ³Johannes Kepler University Linz, AT
Abstract
The recent advances in the physical realization of Noisy Intermediate Scale Quantum (NISQ) computers have motivated research on design automation that allows users to execute quantum algorithms on them. Certain physical constraints in the architectures restrict how logical qubits used to describe the algorithm can be mapped to physical qubits used to realize the corresponding functionality. Thus far, this has been addressed by inserting additional operations in order to overcome the physical constrains. However, all these approaches have taken the existing architectures as invariant and did not explore the potential of changing the quantum architecture itself—a valid option as long as the underlying physical constrains remain satisfied. In this work, we propose initial ideas to explore this potential. More precisely, we introduce several schemes for the generation of alternative coupling graphs (and, by this, quantum computing architectures) that still might be able to satisfy physical constraints but, at the same time, allow for a more efficient realization of the desired quantum functionality.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:02

IP2-20, 719

ACCELERATING QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM USING MACHINE LEARNING
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
Authors:
Mahabubul Alam, Abdullah Ash- Saki and Swaroop Ghosh, Pennsylvania State University, US
Abstract
We propose a machine learning based approach to accelerate quantum approximate optimization algorithm (QAOA) implementation which is a promising quantum-classical hybrid algorithm to prove the so-called quantum supremacy. In QAOA, a parametric quantum circuit and a classical optimizer iterates in a closed loop to solve hard combinatorial optimization problems. The performance of QAOA improves with an increasing number of stages (depth) in the quantum circuit. However, two new parameters are introduced with each added stage for the classical optimizer increasing the number of optimization loop iterations. We note a correlation among parameters of the lower-depth and the higher-depth QAOA implementations and, exploit it by developing a machine learning model to predict the gate parameters close to the optimal values. As a result, the optimization loop converges in a fewer number of iterations. We choose graph MaxCut problem as a prototype to solve using QAOA. We perform a feature extraction routine using 100 different QAOA instances and develop a training data-set with 13,860 optimal parameters. We present our analysis for 4 flavors of regression models and 4 flavors of classical optimizers. Finally, we show that the proposed approach can curtail the number of optimization iterations by on average 44.9% (up to 65.7%) from an analysis performed with 264 flavors of graphs.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

End of session