2.2 Stochastic, Approximate and Neural Computing

Time	Label	Presentation Title Authors
11:30	2.2.1	FRAMEWORK FOR QUANTIFYING AND MANAGING ACCURACY IN STOCHASTIC CIRCUIT DESIGN Speaker: Florian Neugebauer, University of Passau, DE Authors: Florian Neugebauer¹, Ilia Polian¹ and John Hayes² ¹University of Passau, DE; ²University of Michigan, US Abstract Stochastic circuits (SCs) offer tremendous area and power-consumption benefits at the expense of computational inaccuracies. Managing accuracy is a central problem in SC design and has no counterpart in conventional circuit synthesis. It raises a basic question: how to build a systematic design flow for stochastic circuits? We present, for the first time, a systematic design approach to control the accuracy of SCs and balance it against other design parameters. We express the (in)accuracy of a circuit processing n-bit stochastic numbers by the numerical deviation of the computed value from the expected result, in conjunction with a confidence level. Using the theory of Monte Carlo simulation, we derive expressions for the stochastic number length required for a desired level of accuracy, or vice versa. We discuss the integration of the theory into a design framework that is applicable to both combinational and sequential SCs. We show that, for combinational SCs, accuracy is independent of the circuit's size or complexity, a surprising result. We also show how the analysis can identify subtle errors in both combinational and sequential designs. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.2.2	ENERGY-EFFICIENT APPROXIMATE MULTIPLIER DESIGN USING BIT SIGNIFICANCE-DRIVEN LOGIC COMPRESSION Speaker: Issa Qiqieh, School of Electrical and Electronic Engineering, Newcastle University, GB Authors: Issa Qiqieh, Rishad Shafik, Ghaith Tarawneh, Danil Sokolov and Alex Yakovlev, Newcastle University, GB Abstract Approximate arithmetic has recently emerged as a promising paradigm for many imprecision-tolerant applications. It can offer substantial reductions in circuit complexity, delay and energy consumption by relaxing accuracy requirements. In this paper, we propose a novel energy-efficient approximate multiplier design using a significance-driven logic compression (SDLC) approach. Fundamental to this approach is an algorithmic and configurable lossy compression of the partial product rows based on their progressive bit significance. This is followed by the commutative remapping of the resulting product terms to reduce the number of product rows. As such, the complexity of the multiplier in terms of logic cell counts and lengths of critical paths is drastically reduced. A number of multipliers with different bit-widths (4-bit to 128-bit) are designed in SystemVerilog and synthesized using Synopsys Design Compiler. Post-synthesis experiments showed that up to an order of magnitude energy savings, and reductions of 65% in critical delay and almost 45% in silicon area can be achieved for a 128-bit multiplier compared to an accurate equivalent. These gains are achieved with low accuracy losses estimated at less than 0.00071 mean relative error. Additionally, we demonstrate the energy-accuracy trade-offs for different degrees of compression, achieved through configurable logic clustering. In evaluating the effectiveness of our approach, a case study image processing application showed up to 68.3% energy reduction with negligible losses in image quality expressed as peak signal-to-noise ratio (PSNR). Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.2.3	ENERGY-EFFICIENT HYBRID STOCHASTIC-BINARY NEURAL NETWORKS FOR NEAR-SENSOR COMPUTING Speaker: Vincent Lee, University of Washington, US Authors: Vincent Lee¹, Armin Alaghi¹, John Hayes², Visvesh Sathe¹ and Luis Ceze¹ ¹University of Washington, US; ²University of Michigan, US Abstract Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data movement. However, near-sensor computing presents its own set of challenges such as operating power constraints, energy budgets, and communication bandwidth capacities. In this paper, we propose a stochastic-binary hybrid design which splits the computation between the stochastic and binary domains for near-sensor NN applications. In addition, our design uses a new stochastic adder and multiplier that are significantly more accurate than existing adders and multipliers. We also show that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stochastic bit-streams, allowing faster run times at minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary design can achieve 9.8× energy efficiency savings, and application-level accuracies within 0.05% compared to conventional all-binary designs. Download Paper (PDF; Only available from the DATE venue WiFi)
12:45	2.2.4	ACCELERATOR-FRIENDLY NEURAL-NETWORK TRAINING: LEARNING VARIATIONS AND DEFECTS IN RRAM CROSSBAR Speaker: Li Jiang, Shanghai Jiao Tong University, CN Authors: Lerong Chen¹, Jiawen Li¹, Yiran Chen², Qiuping Deng³, Jiyuan Shen¹, Xiaoyao Liang¹ and Li Jiang⁴ ¹Shanghai Jiao Tong University, CN; ²University of Pittsburgh, US; ³Lynmax Research, CN; ⁴Department of Computer Science and Engineering, Shanghai Jiao Tong University, CN Abstract RRAM crossbar consisting of memristor devices can natu- rally carry out the matrix-vector multiplication; it thereby has gained a great momentum as a highly energy-efficient accelerator for neuro- morphic computing. The resistance variations and stuck-at faults in the memristor devices, however, dramatically degrade not only the chip yield, but also the classification accuracy of the neural-networks running on the RRAM crossbar. Existing hardware-based solutions cause enormous overhead and power consumption, while software-based solutions are less efficient in tolerating stuck-at faults and large variations. In this paper, we propose an accelerator-friendly neural-network training method, by leveraging the inherent self-healing capability of the neural-network, to prevent the large-weight synapses from being mapped to the abnormal memristors based on the fault/variation distribution in the RRAM crossbar. Experimental results show the proposed method can pull the classification accuracy (10%-45% loss in previous works) up close to ideal level with ≤ 1% loss. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-1, 298	STRUCTURAL DESIGN OPTIMIZATION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS USING STOCHASTIC COMPUTING Speaker: Yanzhi Wang, Syracuse University, US Authors: Zhe Li¹, Ao Ren¹, Ji Li², Qinru Qiu¹, Bo Yuan³, Jeffrey Draper² and Yanzhi Wang¹ ¹Syracuse University, US; ²University of Southern California, US; ³City University of New York, City College, US Abstract Deep Convolutional Neural Networks (DCNNs) have been demonstrated as effective models for understanding image content. The computation behind DCNNs highly relies on the capability of hardware resources due to the deep structure. DCNNs have been implemented on different large- scale computing platforms. However, there is a trend that DCNNs have been embedded into light-weight local systems, which requires low power/energy consumptions and small hardware footprints. Stochastic Computing (SC) radically simplifies the hardware implementation of arithmetic units and has the potential to satisfy the small low-power needs of DCNNs. Local connectivities and down-sampling operations have made DCNNs more complex to be implemented using SC. In this paper, eight feature extraction designs for DCNNs using SC in two groups are explored and optimized in detail from the perspective of calculation precision, where we permute two SC implementations for inner-product calculation, two down-sampling schemes, and two structures of DCNN neurons. We evaluate the network in aspects of network accuracy and hardware performance for each DCNN using one feature extraction design out of eight. Through exploration and optimization, the accuracies of SC-based DCNNs are guaranteed compared with software implementations on CPU/GPU/binary-based ASIC synthesis, while area, power, and energy are significantly reduced by up to 776X, 190X, and 32835X. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-2, 364	APPROXQA: A UNIFIED QUALITY ASSURANCE FRAMEWORK FOR APPROXIMATE COMPUTING Speaker: Ting Wang, The Chinese University of Hong Kong, HK Authors: Ting Wang, Qian Zhang and Qiang Xu, The Chinese University of Hong Kong, HK Abstract Approximate computing, being able to trade off computation quality and computational effort (e.g., energy) by exploiting the inherent error-resilience of emerging applications (e.g., recognition and mining), has garnered significant attention recently. No doubt to say, quality assurance is indispensable for satisfactory user experience with approximate computing, but this issue has remained largely unexplored in the literature. In this work, we propose a novel framework namely ApproxQA to tackle this problem, in which approximation mode tuning and rollback recovery are considered in a unified manner when quality violation occurs. To be specific, ApproxQA resorts to a two-level controller, in which the high-level approximation controller tunes approximation modes at a coarse-grained scale based on Q-learning while the low-level rollback controller judiciously determines whether to perform rollback recovery at a fine-grained scale based on the target quality requirement. ApproxQA can provide statistical quality assurance even when the underlying quality checkers are not reliable. Experimental results on various benchmark applications demonstrate that it significantly outperforms existing solutions in terms of energy efficiency with quality assurance. Download Paper (PDF; Only available from the DATE venue WiFi)
13:02	IP1-3, 241	(Best Paper Award Candidate) EVOAPPROX8B: LIBRARY OF APPROXIMATE ADDERS AND MULTIPLIERS FOR CIRCUIT DESIGN AND BENCHMARKING OF APPROXIMATION METHODS Speaker: Lukas Sekanina, Brno University of Technology, CZ Authors: Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek and Lukas Sekanina, Brno University of Technology, CZ Abstract Approximate circuits and approximate circuit design methodologies attracted a significant attention of researchers as well as industry in recent years. In order to accelerate the approximate circuit and system design process and to support a fair benchmarking of circuit approximation methods, we propose a library of approximate adders and multipliers called EvoApprox8b. This library contains 430 non-dominated 8-bit approximate adders created from 13 conventional adders and 471 non-dominated 8-bit approximate multipliers created from 6 conventional multipliers. These implementations were evolved by a multi-objective Cartesian genetic programming. The EvoApprox8b library provides Verilog, Matlab and C models of all approximate circuits. In addition to standard circuit parameters, the error is given for seven different error metrics. The EvoApprox8b library is available at: www.fit.vutbr.cz/research/groups/ehw/approxlib Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break in Garden Foyer Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420 Lunch Break in the Garden Foyer On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.

Time

Label

Presentation Title
Authors

11:30

2.2.1

FRAMEWORK FOR QUANTIFYING AND MANAGING ACCURACY IN STOCHASTIC CIRCUIT DESIGN
Speaker:
Florian Neugebauer, University of Passau, DE
Authors:
Florian Neugebauer¹, Ilia Polian¹ and John Hayes²
¹University of Passau, DE; ²University of Michigan, US
Abstract
Stochastic circuits (SCs) offer tremendous area and power-consumption benefits at the expense of computational inaccuracies. Managing accuracy is a central problem in SC design and has no counterpart in conventional circuit synthesis. It raises a basic question: how to build a systematic design flow for stochastic circuits? We present, for the first time, a systematic design approach to control the accuracy of SCs and balance it against other design parameters. We express the (in)accuracy of a circuit processing n-bit stochastic numbers by the numerical deviation of the computed value from the expected result, in conjunction with a confidence level. Using the theory of Monte Carlo simulation, we derive expressions for the stochastic number length required for a desired level of accuracy, or vice versa. We discuss the integration of the theory into a design framework that is applicable to both combinational and sequential SCs. We show that, for combinational SCs, accuracy is independent of the circuit's size or complexity, a surprising result. We also show how the analysis can identify subtle errors in both combinational and sequential designs.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

2.2.2

ENERGY-EFFICIENT APPROXIMATE MULTIPLIER DESIGN USING BIT SIGNIFICANCE-DRIVEN LOGIC COMPRESSION
Speaker:
Issa Qiqieh, School of Electrical and Electronic Engineering, Newcastle University, GB
Authors:
Issa Qiqieh, Rishad Shafik, Ghaith Tarawneh, Danil Sokolov and Alex Yakovlev, Newcastle University, GB
Abstract
Approximate arithmetic has recently emerged as a promising paradigm for many imprecision-tolerant applications. It can offer substantial reductions in circuit complexity, delay and energy consumption by relaxing accuracy requirements. In this paper, we propose a novel energy-efficient approximate multiplier design using a significance-driven logic compression (SDLC) approach. Fundamental to this approach is an algorithmic and configurable lossy compression of the partial product rows based on their progressive bit significance. This is followed by the commutative remapping of the resulting product terms to reduce the number of product rows. As such, the complexity of the multiplier in terms of logic cell counts and lengths of critical paths is drastically reduced. A number of multipliers with different bit-widths (4-bit to 128-bit) are designed in SystemVerilog and synthesized using Synopsys Design Compiler. Post-synthesis experiments showed that up to an order of magnitude energy savings, and reductions of 65% in critical delay and almost 45% in silicon area can be achieved for a 128-bit multiplier compared to an accurate equivalent. These gains are achieved with low accuracy losses estimated at less than 0.00071 mean relative error. Additionally, we demonstrate the energy-accuracy trade-offs for different degrees of compression, achieved through configurable logic clustering. In evaluating the effectiveness of our approach, a case study image processing application showed up to 68.3% energy reduction with negligible losses in image quality expressed as peak signal-to-noise ratio (PSNR).
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

2.2.3

ENERGY-EFFICIENT HYBRID STOCHASTIC-BINARY NEURAL NETWORKS FOR NEAR-SENSOR COMPUTING
Speaker:
Vincent Lee, University of Washington, US
Authors:
Vincent Lee¹, Armin Alaghi¹, John Hayes², Visvesh Sathe¹ and Luis Ceze¹
¹University of Washington, US; ²University of Michigan, US
Abstract
Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data movement. However, near-sensor computing presents its own set of challenges such as operating power constraints, energy budgets, and communication bandwidth capacities. In this paper, we propose a stochastic-binary hybrid design which splits the computation between the stochastic and binary domains for near-sensor NN applications. In addition, our design uses a new stochastic adder and multiplier that are significantly more accurate than existing adders and multipliers. We also show that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stochastic bit-streams, allowing faster run times at minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary design can achieve 9.8× energy efficiency savings, and application-level accuracies within 0.05% compared to conventional all-binary designs.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:45

2.2.4

ACCELERATOR-FRIENDLY NEURAL-NETWORK TRAINING: LEARNING VARIATIONS AND DEFECTS IN RRAM CROSSBAR
Speaker:
Li Jiang, Shanghai Jiao Tong University, CN
Authors:
Lerong Chen¹, Jiawen Li¹, Yiran Chen², Qiuping Deng³, Jiyuan Shen¹, Xiaoyao Liang¹ and Li Jiang⁴
¹Shanghai Jiao Tong University, CN; ²University of Pittsburgh, US; ³Lynmax Research, CN; ⁴Department of Computer Science and Engineering, Shanghai Jiao Tong University, CN
Abstract
RRAM crossbar consisting of memristor devices can natu- rally carry out the matrix-vector multiplication; it thereby has gained a great momentum as a highly energy-efficient accelerator for neuro- morphic computing. The resistance variations and stuck-at faults in the memristor devices, however, dramatically degrade not only the chip yield, but also the classification accuracy of the neural-networks running on the RRAM crossbar. Existing hardware-based solutions cause enormous overhead and power consumption, while software-based solutions are less efficient in tolerating stuck-at faults and large variations. In this paper, we propose an accelerator-friendly neural-network training method, by leveraging the inherent self-healing capability of the neural-network, to prevent the large-weight synapses from being mapped to the abnormal memristors based on the fault/variation distribution in the RRAM crossbar. Experimental results show the proposed method can pull the classification accuracy (10%-45% loss in previous works) up close to ideal level with ≤ 1% loss.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

IP1-1, 298

STRUCTURAL DESIGN OPTIMIZATION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS USING STOCHASTIC COMPUTING
Speaker:
Yanzhi Wang, Syracuse University, US
Authors:
Zhe Li¹, Ao Ren¹, Ji Li², Qinru Qiu¹, Bo Yuan³, Jeffrey Draper² and Yanzhi Wang¹
¹Syracuse University, US; ²University of Southern California, US; ³City University of New York, City College, US
Abstract
Deep Convolutional Neural Networks (DCNNs) have been demonstrated as effective models for understanding image content. The computation behind DCNNs highly relies on the capability of hardware resources due to the deep structure. DCNNs have been implemented on different large- scale computing platforms. However, there is a trend that DCNNs have been embedded into light-weight local systems, which requires low power/energy consumptions and small hardware footprints. Stochastic Computing (SC) radically simplifies the hardware implementation of arithmetic units and has the potential to satisfy the small low-power needs of DCNNs. Local connectivities and down-sampling operations have made DCNNs more complex to be implemented using SC. In this paper, eight feature extraction designs for DCNNs using SC in two groups are explored and optimized in detail from the perspective of calculation precision, where we permute two SC implementations for inner-product calculation, two down-sampling schemes, and two structures of DCNN neurons. We evaluate the network in aspects of network accuracy and hardware performance for each DCNN using one feature extraction design out of eight. Through exploration and optimization, the accuracies of SC-based DCNNs are guaranteed compared with software implementations on CPU/GPU/binary-based ASIC synthesis, while area, power, and energy are significantly reduced by up to 776X, 190X, and 32835X.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:01

IP1-2, 364

APPROXQA: A UNIFIED QUALITY ASSURANCE FRAMEWORK FOR APPROXIMATE COMPUTING
Speaker:
Ting Wang, The Chinese University of Hong Kong, HK
Authors:
Ting Wang, Qian Zhang and Qiang Xu, The Chinese University of Hong Kong, HK
Abstract
Approximate computing, being able to trade off computation quality and computational effort (e.g., energy) by exploiting the inherent error-resilience of emerging applications (e.g., recognition and mining), has garnered significant attention recently. No doubt to say, quality assurance is indispensable for satisfactory user experience with approximate computing, but this issue has remained largely unexplored in the literature. In this work, we propose a novel framework namely ApproxQA to tackle this problem, in which approximation mode tuning and rollback recovery are considered in a unified manner when quality violation occurs. To be specific, ApproxQA resorts to a two-level controller, in which the high-level approximation controller tunes approximation modes at a coarse-grained scale based on Q-learning while the low-level rollback controller judiciously determines whether to perform rollback recovery at a fine-grained scale based on the target quality requirement. ApproxQA can provide statistical quality assurance even when the underlying quality checkers are not reliable. Experimental results on various benchmark applications demonstrate that it significantly outperforms existing solutions in terms of energy efficiency with quality assurance.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:02

IP1-3, 241

(Best Paper Award Candidate)
EVOAPPROX8B: LIBRARY OF APPROXIMATE ADDERS AND MULTIPLIERS FOR CIRCUIT DESIGN AND BENCHMARKING OF APPROXIMATION METHODS
Speaker:
Lukas Sekanina, Brno University of Technology, CZ
Authors:
Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek and Lukas Sekanina, Brno University of Technology, CZ
Abstract
Approximate circuits and approximate circuit design methodologies attracted a significant attention of researchers as well as industry in recent years. In order to accelerate the approximate circuit and system design process and to support a fair benchmarking of circuit approximation methods, we propose a library of approximate adders and multipliers called EvoApprox8b. This library contains 430 non-dominated 8-bit approximate adders created from 13 conventional adders and 471 non-dominated 8-bit approximate multipliers created from 6 conventional multipliers. These implementations were evolved by a multi-objective Cartesian genetic programming. The EvoApprox8b library provides Verilog, Matlab and C models of all approximate circuits. In addition to standard circuit parameters, the error is given for seven different error metrics. The EvoApprox8b library is available at: www.fit.vutbr.cz/research/groups/ehw/approxlib
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

End of session
Lunch Break in Garden Foyer

Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420

Lunch Break in the Garden Foyer
On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.

available at

Visit us at DATE 2017

Booth: 20+21

Booth: 30

Booth: 17

Booth: 26

Booth: 1

Booth: 23

Submissions

2.2 Stochastic, Approximate and Neural Computing

DATE Smartphone App

Visit us at DATE 2017