12.7 Emerging Strategies for Deep Neural Network Hardware

Time	Label	Presentation Title Authors
16:00	12.7.1	AGING-AWARE LIFETIME ENHANCEMENT FOR MEMRISTOR-BASED NEUROMORPHIC COMPUTING Speaker: Shuhang Zhang, Technical University of Munich, DE Authors: Shuhang Zhang¹, Grace Li Zhang¹, Bing Li¹, Hai (Helen) Li² and Ulf Schlichtmann³ ¹Technical University of Munich, DE; ²Duke University, US; ³TU München, DE Abstract Deep Neural Networks (DNNs) have been applied in various fields successfully. Such networks, however, require significant computing resources. Traditional CMOS-based implementation cannot efficiently implement the specific computing patterns such as matrix multiplication. Therefore, memristor-based crossbars have been proposed to accelerate such computing tasks by their analog nature, which also leads to a significant reduction of power consumption. Neural networks must be trained to recognize the features of the applications. This training process leads to many repetitive updates of the memristors in the crossbar. However, memristors in the crossbar can only be programmed reliably for a given number of times. Afterwards, the working range of the memristors deviates from the fresh state. As a result, the weights of the corresponding neural networks cannot be implemented correctly and the classification accuracy drops significantly. This phenomenon is called aging, and it limits the lifetime of memristor-based crossbars. In this paper, we propose a co-optimization framework to reduce the aging effect in software training and hardware mapping simultaneously to counter the aging effect. Experimental results demonstrate that the proposed framework can extend the lifetime of such crossbars up to 15 times, while the expected accuracy of classification is maintained. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.7.2	ENERGY-EFFICIENT CONVOLUTIONAL NEURAL NETWORKS WITH DETERMINISTIC BIT-STREAM PROCESSING Speaker: M. Hassan Najafi, University of Louisiana at Lafayette, US Authors: Sayed Abdolrasoul Faraji¹, M. Hassan Najafi², Bingzhe Li¹, Kia Bazargan³ and David Lilja¹ ¹University of Minnesota, Twin Cities, US; ²University of Louisiana at Lafayette, US; ³University of Minnesota, US Abstract Stochastic computing (SC) has been used for low-cost and low power implementation of neural networks. Inherent inaccuracy and long latency of processing random bit-streams have made prior SC-based implementations inefficient compared to conventional fixed-point designs. Random or pseudo-random bitstreams often need to be processed for a very long time to produce acceptable results. This long latency leads to a significantly higher energy consumption than the binary design counterparts. Low-discrepancy sequences have been recently used for fast-converging deterministic computation with stochastic constructs. In this work, we propose a low-cost, low-latency, and energy-efficient implementation of convolutional neural networks based on low-discrepancy deterministic bit-streams. Experimental results show a significant reduction in the energy consumption compared to conventional random bitstream-based implementations and to the optimized fixed-point design with no quality degradation. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.7.3	RED: A RERAM-BASED DECONVOLUTION ACCELERATOR Speaker: Hai (Helen) Li, Duke University, US Authors: Zichen Fan¹, Ziru Li¹, Bing Li², Yiran Chen³ and Hai (Helen) Li³ ¹Tsinghua University, CN; ²Duke university, US; ³Duke University, US Abstract Deconvolution has been widespread in neural networks. For example, it is essential for performing unsupervised learning in generative adversarial networks or constructing fully convolutional networks for semantic segmentation. Resistive RAM (ReRAM)-based processing-in-memory architecture has been widely explored in accelerating convolutional computation and demonstrates good performance. Performing deconvolution on existing ReRAM-based accelerator designs, however, suffers from long latency and high energy consumption because deconvolutional computation includes not only convolution but also extra add-on operations. To realize the more efficient execution for deconvolution, we analyze its computation requirement and propose a ReRAM-based accelerator design, namely, RED. More specific, RED integrates two orthogonal methods, the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism and therefore improving performance. Experimental evaluations show that compared to the state-of-the-art ReRAM-based accelerator, RED can speed up operation 3.69-31.15x and reduce 8%_88.36% energy consumption. Download Paper (PDF; Only available from the DATE venue WiFi)
17:15	12.7.4	DESIGN OF RELIABLE DNN ACCELERATOR WITH UN-RELIABLE RERAM Speaker: Saibal Mukhopadhyay, GEORGIA TECH, US Authors: Yun Long and Saibal Mukhopadhyay, Georgia Institute of Technology, US Abstract Benefiting from the Computing-in-Memory (CIM) architecture and the unique device properties such as non-volatility, high density and fast read/write, ReRAM based deep learning accelerators provide a promising solution to greatly improve the computing efficiency for various artificial intelligence (AI) applications. However, the intrinsic stochastic behavior (the statistical distribution of device resistance, set/reset voltage, etc) making the computation error-prone. In this paper, we propose two algorithms to suppress the impact of device variation: (a) We employ the dynamical fixed point (DFP) data representation format to adaptively change the decimal point location, minimizing the unused integer bits. (b) We propose a noise-aware training methodology, enhancing the robustness of network to the parameter's variation. We evaluate the proposed algorithms with convolutional neural network (CNN) and recurrent neural network (RNN) across different dataset. Simulations indicate that, for all benchmarks, the accuracy is improved more than 15% with minimal hardware design overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

Time

Label

Presentation Title
Authors

16:00

12.7.1

AGING-AWARE LIFETIME ENHANCEMENT FOR MEMRISTOR-BASED NEUROMORPHIC COMPUTING
Speaker:
Shuhang Zhang, Technical University of Munich, DE
Authors:
Shuhang Zhang¹, Grace Li Zhang¹, Bing Li¹, Hai (Helen) Li² and Ulf Schlichtmann³
¹Technical University of Munich, DE; ²Duke University, US; ³TU München, DE
Abstract
Deep Neural Networks (DNNs) have been applied in various fields successfully. Such networks, however, require significant computing resources. Traditional CMOS-based implementation cannot efficiently implement the specific computing patterns such as matrix multiplication. Therefore, memristor-based crossbars have been proposed to accelerate such computing tasks by their analog nature, which also leads to a significant reduction of power consumption. Neural networks must be trained to recognize the features of the applications. This training process leads to many repetitive updates of the memristors in the crossbar. However, memristors in the crossbar can only be programmed reliably for a given number of times. Afterwards, the working range of the memristors deviates from the fresh state. As a result, the weights of the corresponding neural networks cannot be implemented correctly and the classification accuracy drops significantly. This phenomenon is called aging, and it limits the lifetime of memristor-based crossbars. In this paper, we propose a co-optimization framework to reduce the aging effect in software training and hardware mapping simultaneously to counter the aging effect. Experimental results demonstrate that the proposed framework can extend the lifetime of such crossbars up to 15 times, while the expected accuracy of classification is maintained.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:30

12.7.2

ENERGY-EFFICIENT CONVOLUTIONAL NEURAL NETWORKS WITH DETERMINISTIC BIT-STREAM PROCESSING
Speaker:
M. Hassan Najafi, University of Louisiana at Lafayette, US
Authors:
Sayed Abdolrasoul Faraji¹, M. Hassan Najafi², Bingzhe Li¹, Kia Bazargan³ and David Lilja¹
¹University of Minnesota, Twin Cities, US; ²University of Louisiana at Lafayette, US; ³University of Minnesota, US
Abstract
Stochastic computing (SC) has been used for low-cost and low power implementation of neural networks. Inherent inaccuracy and long latency of processing random bit-streams have made prior SC-based implementations inefficient compared to conventional fixed-point designs. Random or pseudo-random bitstreams often need to be processed for a very long time to produce acceptable results. This long latency leads to a significantly higher energy consumption than the binary design counterparts. Low-discrepancy sequences have been recently used for fast-converging deterministic computation with stochastic constructs. In this work, we propose a low-cost, low-latency, and energy-efficient implementation of convolutional neural networks based on low-discrepancy deterministic bit-streams. Experimental results show a significant reduction in the energy consumption compared to conventional random bitstream-based implementations and to the optimized fixed-point design with no quality degradation.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:00

12.7.3

RED: A RERAM-BASED DECONVOLUTION ACCELERATOR
Speaker:
Hai (Helen) Li, Duke University, US
Authors:
Zichen Fan¹, Ziru Li¹, Bing Li², Yiran Chen³ and Hai (Helen) Li³
¹Tsinghua University, CN; ²Duke university, US; ³Duke University, US
Abstract
Deconvolution has been widespread in neural networks. For example, it is essential for performing unsupervised learning in generative adversarial networks or constructing fully convolutional networks for semantic segmentation. Resistive RAM (ReRAM)-based processing-in-memory architecture has been widely explored in accelerating convolutional computation and demonstrates good performance. Performing deconvolution on existing ReRAM-based accelerator designs, however, suffers from long latency and high energy consumption because deconvolutional computation includes not only convolution but also extra add-on operations. To realize the more efficient execution for deconvolution, we analyze its computation requirement and propose a ReRAM-based accelerator design, namely, RED. More specific, RED integrates two orthogonal methods, the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism and therefore improving performance. Experimental evaluations show that compared to the state-of-the-art ReRAM-based accelerator, RED can speed up operation 3.69-31.15x and reduce 8%_88.36% energy consumption.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:15

12.7.4

DESIGN OF RELIABLE DNN ACCELERATOR WITH UN-RELIABLE RERAM
Speaker:
Saibal Mukhopadhyay, GEORGIA TECH, US
Authors:
Yun Long and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
Benefiting from the Computing-in-Memory (CIM) architecture and the unique device properties such as non-volatility, high density and fast read/write, ReRAM based deep learning accelerators provide a promising solution to greatly improve the computing efficiency for various artificial intelligence (AI) applications. However, the intrinsic stochastic behavior (the statistical distribution of device resistance, set/reset voltage, etc) making the computation error-prone. In this paper, we propose two algorithms to suppress the impact of device variation: (a) We employ the dynamical fixed point (DFP) data representation format to adaptively change the decimal point location, minimizing the unused integer bits. (b) We propose a noise-aware training methodology, enhancing the robustness of network to the parameter's variation. We evaluate the proposed algorithms with convolutional neural network (CNN) and recurrent neural network (RNN) across different dataset. Simulations indicate that, for all benchmarks, the accuracy is improved more than 15% with minimal hardware design overhead.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:30

End of session