9.4 Efficient DNN design with Approximate Computing.

Time	Label	Presentation Title Authors
08:30	9.4.1	PROXSIM: SIMULATION FRAMEWORK FOR CROSS-LAYER APPROXIMATE DNN OPTIMIZATION Speaker: Cecilia Eugenia De la Parra Aparicio, Robert Bosch GmbH, DE Authors: Cecilia De la Parra¹, Andre Guntoro¹ and Akash Kumar² ¹Robert Bosch GmbH, DE; ²TU Dresden, DE Abstract Through cross-layer approximation of Deep Neural Networks (DNN) significant improvements in hardware resources utilization for DNN applications can be achieved. This comes at the cost of accuracy degradation, which can be compensated through different optimization methods. However, DNN optimization is highly time-consuming in existing simulation frameworks for cross-layer DNN approximation, as they are usually implemented for CPU usage only. Specially for large-scale image processing tasks, the need of a more efficient simulation framework is evident. In this paper we present ProxSim, a specialized, GPU-accelerated simulation framework for approximate hardware, based on Tensorflow, which supports approximate DNN inference and retraining. Additionally, we propose a novel hardware-aware regularization technique for approximate DNN optimization. By using ProxSim, we report up to 11x savings in execution time, compared to a multi-thread CPU-based framework, and an accuracy recovery of up to 30% for three case studies of image classification with MNIST, CIFAR-10 and ImageNet. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.4.2	PCM: PRECISION-CONTROLLED MEMORY SYSTEM FOR ENERGY EFFICIENT DEEP NEURAL NETWORK TRAINING Speaker: Boyeal Kim, Seoul National University, KR Authors: Boyeal Kim¹, SangHyun Lee¹, Hyun Kim², Duy-Thanh Nguyen³, Minh-Son Le³, Ik Joon Chang³, Dohun Kwon⁴, Jin Hyeok Yoo⁵, Jun Won Choi⁴ and Hyuk-Jae Lee¹ ¹Seoul National University, KR; ²Seoul National University of Science and Technology, KR; ³Kyung Hee University, KR; ⁴Hanyang University, KR; ⁵Hanyang university, KR Abstract Deep neural network (DNN) training suffers from the significant energy consumption in memory system, and most existing energy reduction techniques for memory system have focused on introducing low precision that is compatible with computing unit (e.g., FP16, FP8). These researches have shown that even in learning the networks with FP16 data precision, it is possible to provide training accuracy as good as FP32, de facto standard of the DNN training. However, our extensive experiments show that we can further reduce the data precision while maintaining the training accuracy of DNNs, which can be obtained by truncating some least significant bits (LSBs) of FP16, named as hard approximation. Nevertheless, the existing hardware structures for DNN training cannot efficiently support such low precision. In this work, we propose a novel memory system architecture for GPUs, named as precision-controlled memory system (PCM), which allows for flexible management at the level of hard approximation. PCM provides high DRAM bandwidth by distributing each precision to different channels with as transposed data mapping on DRAM. In addition, PCM supports fine-grained hard approximation in the L1 data cache using software-controlled registers, which can reduce data movement and thereby improve energy saving and system performance. Furthermore, PCM facilitates the reduction of data maintenance energy, which accounts for a considerable portion of memory energy consumption, by controlling refresh period of DRAM. The experimental results show that in training CIFAR-100 dataset on Resnet-20 with precision tuning, PCM achieves energy saving and performance enhancement by 66% and 20%, respectively, without loss of accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.4.3	RED-CANE: A SYSTEMATIC METHODOLOGY FOR RESILIENCE ANALYSIS AND DESIGN OF CAPSULE NETWORKS UNDER APPROXIMATIONS Speaker: Alberto Marchisio, TU Wien (TU Wien), AT Authors: Alberto Marchisio¹, Vojtech Mrazek², Muhammad Abdullah Hanif³ and Muhammad Shafique³ ¹TU Wien (TU Wien), AT; ²Brno University of Technology, CZ; ³TU Wien, AT Abstract Recent advances in Capsule Networks (CapsNets) have shown their superior learning capability, compared to the traditional Convolutional Neural Networks (CNNs). However, the extremely high complexity of CapsNets limits their fast deployment in real-world applications. Moreover, while the resilience of CNNs have been extensively investigated to enable their energy-efficient implementations, the analysis of CapsNets' resilience is a largely unexplored area, that can provide a strong foundation to investigate techniques to overcome the CapsNets' complexity challenge. Following the trend of Approximate Computing to enable energy-efficient designs, we perform an extensive resilience analysis of the CapsNets inference subjected to the approximation errors. Our methodology models the errors arising from the approximate components (like multipliers), and analyze their impact on the classification accuracy of CapsNets. This enables the selection of approximate components based on the resilience of each operation of the CapsNet inference. We modify the TensorFlow framework to simulate the injection of approximation noise (based on the models of the approximate components) at different computational operations of the CapsNet inference. Our results show that the CapsNets are more resilient to the errors injected in the computations that occur during the dynamic routing (the softmax and the update of the coefficients), rather than other stages like convolutions and activation functions. Our analysis is extremely useful towards designing efficient CapsNet hardware accelerators with approximate components. To the best of our knowledge, this is the first proof-of-concept for employing approximations on the specialized CapsNet hardware. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP4-12, 968	TOWARDS BEST-EFFORT APPROXIMATION: APPLYING NAS TO APPROXIMATE COMPUTING Speaker: Weiwei Chen, Chinese Academy of Sciences, CN Authors: Weiwei Chen, Ying Wang, Shuang Yang, Cheng Liu and Lei Zhang, Chinese Academy of Sciences, CN Abstract The design of neural network architecture for code approximation involves a large number of hyper-parameters to explore, it is a non-trivial task to find an neural-based approximate computing solution that meets the demand of application-specified accuracy and Quality of Service (QoS). Prior works do not address the problem of 'optimal' network architectures design in program approximation, which depends on the user-specified constraints, the complexity of dataset and the hardware configuration. In this paper, we apply Neural Architecture Search (NAS) for searching and selecting the neural approximate computing and provide an automatic framework that tries to generate the best-effort approxi-mation result while satisfying the user-specified QoS/accuracy constraints. Compared with previous method, this work achieves more than 1.43x speedup and 1.74x energy reduction on average when applied to the AxBench benchmarks. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP4-13, 973	ON THE AUTOMATIC EXPLORATION OF WEIGHT SHARING FOR DEEP NEURAL NETWORK COMPRESSION Speaker: Etienne Dupuis, École Centrale de Lyon, FR Authors: Etienne Dupuis¹, David Novo², Ian O'Connor¹ and Alberto Bosio¹ ¹Lyon Institute of Nanotechnology, FR; ²Université de Montpellier, FR Abstract Deep neural networks demonstrate impressive inference results, particularly in computer vision and speech recognition. However, the computational workload and storage associated render their use prohibitive in resource-limited embedded systems. The approximate computing paradigm has been widely explored in both industrial and academic circles. It improves performance and energy-efficiency by relaxing the need for fully accurate operations. Consequently, there is a large number of implementation options with very different approximation strategies (such as pruning, quantization, low-rank factorization, knowledge distillation, ...). To the best of our knowledge, no automated approach exists for exploring, selecting and generating the best approximate versions of a given convolutional neural network (CNN) and the design objectives. The objective of this work in progress is to show that the design space exploration phase can enable significant network compression without noticeable accuracy loss. We demonstrate this via an example based on weight sharing and show that we can obtain 4x compression rate without re-training and the resulting network does not suffer from accuracy loss, in an int-16 version of LeNet-5 (5-layer 1,720-kbit CNN) using our method. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session

Time

Label

Presentation Title
Authors

08:30

9.4.1

PROXSIM: SIMULATION FRAMEWORK FOR CROSS-LAYER APPROXIMATE DNN OPTIMIZATION
Speaker:
Cecilia Eugenia De la Parra Aparicio, Robert Bosch GmbH, DE
Authors:
Cecilia De la Parra¹, Andre Guntoro¹ and Akash Kumar²
¹Robert Bosch GmbH, DE; ²TU Dresden, DE
Abstract
Through cross-layer approximation of Deep Neural Networks (DNN) significant improvements in hardware resources utilization for DNN applications can be achieved. This comes at the cost of accuracy degradation, which can be compensated through different optimization methods. However, DNN optimization is highly time-consuming in existing simulation frameworks for cross-layer DNN approximation, as they are usually implemented for CPU usage only. Specially for large-scale image processing tasks, the need of a more efficient simulation framework is evident. In this paper we present ProxSim, a specialized, GPU-accelerated simulation framework for approximate hardware, based on Tensorflow, which supports approximate DNN inference and retraining. Additionally, we propose a novel hardware-aware regularization technique for approximate DNN optimization. By using ProxSim, we report up to 11x savings in execution time, compared to a multi-thread CPU-based framework, and an accuracy recovery of up to 30% for three case studies of image classification with MNIST, CIFAR-10 and ImageNet.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:00

9.4.2

PCM: PRECISION-CONTROLLED MEMORY SYSTEM FOR ENERGY EFFICIENT DEEP NEURAL NETWORK TRAINING
Speaker:
Boyeal Kim, Seoul National University, KR
Authors:
Boyeal Kim¹, SangHyun Lee¹, Hyun Kim², Duy-Thanh Nguyen³, Minh-Son Le³, Ik Joon Chang³, Dohun Kwon⁴, Jin Hyeok Yoo⁵, Jun Won Choi⁴ and Hyuk-Jae Lee¹
¹Seoul National University, KR; ²Seoul National University of Science and Technology, KR; ³Kyung Hee University, KR; ⁴Hanyang University, KR; ⁵Hanyang university, KR
Abstract
Deep neural network (DNN) training suffers from the significant energy consumption in memory system, and most existing energy reduction techniques for memory system have focused on introducing low precision that is compatible with computing unit (e.g., FP16, FP8). These researches have shown that even in learning the networks with FP16 data precision, it is possible to provide training accuracy as good as FP32, de facto standard of the DNN training. However, our extensive experiments show that we can further reduce the data precision while maintaining the training accuracy of DNNs, which can be obtained by truncating some least significant bits (LSBs) of FP16, named as hard approximation. Nevertheless, the existing hardware structures for DNN training cannot efficiently support such low precision. In this work, we propose a novel memory system architecture for GPUs, named as precision-controlled memory system (PCM), which allows for flexible management at the level of hard approximation. PCM provides high DRAM bandwidth by distributing each precision to different channels with as transposed data mapping on DRAM. In addition, PCM supports fine-grained hard approximation in the L1 data cache using software-controlled registers, which can reduce data movement and thereby improve energy saving and system performance. Furthermore, PCM facilitates the reduction of data maintenance energy, which accounts for a considerable portion of memory energy consumption, by controlling refresh period of DRAM. The experimental results show that in training CIFAR-100 dataset on Resnet-20 with precision tuning, PCM achieves energy saving and performance enhancement by 66% and 20%, respectively, without loss of accuracy.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:30

9.4.3

RED-CANE: A SYSTEMATIC METHODOLOGY FOR RESILIENCE ANALYSIS AND DESIGN OF CAPSULE NETWORKS UNDER APPROXIMATIONS
Speaker:
Alberto Marchisio, TU Wien (TU Wien), AT
Authors:
Alberto Marchisio¹, Vojtech Mrazek², Muhammad Abdullah Hanif³ and Muhammad Shafique³
¹TU Wien (TU Wien), AT; ²Brno University of Technology, CZ; ³TU Wien, AT
Abstract
Recent advances in Capsule Networks (CapsNets) have shown their superior learning capability, compared to the traditional Convolutional Neural Networks (CNNs). However, the extremely high complexity of CapsNets limits their fast deployment in real-world applications. Moreover, while the resilience of CNNs have been extensively investigated to enable their energy-efficient implementations, the analysis of CapsNets' resilience is a largely unexplored area, that can provide a strong foundation to investigate techniques to overcome the CapsNets' complexity challenge. Following the trend of Approximate Computing to enable energy-efficient designs, we perform an extensive resilience analysis of the CapsNets inference subjected to the approximation errors. Our methodology models the errors arising from the approximate components (like multipliers), and analyze their impact on the classification accuracy of CapsNets. This enables the selection of approximate components based on the resilience of each operation of the CapsNet inference. We modify the TensorFlow framework to simulate the injection of approximation noise (based on the models of the approximate components) at different computational operations of the CapsNet inference. Our results show that the CapsNets are more resilient to the errors injected in the computations that occur during the dynamic routing (the softmax and the update of the coefficients), rather than other stages like convolutions and activation functions. Our analysis is extremely useful towards designing efficient CapsNet hardware accelerators with approximate components. To the best of our knowledge, this is the first proof-of-concept for employing approximations on the specialized CapsNet hardware.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

IP4-12, 968

TOWARDS BEST-EFFORT APPROXIMATION: APPLYING NAS TO APPROXIMATE COMPUTING
Speaker:
Weiwei Chen, Chinese Academy of Sciences, CN
Authors:
Weiwei Chen, Ying Wang, Shuang Yang, Cheng Liu and Lei Zhang, Chinese Academy of Sciences, CN
Abstract
The design of neural network architecture for code approximation involves a large number of hyper-parameters to explore, it is a non-trivial task to find an neural-based approximate computing solution that meets the demand of application-specified accuracy and Quality of Service (QoS). Prior works do not address the problem of 'optimal' network architectures design in program approximation, which depends on the user-specified constraints, the complexity of dataset and the hardware configuration. In this paper, we apply Neural Architecture Search (NAS) for searching and selecting the neural approximate computing and provide an automatic framework that tries to generate the best-effort approxi-mation result while satisfying the user-specified QoS/accuracy constraints. Compared with previous method, this work achieves more than 1.43x speedup and 1.74x energy reduction on average when applied to the AxBench benchmarks.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:01

IP4-13, 973

ON THE AUTOMATIC EXPLORATION OF WEIGHT SHARING FOR DEEP NEURAL NETWORK COMPRESSION
Speaker:
Etienne Dupuis, École Centrale de Lyon, FR
Authors:
Etienne Dupuis¹, David Novo², Ian O'Connor¹ and Alberto Bosio¹
¹Lyon Institute of Nanotechnology, FR; ²Université de Montpellier, FR
Abstract
Deep neural networks demonstrate impressive inference results, particularly in computer vision and speech recognition. However, the computational workload and storage associated render their use prohibitive in resource-limited embedded systems. The approximate computing paradigm has been widely explored in both industrial and academic circles. It improves performance and energy-efficiency by relaxing the need for fully accurate operations. Consequently, there is a large number of implementation options with very different approximation strategies (such as pruning, quantization, low-rank factorization, knowledge distillation, ...). To the best of our knowledge, no automated approach exists for exploring, selecting and generating the best approximate versions of a given convolutional neural network (CNN) and the design objectives. The objective of this work in progress is to show that the design space exploration phase can enable significant network compression without noticeable accuracy loss. We demonstrate this via an example based on weight sharing and show that we can obtain 4x compression rate without re-training and the resulting network does not suffer from accuracy loss, in an int-16 version of LeNet-5 (5-layer 1,720-kbit CNN) using our method.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

End of session