9.4 Efficient DNN design with Approximate Computing.

Printer-friendly version PDF version

Date: Thursday 12 March 2020
Time: 08:30 - 10:00
Location / Room: Stendhal

Chair:
Daniel Menard, INSA Rennes, FR

Co-Chair:
Seokhyeong Kang, Pohang University of Science and Technology, KR

Deep Neural Networks (DNN) are widely used in numerous domains. Cross-layer DNN approximation requires efficient simulation framework. The GPU-accelerated simulation framework, ProxSim, supports DNN inference and retraining for approximate hardware. A significant amount of energy is consumed during the training process due to excessive memory accesses. The precision-controlled memory systems, dedicated for GPUs, allow flexible management  of approximation. New generation of networks, like Capsule Networks, provide better learning capabilities but at the expense of high complexity. ReD-CaNe methodology analyzes resilience through an error injection and approximates them. 

TimeLabelPresentation Title
Authors
08:309.4.1PROXSIM: SIMULATION FRAMEWORK FOR CROSS-LAYER APPROXIMATE DNN OPTIMIZATION
Speaker:
Cecilia Eugenia De la Parra Aparicio, Robert Bosch GmbH, DE
Authors:
Cecilia De la Parra1, Andre Guntoro1 and Akash Kumar2
1Robert Bosch GmbH, DE; 2TU Dresden, DE
Abstract
Through cross-layer approximation of Deep Neural Networks (DNN) significant improvements in hardware resources utilization for DNN applications can be achieved. This comes at the cost of accuracy degradation, which can be compensated through different optimization methods. However, DNN optimization is highly time-consuming in existing simulation frameworks for cross-layer DNN approximation, as they are usually implemented for CPU usage only. Specially for large-scale image processing tasks, the need of a more efficient simulation framework is evident. In this paper we present ProxSim, a specialized, GPU-accelerated simulation framework for approximate hardware, based on Tensorflow, which supports approximate DNN inference and retraining. Additionally, we propose a novel hardware-aware regularization technique for approximate DNN optimization. By using ProxSim, we report up to 11x savings in execution time, compared to a multi-thread CPU-based framework, and an accuracy recovery of up to 30% for three case studies of image classification with MNIST, CIFAR-10 and ImageNet.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.4.2PCM: PRECISION-CONTROLLED MEMORY SYSTEM FOR ENERGY EFFICIENT DEEP NEURAL NETWORK TRAINING
Speaker:
Boyeal Kim, Seoul National University, KR
Authors:
Boyeal Kim1, SangHyun Lee1, Hyun Kim2, Duy-Thanh Nguyen3, Minh-Son Le3, Ik Joon Chang3, Dohun Kwon4, Jin Hyeok Yoo5, Jun Won Choi4 and Hyuk-Jae Lee1
1Seoul National University, KR; 2Seoul National University of Science and Technology, KR; 3Kyung Hee University, KR; 4Hanyang University, KR; 5Hanyang university, KR
Abstract
Deep neural network (DNN) training suffers from the significant energy consumption in memory system, and most existing energy reduction techniques for memory system have focused on introducing low precision that is compatible with computing unit (e.g., FP16, FP8). These researches have shown that even in learning the networks with FP16 data precision, it is possible to provide training accuracy as good as FP32, de facto standard of the DNN training. However, our extensive experiments show that we can further reduce the data precision while maintaining the training accuracy of DNNs, which can be obtained by truncating some least significant bits (LSBs) of FP16, named as hard approximation. Nevertheless, the existing hardware structures for DNN training cannot efficiently support such low precision. In this work, we propose a novel memory system architecture for GPUs, named as precision-controlled memory system (PCM), which allows for flexible management at the level of hard approximation. PCM provides high DRAM bandwidth by distributing each precision to different channels with as transposed data mapping on DRAM. In addition, PCM supports fine-grained hard approximation in the L1 data cache using software-controlled registers, which can reduce data movement and thereby improve energy saving and system performance. Furthermore, PCM facilitates the reduction of data maintenance energy, which accounts for a considerable portion of memory energy consumption, by controlling refresh period of DRAM. The experimental results show that in training CIFAR-100 dataset on Resnet-20 with precision tuning, PCM achieves energy saving and performance enhancement by 66% and 20%, respectively, without loss of accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.4.3RED-CANE: A SYSTEMATIC METHODOLOGY FOR RESILIENCE ANALYSIS AND DESIGN OF CAPSULE NETWORKS UNDER APPROXIMATIONS
Speaker:
Alberto Marchisio, TU Wien (TU Wien), AT
Authors:
Alberto Marchisio1, Vojtech Mrazek2, Muhammad Abdullah Hanif3 and Muhammad Shafique3
1TU Wien (TU Wien), AT; 2Brno University of Technology, CZ; 3TU Wien, AT
Abstract
Recent advances in Capsule Networks (CapsNets) have shown their superior learning capability, compared to the traditional Convolutional Neural Networks (CNNs). However, the extremely high complexity of CapsNets limits their fast deployment in real-world applications. Moreover, while the resilience of CNNs have been extensively investigated to enable their energy-efficient implementations, the analysis of CapsNets' resilience is a largely unexplored area, that can provide a strong foundation to investigate techniques to overcome the CapsNets' complexity challenge. Following the trend of Approximate Computing to enable energy-efficient designs, we perform an extensive resilience analysis of the CapsNets inference subjected to the approximation errors. Our methodology models the errors arising from the approximate components (like multipliers), and analyze their impact on the classification accuracy of CapsNets. This enables the selection of approximate components based on the resilience of each operation of the CapsNet inference. We modify the TensorFlow framework to simulate the injection of approximation noise (based on the models of the approximate components) at different computational operations of the CapsNet inference. Our results show that the CapsNets are more resilient to the errors injected in the computations that occur during the dynamic routing (the softmax and the update of the coefficients), rather than other stages like convolutions and activation functions. Our analysis is extremely useful towards designing efficient CapsNet hardware accelerators with approximate components. To the best of our knowledge, this is the first proof-of-concept for employing approximations on the specialized CapsNet hardware.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-12, 968TOWARDS BEST-EFFORT APPROXIMATION: APPLYING NAS TO APPROXIMATE COMPUTING
Speaker:
Weiwei Chen, Chinese Academy of Sciences, CN
Authors:
Weiwei Chen, Ying Wang, Shuang Yang, Cheng Liu and Lei Zhang, Chinese Academy of Sciences, CN
Abstract
The design of neural network architecture for code approximation involves a large number of hyper-parameters to explore, it is a non-trivial task to find an neural-based approximate computing solution that meets the demand of application-specified accuracy and Quality of Service (QoS). Prior works do not address the problem of 'optimal' network architectures design in program approximation, which depends on the user-specified constraints, the complexity of dataset and the hardware configuration. In this paper, we apply Neural Architecture Search (NAS) for searching and selecting the neural approximate computing and provide an automatic framework that tries to generate the best-effort approxi-mation result while satisfying the user-specified QoS/accuracy constraints. Compared with previous method, this work achieves more than 1.43x speedup and 1.74x energy reduction on average when applied to the AxBench benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-13, 973ON THE AUTOMATIC EXPLORATION OF WEIGHT SHARING FOR DEEP NEURAL NETWORK COMPRESSION
Speaker:
Etienne Dupuis, École Centrale de Lyon, FR
Authors:
Etienne Dupuis1, David Novo2, Ian O'Connor1 and Alberto Bosio1
1Lyon Institute of Nanotechnology, FR; 2Université de Montpellier, FR
Abstract
Deep neural networks demonstrate impressive inference results, particularly in computer vision and speech recognition. However, the computational workload and storage associated render their use prohibitive in resource-limited embedded systems. The approximate computing paradigm has been widely explored in both industrial and academic circles. It improves performance and energy-efficiency by relaxing the need for fully accurate operations. Consequently, there is a large number of implementation options with very different approximation strategies (such as pruning, quantization, low-rank factorization, knowledge distillation, ...). To the best of our knowledge, no automated approach exists for exploring, selecting and generating the best approximate versions of a given convolutional neural network (CNN) and the design objectives. The objective of this work in progress is to show that the design space exploration phase can enable significant network compression without noticeable accuracy loss. We demonstrate this via an example based on weight sharing and show that we can obtain 4x compression rate without re-training and the resulting network does not suffer from accuracy loss, in an int-16 version of LeNet-5 (5-layer 1,720-kbit CNN) using our method.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session