7.6 Special Session: Next Generation Processors and Architectures for Deep Learning

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 4

Chair:
Theocharides Theocharis, University of Cyprus, CY

Co-Chair:
Shafique Muhammad, TU Wien, AT

Machine Learning is nowadays embedded in several computing devices, consumer electronics and cyber-physical systems. Smart sensors are deployed everywhere, in applications such as wearables and perceptual computing devices, and intelligent algorithms power the so called "Internet of Things". Similarly, smart cyber-physical systems emerge as a vital computation paradigm in a vast application spectrum ranging from consumer electronics to large-scale complex critical infrastructures. The need for smartification of such systems, and intelligent data analytics (especially in the era of Big Data), emphasizes the need of revolutionizing the way we build processors and systems geared towards machine learning (and deep learning in particular). Issues related from memory, to interconnect, and spanning across the hardware and software spectrum, need to be addressed typically by advances in technology, design methodologies, and new programming paradigms among others. The emergence of powerful embedded devices and ultra-low-power hardware has enabled us to transfer the paradigm of deep learning architectures and systems, from high-end costly clusters/supercomputers, to affordable systems and even mobile devices. Such systems and devices have not received the required attention so far in research and development, until the last few years, when such systems where initially proposed to accelerate deep learning, providing previously unattainable levels of performance of such algorithms, whilst maintaining the power and reliability constraints imposed by the nature of these embedded applications. This special session aims to present a holistic overview of emerging works in such architecture and systems, using all available technology spectrums, and bring together views from academia and industry in order to exchange information and explain how we can take advantage of existing and emerging hardware technologies in addressing the associated challenges.

Time	Label	Presentation Title Authors
14:30	7.6.1	RERAM-BASED ACCELERATOR FOR DEEP LEARNING Speaker: Hai Li, Duke university, US Authors: Li Bing¹, Linghao Song², Fan Chen², Xuehai Qian³, Yiran Chen² and Hai (Helen) Li⁴ ¹Duke university, US; ²Duke University, US; ³University of South California, US; ⁴Duke University/TUM-IAS, US Abstract Big data computing applications such as deep learning and graph analytic usually incur a large amount of data movements. Deploying such applications on conventional von Neumann architecture that separates the processing units and memory components likely leads to performance bottleneck due to the limited memory bandwidth. A common approach is to develop architecture and memory co-design methodologies to overcome the challenge. Our research follows the same strategy by leveraging resistive memory (ReRAM) to further enhance the performance and energy efficiency. Specifically, we employ the general principles behind processing-in-memory to design efficient ReRAM based accelerators that support both testing and training operations. Related circuit and architecture optimization will be discussed too. Download Paper (PDF; Only available from the DATE venue WiFi)
14:45	7.6.2	EXPLOITING APPROXIMATE COMPUTING FOR DEEP LEARNING ACCELERATION Speaker: Jungook Choi, IBM Research, US Authors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Viji Srinivasan and Swagath Venkataramani, IBM T. J. Watson Research Center, US Abstract Deep Neural Networks (DNNs) have emerged as a powerful and versatile set of techniques to address challenging artificial intelligence (AI) problems. Applications in domains such as image/video processing, natural language processing, speech synthesis and recognition, genomics and many others have embraced deep learning as the foundational technique. DNNs achieve superior accuracy for these applications using very large models which require 100s of MBs of data storage, ExaOps of computation and high bandwidth for data movement. Despite advances in computing systems, training state-of-the-art DNNs on large datasets takes several days/weeks, directly limiting the pace of innovation and adoption. In this paper, we discuss how these challenges can be addressed via approximate computing. Based on our earlier studies demonstrating that DNNs are resilient to numerical errors from approximate computing, we present techniques to reduce communication overhead of distributed deep learning training via adaptive residual gradient compression ({em AdaComp}), and computation cost for deep learning inference via Prameterized clipping ACTivation ({em PACT}) based network quantization. Experimental evaluation demonstrates order of magnitude savings in communication overhead for training and computational cost for inference while not compromising application accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
15:10	7.6.3	AN OVERVIEW OF NEXT-GENERATION ARCHITECTURES FOR MACHINE LEARNING: ROADMAP, OPPORTUNITIES AND CHALLENGES IN THE IOT ERA Speaker: Muhammad Shafique, Vienna University of Technology (TU Wien), AT Authors: Muhammad Shafique¹, Theocharis Theocharides², Christos Bouganis³, Muhammad Abdullah Hanif¹, Faiq Khalid Lodhi⁴, Rehan Hafiz⁵ and Semeen Rehman¹ ¹TU Wien, AT; ²University of Cyprus, CY; ³Imperial College London, GB; ⁴Department of Computer Engineering, Vienna University of Technology, AT; ⁵ITU, PK Abstract The number of connected Internet of Things (IoT) devices are expected to reach over 20 billion by 2020. These range from basic sensor nodes that log and report the data to the ones that are capable of processing the incoming information and taking an action accordingly. Machine learning, and in particular deep learning, is the de facto processing paradigm for intelligently processing these immense volumes of data. However, the resource inhibited environment of IoT devices, owing to their limited energy budget and lower compute capabilities, render them a challenging platform for deployment of desired data analytics. This paper provides an overview of the current and emerging trends in designing highly efficient, reliable, secure and scalable machine learning architectures for such devices. The paper highlights the focal challenges and obstacles being faced by the community in achieving its desired goals. The paper further presents a research roadmap that can help in addressing the highlighted challenges and thereby designing scalable, high-performance, and energy efficient architectures for performing machine learning on the edge. Download Paper (PDF; Only available from the DATE venue WiFi)
15:35	7.6.4	INFERENCE OF QUANTIZED NEURAL NETWORKS ON HETEROGENEOUS ALL-PROGRAMMABLE DEVICES Speaker: Thomas Preusser, Xilinx Inc., IE Authors: Thomas Preußer¹, Giulio Gambardella², Nicholas Fraser² and Michaela Blott³ ¹Technische Universität Dresden, DE; ²Xilinx Research Labs, IE; ³Xilinx, IE Abstract Neural networks have established as a generic and powerful means to approach challenging problems such as image classification, object detection or decision making. Their successful employment foots on an enormous demand of compute. The quantization of network parameters and the processed data has proven a valuable measure to reduce the challenges of network inference so effectively that the feasible scope of applications is expanded even into the embedded domain. This paper describes the making of a real-time object detection in a live video stream processed on an embedded all-programmable device. The presented case illustrates how the required processing is tamed and parallelized across both the CPU cores and the programmable logic and how the most suitable resources and powerful extensions, such as NEON vectorization, are leveraged for the individual processing steps. The crafted result is an extended Darknet framework implementing a fully integrated, end-to-end solution from video capture over object annotation to video output applying neural network inference at different quantization levels running at 16~frames per second on an embedded Zynq UltraScale+ (XCZU3EG) platform. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD). Lunch Breaks (Großer Saal + Saal 1) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 20, 2018 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 21, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 22, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00 Keynote Lecture in "Saal 2" 13:20 - 13:50 Coffee Break 15:30 - 16:00