7.6 Special Session: Next Generation Processors and Architectures for Deep Learning

Printer-friendly version PDF version

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 4

Chair:
Theocharides Theocharis, University of Cyprus, CY

Co-Chair:
Shafique Muhammad, TU Wien, AT

Machine Learning is nowadays embedded in several computing devices, consumer electronics and cyber-physical systems. Smart sensors are deployed everywhere, in applications such as wearables and perceptual computing devices, and intelligent algorithms power the so called "Internet of Things". Similarly, smart cyber-physical systems emerge as a vital computation paradigm in a vast application spectrum ranging from consumer electronics to large-scale complex critical infrastructures. The need for smartification of such systems, and intelligent data analytics (especially in the era of Big Data), emphasizes the need of revolutionizing the way we build processors and systems geared towards machine learning (and deep learning in particular). Issues related from memory, to interconnect, and spanning across the hardware and software spectrum, need to be addressed typically by advances in technology, design methodologies, and new programming paradigms among others. The emergence of powerful embedded devices and ultra-low-power hardware has enabled us to transfer the paradigm of deep learning architectures and systems, from high-end costly clusters/supercomputers, to affordable systems and even mobile devices. Such systems and devices have not received the required attention so far in research and development, until the last few years, when such systems where initially proposed to accelerate deep learning, providing previously unattainable levels of performance of such algorithms, whilst maintaining the power and reliability constraints imposed by the nature of these embedded applications. This special session aims to present a holistic overview of emerging works in such architecture and systems, using all available technology spectrums, and bring together views from academia and industry in order to exchange information and explain how we can take advantage of existing and emerging hardware technologies in addressing the associated challenges.

TimeLabelPresentation Title
Authors
14:307.6.1RERAM-BASED ACCELERATOR FOR DEEP LEARNING
Speaker:
Hai Li, Duke university, US
Authors:
Li Bing1, Linghao Song2, Fan Chen2, Xuehai Qian3, Yiran Chen2 and Hai (Helen) Li4
1Duke university, US; 2Duke University, US; 3University of South California, US; 4Duke University/TUM-IAS, US
Abstract
Big data computing applications such as deep learning and graph analytic usually incur a large amount of data movements. Deploying such applications on conventional von Neumann architecture that separates the processing units and memory components likely leads to performance bottleneck due to the limited memory bandwidth. A common approach is to develop architecture and memory co-design methodologies to overcome the challenge. Our research follows the same strategy by leveraging resistive memory (ReRAM) to further enhance the performance and energy efficiency. Specifically, we employ the general principles behind processing-in-memory to design efficient ReRAM based accelerators that support both testing and training operations. Related circuit and architecture optimization will be discussed too.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:457.6.2EXPLOITING APPROXIMATE COMPUTING FOR DEEP LEARNING ACCELERATION
Speaker:
Jungook Choi, IBM Research, US
Authors:
Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Viji Srinivasan and Swagath Venkataramani, IBM T. J. Watson Research Center, US
Abstract
Deep Neural Networks (DNNs) have emerged as a powerful and versatile set of techniques to address challenging artificial intelligence (AI) problems. Applications in domains such as image/video processing, natural language processing, speech synthesis and recognition, genomics and many others have embraced deep learning as the foundational technique. DNNs achieve superior accuracy for these applications using very large models which require 100s of MBs of data storage, ExaOps of computation and high bandwidth for data movement. Despite advances in computing systems, training state-of-the-art DNNs on large datasets takes several days/weeks, directly limiting the pace of innovation and adoption. In this paper, we discuss how these challenges can be addressed via approximate computing. Based on our earlier studies demonstrating that DNNs are resilient to numerical errors from approximate computing, we present techniques to reduce communication overhead of distributed deep learning training via adaptive residual gradient compression ({em AdaComp}), and computation cost for deep learning inference via Prameterized clipping ACTivation ({em PACT}) based network quantization. Experimental evaluation demonstrates order of magnitude savings in communication overhead for training and computational cost for inference while not compromising application accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:107.6.3AN OVERVIEW OF NEXT-GENERATION ARCHITECTURES FOR MACHINE LEARNING: ROADMAP, OPPORTUNITIES AND CHALLENGES IN THE IOT ERA
Speaker:
Muhammad Shafique, Vienna University of Technology (TU Wien), AT
Authors:
Muhammad Shafique1, Theocharis Theocharides2, Christos Bouganis3, Muhammad Abdullah Hanif1, Faiq Khalid Lodhi4, Rehan Hafiz5 and Semeen Rehman1
1TU Wien, AT; 2University of Cyprus, CY; 3Imperial College London, GB; 4Department of Computer Engineering, Vienna University of Technology, AT; 5ITU, PK
Abstract
The number of connected Internet of Things (IoT) devices are expected to reach over 20 billion by 2020. These range from basic sensor nodes that log and report the data to the ones that are capable of processing the incoming information and taking an action accordingly. Machine learning, and in particular deep learning, is the de facto processing paradigm for intelligently processing these immense volumes of data. However, the resource inhibited environment of IoT devices, owing to their limited energy budget and lower compute capabilities, render them a challenging platform for deployment of desired data analytics. This paper provides an overview of the current and emerging trends in designing highly efficient, reliable, secure and scalable machine learning architectures for such devices. The paper highlights the focal challenges and obstacles being faced by the community in achieving its desired goals. The paper further presents a research roadmap that can help in addressing the highlighted challenges and thereby designing scalable, high-performance, and energy efficient architectures for performing machine learning on the edge.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:357.6.4INFERENCE OF QUANTIZED NEURAL NETWORKS ON HETEROGENEOUS ALL-PROGRAMMABLE DEVICES
Speaker:
Thomas Preusser, Xilinx Inc., IE
Authors:
Thomas Preußer1, Giulio Gambardella2, Nicholas Fraser2 and Michaela Blott3
1Technische Universität Dresden, DE; 2Xilinx Research Labs, IE; 3Xilinx, IE
Abstract
Neural networks have established as a generic and powerful means to approach challenging problems such as image classification, object detection or decision making. Their successful employment foots on an enormous demand of compute. The quantization of network parameters and the processed data has proven a valuable measure to reduce the challenges of network inference so effectively that the feasible scope of applications is expanded even into the embedded domain. This paper describes the making of a real-time object detection in a live video stream processed on an embedded all-programmable device. The presented case illustrates how the required processing is tamed and parallelized across both the CPU cores and the programmable logic and how the most suitable resources and powerful extensions, such as NEON vectorization, are leveraged for the individual processing steps. The crafted result is an extended Darknet framework implementing a fully integrated, end-to-end solution from video capture over object annotation to video output applying neural network inference at different quantization levels running at 16~frames per second on an embedded Zynq UltraScale+ (XCZU3EG) platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00