Booklet Proof Reading

Printer-friendly version PDF version

Goto Session:

1.1 Opening Session: Plenary, Awards Ceremony & Keynote Addresses

Date: Tuesday 20 March 2018
Time: 08:30 - 10:30
Location / Room: Großer Saal

Chair:
Jan Madsen, DATE 2018 General Chair, DTU, DK

Co-Chair:
Ayse Coskun, DATE 2018 Programme Chair, Boston University, US

TimeLabelPresentation Title
Authors
08:301.1.1WELCOME ADDRESSES
Speakers:
Jan Madsen1 and Ayse Coskun2
1DTU, DK; 2Boston University, US
08:451.1.2PRESENTATION OF AWARDS
Abstract
  • 2018 EDAA Achievement Award
  • EDAA Outstanding Dissertations Award 2017
  • DATE Fellow Award
  • IEEE Fellow Award
  • IEEE CEDA and CS TTTC Outstanding Service Contribution Award 2017
09:151.1.3KEYNOTE ADDRESS: THE RESPONSIBILITY SENSITIVE SAFETY (RSS) FORMAL MODEL TOWARD SAFETY GUARANTEES FOR AUTONOMOUS VEHICLES
Speaker:
Amnon Shashua, Intel Corporation, US
Abstract
In recent years, car makers and tech companies are racing toward self-driving cars. A critical component in getting society acceptance to the technology is to find a way to guarantee safety. The prevailing common wisdom is a data-driven empirical approach for safety validation where the more mileage driven the better the maturity of the system must be. I will describe a model in which the sources of errors due to Planning (the actions and decisions for negotiating motion in traffic) can be fenced out from the data driven approach through a formal model of the common sense behind human judgment of what it means to cause an accident and how to define actions that will guarantee that the AV will never cause an accident due to Planning. The model creates a clear distinction of what can be certified by regulators and what should be left to the judgment of AV manufacturers. The RSS model also puts in context the conversation of "ethical dilemmas" by providing a formal framework for the discussion.
1.1.4KEYNOTE ADDRESS: PROGRAMMING LIVING CELLS: DESIGN AUTOMATION TO MAP CIRCUITS TO DNA
Speaker:
Christopher Voigt, MIT, US
Abstract
Platforms are being established to to facilitate large genetic engineering projects. A desired cellular function is divided into systems that can be developed independently and then combined. Genetic sensors allow cells to receive environmental and cell state information. Sensory information is integrated by genetic circuits, which control the conditions and timing of a response. The circuit outputs are connected to actuators that control what the cell is doing, from building molecules to moving and communicating. Design automation tools from the electronics industry are applied to map a circuit design to a DNA sequence. Collectively, this enables a wide range of applications, for example cells that communicate to build a material, navigate the human body to treat a disease, or protect plants by responding to the environment.
10:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

UB01 Session 1

Date: Tuesday 20 March 2018
Time: 10:30 - 12:30
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB01.1ARCHON: AN ARCHITECTURE-OPEN RESOURCE-DRIVEN CROSS-LAYER MODELLING FRAMEWORK
Authors:
Fei Xia1, Ashur Rafiev1, Mohammed Al-Hayanni2, Alexei Iliasov1, Rishad Shafik1, Alexander Romanovsky1 and Alex Yakovlev1
1Newcastle University, GB; 2Newcastle University, UK and University of Technology and HCED, IQ
Abstract
This demonstration showcases a modelling method for large complex computing systems focusing on many-core types and concentrating on the crosslayer aspects. The resource-driven models aim to help system designers reason about, analyse, and ultimately design such systems across all conventional computing and communication layers, from application, operating system, down to the finest hardware details. The framework and tool support the notion of selective abstraction and are suitable for studying such non-functional properties such as performance, reliability and energy consumption.

More information ...
UB01.2TOPOLINANO & MAGCAD: A DESIGN AND SIMULATION FRAMEWORK FOR THE EXPLORATION OF EMERGING TECHNOLOGIES
Authors:
Umberto Garlando and Fabrizio Riente, Politecnico di Torino, IT
Abstract
We developed a design framework that enables the exploration and analysis of emerging beyond-CMOS technologies. It is composed of two powerful tools: ToPoliNano and MagCAD. Different technologies are supported, and new ones could be added thanks to their modular structure. ToPoliNano starts from a VHDL description of a circuit and performs the place&route following the technological constraints. The resulting circuit can be simulated both at logical or physical level. MagCAD is a layout editor where the user can design custom circuits, by placing basic elements of the selected technology. The tool can extract a VHDL netlist based on compact models of placed elements derived from experiments or physical simulations. Circuits can be verified with standard VHDL simulators. The design workflow will be demonstrated at the U-booth to show how those tools could be a valuable help in the studying and development of emerging technologies and to obtain feedbacks from the scientific community.

More information ...
UB01.3ADVANCED SIMULATION OF QUANTUM COMPUTATIONS
Authors:
Zulehner Alwin and Robert Wille, Johannes Kepler University Linz, AT
Abstract
Quantum computation is a promising emerging technology which allows for substantial speed-ups compared to classical computation. Since physical realizations of quantum computers are in their infancy, most research in this domain still relies on simulations on classical machines. This causes an exponential overhead which current simulators try to tackle with straight forward array-based representations and massive hardware power. There also exist solutions based on decision diagrams (graph-based approaches) that try to tackle the complexity by exploiting redundancies in quantum states and operations. However, they did not get established since they yield speedups only for certain benchmarks. Here, we demonstrate a new graph-based simulation approach which clearly outperforms state-of-the-art simulators. By this, users can efficiently execute quantum algorithms even if the respective quantum computers are not broadly available yet.

More information ...
UB01.4OTPG: SPECIFICATION-BASED CONSTRUCTION OF ONLINE TPGS FOR MICROPROCESSORS
Authors:
Mikhail Chupilko, Alexander Kamkin and Andrei Tatarnikov, ISP RAS, RU
Abstract
This work presents an approach to construction of online test program generators (TPGs). The approach is intended to use specifications of ISA presented in nML/mmuSL specification languages. They are processed by a meta-generator to obtain their binary representations supplied with meta information and a test generation core compatible with the target microprocessor. The test generation core is loaded as a binary image into the target microprocessor's memory (for experiments we're using QEMU for MIPS) and produces test cases to be processed (incl. results checking) by an executor. It should be noticed that the meta-generator and the executor are not obligatory run at the same microprocessor (especially, if it is highly incomplete). The final goal of the project is to propose a method of obtaining online TPGs for a wide range of ISAs, and to develop a mature tool implementing this method.

More information ...
UB01.5ABSYNTH: A COMPREHENSIVE APPROACH TO FRONT TO BACK ANALOG BLOCK DESIGN AUTOMATION
Authors:
Abhaya Chandra Kammara S.1, Sidney Pontes-Filho2 and Andreas König2
1ISE, TU Kaiserslautern, DE; 2University of Kaiserslautern, DE
Abstract
ABSYNTH was first presented in CEBIT 2014 where complete, practical circuit sizing approaches have been shown using meta-heuristics on trusted simulators. This tool was then proven by its use in design of several cells in a research project. Here, we present the extension to our nested optimization approach that creates a symmetric and well matched layout in every step for every instance in the population of the swarm, that is extracted in our flow to provide feedback to the cost function impacting on the population update for more viable and robust circuits. The layout optimization presented in this DEMO works with Cadence Layout design tools. Our initial focus is, motivated by Industry 4.0, IoT, on cells for signal conditioning electronics with reconfigurability and Self-X features.[1] Abhaya C. Kammara, L.Palanichamy, and A. König, "Multi-Objective optimization and visualization for analog automation", Complex. Intell. Syst, Springer, DOI 10.1007/s40747-016-0027-3, 2016

More information ...
UB01.6WARE: WEARABLE ELECTRONICS DIRECTIONAL AUGMENTED REALITY
Authors:
Gabriele Miorandi1, Walter Vendraminetto2, Federico Fraccaroli3, Davide Quaglia1 and Gianluca Benedetti4
1University of Verona, IT; 2EDALab Srl, IT; 3Wagoo LLC, IT; 4Wagoo Italia srls, IT
Abstract
Augmented Reality (AR) currently require large form factors, weight, cost and frequent recharging cycles that reduce usability. Connectivity, image processing, localization, and direction evaluation lead to high processing and power requirements. A multi-antenna system, patented by the industrial partner, enables a new generation of smart eye-wear that elegantly requires less hardware, connectivity, and power to provide AR functionalities. They will allow users to directionally locate nearby radio emitting sources that highlight objects of interest (e.g., people or retail items) by using existing standards like Bluetooth Low Energy, Apple's iBeacon and Google's Eddystone. This booth will report the current level of research addressed by the Computer Science Department of University of Verona, Wagoo LLC, and Wagoo Italia srls. In the presented demo, different objects emit an "I am here" signal and a prototype of the smart glasses shows the information related to the observed object.

More information ...
UB01.7IIDEAA: DESIGN SPACE EXPLORATION FOR FUNCTIONAL-LEVEL APPROXIMATION
Authors:
Marcello Traiola1, Mario Barbareschi2, Marcello Traiola3 and Alberto Bosio3
1LIRMM, FR; 2DIETI - University of Naples Federico II, IT; 3LIRMM - University of Montpellier / CNRS, FR
Abstract
Approximate Computing (AxC) aims at enabling the production of computing systems which can satisfy the rising performance demands and can improve the energy efficiency. AxC exploits the gap between the level of accuracy required by the users and the precision provided by the computing system, for achieving diverse optimizations. Various AxC techniques have been proposed so far for several applications and, unfortunately, existing approaches are application specific and a general and systematic methodology to automatically define approximate algorithms is still an open challenge. In this works we introduce a methodology which makes use of mutation techniques to obtain approximate versions of a given application described as a C/C++ code. We designed and implemented IIDEAA, an automatic tool exploiting (i) a source-to-source manipulation technique and (ii) an Evolutionary search engine, in order to search for the best functional approximation version of the given C/C++ code.

More information ...
UB01.8IIP GENERATORS TO EASE ANALOG IC DESIGN
Authors:
Benjamin Prautsch, Uwe Eichler and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS/EAS, DE
Abstract
Semiconductor technology has shown significant progress over the last decades. Digital EDA (electronic design automation) allowed that this progress could be converted to high-performance digital ICs. Analog components are part of Systems-on-Chip (SoC) too, but analog EDA lags far behind. Therefore, a lot of effort was spent to automate analog IC design. Mayor results are constraint-based layout-aware optimization tools using predefined layout templates or pure automation as well as analog generators containing expert knowledge. While optimization is a holistic top-down approach, generators allow parameterized and fast bottom-up generation of critical schematic and layout parts, pre-planned by experienced designers. With IIP Generators, we follow three use cases to ease analog design: 1) design on higher hierarchy levels, 2) development of hierarchical high-level IIPs, and 3) automated design porting due to highly technology-independent blocks down to 22nm.

More information ...
UB01.9CIJTAG: CONCURRENT IJTAG DEMONSTRATOR
Author:
Krenz-Baath René, Hamm-Lippstadt University of Applied Sciences, DE
Abstract
The flexibility of on-chip instrument access enabled by IEEE 1687 (IJTAG) has shown tremendous improvements in modern industrial designs. Due to a constantly increasing spektrum of tasks performed through 1687 networks such as performing test operations during production test, on-line test operations as well as operating health monitors the test requirements in modern designs increase dramatically with respect to test performance, responsiveness and low power. These requirements have a major impact on the design of such test infrastructures. In complex designs with large test infrastructures it might be challenging to comply with the large spectrum of requirements. Concurrent IJTAG is novel partitioning concept to a reconfigurable test infrastructure in order to enable an independent operation of different sections of the test infrastructure. The proposed demonstrator shows the first FPGA-based implementation of concurrent IJTAG test infrastructures.

More information ...
UB01.10VIRTUAL PROTOTYPE MAKANI: ANALYZING THE USAGE OF POWER MANAGEMENT TECHNIQUES AND EXTRA-FUNCTIONAL PROPERTIES BY USING VIRTUAL PROTOTYPING
Author:
Sören Schreiner, OFFIS – Institute for Information Technology, DE
Abstract
My Phd work consists of analyzing the correct usage of power management techniques, as well as the analysis of extra-functional properties, including power and timing properties, in MPSoCs. Especially in safety-critical environments the power management gets safety-critical too, since it is able to influence the overall system behavior. To demonstrate my methodologies a mixed-critical multi-rotor system and its corresponding virtual prototype is used. The multi-rotor system's avionics is served by a Xilinx Zynq 7000 MPSoC. The hardware architecture includes ARM and MicroBlaze cores, a NoC for communication and peripherals. The MPSoC processes the flight algorithms with triple modular redundancy and a mission-critical video processing task. The virtual prototype consists of a virtual platform and an environmental model. The virtual platform is equipped with my measuring tool libraries to generate traces of the observed power management techniques and the extra-functional properties.

More information ...
12:30End of session
13:00Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.1 Executive Panel: How Electronics May Change Our Lives, and the World

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Saal 2

Moderator:
Antun Domic, Synopsys, US

Innovation runs strong in our industry. A 17 qubits Quantum Computer (QC) has been shipped to Delft University researchers; an initiative to make cloud QC commercially available for businesses and research has been announced; if, and once available QC may change the landscape of finance, imaging diagnostics, pharmacology, meteorology and, of course, security all the way. After decades of domination by general purpose CPU and GPU, innovation is disrupting computing architectures: Massively parallel Tensor Processing Units (TPU) have demonstrated that a computer can learn from past experience, and then beat a 9-dan human professional Go player, or classify zillions of images with unprecedented accuracy and speed. Autonomous "Things" in which a wide breadth of sensors feed a processor with huge amounts of data, that are analyzed in order to make decisions that are then sent to actuators with minimal if any human supervision are emerging; Advanced Driver-Assistance Systems (ADAS) are the top of the iceberg, exceeding SAE level 3 requirements, ADAS have been adopted by scores of automakers, and all the top 10 automakers have already announced plans toward SAE level 4, and 5 availability before the end of this decade. Finally, computers are digital, but the world is analog; scores of sensors and actuators are the eyes, the ears, the nose, and the arms of the most advanced processors, and the most advanced applications could not exist without them. Today, sensors are designed at the established technology nodes, and often manufactured using electro-mechanical processes which make their integration with their host processors challenging. Is this going to continue, or will they be submerged by digital, and eventually be designed, integrated, and manufactured using the very same emerging technology nodes? Electronics may truly change our lives, and the world, IF we will be able to expand the scope of QC beyond cryptography, and the scope of AI beyond image recognition, IF ADAS technology will not be overwhelmed by legal. IF "More than Moore" will join forces with "More of Moore". IF… This Executive Panel, moderated by Dr. Antun Domic, Synopsys CTO, gathers world experts to discuss the many opportunities that lie ahead, and the challenges to be overcome

Panelists:

  • Loic Lietar, Greenwaves Technologies, FR
  • Martin Roetteler, Microsoft, US
  • Horst Symanzik, Bosch Sensortec, DE
  • Olivier Temam, Google, US
  • Martin Duncan, STMicroelectronics, IT
13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.2 Energy Efficient Neural Networks

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Konf. 6

Chair:
Hai (Helen) Li, Duke University, US

Co-Chair:
Muhammad Shafique, Vienna University of Technology (TU Wien), AT

This session focuses on energy efficient neural network architectures. The first paper proposes a methodology that enables aggressive voltage scaling of accelerator weight memories to improve the energy-efficiency of DNN accelerators. The second paper introduces methods to optimize the memory usage in DNN training. The third paper presents HyperPower, that enables efficient Bayesian optimization and random search in the context of power- and memory-constrained hyper- parameter optimization for NNs running on a given hardware platform. Finally, the last paper presents a new sparse matrix format to maximize the inference speed of the LSTM accelerator. The session also includes 2 IP papers ReCom and SparseNN, which both focus on energy efficiency of neural networks.

TimeLabelPresentation Title
Authors
11:302.2.1(Best Paper Award Candidate)
MATIC: LEARNING AROUND ERRORS FOR EFFICIENT LOW-VOLTAGE NEURAL NETWORK ACCELERATORS
Speaker:
Sung Kim, University of Washington, US
Authors:
Sung Kim, Patrick Howe, Thierry Moreau, Armin Alaghi, Luis Ceze and Visvesh Sathe, University of Washington, US
Abstract
As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly. However, while accelerators with high performance and efficiency on convolutional deep neural networks (Conv-DNNs) have been developed, less progress has been made with regards to fully-connected DNNs (FC-DNNs). In this paper, we propose MATIC (Memory Adaptive Training with In-situ Canaries), a methodology that enables aggressive voltage scaling of accelerator weight memories to improve the energy-efficiency of DNN accelerators. To enable accurate operation with voltage overscaling, MATIC combines the characteristics of destructive SRAM reads with the error resilience of neural networks in a memory-adaptive training process. Furthermore, PVT-related voltage margins are eliminated using bit-cells from synaptic weights as in-situ canaries to track runtime environmental variation. Demonstrated on a low-power DNN accelerator that we fabricate in 65 nm CMOS, MATIC enables up to 60-80 mV of voltage overscaling (3.3x total energy reduction versus the nominal voltage), or 18.6x application error reduction.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.2.2MAXIMIZING SYSTEM PERFORMANCE BY BALANCING COMPUTATION LOADS IN LSTM ACCELERATORS
Speaker:
Junki Park, POSTECH, KR
Authors:
Junki Park1, Jaeha Kung1, Wooseok Yi2 and Jae-Joon Kim1
1Pohang University of Science and Techology, KR; 2POSTECH CITE, KR
Abstract
The LSTM is a popular neural network model for modeling or analyzing the time-varying data. The main operation of LSTM is a matrix-vector multiplication and it becomes sparse (spMxV) due to the widely-accepted weight pruning in deep learning. This paper presents a new sparse matrix format, named CBSR, to maximize the inference speed of the LSTM accelerator. In the CBSR format, speed-up is achieved by balancing out the computation loads over PEs. Along with the new format, we present a simple network transformation to completely remove the hardware overhead incurred when using the CBSR format. Also, the detailed analysis on the impact of network size or the number of PEs is performed, which lacks in the prior work. The simulation results show 16~38% improvement in the system performance compared to the well-known CSC/CSR format. The power analysis is also performed in 65nm CMOS technology to show 9~22% energy savings.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.2.3MODNN: MEMORY OPTIMAL DNN TRAINING ON GPUS
Speaker:
Xiaoming Chen, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Xiaoming Chen, Danny Z. Chen and Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
Graphics processing units (GPUs) are widely adopted to accelerate the training of deep neural networks (DNNs). However, the limited GPU memory size restricts the maximum scale of DNNs that can be trained on GPUs, which presents serious challenges. This paper proposes an moDNN framework to optimize the memory usage in DNN training. moDNN supports automatic tuning of DNN training code to match any given memory budget (not smaller than the theoretical lower bound). By taking full advantage of overlapping computations and data transfers, we have developed heuristics to judiciously schedule data offloading and prefetching, together with training algorithm selection, to optimize the memory usage. We further introduce a new sub-batch size selection method which also greatly reduces the memory usage. moDNN can save the memory usage up to 50X, compared with the ideal case which assumes that the GPU memory is sufficient to hold all data. When executing moDNN on a GPU with 12GB memory, the performance loss is only 8%, which is much lower than that caused by the best known existing approach, vDNN. moDNN is also applicable to multiple GPUs and attains 1.84X average speedup on two GPUs.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.2.4HYPERPOWER: POWER- AND MEMORY-CONSTRAINED HYPER-PARAMETER OPTIMIZATION FOR NEURAL NETWORKS
Speaker:
Dimitrios Stamoulis, Carnegie Mellon University, US
Authors:
Dimitrios Stamoulis1, Ermao Cai1, Da-Cheng Juan2 and Diana Marculescu1
1Carnegie Mellon University, US; 2Google Research, US
Abstract
While selecting the hyper-parameters of Neural Networks (NNs) has been so far treated as an art, the emergence of more complex, deeper architectures poses increasingly more challenges to designers and Machine Learning (ML) practitioners, especially when power and memory constraints need to be considered. In this work, we propose HyperPower, a framework that enables efficient Bayesian optimization and random search in the context of power- and memory-constrained hyper-parameter optimization for NNs running on a given hardware platform. HyperPower is the first work (i) to show that power consumption can be used as a low-cost, a priori known constraint, and (ii) to propose predictive models for the power and memory of NNs executing on GPUs. Thanks to HyperPower, the number of function evaluations and the best test error achieved by a constraint-unaware method are reached up to 112.99x and 30.12x faster, respectively, while never considering invalid configurations. HyperPower significantly speeds up the hyper-parameter optimization, achieving up to 57.20x more function evaluations compared to constraint-unaware methods for a given time interval, effectively yielding significant accuracy improvements by up to 67.6%.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-1, 84RECOM: AN EFFICIENT RESISTIVE ACCELERATOR FOR COMPRESSED DEEP NEURAL NETWORKS
Speaker:
Houxiang Ji, Shanghai Jiao Tong University, CN
Authors:
Houxiang Ji1, Linghao Song2, Li Jiang1, Hai (Helen) Li3 and Yiran Chen2
1Shanghai Jiao Tong University, CN; 2Duke University, US; 3Duke University/TUM-IAS, US
Abstract
Deep Neural Networks (DNNs) play a key role in prevailing machine learning applications. Resistive random-access memory (ReRAM) is capable of both computation and storage, contributing to the acceleration on DNNs process in memory. Besides, DNNs have a significant amount of zero weights, which provides a possibility to reduce computation cost by skipping ineffectual calculations on zero weights. However, the irregular distribution of zero weights in DNNs makes it difficult for resistive accelerators to take advantage of the sparsity, because resistive accelerators have a high reliance on regular matrix-vector multiplication in ReRAM. In this work, we propose ReCom, the first resistive accelerator to support sparse DNN processing. ReCom is an efficient resistive accelerator for compressed deep neural networks, where DNN weights are structurally compressed to eliminate zero parameters and become more friendly to computation in ReRAM, and zero DNN activations are also considered at the same time. Two technologies, Structurally-compressed Weight Oriented Fetching (SWOF) and In-layer Pipeline for Memory and Computation (IPMC),are particularly proposed to efficiently process the compressed DNNs in ReRAM. In our evaluation, ReCom can achieve 3.37x speedup and 2.41x energy efficiency compared to a state-of-the-art resistive accelerator.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-2, 645SPARSENN: AN ENERGY-EFFICIENT NEURAL NETWORK ACCELERATOR EXPLOITING INPUT AND OUTPUT SPARSITY
Speaker:
Jingyang Zhu, Hong Kong University of Science and Technology, HK
Authors:
Jingyang Zhu, Jingbo Jiang, Xizi Chen and Chi-Ying Tsui, Hong Kong University of Science and Technology, HK
Abstract
Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computational complexity poses a challenge to the hardware design. In this work, we leverage the intrinsic activation sparsity of DNN to substantially reduce the execution cycles and the energy consumption. An end-to-end training algorithm is proposed to develop a lightweight (less than 5% overhead) run-time predictor for the output activation sparsity on the fly. Furthermore, an energy-efficient hardware architecture, SparseNN, is proposed to exploit both the input and output sparsity. SparseNN is a scalable architecture with distributed memories and processing elements connected through a dedicated on-chip network. Compared with the state-of-the-art accelerators which only exploit the input sparsity, SparseNN can achieve a 10%-70% improvement in throughput and a power reduction of around 50%.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.3 High-Level Synthesis

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Konf. 1

Chair:
Selma Saidi, Hamburg University of Technology, DE

Co-Chair:
Daniel Ziener, University of Twente, NL

This session addresses high-level synthesis for easing the development of application-specific designs. First, user-guided optimizations for high-level synthesis based on innovative resource prediction using CNNs will be discussed for an area-reduction advisor. The second paper proposes a look-ahead scheduling scheme to minimize the area for functional units. The final talk presents a direct HLS synthesis path ofsoftware-customizable floating-point cores that does not rely on external libraries or floating-point code generators.

TimeLabelPresentation Title
Authors
11:302.3.1(Best Paper Award Candidate)
SENSEI: AN AREA-REDUCTION ADVISOR FOR FPGA HIGH-LEVEL SYNTHESIS
Speaker:
Hsuan Hsiao, University of Toronto, CA
Authors:
Hsuan Hsiao and Jason H. Anderson, University of Toronto, CA
Abstract
High-level synthesis (HLS) provides an easy-to-use abstraction for designing hardware circuits. However, standard datatypes in high-level languages are overprovisioned for typical applications, incurring extra area since the underlying FPGA hardware can support arbitrary bitwidths. This area inefficiency can be overcome by enabling the use of arbitrary-width datatypes at the source code level. However, this requires that HLS users spend time and effort on examining all program variables and quantifying their area impact, which can be intractable especially with large, complex programs and time-consuming synthesis. We propose Sensei, an advisor that predicts the post-synthesis area savings brought about by reducing bitwidth and presents users with a ranking of program variables and their area impact. Equipped with a convolutional neural network (CNN)-based predictor, Sensei achieves high area-prediction accuracy and enables rapid exploration of area-saving opportunities.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.3.2A FAST AND EFFECTIVE LOOKAHEAD AND FRACTIONAL SEARCH BASED SCHEDULING ALGORITHM FOR HIGH-LEVEL SYNTHESIS
Speaker:
Shantanu Dutt, University of Illinois at Chicago, US
Authors:
Shantanu Dutt and Ouwen Shi, University of Illinois at Chicago, US
Abstract
We present a latency-constrained iterative list scheduling type algorithm, FALLS, to minimize the total number of functional units (FUs) allocated, and thus the total area, in high-level synthesis designs. The algorithm incorporates a novel lookahead technique to selectively schedule available operations by allocating the needed FUs earlier or reserving available FUs for scheduling more timing-urgent operations later, such that no additional FU is needed and higher FU utilization is obtained. Further, a fractional search framework is developed to iteratively estimate the number of FUs of each function type required in the final design based on the current scheduling solution and FU utilization, and reiterate the lookahead-based list scheduling with the new FU allocation estimate to further increase FU utilization. Extensive experiments conducted over several DFGs and a wide range of latency constraints demonstrate that FALLS is much more effective than other approximate state-of-the-art algorithms in both number of FUs and total FU area, and has a much smaller runtime. Results also show that FALLS has only an average 5.5% optimality gap compared to an optimal integer linear programming (ILP) formulation, but is 278k times faster. FALLS also performs much better in architectural (FU + mux/demux + register) area, interconnect congestion and number of interconnects than approximate algorithms, and is at most 4.0% worse in them than the ILP method.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.3.3HIGH-LEVEL SYNTHESIS OF SOFTWARE-CUSTOMIZABLE FLOATING-POINT CORES
Speaker:
Hsuan Hsiao, University of Toronto, CA
Authors:
Samridhi Bansal1, Hsuan Hsiao1, Tomasz Czajkowski2 and Jason H. Anderson1
1University of Toronto, CA; 2Intel Corp., CA
Abstract
Parameterized cores with fixed capabilities are typically used for floating-point (FP) operations on FPGAs. However, such standard cores can be over provisioned or lack specific specializations as required by applications. We consider FP cores described in the C language, synthesized to hardware using the LegUp high-level synthesis (HLS) tool [1]. Their software specification permits straightforward customization to non-compliant variants having superior area and performance characteristics, such as reduced-precision floating point, or cores without full IEEE 754 exceptions support. We create and evaluate the IEEE 754 FP standard cores for the key operations of addition, subtraction, division and multiplication, targeted to an FPGA and compare with widely used optimized RTL FP cores from Altera [7] and FloPoCo [3]. The software-specified HLSgenerated cores are surprisingly close to the optimized RTL cores in terms of area/performance, and superior in certain cases, such as FP division.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-3, 476ACCLIB: ACCELERATORS AS LIBRARIES
Speaker:
Jacob R. Stevens, Purdue University, US
Authors:
Jacob Stevens1, Yue Du2, Vivek Kozhikkottu3 and Anand Raghunathan1
1Purdue University, US; 2IBM, US; 3Intel Corporation, US
Abstract
Accelerator-based computing, which has been a mainstay of System-on-Chips (SoCs) is of growing interest to a wider range of computing systems. However, the significant design effort required to identify a computational target for acceleration, design a hardware accelerator, verify the correctness of the accelerator, integrate the accelerator into the system, and rewrite applications to use the accelerator, is a major bottleneck to the widespread adoption of accelerator-based computing. The classical approach to this problem is based on top-down methodologies such as automatic HW/SW partitioning and high-level synthesis (HLS). While HLS has advanced significantly and is seeing increased adoption, it does not leverage the ability of experienced human designers to craft highly optimized RTL, nor does it leverage the growing body of already existing hardware accelerators. In this work, we propose ACCLIB, a design framework that allows software developers to utilize existing libraries of pre-designed hardware accelerators automatically with no prior knowledge of the function of the accelerators, with minimal knowledge of hardware design, and with minimal design effort. To accomplish this, ACCLIB uses formal verification techniques to match a target software function with a functionally equivalent accelerator from a library of accelerators. It also generates the required HW/SW interfaces as well as the code necessary to offload the computation to the accelerator. We validate ACCLIB by applying it to accelerate six different applications using a library of hardware accelerators in just over one hour per application, demonstrating that the proposed approach has the potential to lower the barrier to adoption of accelerator-based computing.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.4 Model Checking

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Konf. 2

Chair:
Armin Biere, JKU Linz, AT

Co-Chair:
Daniel Große, University Bremen, DE

The session consists of four papers on model checking. It ranges from model checking of multiple properties, improving bit-level model checking, checking specific properties in processor design to software model checking of the Linux kernel.

TimeLabelPresentation Title
Authors
11:302.4.1(Best Paper Award Candidate)
EFFICIENT VERIFICATION OF MULTI-PROPERTY DESIGNS (THE BENEFIT OF WRONG ASSUMPTIONS)
Speaker:
Eugene Goldberg, Diffblue, US
Authors:
Eugene Goldberg1, Matthias Gudemann2, Daniel Kroening3 and Rajdeep Mukherjee4
1DiffBlue, US; 2DiffBlue, DE; 3DiffBlue, GB; 4Oxford University, GB
Abstract
We consider the problem of efficiently checking a set of safety properties P1,...,Pk of one design. We introduce a new approach called Ja-verification, where Ja stands for ``Just-Assume'' (as opposed to ``assume-guarantee''). In this approach, when proving a property Pi, one assumes that every property Pj for j != i holds. The process of proving properties either results in showing that P1,...,Pk hold without any assumptions or finding a ``debugging set'' of properties. The latter identifies a subset of failed properties that are the first to break. The design behaviors that cause the properties in the debugging set to fail must be fixed first. Importantly, in our approach, there is no need to prove the assumptions used. We describe the theory behind our approach and report experimental results that demonstrate substantial gains in performance, especially in the cases where a small debugging set exists.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.4.2COMBINING PDR AND REVERSE PDR FOR HARDWARE MODEL CHECKING
Speaker:
Tobias Seufert, University of Freiburg, DE
Authors:
Tobias Seufert and Christoph Scholl, University Freiburg, DE
Abstract
In the last few years IC3 resp. PDR attracted a lot of attention as a SAT-based hardware verification approach without needing to unroll the transition relation as in Bounded Model Checking (BMC). Motivated by different strengths of forward and backward traversal already observed in BDD based model checking and by an exponential complexity gap between original PDR and its reverted counterpart 'Reverse PDR' (which starts its analysis with the initial states instead of the unsafe states as in the original PDR), we take a closer look at Reverse PDR and we present a combined forward/backward version of PDR that inherits the advantages of both original and Reverse PDR. Our experimental results on benchmarks from the Hardware Model Checking Competition demonstrate clear benefits of the combined approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.4.3SYMBOLIC QUICK ERROR DETECTION USING SYMBOLIC INITIAL STATE FOR PRE-SILICON VERIFICATION
Speaker:
Mohammad Rahmani Fadiheh, University of Kaiserslautern, DE
Authors:
Mohammad Rahmani Fadiheh1, Joakim Urdahl1, Srinivasa Shashank Nuthakki2, Subhasish Mitra2, Dominik Stoffel1 and Wolfgang Kunz1
1University of Kaiserslautern, DE; 2Stanford University, US
Abstract
Driven by the demand for highly customizable processor cores for IoT and related applications, there is a renewed interest in effective but low-cost techniques for verifying systems-on-chip (SoCs). This paper revisits the problem of processor verification and presents a radically different approach when compared to the state of the art. The proposed approach is highly automated and leverages recent progress in the field of post-silicon validation by the method of Quick Error Detection (QED) and Symbolic Quick Error Detection (SQED). In this paper, we modify SQED by incorporating a symbolic initial state in its BMC-based analysis and generalize the approach into the S2QED method. As a first advantage, S2QED can separate logic bugs from electrical bugs in QED-based post-silicon validation. Secondly, it also makes a strong contribution to pre-silicon verification by proving that the execution of each instruction is independent of its context in the program. The manual efforts for the proposed approach are orders of magnitude smaller than for conventional property checking. Our experimental results demonstrate the potential of S2QED using the Aquarius open-source processor example.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.4.4VERIFICATION OF TREE-BASED HIERARCHICAL READ-COPY UPDATE IN THE LINUX KERNEL
Speaker:
Paul McKenney, IBM Linux Technology Center, US
Authors:
Lihao Liang1, Paul McKenney2, Daniel Kroening1 and Tom Melham1
1University of Oxford, GB; 2IBM Linux Technology Center, US
Abstract
Read-Copy Update (RCU) is a scalable, high-performance Linux-kernel synchronization mechanism that runs low-overhead readers concurrently with updaters. Production-quality RCU implementations are decidedly non-trivial and their stringent validation is mandatory. This suggests use of formal verification. Previous formal verification efforts for RCU either focus on simple implementations or use modeling languages. In this paper, we construct a model directly from the source code of Tree RCU in the Linux kernel, and use the CBMC program analyzer to verify its safety and liveness properties. To the best of our knowledge, this is the first verification of a significant part of RCU's source code—an important step towards integration of formal verification into the Linux kernel's regression test suite.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.5 GPU and GPU-based heterogeneous system management

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Konf. 3

Chair:
Andrea Marongiu, Università di Bologna, IT

Co-Chair:
Carles Hernandez, BSC, ES

GPUs are at the heart of several modern heterogeneous systems, where the common communication paradigm between the CPU and the GPU is shared memory. The papers in this session propose novel techniques to deal with i) efficient shared memory management and ii) GPU multiprocessor scheduling in presence of process variation. The first paper focuses on GPU-based heterogeneous systems with a shared last-level cache (SLLC) and proposes a novel metric that combines CPU/GPU miss count and "hit utility" to devise an effective cache-way partitioning. The second paper proposes a technique to mitigate the negative effects of cache and memory controller sharing in GPUs running multiple workloads. The third paper discusses a HW technique to mitigate the effects of hardware variability (e.g., process variations (PVs) and negative bias temperature instability (NBTI)) in GPU Streaming Processors (SPs).

TimeLabelPresentation Title
Authors
11:302.5.1(Best Paper Award Candidate)
HVSM: HARDWARE-VARIABILITY AWARE STREAMING PROCESSORS' MANAGEMENT POLICY IN GPUS
Speaker:
Jingweijia Tan, Jilin University, CN
Authors:
Jingweijia Tan1 and Kaige Yan2
1Jilin University, CN; 2College of Communication Engineering, Jilin University, CN
Abstract
GPUs are widely used in general-purpose high performance computing field due to their highly parallel architecture. In recent years, a new era with nanometer scale integrated circuit manufacture process has come, as a consequence, GPUs' computation capability gets even stronger. However, as process technology scales down, hardware variability, e.g., process variations (PVs) and negative bias temperature instability (NBTI), has a higher impact on the chip quality. The parallelism of GPU desires high consistency of hardware units on chip, otherwise, the worst unit will inevitably become the bottleneck. So the hardware variability becomes a pressing concern to further improve GPUs' performance and lifetime, not only in integrated circuit fabrication, but more in GPU architecture design. Streaming Processors (SPs) are the key units in GPUs, which perform most of parallel computing operations. Therefore, in this work, we focus on mitigating the impact of hardware variability in GPU SPs. We first model and analyze SPs' performance variations under hardware variability. Then, we observe that both PV and NBTI have large impact on SP's performance. We further observe unbalanced SP utilization, e.g., some SPs are idle when others are active, during program execution. Leveraging both observations, we propose a Hardware Variability-aware SPs' Management policy (HVSM), which dynamically prioritizes the fast SPs, regroups SPs in a two-level granularity and dispatches computation in appropriate SPs. Our experimental results show HVSM effectively reduces the impact of hardware variability, which can translate to 28% performance improvement or 14.4% lifetime extension for a GPU chip.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.5.2THROUGHPUT OPTIMIZATION AND RESOURCE ALLOCATION ON GPUS UNDER MULTI-APPLICATION EXECUTION
Speaker:
Iraklis Anagnostopoulos, Southern Illinois University Carbondale, US
Authors:
Srinivasa Reddy Punyala, Theodoros Marinakis, Arash Komaee and Iraklis Anagnostopoulos, Southern Illinois University Carbondale, US
Abstract
Platform heterogeneity prevails as a solution to the throughput and computational challenges imposed by parallel applications and technology scaling. Specifically, Graphics Processing Units (GPUs) are based on the Single Instruction Multiple Thread (SIMT) paradigm and they can offer tremendous speed-up for parallel applications. However, GPUs were designed to execute a single application at a time. In case of simultaneous multi-application execution, due to the GPUs' massive multi-threading paradigm, applications compete against each other using destructively the shared resources (caches and memory controllers) resulting in significant throughput degradation. In this paper, a methodology for minimizing interference in shared resources and provide efficient concurrent execution of multiple applications on GPUs is presented. Particularly, the proposed methodology (i) performs application classification; (ii) analyzes the per-class interference; (iii) finds the best matching between classes; and (iv) employs an efficient resource allocation. Experimental results showed that the proposed approach increases the throughput of the system for two concurrent applications by an average of 36% compared to the default execution and 10% compared to an exahustive profile-based optimization technique.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.5.3SET VARIATION-AWARE SHARED LAST-LEVEL CACHE MANAGEMENT FOR CPU-GPU HETEROGENEOUS ARCHITECTURE
Speaker:
Xin Li, Shandong University, CN
Authors:
Zhaoying Li, Lei Ju, Hongjun Dai, Xin Li, Mengying Zhao and Zhiping Jia, Shandong University, CN
Abstract
Heterogeneous CPU-GPU multiprocessor systems-on-chip (HMPSoC) becomes a popular architecture choice for high performance embedded systems, where shared last-level cache (LLC) management becomes a critical design consideration. We observe that within a sampling period, CPU and GPU may have distinct access behaviors over various LLC sets. In this work, we propose a light-weighted and fined-grained cache management policy to cope with the CPU-GPU access behavior variation among cache sets. In particular, CPU and GPU requests are prioritized disparately in each LLC set during cache block insertion and promotion, based on the per-core utility behaviors and a per-set CPU-GPU miss counter. Experimental results show that our LLC management scheme outperforms the two state-of-the-art schemes TAP-RRIP and LSP by 12.6% and 10.01%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-4, 464HPXA: A HIGHLY PARALLEL XML PARSER
Speaker:
Smruti Sarangi, IIT Delhi, IN
Authors:
Isaar Ahmad, Sanjog Patil and Smruti R. Sarangi, IIT Delhi, IN
Abstract
State of the art XML parsing approaches read an XML file byte by byte, and use complex finite state machines to process each byte. In this paper, we propose a new parser, HPXA, which reads and processes 16 bytes at a time. We designed most of the components ab initio, to ensure that they can process multiple XML tokens and tags in parallel. We propose two basic elements - a sparse 1D array compactor, and a hardware unit called LTMAdder that takes its decisions based on adding the rows of a lower triangular matrix. We demonstrate that we are able to process 16 bytes in parallel with very few pipeline stalls for a suite of widely used XML benchmarks. Moreover, for a 28nm technology node, we can process XML data at 106 Gbps, which is roughly 6.5X faster than competing prior work.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.6 Circuit Locking and Camouflaging

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Konf. 4

Chair:
Debdeep Mukhophyadhyay, IIT Kharagpur, IN

Co-Chair:
Yiorgos Makris, UT Dallas, US

Intellectual property piracy, counterfeiting and reverse-engineering are serious threats for the supply chain in advanced microelectronics. This session presents novel approaches to protect circuits against these threats. The techniques deploy nanotechnology and novel timing scheme to obtain efficient protections.

TimeLabelPresentation Title
Authors
11:302.6.1CYCLIC LOCKING AND MEMRISTOR-BASED OBFUSCATION AGAINST CYCSAT AND INSIDE FOUNDRY ATTACKS
Speaker:
Hai Zhou, Northwestern University, US
Authors:
Amin Rezaei, Yuanqi Shen, Shuyu Kong, Jie Gu and Hai Zhou, Northwestern University, US
Abstract
The high cost of IC design has made chip protection one of the first priorities of the semiconductor industry. Although there is a common impression that combinational circuits must be designed without any cycles, circuits with cycles can be combinational as well. Such cyclic circuits can be used to reliably lock ICs. Moreover, since memristor is compatible with CMOS structure, it is possible to efficiently obfuscate cyclic circuits using polymorphic memristor-CMOS gates. In this case, the layouts of the circuits with different functionalities look exactly identical, making it impossible even for an inside foundry attacker to distinguish the defined functionality of an IC by looking at its layout. In this paper, we propose a comprehensive chip protection method based on cyclic locking and polymorphic memristor-CMOS obfuscation. The robustness against state-of-the-art key-pruning attacks is demonstrated and the overhead of the polymorphic gates is investigated.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.6.2TIMINGCAMOUFLAGE: IMPROVING CIRCUIT SECURITY AGAINST COUNTERFEITING BY UNCONVENTIONAL TIMING
Speaker:
Li Zhang, Technical University of Munich, DE
Authors:
Grace Li Zhang1, Bing Li2, Bei Yu3, David Z. Pan4 and Ulf Schlichtmann5
1TU München (TUM), DE; 2TU München, DE; 3The Chinese University of Hong Kong, HK; 4University of Texas at Austin, US; 5Technical University of Munich, DE
Abstract
With recent advances in reverse engineering, attackers can reconstruct a netlist to counterfeit chips by opening the die and scanning all layers of original chips. This relatively easy counterfeiting is made possible by the use of the standard simple clocking scheme where all combinational blocks function within one clock period. In this paper, we propose a method to invalidate the assumption that a netlist completely represents the function of a circuit. With the help of wave-pipelining paths, this method forces attackers to capture delay information from manufactured chips, which is a very challenging task because we also introduce false paths. Experimental results confirm that wave-pipelining paths and false paths can be constructed in benchmark circuits successfully with only a negligible cost, while the potential attack techniques can be thwarted.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.6.3ADVANCING HARDWARE SECURITY USING POLYMORPHIC AND STOCHASTIC SPIN-HALL EFFECT DEVICES
Speaker:
Satwik Patnaik, New York University, AE
Authors:
Satwik Patnaik1, Nikhil Rangarajan2, Johann Knechtel3, Ozgur Sinanoglu4 and Shaloo Rakheja2
1NEW YORK UNIVERSITY, US; 2New York University, US; 3New York University Abu Dhabi (NYUAD), AE; 4New York University Abu Dhabi, AE
Abstract
Protecting intellectual property (IP) in electronic circuits has become a serious challenge in recent years. Logic locking/encryption and layout camouflaging are two prominent techniques for IP protection. Most existing approaches, however, particularly those focused on CMOS integration, incur excessive design overheads resulting from their need for additional circuit structures or device-level modifications. This work leverages the innate polymorphism of an emerging spin-based device, called the giant spin-Hall effect (GSHE) switch, to simultaneously enable locking and camouflaging within a single instance. Using the GSHE switch, we propose a powerful primitive that enables cloaking all the 16 Boolean functions possible for two inputs. We conduct a comprehensive study using state-of-the-art Boolean satisfiability (SAT) attacks to demonstrate the superior resilience of the proposed primitive in comparison to several others in the literature. While we tailor the primitive for deterministic computation, it can readily support stochastic computation; we argue that stochastic behavior can break most, if not all, existing SAT attacks. Finally, we discuss the resilience of the primitive against various side-channel attacks as well as invasive monitoring at runtime, which are arguably even more concerning threats than SAT attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.7 Special Session: Spintronics based New Computing Paradigms and Applications

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Konf. 5

Chair:
Zhao Weisheng, Beihang University, CN

Co-Chair:
Tahoori Mehdi, Karlsruhe Institute of Technology, DE

n recent technology nodes, the well-known "moore's law" tends to slow down. Indeed, the continuously decreasing size of the CMOS transistors and operating frequencies result in serious power consumption, heat dissipation and reliability issues. Among the solutions investigated to overcome these limitations, the use of emerging nano-devices mixed (or not) with CMOS circuits is often referred as the « More than Moore » concept. In particular, logic circuits based on non-volatile memories can be an efficient solution to reduce the power, to improve the reliability and can offer new paradigms for computing. We are convinced that this research field has become a hot topic for the DATE community. The aim of this session is to bring together the worldwide leading experts (from respectively USA, Belgium, China and Germany) related to this hot topic to share the most recent results and discuss the future challenges. Different computing paradigms will be involved in this special session benefiting from interesting nature of spintronics devices. The invited speakers will talk about devices, design and compact modeling aspects, and applications, permitting a full development platform from devices to circuit & systems based on spintronics. n recent technology nodes, the well-known "moore's law" tends to slow down. Indeed, the continuously decreasing size of the CMOS transistors and operating frequencies result in serious power consumption, heat dissipation and reliability issues. Among the solutions investigated to overcome these limitations, the use of emerging nano-devices mixed (or not) with CMOS circuits is often referred as the « More than Moore » concept. In particular, logic circuits based on non-volatile memories can be an efficient solution to reduce the power, to improve the reliability and can offer new paradigms for computing. We are convinced that this research field has become a hot topic for the DATE community. The aim of this session is to bring together the worldwide leading experts (from respectively USA, Belgium, China and Germany) related to this hot topic to share the most recent results and discuss the future challenges. Different computing paradigms will be involved in this special session benefiting from interesting nature of spintronics devices. The invited speakers will talk about devices, design and compact modeling aspects, and applications, permitting a full development platform from devices to circuit & systems based on spintronics.

TimeLabelPresentation Title
Authors
11:302.7.1MAIN MEMORY ORGANIZATION TRADE-OFFS WITH DRAM AND STT-MRAM OPTIONS BASED ON EXTENDED GEM5/NVMAIN SIMULATION FRAMEWORK
Speaker:
Manu Komalan, IMEC, BE
Authors:
Manu Komalan1, Oh Hyung Rock1, Matthias Hartmann1, Sushil Sakhare1, Christian Tenllado2, Jose Ignacio Gomez2, Gouri Sankar Kar1, Arnaud Furnemont1, Francky Catthoor1, Sophiane Senni3, David Novo4, Abdoulaye Gamatie5 and Lionel Torres6
1IMEC, BE; 2Universidad Complutense de Madrid (UCM), ES; 3LIRMM, FR; 4French National Centre for Scientific Research (CNRS), FR; 5CNRS LIRMM / University of Montpellier, FR; 6University of Montpellier, FR
Abstract
Current main memory organizations in embedded and mobile application systems are DRAM dominated. The everincreasing gap between today's processor and memory speeds makes the DRAM subsystem design a major aspect of computer system design. However, the limitations to DRAM scaling and other associated challenges like refresh provides some undesired trade-offs between performance, energy and area to be made by architecture designers. Several emerging NVM options are being explored to at least partly remedy this but today it is very hard to assess the viability of these proposals because the simulations are not fully based on realistic assumptions on the NVM memory technologies and on the system architecture level. In this paper, we propose to use realistic, calibrated STTMRAM models and a well calibrated cross-layer simulation and exploration framework, named SEAT, to better consider technologies aspects and architecture constraints. We will focus on general purpose/mobile SoC multi-core architectures. We will highlight results for a number of relevant benchmarks, representatives of numerous applications based on actual system architecture.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:452.7.2EXPLORING THE OPPORTUNITY OF IMPLEMENTING NEUROMORPHIC COMPUTING SYSTEMS WITH SPINTRONIC DEVICES
Speaker:
Hai Li, Duke University, US
Authors:
Bonan Yan1, Fan Chen1, Yaojun Zhang2, Chang Song1, Hai (Helen) Li3 and Yiran Chen1
1Duke University, US; 2University of Pittsburgh, US; 3Duke University/TUM-IAS, US
Abstract
Many cognitive algorithms such as neural networks cannot be efficiently executed by von Neumann architectures, the performance of which is constrained by the memory wall between microprocessor and memory hierarchy. Hence, researchers started to investigate new computing paradigms such as neuromorphic computing that can adapt their structure to the topology of the algorithms and accelerate their executions. New computing units have been also invented to support this effort by leveraging emerging nano-devices. In this work, we will discuss the opportunity of implementing neuromorphic computing systems with spintronic devices. We will also provide insights on how spintronic devices fit into different part of neuromorphic computing systems. Approaches to optimize the circuits are also discussed.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.7.3SPINTRONIC NORMALLY-OFF HETEROGENEOUS SYSTEM-ON-CHIP DESIGN
Speaker:
Rajendra Bishnoi, Karlsruhe Institute of Technology, DE
Authors:
Anteneh Gebregiorgis, Rajendra Bishnoi and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
One of the major challenges in device down-scaling is the increase in the leakage power, which becomes a major component in the overall system power consumption. One way to deal with this problem is to introduce the concept of normally-off instant-on computing architectures, in which the system components are powered off when they are not active. An associated challenge is the back-up and restoration of system states, which in turn can introduce additional costs that erode some of the gains. A promising alternative is the use of non-volatile storage elements in the System-on-Chip (SoC) design which can instantly power-down and retain their values. In this work, we show how we can design a normally-off SoC by exploiting non-volatile latches, flip-flops and registers. The idea is to design a hybrid architecture containing conventional CMOS bistables as well as different flavors of spintronic-based non-volatile storage elements, to balance performance, area, and energy efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:152.7.4MAGNETIC SKYRMIONS FOR FUTURE POTENTIAL MEMORY AND LOGIC APPLICATIONS: ALTERNATIVE INFORMATION CARRIERS
Speaker and Author:
Wang Kang, Beihang University, CN
Abstract
Magnetic skyrmions are swirling topological configurations, which are mostly induced by chiral interactions between atomic spins in non-centrosymmetric magnetic bulks or in thin films with broken inversion symmetry. They hold promise as information carriers in future ultra-dense, low-power memory and logic devices owing to the nanocale size and extremely low spin-polarized currents needed to move them. To date, an intense research effort has led to the identification, creation/annihilation, motion and manipulation of skyrmions at room temperature. Meanwhile, a rich variety of skyrmion-based device concepts and prototypes have been proposed, indicating the considerable potential of magnetic skyrmions in future electronic applications. However, current studies mainly focus on physical or principle investigations, whereas the electrical design methodology, implementation and evaluations are still lacking. In this paper, we will bring the readers in the "design, automation and test (DAT) society" the current status and outlook of skyrmions in relation to future potential racetrack memory and neuromorphic computing applications. Most importantly, we also want to evoke the effort from the DAT society to address the challenges, e.g., all-electrical manipulation of skyrmions at room temperature, for the research and development of practical skyrmion-based electronics.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.7.5NOVEL APPLICATION OF SPINTRONICS IN COMPUTING, SENSING, STORAGE AND CYBERSECURITY
Speaker:
Anirudh Iyengar, Pennsylvania State University, US
Author:
Swaroop Ghosh, Pennsylvania State University, US
Abstract
With conventional Von Neumann computing struggling to match the energy-efficiency of biological systems, there is pressing need to explore alternative computing models. CMOS switches, although universal, fails to offer additional features to meet this end goal. Recent experimental studies have revealed that spintronics possess many promising features that can not only enable non Von Neumann compute models but also high-density storage, sensing of environmental parameters and protection from cybersecurity threats. This talk will provide an in-depth study of spintronics and its relation to these novel aspects from device, circuit and system standpoint.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.7.6LARGE SCALE, HIGH DENSITY INTEGRATION OF ALL SPIN LOGIC
Speaker and Author:
Jacques-Olivier Klein, C2N, Univ. Paris-Sud, FR
Abstract
Spintronics brings new features that make it a viable candidate technology to implement non-conventional processing for new computing paradigms in an efficient way. The first milestone of the spintronics roadmap was the fabrication of hybrid systems where the data processing relies mostly on charge-based electronics devices (CMOS), while the memory hierarchy is partially or totally replaced by MRAM. In the next step, spintronics can also be used for data processing, still in conjunction with CMOS. Nevertheless, replacing all the processing by pure spintronic circuits, without any charge current, remains the ultimate objective of spintronics. All spin logic (ASL) paves the way towards that goal, even if some CMOS control circuits are still necessary. However, as ASL does not rely on the same computing principle as CMOS, it is necessary to address some specific issues. Pure spin current propagates in every direction, including backwards in the presence of multiple inputs; and is divided when crossings are encountered. It combines mainly linearly, while logic operations require non-linear binary decisions. Interconnect between logic gates requires directionality from inputs to outputs, and fanout with negligible signal attenuation. In this context, we develop new strategies for ASL modeling and logic design. We propose an architecture and a design strategy based on a high-density array to address the specific issues of directionality, attenuation and linearity. Moreover, the feasibility is supported through the modeling and the simulation of its basic block. This implies modularity to simulate complex circuits, even when they are ahead of today's experimental demonstrations.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

2.8 Enabling ICT Innovations for European SMEs

Date: Tuesday 20 March 2018
Time: 11:30 - 13:00
Location / Room: Exhibition Theatre

Organisers:
Rainer Leupers, RWTH Aachen, DE
Bernd Janson, ZENIT GmbH, DE

Moderator:
Luca Fanucci, University of Pisa, IT

Technology-driven business in Europe fails more often compared to other regions such as the USA and China. Turning results into products is a challenge which starts at the very beginning of an idea for a new technology and obliges researchers to cooperate with business experts and investors. The European Commission started its Smart Anything Everywhere (SAE) Initiative to foster transfer from research to business in the areas of Cyber Physical Systems (CPS) and via the instrument of Digital Innovation Hubs (DIH).
Two SAE Initiative projects will be presented in the workshop session, supported by two speakers from industry presenting their products consisting of programming technology SLX (Silexica) and software for automated driving (BASELABS). The workshop session will introduce the SAE approach and their individual funding schemes for European university-industry cooperation. The technology transfer concept focuses on direct cooperation between universities and SMEs supported by open innovation networks and other stakeholders like investors. The session speakers will demonstrate in a pragmatic way and by use of concrete examples how technology transfer can be initiated and implemented in practice and to overcome the associated pitfalls and use the innovation opportunities. The mix of presentations ensures that both academic and industrial viewpoints and concerns are adequately addressed. The workshop session will hence be of interest to a large audience. Amongst others, the goal is to motivate more stakeholders to engage in international technology transfer.
During the session, SAE representatives will share their experiences and insights as researcher, founder, entrepreneur, investor or consultant.

TimeLabelPresentation Title
Authors
11:302.8.1PRESENTATION OF TETRAMAX
Speaker:
Rainer Leupers, RWTH Aachen, DE
Abstract

TETRAMAX as part of the SAE Initiative started in 2017 and is funded by Horizon 2020. The project supports application experiments between academia and industry (SMEs) related to Internet of Things (IoT) technologies and focusing on customized low energy computing (CLEC).
The project builds on three major activity lines:
(1) Stimulating, organizing, co-funding, and evaluating different types of cross-border Application Experiments, providing "EU added value" via innovative CLEC technologies to first-time users and broad markets in European ICT-related industries,
(2) Building and leveraging a new European CLEC competence center network, offering technology brokerage, one-stop shop assistance and CLEC training to SMEs and mid-caps, and with a clear evolution path towards new regional digital innovation hubs where needed, and
(3) Paving the way towards self-sustainability based on pragmatic and customized long-term business plans
The project impact will be measured based on 50+ performance indicators. The immediate ambition of TETRAMAX within its duration is to support 50+ industry clients and 3rd parties in the entire EU with innovative technologies, leading to an estimated revenue increase of 25 Mio. € based on 50+ new or improved CLEC-based products, 10+ entirely new businesses/SMEs initiated, as well as 30+ new permanent jobs and significant cost and energy savings in product manufacturing. Moreover, in the long term, TETRAMAX will be the trailblazer towards a reinforced, profitable, and sustainable ecosystem infrastructure, providing CLEC competence, services and a continuous innovation stream at European scale, yet with strong regional presence as favoured by SMEs.

11:452.8.2THE FED4SAE PROJECT, ACCELERATING EUROPEAN CPS SOLUTIONS TO MARKET
Speaker:
Isabelle Dor, COMMISSARIAT A L ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES, FR
Abstract

Fed4SAE - accelerating European CPS solutions to market - will boost digitization of European industry by strengthening companies' competitiveness in the CPS market under the Smart Anything Everywhere Initiative. The H2020 project aims at creating a pan-European network of Digital Innovation Hubs (DIH) by leveraging existing regional technology or business ecosystems across complete value chains and multiple competencies. The network within Fed4SAE will enable start-ups, SMEs and mid-cap companies in all sectors to build and create new digital products and services. The project mission also includes innovation management and links these companies to suppliers and investors in order to create innovative CPS solutions and accelerate their development and industrialization.
The Fed4SAE project will fund industrial projects by means of the cascade funding process set by the European Commission and through an adapted, fast-response and agile approach to attract innovative companies. Three open calls haven taken/will take place in 2017 and 2018 in order to support the best projects in accordance with several criteria: innovative solution in terms of technical expertise, mature solution with a good technology readiness level, impacting solution thanks to efficient innovation management.
Companies will be able to use industrial CPS programmable platforms in combination with expertise and know-how from the R&D Advanced Platform according to the application domains. The ultimate goal of each project within Fed4SAE is to provide a complete solution combining hardware and software, available to be tested in the market environment prior to large deployment in the targeted market - this deployment will be supported to enhance the innovation management.

12:002.8.3OPEN INNOVATION BUSINESS BASED ON EFFICIENT NETWORKING
Speaker:
Bernd Janson, ZENIT GmbH, DE
Abstract

Open innovation is based on strong networks between academia and industry. ICT developments depend greatly on open innovation due to short innovation cycles and strong competition. To build an open innovation network which operates in a regional, national and international context was the idea behind the Enterprise Europe Network which started in 2008. The overall aim is to support the competitiveness and innovation capabilities of SMEs in Europe. Today, the Enterprise Europe Network is the largest innovation network in the world. It addresses every need in the whole value chain of the innovation process - from idea to product. Bernd Janson will explain how 600 partners and over 6000 consultants worldwide work together to improve the performance of SMEs in Europe. He also explains the Network's role within Tetramax.

12:152.8.4THE SILEXICA MULTICORE SOLUTION
Speaker:
Juan Eusse, Silexica GmbH, DE
Abstract

Silexica was founded in 2014 as a spin-off from RWTH Aachen University and expanded rapidly to complete an $ 8 million "Series A" round of financing in November 2016. The following year saw Silexica open offices in Silicon Valley and Japan and win multicore solution projects with companies including Denso, Ricoh and Fujitsu.
Silexica continues to work with RWTH Aachen on research projects to develop and enhance SLX, Silexica's programming technology SLX uses state-of-the-art compiler technology and full heterogeneity awareness to support software professionals in the most challenging projects.
Juan Eusse's presentation will talk about the transfer of technology from university to a limited company, forming a strong relationship between both parties and learning from each other.
He will provide an insight into SLX with real industrial examples of how the company and technologies have since developed.

12:302.8.5TRANSFERRING RESEARCH RESULTS TO SAFETY-RELEVANT PRODUCTS: CASE STUDY ON AUTOMATED DRIVING SOFTWARE
Speaker:
Robert Schubert, BASELABS GmbH, DE
Abstract

While the transfer from research to market exploitation is already a challenge in itself, additional tasks arise when it comes to products that are used for safety-critical applications. This case study will focus on the example of BASELABS, a software company which focuses on automated driving. The presentation will highlight the path from research via pre-development towards safety-certified software products and the different business models related to each stage. The objective is to provide insights and best practices helping researchers and entrepreneurs with similar challenges.

12:452.8.6INTENTA GMBH
Speaker:
Basel Fardi, Intenta GmbH, DE
Abstract

Intenta is on the cutting edge of research and development in the fields of image processing, data fusion, and object/person recognition and detection. Intenta's focus is the development and marketing of product lines based on smart sensor technologies.
Founded in 2011, the company has experienced continual growth to over 160 employees in 2017.

13:00End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

UB02 Session 2

Date: Tuesday 20 March 2018
Time: 12:30 - 15:00
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB02.1ARCHON: AN ARCHITECTURE-OPEN RESOURCE-DRIVEN CROSS-LAYER MODELLING FRAMEWORK
Authors:
Fei Xia1, Ashur Rafiev1, Mohammed Al-Hayanni2, Alexei Iliasov1, Rishad Shafik1, Alexander Romanovsky1 and Alex Yakovlev1
1Newcastle University, GB; 2Newcastle University, UK and University of Technology and HCED, IQ
Abstract
This demonstration showcases a modelling method for large complex computing systems focusing on many-core types and concentrating on the crosslayer aspects. The resource-driven models aim to help system designers reason about, analyse, and ultimately design such systems across all conventional computing and communication layers, from application, operating system, down to the finest hardware details. The framework and tool support the notion of selective abstraction and are suitable for studying such non-functional properties such as performance, reliability and energy consumption.

More information ...
UB02.2GENERATING FULL-CUSTOM SCHEMATICS IN A MIXED-SIGNAL TOP-DOWN DESIGN FLOW
Authors:
Tobias Markus1, Markus Mueller2 and Ulrich Bruening1
1University of Heidelberg, DE; 2Extoll GmbH, DE
Abstract
Design time is one of the precious assets in the cycle of hardware design. The top down methodology has been used in digital designs very successfully and now we also apply it for analog and mixed signal designs. Generating most of the structures automatically saves time and avoids errors. A Top Down Design Flow for Mixed Signal Designs is used which generates the schematic structure from the system RNM representation. Since the structural verilog part of the system level design will automatically generate the schematic structure it is only the functional part which is missing and has to be implemented by the analog designer. Some often used blocks can be used as an entry point to partially generate parts of the design in the schematic and furthermore even parts of the layout. We will demonstrate this design method with an example project.

More information ...
UB02.3OISC MULTICORE STENCIL PROCESSOR: ONE INSTRUCTION-SET COMPUTER-BASED MULTICORE PROCESSOR FOR STENCIL COMPUTING
Authors:
Kaoru Saso, Jing Yuan Zhao and Yuko Hara-Azumi, School of Engineering, Tokyo Institute of Technology, JP
Abstract
Subtract and Branch on NEGative with 4 operands (SUBNEG4) is one of One Instruction-Set Computers that execute only one type of instruction. Thanks to its simplicity, SUBNEG4 has only 1/20x circuit area and 1/10x power consumption against MIPS processor. As SUBNEG4 is Turing-complete, it is suitable for parallel computing by multiple cores, while keeping its low-power feature. Our on-going project is seeking for effective use and deployment of SUBNEG4 cores on embedded systems. Our booth will demonstrate the significant speed-up by a SUBNEG4-based many-core processor against a conventional processor, for stencil computing. Our 64-core processor efficiently handles 2D von-Neumann neighborhood stencils, e.g., wave simulation by Verlet integration and 2D Jacobi iteration, to compute 64 points simultaneously. We show that small many-core processors can be realized even with such large number of cores while achieving good speed-up for heaving computation.

More information ...
UB02.4EXPERIENCE-BASED AUTOMATION OF ANALOG IC DESIGN
Authors:
Florian Leber and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous circuit synthesis approaches, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches (also known as "generators"): They (a) emulate experts' decisions, thus (b) make expert knowledge re-usable and (c) can consider all relevant aspects and constraints implicitly. Nowadays, generators are successfully applied in analog layout (Pcells, Pycells). We aim at an entire design flow completely based on procedural automation techniques. This flow will consist of procedures for the generation of schematics and layouts for every typical analog circuit class, such as amplifier, bandgap, filter a.s.o. In our presentation we give an overview on such a design flow and we show an approach for capturing an analog circuit designer's strategy as an executable "expert design plan".

More information ...
UB02.5ABSYNTH: A COMPREHENSIVE APPROACH TO FRONT TO BACK ANALOG BLOCK DESIGN AUTOMATION
Authors:
Abhaya Chandra Kammara S.1, Sidney Pontes-Filho2 and Andreas König2
1ISE, TU Kaiserslautern, DE; 2University of Kaiserslautern, DE
Abstract
ABSYNTH was first presented in CEBIT 2014 where complete, practical circuit sizing approaches have been shown using meta-heuristics on trusted simulators. This tool was then proven by its use in design of several cells in a research project. Here, we present the extension to our nested optimization approach that creates a symmetric and well matched layout in every step for every instance in the population of the swarm, that is extracted in our flow to provide feedback to the cost function impacting on the population update for more viable and robust circuits. The layout optimization presented in this DEMO works with Cadence Layout design tools. Our initial focus is, motivated by Industry 4.0, IoT, on cells for signal conditioning electronics with reconfigurability and Self-X features.[1] Abhaya C. Kammara, L.Palanichamy, and A. König, "Multi-Objective optimization and visualization for analog automation", Complex. Intell. Syst, Springer, DOI 10.1007/s40747-016-0027-3, 2016

More information ...
UB02.6TTOOL/OMC: OPTIMIZED COMPILATION OF EXECUTABLE UML/SYSML DIAGRAMS FOR THE DESIGN OF DATA-FLOW APPLICATIONS
Authors:
Andrea Enrici1, Julien Lallet1, Renaud Pacalet2 and Ludovic Apvrille2
1Nokia Bell Labs, FR; 2Télécom ParisTech, FR
Abstract
Future 5G networks are expected to increase data rates by a factor of 10x. To meet this requirement, baseband stations will be equipped with both programmable (e.g., CPUs, DSPs) and reconfigurable components (e.g., FPGAs). Efficiently programming these architectures is not trivial due to the inner complexity and interactions of these two types of components. This raises the need for unified design flows capable of rapidly partitioning and programming these mixed architectures. Our demonstration will show the complete system-level design and Design Space Exploration, based on UML/SysML diagrams, of a 5G data-link layer receiver, that is partitioned onto both programmable and reconfigurable hardware. We realize an implementation of such a UML/SysML design by compiling it into an executable C application whose memory footprint is optimized with respect to a given scheduling. We will validate the effectiveness of our solution by comparing automated vs manual designs.

More information ...
UB02.7PRIME: PLATFORM- AND APPLICATION-AGNOSTIC RUN-TIME POWER MANAGEMENT OF HETEROGENEOUS EMBEDDED SYSTEMS
Authors:
Domenico Balsamo, Graeme M. Bragg, Charles Leech and Geoff V. Merrett, University of Southampton, GB
Abstract
Increasing energy efficiency and reliability at runtime is a key challenge of heterogeneous many-core systems. We demonstrate how contributions from the PRiME project integrate to enable application- and platform-agnostic runtime management that respects application performance targets. We consider opportunities to enable runtime management across the system stack and we enable cross-layer interactions to trade-off power and reliability with performance and accuracy. We consider a system as three distinct layers, with abstracted communication between them, which enables the direct comparison of different approaches, without requiring specific application or platform knowledge. Application-agnostic runtime management is demonstrated with a selection of runtime managers from PRiME, including linear regression modelling and predictive thermal management, operating across multiple applications. Platform-independent runtime management is demonstrated using two heterogeneous platforms.

More information ...
UB02.8RISC-V PROCESSOR MODELING IN IP-XACT USING KACTUS2
Authors:
Esko Pekkarinen and Timo Hämäläinen, Tampere University of Technology, FI
Abstract
The complexity of modern embedded system design is managed by advanced, high-level design methodologies such as IP-XACT. However, integrating IP-XACT as a part of an existing design flow and packaging legacy sources is too often inhibited by the inherit differences between IP-XACT and the traditional hardware description languages. In this work, we take an existing Verilog implementation of a RISC-V microprocessor and package it with our open-source IP-XACT tool Kactus2. The resulting IP-XACT description will be publicly available and based on the modeling experience we report the observed pitfalls in the transition from HDL to IP-XACT.

More information ...
UB02.9EMBEDDED ACCELERATION OF IMAGE CLASSIFICATION APPLICATIONS FOR STEREO VISION SYSTEMS
Authors:
Mohammad Loni1, Carl Ahlberg2, Masoud Daneshtalab2, Mikael Ekström2 and Mikael Sjödin2
1MDH, SE; 2Mälardalen University, SE
Abstract
Autonomous systems are used in a broad range of applications from indoor utensils to medical application. Stereo vision cameras probably are the most flexible sensing way in these systems since they can extract depth, luminance, color, and shape information. However, stereo vision based applications suffer from huge image sizes, computational complexity and high energy consumption. To tackle these challenges, we first developed GIMME2 [1], a high-throughput, and cost efficient FPGA-based stereo vision system. In the next step, we present a novel approximation accelerator which is also compatible with GIMME2. Our accelerator tries to map neural network (NN) based image classification algorithms to FPGA by using DeepMaker which is an evolutionary based module embed in our accelerator that regenerates a near-optimal NN in term of accuracy. Then, the back-end side of DeepMaker maps the generated NN to FPGA. We will demo a GIMME2-based accelerator for image classification applications.

More information ...
UB02.10CLAVA-MARGOT: CLAVA + MARGOT = C/C++ TO C/C++ COMPILER AND RUNTIME AUTOTUNING FRAMEWORK
Authors:
João Bispo1, Davide Gadioli2, Pedro Pinto1, Emanuele Vitali2, Hamid Arabnejad1, Gianluca Palermo2, Cristina Silvano2, Jorge G. Barbosa1 and João M. P Cardoso1
1Porto University, PT; 2Politecnico di Milano (POLIMI), IT
Abstract
Current computing platforms consist of heterogeneous architectures. To efficiently target those platforms, compilers can be extend with code transformations and insertion of code to interface to runtime autotuning schemes, which tune application parameters according to: the actual execution, target architecture, and workload. We present an approach consisting of a C/C++ source-to-source compiler (Clava) and an autotuner (mARGOt ). They are part of the toolflow of the FET-HPC ANTAREX project and allow parallelization, multiversioning and code transformations in the context of runtime autotuning. mARGOt is an autotuner that allows application adaptation to changing conditions and goals. Clava is a source-to-source compiler to transform C/C++ programs, including code instrumentation and integration with components such as mARGOt. We will demonstrate how to use Clava to integrate the mARGOT autotuner in an example application, and several mARGOt functionalities exposed through a Clava API.

More information ...
15:00End of session
16:00Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

3.1 Executive Session: Design Automation for Quantum Computing

Date: Tuesday 20 March 2018
Time: 14:30 - 16:00
Location / Room: Saal 2

Chair:
Charbon Edoardo, TU Delft / EPFL, NL

Co-Chair:
Große Daniel, University of Bremen, DE

Recent developments in quantum hardware indicate that systems featuring more than 50 physical qubits are within reach. At this scale, classical simulation will no longer be feasible and there is a possibility that such quantum devices may outperform even classical supercomputers at certain tasks. With the rapid growth of qubit numbers and coherence times comes the increasingly difficult challenge of quantum program compilation. This entails the translation of a high-level description of a quantum algorithm to hardware-specific low-level operations which can be carried out by the quantum device. Some parts of the calculation may still be performed manually due to the lack of efficient methods. This, in turn, may lead to a design gap, which will prevent the programming of a quantum computer. In this session, we discuss the challenges in fully-automatic quantum compilation. We motivate directions for future research to tackle these challenges. Yet, with the algorithms and approaches that exist today, we demonstrate how to automatically perform the quantum programming flow from algorithm to a physical quantum computer for a simple algorithmic benchmark, namely the hidden shift problem. We present and use two tool flows which invoke RevKit. One which is based on ProjectQ and which targets the IBM Quantum Experience or a local simulator, and one which is based on Microsoft's quantum programming language Q#.

TimeLabelPresentation Title
Authors
14:303.1.1QUANTUM ALGORITHMS: THE QUEST FOR SCALABLE PROGRAMMING, SYNTHESIS, AND TEST
Author:
Martin Roetteler, Microsoft, US
Abstract
tbd
15:003.1.2PROJECTQ: A SOFTWARE FRAMEWORK FOR PROGRAMMING QUANTUM COMPUTERS
Author:
Thomas Haener, ETHZ, CH
Abstract
tbd
15:303.1.3REVKIT: AUTOMATIC COMPILATION AND DESIGN SPACE EXPLORATION FOR QUANTUM PROGRAMS
Author:
Mathias Soeken, Integrated System Laboratory – EPFL, CH
Abstract
tbd
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

3.2 Approximate and Near-Threshold Computing

Date: Tuesday 20 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 6

Chair:
Semeen Rehman, Vienna University of Technology (TU Wien), AT

Co-Chair:
Saibal Mukhopadhyay, Georgia Tech., US

This session focuses on approximate and near-threshold computing. The first paper proposes a novel dynamic virtual machine (VM) allocation method, while guaranteeing quality of service (QoS) requirements. The second paper introduces and presents an adaptive simulation methodology in which neurons in the region of interest (ROI) follow highly accurate biological models while the other neurons follow computation-friendly models. Finally the last paper shows an approximate computing technique to perform approximate computing with memory, avoiding redundant computation when encountering similar input patterns. The session also includes one IP paper on approximate big data computing.

TimeLabelPresentation Title
Authors
14:303.2.1ENERGY PROPORTIONALITY IN NEAR-THRESHOLD COMPUTING SERVERS AND CLOUD DATA CENTERS: CONSOLIDATING OR NOT?
Speaker:
Ali Pahlevan, Embedded Systems Lab (ESL), EPFL, CH
Authors:
Ali Pahlevan1, Yasir Mahmood Qureshi1, Marina Zapater1, Andrea Bartolini2, Davide Rossi3, Luca Benini2 and David Atienza1
1Embedded Systems Lab (ESL), EPFL, CH; 2Integrated System Laboratory ETH, Zurich, CH; 3Energy Efficient Embedded Systems (EEES) Lab – DEI, University of Bologna, IT
Abstract
Cloud Computing aims to efficiently tackle the increasing demand of computing resources, and its popularity has led to a dramatic increase in the number of computing servers and data centers worldwide. However, as effect of post-Dennard scaling, computing servers have become power-limited, and new system-level approaches must be used to improve their energy efficiency. This paper first presents an accurate power modelling characterization for a new server architecture based on the FD-SOI process technology for near-threshold computing (NTC). Then, we explore the existing energy vs. performance trade-offs when virtualized applications with different CPU utilization and memory footprint characteristics are executed. Finally, based on this analysis, we propose a novel dynamic virtual machine (VM) allocation method that exploits the knowledge of VMs characteristics together with our accurate server power model for next-generation NTC-based data centers, while guaranteeing quality of service (QoS) requirements. Our results demonstrate the inefficiency of current workload consolidation techniques for new NTC-based data center designs, and how our proposed method provides up to 45% energy savings when compared to state-of-the-art consolidation-based approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.2.2LOOKUP TABLE ALLOCATION FOR APPROXIMATE COMPUTING WITH MEMORY UNDER QUALITY CONSTRAINTS
Speaker:
Ye Tian, The Chinese University of Hong Kong, HK
Authors:
Ye Tian, Qian Zhang, Ting Wang and Qiang Xu, The Chinese University of Hong Kong, HK
Abstract
Computation kernels in emerging recognition, mining, and synthesis (RMS) applications are inherently error-resilient, where approximate computing can be applied to improve their energy efficiency by trading off computational effort and output quality. One promising approximate computing technique is to perform approximate computing with memory, which stores a subset of function responses in a lookup table (LUT), and avoids redundant computation when encountering similar input patterns. Limited by the memory space, most existing solutions simply store values for those frequently-appeared input patterns, without considering output quality and/or intrinsic characteristic of the target kernel. In this paper, we propose a novel LUT allocation technique for approximate computing with memory, which is able to dramatically improve the hit rate of LUT and hence achieves significant energy savings under given quality constraints. We also present how to apply the proposed LUT allocation solution for multiple computation kernels. Experimental results show the efficacy of our proposed methodology.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.2.3ACCELERATING BIOPHYSICAL NEURAL NETWORK SIMULATION WITH REGION OF INTEREST BASED APPROXIMATION
Speaker:
Yun Long, Georgia Institute of Technology, US
Authors:
Yun Long, Xueyuan She and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
Modeling the dynamics of biophysical neural network (BNN) is essential to understand brain operation and design cognitive systems. Large-scale and biophysically plausible BNN modeling requires solving multiple-terms, coupled and non-linear differential equations, making simulation computationally complex and memory intensive. This paper presents an adaptive simulation methodology in which neurons in the region of interest (ROI) follow high biological accurate models while the other neurons follow computation friendly models. To enable ROI based approximation, we propose a generic template based computing algorithm which unifies the data structure and computing flow for various neuron models. We implement the algorithms on CPU, GPU and embedded platforms, showing 11x speedup with insignificant loss of biological details in the region of interest.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-5, 491QOR-AWARE POWER CAPPING FOR APPROXIMATE BIG DATA PROCESSING
Speaker:
Sherief Reda, Brown University, US
Authors:
Seyed Morteza Nabavinejad1, Xin Zhan2, Reza Azimi2, Maziar Goudarzi1 and Sherief Reda2
1Sharif University of Technology, IR; 2Brown University, US
Abstract
To limit the peak power consumption of a cluster, a centralized power capping system typically assigns power caps to the individual servers, which are then enforced using local capping controllers. Consequently, the performance and throughput of the servers are affected, and the runtime of jobs is extended as a result. We observe that servers in big data processing clusters often execute big data applications that have different tolerance for approximate results. To mitigate the impact of power capping, we propose a new power-Capping aware resource manager for Approximate Big data processing (CAB) that takes into consideration the minimum Quality-of-Result (QoR) of the jobs. We use industry standard feedback power capping controllers to enforce a power cap quickly, while, simultaneously modifying the resource allocations to various jobs based on their progress rate, target minimum QoR, and the power cap such that the impact of capping on runtime is minimized. Based on the applied cap and the progress rates of jobs, CAB dynamically allocates the computing resources (i.e., number of cores and memory) to the jobs to mitigate the impact of capping on the finish time. We implement CAB in Hadoop-2.7.3 and evaluate its improvement over other methods on a state-of-the-art 28-core Xeon server. We demonstrate that CAB minimizes the impact of power capping on runtime by up to 39.4% while meeting the minimum QoR constraints.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

3.3 Optimization Techniques for MPSoCs

Date: Tuesday 20 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 1

Chair:
Stefan Wildermann, FAU Erlangen-Nuremberg, DE

Co-Chair:
Michael Glass, Ulm University, DE

This session presents innovative techniques for optimizing several aspects in multi-processor/core system design. The first paper proposes an effective and efficient design space exploration technique for designing domain-specific platforms. The second paper proposes a novel task allocation and scheduling scheme to maximize soft-error reliability while satisfying lifetime reliability constraints for soft-real-time MPSoCs. Third paper proposes a virtual resource manager that monitors the access behavior and predicts the node-to-node interconnect performance.

TimeLabelPresentation Title
Authors
14:303.3.1DS-DSE: DOMAIN-SPECIFIC DESIGN SPACE EXPLORATION FOR STREAMING APPLICATIONS
Speaker:
Jinghan Zhang, Northeastern University, US
Authors:
Jinghan Zhang, Hamed Tabkhi and Gunar Schirner, Northeastern University, US
Abstract
Domain-specific computing is promising for high-performance low-power execution of applications with similar functionality. In particular, streaming applications with significant functional and structural similarities can tremendously benefit. However, current Design Space Exploration (DSE) focuses on individual applications in isolation. Hence, much of the domain optimization opportunities are missed. DSE methodologies need to broaden the scope from individual applications in isolation to optimizing across applications within a domain. This paper introduces a novel Domain-Specific DSE (DS-DSE) approach for domain-specific computing with a focus on streaming applications. Key contributions are: (1) a formalized method to extract the functional and structural similarities of domain applications, (2) a novel algorithm for hardware/software partitioning of a domain-specific platform to maximize the throughput across domain applications (under certain constraints) and (3) a methodology to evaluate a domain platform. This paper demonstrates the benefits using 4 domains: OpenVX (vision processing), and 3 synthetic domains (with greater complexity). Our experiments demonstrate a performance improvement (average throughput) of 36.8% for OpenVX and 46.2% for synthetic domains of the DS-DSE generated platform compared to an application-specific platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.3.2VARIATION-AWARE TASK ALLOCATION AND SCHEDULING FOR IMPROVING RELIABILITY OF REAL-TIME MPSOCS
Speaker:
Junlong Zhou, Nanjing University of Science and Technology, CN
Authors:
Junlong Zhou1, Tongquan Wei2, Mingsong Chen2, Xiaobo Sharon Hu3, Yue Ma3, Gongxuan Zhang1 and Jianming Yan4
1Nanjing University of Science and Technology, CN; 2East China Normal University, CN; 3University of Notre Dame, US; 4Meituan.com Corporation, CN
Abstract
Both soft-error reliability (SER) due to transient faults and lifetime reliability (LTR) due to permanent faults are key concerns in real-time MPSoCs. Existing works have investigated related problems, however, most of them only focus on one of the two reliability concerns. A few efforts do consider both types of reliability together, but ignore the impacts of hardware- and application-level variations on reliability, thus are not applicable to state-of-the-art MPSoCs under variations. In this paper, we focus on increasing SER without sacrificing LTR since transient faults occur much more frequently than permanent faults. Specifically, we propose a novel task allocation and scheduling scheme to maximize SER while satisfying a LTR constraint for soft real-time MPSoCs. Considering that SER is the objective while LTR is a constraint in our problem, and LTR is highly related to core temperature profiles, we dedicate to investigating the effects of variations in core soft-error rate, task vulnerability to soft errors, and task execution time on SER. To the best of our knowledge, our work is the first attempt that jointly handles the two reliability issues as well as taking into account the effects of variations on reliability. Experimental results show that our scheme improves the SER by up to 66% as compared to a number of representative existing approaches while meeting the same LTR constraint.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.3.3TOPOLOGY-AWARE VIRTUAL RESOURCE MANAGEMENT FOR HETEROGENEOUS MULTICORE SYSTEMS
Speaker:
Jianmin Qian, Shanghai Jiaotong University, CN
Authors:
Jianmin Qian, Jian Li and Ruhui Ma, Shanghai Jiao Tong University, CN
Abstract
Virtualization technology consolidates multiple independent workloads on a single physical server with virtual resource management, which can result in a significant utilization improvement and energy saving. However, the management of virtual resource is becoming more and more challenging, due to the lack of accurate performance prediction model for the diverse applications' irregular resource access behaviors as well as the complicated Non-Uniform Memory Access (NUMA) server architecture. These challenges drastically affect the overall consolidation performance. This paper proposes vTRMS, a runtime Topology-aware virtual Resource Management Scheme for heterogeneous NUMA multicore systems. vTRMS can improve application performance based on the comprehensive online monitor of the application resource access behaviors as well as an accurate and platform-independent detected NUMA topology metric. Experiment results show that, compared with state-of-art approach, vTRMS can bring up an average throughput improvement of 28.3% and 36.2% on Intel and AMD NUMA machines respectively, when consolidating 32-VMs. At the same time, vTRMS only incurs a runtime overhead no more than 5%.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-6, 261EXACT MULTI-OBJECTIVE DESIGN SPACE EXPLORATION USING ASPMT
Speaker:
Kai Neubauer, University of Rostock, DE
Authors:
Kai Neubauer1, Philipp Wanko2, Torsten Schaub2 and Christian Haubelt1
1University of Rostock, DE; 2University of Potsdam, DE
Abstract
An efficient Design Space Exploration (DSE) is imperative for the design of modern, highly complex embedded systems in order to steer the development towards optimal design points. The early evaluation of design decisions at system-level abstraction layer helps to find promising regions for subsequent development steps in lower abstraction levels by diminishing the complexity of the search problem. In recent works, symbolic techniques, especially Answer Set Programming (ASP) modulo Theories (ASPmT), have been shown to find feasible solutions of highly complex system-level synthesis problems with non-linear constraints very efficiently. In this paper, we present a novel approach to a holistic system-level DSE based on ASPmT. To this end, we include additional background theories that concurrently guarantee compliance with hard constraints and perform the simultaneous optimization of several design objectives. We implement and compare our approach with a state-of-the-art preference handling framework for ASP. Experimental results indicate that our proposed method produces better solutions with respect to both diversity and convergence to the true Pareto front.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

3.4 Optimizing Computing with Neuromorphic Architectures and Accelerators

Date: Tuesday 20 March 2018
Time: 14:30 - 15:30
Location / Room: Konf. 2

Chair:
Dimitrios Soudris, NTUA, GR

Co-Chair:
Ioana Vatajelu, University of Grenoble–Alpes, TIMA Laboratory, FR

Creating performance and power efficient acceleration techniques is a major challenge. In this session, various approaches are presented toward this direction for neural network applications and GPUs. A wide range of optimization techniques are discussed, including application-level optimizations, system-level solutions, matrix optimizations, and accuracy vs. computations trade-offs.

TimeLabelPresentation Title
Authors
14:303.4.1STRUCTURE OPTIMIZATIONS OF NEUROMORPHIC COMPUTING ARCHITECTURES FOR DEEP NEURAL NETWORKS
Speaker:
Heechun Park, SNUCAD, KR
Authors:
Heechun Park and Taewhan Kim, Seoul National University, KR
Abstract
This work addresses a new structure optimization of neuromorphic computing architectures. This enables to speed up the DNN (deep neural network) computation twice as fast as, theoretically, that of the existing architectures. Precisely, we propose a new structural technique of mixing both of the dendritic and axonal based neuromorphic cores in a way to totally eliminate the inherent non-zero waiting time between cores in the DNN implementation. In addition, in conjunction with the new architecture we propose a technique of maximally utilizing computation units so that the resource overhead of total computation units can be minimized. We have provided a set of experimental data to demonstrate the effectiveness (i.e., speed and area) of our proposed architectural optimizations: ~2x speedup with no accuracy penalty on the neuromorphic computation or improved accuracy with no additional computation time.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.4.2CCR: A CONCISE CONVOLUTION RULE FOR SPARSE NEURAL NETWORK ACCELERATORS
Speaker:
Jiajun Li, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Jiajun Li, Guihai Yan, Wenyan Lu, Shuhao Jiang, Shijun Gong, Jingya Wu and Xiaowei Li, Institute of Computing Technology, Chinese Academy of Sciences, CN
Abstract
Convolutional Neural networks (CNNs) have achieved great success in a broad range of applications. As CNN-based methods are often both computation and memory intensive, sparse CNNs have emerged as an effective solution to reduce the amount of computation and memory accesses while maintaining the high accuracy. However, dense CNN accelerators can hardly benefit from the reduction of computations and memory accesses due to the lack of support for irregular and sparse models. This paper proposed a concise convolution rule (CCR) to diminish the gap between sparse CNNs and dense CNN accelerators. CCR transforms a sparse convolution into multiple effective and ineffective ones. The ineffective convolutions in which either the neurons or synapses are all zeros do not contribute to the final results and the computations and memory accesses can be eliminated. The effective convolutions in which both the neurons and synapses are dense can be easily mapped to the existing dense CNN accelerators. Unlike prior approaches which trade complexity for flexibility, CCR advocates a novel approach to reaping the benefits from the reduction of computation and memory accesses as well as the acceleration of the existing dense architectures without intrusive PE modifications. As a case study, we implemented a sparse CNN accelerator, SparseK, following the rationale of CCR. The experiments show that SparseK achieved a speedup of $2.9imes$ on VGG16 compared to a comparably provisioned dense architecture. Compared with state-of-the-art sparse accelerators, SparseK can improve the performance and energy efficiency by 1.8x and 1.5x, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP1-7, 273HIPE: HMC INSTRUCTION PREDICATION EXTENSION APPLIED ON DATABASE PROCESSING
Speaker:
Diego Tomé, Centrum Wiskunde & Informatica (CWI), BR
Authors:
Diego Gomes Tomé1, Paulo Cesar Santos2, Luigi Carro2, Eduardo Cunha de Almeida3 and Marco Antonio Zanata Alves3
1Federal University of Paraná, BR; 2UFRGS, BR; 3UFPR, BR
Abstract
The recent Hybrid Memory Cube (HMC) is a smart memory which includes functional units inside one logic layer of the 3D stacked memory design. In order to execute instructions inside the Hybrid Memory Cube (HMC), the processor needs to send instructions to be executed near data, keeping most of the pipeline complexity inside the processor. Thus, control-flow and data-flow dependencies are all managed inside the processor, in such way that only update instructions are supported by the HMC. In order to solve data-flow dependencies inside the memory, previous work proposed HMC Instruction Vector Extensions(HIVE), which embeds a high number of functional units with an interlock register bank. In this work, we propose HMC Instruction Prediction Extensions (HIPE), that supports predicated execution inside the memory, in order to transform control-flow dependencies into data-flow dependencies. Our mechanism focuses on removing the high latency iteration between the processor and the smart memory during the execution of branches that depends on data processed inside the memory. In this paper, we evaluate a balanced design of HIVE comparing to x86 and HMC executions.After we show the HIPE mechanism results when executing a database workload, which is a strong candidate to use smart memories. We show interesting trade-offs of performance when comparing our mechanism to previous work.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
16:00Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

3.5 Memory Reliability

Date: Tuesday 20 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 3

Chair:
Jose Pineda, NXP, NL

Co-Chair:
Mehdi Tahoori, Karlsruhe Institute of Technology, DE

This session discusses reliability issues for different on-chip and off-chip memory technologies. The first paper uses important sampling to reduce the number of Monte Carlo simulations to obtain failure rates for advanced SRAM memories. The second paper performs degradation analysis for FinFET memories. The third paper discusses reliability issues for solid state memories.

TimeLabelPresentation Title
Authors
14:303.5.1GRADIENT IMPORTANCE SAMPLING: AN EFFICIENT STATISTICAL EXTRACTION METHODOLOGY OF HIGH-SIGMA SRAM DYNAMIC CHARACTERISTICS
Speaker:
Thomas Haine, Université catholique de Louvain, BE
Authors:
Thomas Haine1, Johan Segers2, Denis Flandre2 and David Bol2
1université catholique de louvain, BE; 2Université catholique de Louvain, BE
Abstract
The impact of within-die transistor variability has increased with CMOS technology scaling up to the point where it has emerged as a systematic problem for the designer. Estimating extremely low failure rate, i.e. "high-sigma" probabilities, by the conventional Monte Carlo (MC) approach requires millions of simulation runs, making it an impractical approach for circuit designers. To overcome this problem, alternative failure estimation methodologies, which require a smaller number of runs have been proposed. In this paper, we propose a novel methodology called "gradient importance sampling" (GIS) for fast statistical extraction of high-sigma circuit characteristics. It is based on conventional Importance Sampling combined with a gradient-based approach to find the most probable failure point (MPFP). By applying GIS to extract SRAM dynamic characteristics in 28nm FDSOI CMOS, we show that the proposed methodology is straightforward, computationally efficient and the results are in line with those obtained via standard MC. To the best of our knowledge, the GIS results are the best in their class for low failure rate estimation.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.5.2DEGRADATION ANALYSIS OF HIGH PERFORMANCE 14NM FINFET SRAM
Speaker:
Daniel Kraak, Delft University of Technology, NL
Authors:
Daniel Kraak1, Innocent Agbo1, Mottaqiallah Taouil1, Said Hamdioui1, Pieter Weckx2, Stefan Cosemans2 and Francky Catthoor2
1Delft University of Technology, NL; 2imec vzw., BE
Abstract
Memory designs usually add design margins to compensate for chip aging; this may lead to yield and performance loss (in case of overestimation) or reduced reliability (in case of underestimation). This paper analyzes the impact of aging on cutting edge high performance 14nm FinFET SRAM using a calibrated aging model; it does not only analyze the impact of the SRAM's components individually, as it is the case in prior work, but it also investigates the contribution of the interaction of these components while considering different workloads; both the overall metric of the memory (i.e., the access time) as well as metrics of individual components (e.g., sensing delay for the sense amplifier) are examined. The results show that it is crucial to consider not only the aging of all individual components, but also their interaction in order to provide accurate prediction of aging effects; considering only aging of single/individual components leads to either too optimistic or pessimistic results. For example, using our approach (which includes the components interaction) results approximately in 9.1% degradation of memory access time (for three years of aging), while using the traditional approach (based on adding the impact of individual components) results in 7.3% increase only; a relative difference of 25%, for which the timing and the address decoder components are the main contributors. With respect to individual components, the sense amplifier is the most fragile one (e.g., its offset voltage spec. degrades up to 58%.)

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.5.3INVESTIGATING POWER OUTAGE EFFECTS ON RELIABILITY OF SOLID-STATE DRIVES
Speaker:
Hossein Asadi, Sharif University of Technology, IR
Authors:
Saba Ahmadian, Farhad Taheri, Mehrshad Lotfi, Maryam Karimi and Hossein Asadi, Sharif University of Technology, IR
Abstract
Solid-State Drives (SSDs) are recently employed in enterprise servers and high-end storage systems in order to enhance performance of storage subsystem. Although employing high speed SSDs in the storage subsystems can significantly improve system performance, it comes with significant reliability threat for write operations upon power failures. In this paper, we present a comprehensive analysis investigating the impact of workload dependent parameters on the reliability of SSDs under power failure for variety of SSDs (from top manufacturers). To this end, we first develop a platform to perform two important features required for study: a) a realistic fault injection into the SSD in the computing systems and b) data loss detection mechanism on the SSD upon power failure. In the proposed physical fault injection platform, SSDs experience a real discharge phase of Power Supply Unit (PSU) that occurs during power failure in data centers which was neglected in previous studies. The impact of workload dependent parameters such as workload Working Set Size (WSS), request size, request type, access pattern, and sequence of accesses on the failure of SSDs is carefully studied in the presence of realistic power failures. Experimental results over thousands number of fault injections show that data loss occurs even after completion of the request (up to 700ms) where the failure rate is influenced by the type, size, access pattern, and sequence of IO accesses while other parameters such as workload WSS has no impact on the failure of SSDs.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-8, 241PARAMETRIC FAILURE MODELING AND YIELD ANALYSIS FOR STT-MRAM
Speaker:
Sarath Mohanachandran Nair, Karlsruhe Institute of Technology, DE
Authors:
Sarath Mohanachandran Nair, Rajendra Bishnoi and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
The emerging Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising candidate to replace conventional on-chip memory technologies due to its advantages such as non-volatility, high density, scalability and unlimited endurance. However, as the technology scales, yield loss due to extreme parametric variations is becoming a major challenge for STT-MRAM because of its higher sensitivity to process variations as compared to CMOS memories. In addition, the parametric variations in STT-MRAM exacerbates its stochastic switching behavior, leading to both test time fails and reliability failures in the field. Since an STT-MRAM memory array consists of both CMOS and magnetic components, it is important to consider variations in both these components to obtain the failures at the system level. In this work, we model the parametric failures of STT-MRAM at the system level considering the correlation among bit-cells as well as the impact of peripheral components. The proposed approach provides realistic fault distribution maps and equips the designer to investigate the efficacy of different combinations of defect tolerance techniques for an effective design-for-yield exploration.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

3.6 Real-time Multiprocessing

Date: Tuesday 20 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 4

Chair:
Jian-Jia Chen, TU Dortmund, DE

Co-Chair:
Rolf Ernst, TU Braunschweig, DE

The session details on various aspects of real-time mutiprocessors, where special focus is put on workload-aware scheduling, Network-on-Chips, security and synchronization constraints. The first paper improves the overall schedulability by strategically arranging the workload among processors. The second paper reduces the pessimism in the analysis of NoC. The third paper considers security-related workloads whilst maintaining feasibility of schedules. The fourth paper presents an implementation of SDF graphs by means of OS-synchronization primitives.

TimeLabelPresentation Title
Authors
14:303.6.1WORKLOAD-AWARE HARMONIC PARTITIONED SCHEDULING FOR PROBABILISTIC REAL-TIME SYSTEMS
Speaker:
Jiankang Ren, Dalian University of Technology, CN
Authors:
Jiankang Ren, Ran Bi, Xiaoyan Su, Qian Liu, Guowei Wu and Guozhen Tan, Dalian University of Technology, CN
Abstract
Multiprocessor platforms, widely adopted to realize real-time systems nowadays, bring the probabilistic characteristic to such systems because of the performance variations of complex chips. In this paper, we present a harmonic partitioned scheduling scheme with workload awareness for periodic probabilistic real-time tasks on multiprocessors under the fixed-priority preemptive scheduling policy. The key idea of this research is to improve the overall schedulability by strategically arranging the workload among processors based on the exploration of the harmonic relationship among probabilistic real-time tasks. In particular, we define a harmonic index to quantify the harmonicity among probabilistic real-time tasks. This index can be obtained via the harmonic period transformation and probabilistic cumulative worst case utilization calculation of these tasks. The proposed scheduling scheme first sorts tasks with respect to the workload, then packs them to processors one by one aiming at minimizing the increase of harmonic index caused by the task assignment. Experiments with randomly generated task sets show significant performance improvement of our proposed approach over the existing harmonic partitioned scheduling algorithm for probabilistic real-time systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:003.6.2(Best Paper Award Candidate)
BUFFER-AWARE BOUNDS TO MULTI-POINT PROGRESSIVE BLOCKING IN PRIORITY-PREEMPTIVE NOCS
Speaker:
Leandro Indrusiak, University of York, GB
Authors:
Leandro Indrusiak1, Alan Burns1 and Borislav Nikolic2
1University of York, GB; 2CISTER/INESC-TEC, ISEP, IPP, PT
Abstract
This paper aims to reduce the pessimism of the analysis of the multi-point progressive blocking (MPB) problem in real-time priority-preemptive wormhole networks-on-chip. It shows that the amount of buffering on each network node can influence the worst-case interference that packets can suffer along their routes, and it proposes a novel analytical model that can quantify such interference as a function of the buffer size. It shows that, perhaps counter-intuitively, smaller buffers can result in lower upper-bounds on interference and thus improved schedulability. Didactic examples and large-scale experiments provide evidence of the strength of the proposed approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:303.6.3A DESIGN-SPACE EXPLORATION FOR ALLOCATING SECURITY TASKS IN MULTICORE REAL-TIME SYSTEMS
Speaker:
Monowar Hasan, University of Illinois, BD
Authors:
Monowar Hasan1, Sibin Mohan1, Rodolfo Pellizzoni2 and Rakesh Bobba3
1University of Illinois at Urbana-Champaign, US; 2University of Waterloo, CA; 3Oregon State University, US
Abstract
The increased capabilities of modern real-time systems (RTS) introduce more security threats. Recently, frameworks that integrate security tasks without perturbing the real-time tasks have been proposed, but they only target single core systems. However, modern RTS are migrating towards multicore platforms. This makes the problem of integrating security mechanisms more complex, as designers now have multiple choices for where to allocate the security tasks. In this paper we propose Hydra, a design space exploration algorithm that finds an allocation of security tasks into existing (viz., legacy) multicore RTS using the concept of opportunistic execution. Hydra allows security tasks to operate with existing real-time tasks without perturbing system parameters or normal execution patterns, while still meeting the desired monitoring frequency for intrusion detection. Our evaluation using a representative real-time control system (along with synthetic tasksets for a broader design space exploration) illustrates the efficacy of the proposed mechanism.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:453.6.4DESIGN AND ANALYSIS OF SEMAPHORE PRECEDENCE CONSTRAINTS: A MODEL-BASED APPROACH FOR DETERMINISTIC COMMUNICATIONS
Speaker:
Yassine Ouhammou, LIAS / ENSMA & University of Poitiers, FR
Authors:
Thanh-Dat Nguyen1, Yassine OUHAMMOU1, Emmanuel GROLLEAU1, Julien Forget2, Claire Pagetti3 and Pascal RICHARD1
1LIAS/ENSMA, FR; 2LIFL/University of Lille1, FR; 3ONERA / DTIM, FR
Abstract
Architecture Analysis and Design Language (AADL) is a standard in avionics system design. However, the communication patterns provided by AADL are not sufficient to the current context of Real-Time Embedded System (RTES) in which some multi-periodic communication patterns may occur. We propose an extension of a precedence model between tasks of different periods (multiperiodic communication). This relies on the Semaphore Precedence Constraint (SPC) model that is inspired from the concept of Semaphore, and more specifically on the m−n producer/consumer paradigm. We reinforce the SPC semantics by allowing cycles in the precedence graph. We also present another viewpoint on the periodicity of tasks system using SPC based on a graph apart from the encoding technique presented in the SPC seminal work. An implementation of SPC in AADL and its associated analysis tool are also provided to study the temporal behaviour of systems using SPC.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP1-9, 80ONE-WAY SHARED MEMORY
Speaker and Author:
Martin Schoeberl, Technical University of Denmark, DK
Abstract
Standard multicore processors use the shared main memory via the on-chip caches for communication between cores. However, this form of communication has two limitations: (1) it is hardly time-predictable and therefore not a good solution for real-time systems and (2) this single shared memory is a bottleneck in the system. This paper presents a communication architecture for time-predictable multicore systems where core-local memories are distributed on the chip. A network-on-chip constantly copies data from a sender core-local memory to a receiver core-local memory. As this copying is performed in one direction we call this architecture a one-way shared memory. With the use of time-division multiplexing for the memory accesses and the network-on-chip routers we achieve a time-predictable solution where the communication latency and bandwidth can be bounded. An example architecture for a 3x3 core processor and 32-bit wide links and memory ports provides a cumulative bandwidth of 29 bytes per clock cycle. Furthermore, the evaluation shows that this architecture, due to its simplicity, is small compared to other network-on-chip solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

3.8 Innovative Products for Autonomous Driving (part 1)

Date: Tuesday 20 March 2018
Time: 14:30 - 16:00
Location / Room: Exhibition Theatre

Organiser:
Hans-Jürgen Brand, IDT/ZMDI, DE

The workshop on Innovative Products for Autonomous Driving includes 2 sessions (part 2:session 6.8). This session will highlight how to design functional safety products, how 5G will enable connected cars and foundry solutions for manufacturing chips for autonomous driving.

TimeLabelPresentation Title
Authors
14:303.8.1MICROELECTRONICS-DRIVEN INNOVATION IN MOBILITY
Speakers:
Christian Wolf1 and Hans-Jürgen Brand2
1IDT Europe GmbH, DE; 2IDT/ZMDI, DE
Abstract

The 2nd car and mobility revolution is predominantly enabled by innovative semiconductor products. The new megatrends in mobility such as vehicle electrification, vehicle connectivity and autonomous driving are creating a diversifying demand for new automotive semiconductors. The presentation will show the major trends in this area and also focus on approaches how Functional Safety - a key requirement for automotive semiconductors - can be handled in the design process of those products.

15:003.8.25G CONNECTED CARS
Speakers:
Stanislav Mudriievskyi and Vincent Latzko, Technical University Dresden, DE
Abstract

While autonomous driving already promises more comfort and safety, connected driving makes it possible to use the new strategies to improve the safety of road traffic, significantly reduce CO2 emissions and increase traffic efficiency. Additional 5G networking possibilities will remove the fundamental limitation of today's autonomous approaches that are used for controlling the vehicle by means of the onboard installed sensors only. It will be possible to use the information gained by the sensors of all neighbor vehicles as well as the environment or the existing infrastructure (e. g. surveillance cameras at crossroads, highways, geolocal weather sensors, etc.). All these can be virtually merged in the network, resulting in better decision-making.

15:303.8.3FOUNDRY SOLUTIONS FOR AUTONOMOUS DRIVING
Speaker:
Alexander Muffler, X-Fab Semiconductor Foundries AG, DE
Abstract

ICs developed for autonomous driving do not only have to be designed according respective ASIL levels. They also need to be based and manufactured on highly reliable semiconductor processes.
X-FAB is the leading analog/mixed-signal and MEMS foundry group manufacturing silicon wafers for the automotive market with a track record of more than 25 years as automotive foundry. Already at process development, the clear focus is on highly reliable semiconductor processes. This also applies to IPs such as Flash, EEPROM, digital IPs like RAM and ROM, all for high temperature application up to 175°C junction temperature. The PDKs - which are the interface between IC designers and the silicon - are developed by X-FAB with the target to give chip designers all necessary tools on hand to enable first-time-right mixed-signal IC development for harsh environments. X-FAB does not only provide simulation models which behave as close as possible to the real silicon, but also tools for lifetime calculation based on user defined mission profiles. X-FAB also develops NVW IPs explicitly targeting automotive needs. These IPs include for example error correction and detection modes and are especially suited for autonomous driving cars.

16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

UB03 Session 3

Date: Tuesday 20 March 2018
Time: 15:00 - 17:30
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB03.1TOPOLINANO & MAGCAD: A DESIGN AND SIMULATION FRAMEWORK FOR THE EXPLORATION OF EMERGING TECHNOLOGIES
Authors:
Umberto Garlando and Fabrizio Riente, Politecnico di Torino, IT
Abstract
We developed a design framework that enables the exploration and analysis of emerging beyond-CMOS technologies. It is composed of two powerful tools: ToPoliNano and MagCAD. Different technologies are supported, and new ones could be added thanks to their modular structure. ToPoliNano starts from a VHDL description of a circuit and performs the place&route following the technological constraints. The resulting circuit can be simulated both at logical or physical level. MagCAD is a layout editor where the user can design custom circuits, by placing basic elements of the selected technology. The tool can extract a VHDL netlist based on compact models of placed elements derived from experiments or physical simulations. Circuits can be verified with standard VHDL simulators. The design workflow will be demonstrated at the U-booth to show how those tools could be a valuable help in the studying and development of emerging technologies and to obtain feedbacks from the scientific community.

More information ...
UB03.2GENERATING FULL-CUSTOM SCHEMATICS IN A MIXED-SIGNAL TOP-DOWN DESIGN FLOW
Authors:
Tobias Markus1, Markus Mueller2 and Ulrich Bruening1
1University of Heidelberg, DE; 2Extoll GmbH, DE
Abstract
Design time is one of the precious assets in the cycle of hardware design. The top down methodology has been used in digital designs very successfully and now we also apply it for analog and mixed signal designs. Generating most of the structures automatically saves time and avoids errors. A Top Down Design Flow for Mixed Signal Designs is used which generates the schematic structure from the system RNM representation. Since the structural verilog part of the system level design will automatically generate the schematic structure it is only the functional part which is missing and has to be implemented by the analog designer. Some often used blocks can be used as an entry point to partially generate parts of the design in the schematic and furthermore even parts of the layout. We will demonstrate this design method with an example project.

More information ...
UB03.3DISGUISING THE INTERCONNECTS: EFFICIENT PROTECTION OF DESIGN IP
Authors:
Johann Knechtel1, Satwik Patnaik2, Mohammed Ashraf3 and Ozgur Sinanoglu3
1NYU Abu Dhabi, AE; 2New York University, US; 3New York University Abu Dhabi, AE
Abstract
Ensuring the trustworthiness and security of electronics has become an urgent challenge in recent years. Among various concerns, the protection of design intellectual property (IP) is to be addressed, due to outsourcing trends for the manufacturing supply chain and malicious end-user. In other words, adversaries either residing in the off-shore fab or in the field may want to obtain and pirate the design IP. As classical design tools do not consider such threats, there is clearly a need for security-aware EDA techniques. Here we present novel but proven techniques for efficient protection of design IP, embedded in an industrial-level design flow using Cadence Innovus. The key idea in our work is that disguising the interconnects is supremely suitable to protect design IP, while inducing only little additional cost and providing strong resilience. We share our customized libraries with the community, and we demonstrate our design flow and its security measures.

More information ...
UB03.4HARDENING THE HARDWARE: A REVERSE-ENGINEERING RESILIENT SECURE CHIP
Authors:
Abhrajit Sengupta1, Muhammad Yasin2, Mohammed Nabeel3, Mohammed Ashraf3, Jeyavijayan Rajendran4 and Ozgur Sinanoglu3
1New York University, AE; 2New York University, US; 3New York University Abu Dhabi, AE; 4Texas A&M, US
Abstract
With the globalization of integrated circuit (IC) supply chain, the semi-conductor industry is facing a number of threats, such as Intellectual Property (IP) piracy, hardware Trojans, and counterfeiting. To defend against such threats at the hardware level, logic locking was proposed as a promising countermeasure. Yet, several recent attacks have completely undermined its security by successfully retrieving the secret key. Here, we present stripped-functionality logic locking (SFLL), which resists all existing attacks by hiding a part of the functionality in the form of a secret key. We leverage security-aware synthesis to develop a computer-aided design (CAD) framework that meets the desired security criterion at a minimal cost of 5%, 0.5%, and 8% for power, performance, and area, respectively. Moreover, we taped out a chip, the first such prototype of its kind, by applying our technique on an industry level processor, namely, ARM Cortex-M0 microprocessor in 65$nm$ technology.

More information ...
UB03.5RECONFIGURABLE SELF-TIMED DATAFLOW ACCELERATOR
Authors:
Danil Sokolov, Alessandro de Gennaro and Andrey Mokhov, Newcastle University, GB
Abstract
Many applications require reconfigurable pipelines to handle incoming data items differently depending on their values or the operating mode. Currently, reconfigurable synchronous pipelines are the mainstream of dataflow accelerators. However, there are certain advantages to be gained from self-timed dataflow processing, e.g. robustness to unstable power supply, data-dependent performance, etc. To become attractive for industry, reconfigurable asynchronous pipelines need a formal behavioural model and design automation. This demo will present a design flow for the specification, verification and synthesis of reconfigurable self-timed pipelines using Dataflow Structure formalism in Workcraft(https://workcraft.org/). As a case study we will use an asynchronous accelerator for Ordinal Pattern Encoding(OPE) with reconfigurable pipeline depth. We will exhibit the resultant OPE chip fabricated in TSMC90nm to show the benefits of reconfigurability and asynchrony for dataflow processing.

More information ...
UB03.6SPANNER: SELF-REPAIRING SPIKING NEURAL NETWORK CONTROLLER FOR AN AUTONOMOUS ROBOT
Authors:
Alan Millard1, Anju Johnson1, James Hilder1, David Halliday1, Andy Tyrrell1, Jon Timmis1, Junxiu Liu2, Shvan Karim2, Jim Harkin2 and Liam McDaid2
1University of York, GB; 2Ulster University, GB
Abstract
The human brain is remarkably resilient, and is able to self-repair following injury or a stroke. In contrast, electronic systems typically exhibit limited self-repair capabilities, and cannot recover from faults. We demonstrate a bio-inspired approach to self-repair that allows an autonomous robot to recover from faults in its artificial 'brain'. Astrocytes are support cells in the human brain that interact with neurons to regulate synaptic activity. We have modelled this interaction to create a spiking neural network that can self-repair when synapses between neurons are damaged, by strengthening redundant pathways. We demonstrate a robot platform controlled by a self-repairing spiking neural network that is implemented on an FPGA. We demonstrate that injecting faults into the synapses of the network initially causes the robot to behave erratically, but that the neural controller is able to automatically repair itself, thus allowing the robot to resume normal function.

More information ...
UB03.7T-CREST: THE OPEN-SOURCE REAL-TIME MULTICORE PROCESSOR
Authors:
Martin Schoeberl, Luca Pezzarossa and Jens Sparsø, Technical University of Denmark, DK
Abstract
Future real-time systems, such as advanced control systems or real-time image recognition, need more powerful processors, but still a system where the worst-case execution time (WCET) can be statically predicted. Multicore processors are one answer to the need for more processing power. However, it is still an open research question how to best organize and implement time-predictable communication between processing cores. T-CREST is an open-source multicore processor for research on time-predictable computer architecture. In consists of several Patmos processors connected by various time-predictable communication structures: access to shared off-chip, access to shared on-chip memory, and the Argo network-on-chip for fast inter-processor communication. T-CREST is supported by open-source development tools, such as compilation and WCET analysis. To best of our knowledge, T-CREST is the only fully open-source architecture for research on future real-time multicore architectures.

More information ...
UB03.8FPGA-BASED HARDWARE ACCELERATOR FOR DRUG DISCOVERY
Authors:
Ghaith Tarawneh, Alessandro de Gennaro, Georgy Lukyanov and Andrey Mokhov, Newcastle University, GB
Abstract
We present an FPGA-based hardware accelerator for drug discovery, developed during the EPSRC programme grant POETS (EP/N031768/1) in partnership with e-Therapeutics, an Oxford based drug discovery company. e-Therapeutics is pioneering a novel form of drug discovery based on analyzing protein interactome networks (https://www.youtube.com/watch?v=wQFpTtuzrgA). This approach can discover suitable drug candidates much more efficiently compared to wet lab testing but requires considerable computing power, particularly because commodity computers are generally inefficient at analyzing large-scale networks. The presented accelerator, consisting of an FPGA board with a silicon-mapped protein interactome plus accompanying software formalisms and tools, can deliver a 1000x speed up in this application compared to software running on commodity computers. We will showcase demos in which we run in-silico analysis of protein interactomes to test drug effects and visualize the results in real-time.

More information ...
UB03.9CIJTAG: CONCURRENT IJTAG DEMONSTRATOR
Author:
Krenz-Baath René, Hamm-Lippstadt University of Applied Sciences, DE
Abstract
The flexibility of on-chip instrument access enabled by IEEE 1687 (IJTAG) has shown tremendous improvements in modern industrial designs. Due to a constantly increasing spektrum of tasks performed through 1687 networks such as performing test operations during production test, on-line test operations as well as operating health monitors the test requirements in modern designs increase dramatically with respect to test performance, responsiveness and low power. These requirements have a major impact on the design of such test infrastructures. In complex designs with large test infrastructures it might be challenging to comply with the large spectrum of requirements. Concurrent IJTAG is novel partitioning concept to a reconfigurable test infrastructure in order to enable an independent operation of different sections of the test infrastructure. The proposed demonstrator shows the first FPGA-based implementation of concurrent IJTAG test infrastructures.

More information ...
UB03.10CLAVA-MARGOT: CLAVA + MARGOT = C/C++ TO C/C++ COMPILER AND RUNTIME AUTOTUNING FRAMEWORK
Authors:
João Bispo1, Davide Gadioli2, Pedro Pinto1, Emanuele Vitali2, Hamid Arabnejad1, Gianluca Palermo2, Cristina Silvano2, Jorge G. Barbosa1 and João M. P Cardoso1
1Porto University, PT; 2Politecnico di Milano (POLIMI), IT
Abstract
Current computing platforms consist of heterogeneous architectures. To efficiently target those platforms, compilers can be extend with code transformations and insertion of code to interface to runtime autotuning schemes, which tune application parameters according to: the actual execution, target architecture, and workload. We present an approach consisting of a C/C++ source-to-source compiler (Clava) and an autotuner (mARGOt ). They are part of the toolflow of the FET-HPC ANTAREX project and allow parallelization, multiversioning and code transformations in the context of runtime autotuning. mARGOt is an autotuner that allows application adaptation to changing conditions and goals. Clava is a source-to-source compiler to transform C/C++ programs, including code instrumentation and integration with components such as mARGOt. We will demonstrate how to use Clava to integrate the mARGOT autotuner in an example application, and several mARGOt functionalities exposed through a Clava API.

More information ...
17:30End of session
18:30Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

IP1 Interactive Presentations

Date: Tuesday 20 March 2018
Time: 16:00 - 16:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP1-1RECOM: AN EFFICIENT RESISTIVE ACCELERATOR FOR COMPRESSED DEEP NEURAL NETWORKS
Speaker:
Houxiang Ji, Shanghai Jiao Tong University, CN
Authors:
Houxiang Ji1, Linghao Song2, Li Jiang1, Hai (Helen) Li3 and Yiran Chen2
1Shanghai Jiao Tong University, CN; 2Duke University, US; 3Duke University/TUM-IAS, US
Abstract
Deep Neural Networks (DNNs) play a key role in prevailing machine learning applications. Resistive random-access memory (ReRAM) is capable of both computation and storage, contributing to the acceleration on DNNs process in memory. Besides, DNNs have a significant amount of zero weights, which provides a possibility to reduce computation cost by skipping ineffectual calculations on zero weights. However, the irregular distribution of zero weights in DNNs makes it difficult for resistive accelerators to take advantage of the sparsity, because resistive accelerators have a high reliance on regular matrix-vector multiplication in ReRAM. In this work, we propose ReCom, the first resistive accelerator to support sparse DNN processing. ReCom is an efficient resistive accelerator for compressed deep neural networks, where DNN weights are structurally compressed to eliminate zero parameters and become more friendly to computation in ReRAM, and zero DNN activations are also considered at the same time. Two technologies, Structurally-compressed Weight Oriented Fetching (SWOF) and In-layer Pipeline for Memory and Computation (IPMC),are particularly proposed to efficiently process the compressed DNNs in ReRAM. In our evaluation, ReCom can achieve 3.37x speedup and 2.41x energy efficiency compared to a state-of-the-art resistive accelerator.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-2SPARSENN: AN ENERGY-EFFICIENT NEURAL NETWORK ACCELERATOR EXPLOITING INPUT AND OUTPUT SPARSITY
Speaker:
Jingyang Zhu, Hong Kong University of Science and Technology, HK
Authors:
Jingyang Zhu, Jingbo Jiang, Xizi Chen and Chi-Ying Tsui, Hong Kong University of Science and Technology, HK
Abstract
Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computational complexity poses a challenge to the hardware design. In this work, we leverage the intrinsic activation sparsity of DNN to substantially reduce the execution cycles and the energy consumption. An end-to-end training algorithm is proposed to develop a lightweight (less than 5% overhead) run-time predictor for the output activation sparsity on the fly. Furthermore, an energy-efficient hardware architecture, SparseNN, is proposed to exploit both the input and output sparsity. SparseNN is a scalable architecture with distributed memories and processing elements connected through a dedicated on-chip network. Compared with the state-of-the-art accelerators which only exploit the input sparsity, SparseNN can achieve a 10%-70% improvement in throughput and a power reduction of around 50%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-3ACCLIB: ACCELERATORS AS LIBRARIES
Speaker:
Jacob R. Stevens, Purdue University, US
Authors:
Jacob Stevens1, Yue Du2, Vivek Kozhikkottu3 and Anand Raghunathan1
1Purdue University, US; 2IBM, US; 3Intel Corporation, US
Abstract
Accelerator-based computing, which has been a mainstay of System-on-Chips (SoCs) is of growing interest to a wider range of computing systems. However, the significant design effort required to identify a computational target for acceleration, design a hardware accelerator, verify the correctness of the accelerator, integrate the accelerator into the system, and rewrite applications to use the accelerator, is a major bottleneck to the widespread adoption of accelerator-based computing. The classical approach to this problem is based on top-down methodologies such as automatic HW/SW partitioning and high-level synthesis (HLS). While HLS has advanced significantly and is seeing increased adoption, it does not leverage the ability of experienced human designers to craft highly optimized RTL, nor does it leverage the growing body of already existing hardware accelerators. In this work, we propose ACCLIB, a design framework that allows software developers to utilize existing libraries of pre-designed hardware accelerators automatically with no prior knowledge of the function of the accelerators, with minimal knowledge of hardware design, and with minimal design effort. To accomplish this, ACCLIB uses formal verification techniques to match a target software function with a functionally equivalent accelerator from a library of accelerators. It also generates the required HW/SW interfaces as well as the code necessary to offload the computation to the accelerator. We validate ACCLIB by applying it to accelerate six different applications using a library of hardware accelerators in just over one hour per application, demonstrating that the proposed approach has the potential to lower the barrier to adoption of accelerator-based computing.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-4HPXA: A HIGHLY PARALLEL XML PARSER
Speaker:
Smruti Sarangi, IIT Delhi, IN
Authors:
Isaar Ahmad, Sanjog Patil and Smruti R. Sarangi, IIT Delhi, IN
Abstract
State of the art XML parsing approaches read an XML file byte by byte, and use complex finite state machines to process each byte. In this paper, we propose a new parser, HPXA, which reads and processes 16 bytes at a time. We designed most of the components ab initio, to ensure that they can process multiple XML tokens and tags in parallel. We propose two basic elements - a sparse 1D array compactor, and a hardware unit called LTMAdder that takes its decisions based on adding the rows of a lower triangular matrix. We demonstrate that we are able to process 16 bytes in parallel with very few pipeline stalls for a suite of widely used XML benchmarks. Moreover, for a 28nm technology node, we can process XML data at 106 Gbps, which is roughly 6.5X faster than competing prior work.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-5QOR-AWARE POWER CAPPING FOR APPROXIMATE BIG DATA PROCESSING
Speaker:
Sherief Reda, Brown University, US
Authors:
Seyed Morteza Nabavinejad1, Xin Zhan2, Reza Azimi2, Maziar Goudarzi1 and Sherief Reda2
1Sharif University of Technology, IR; 2Brown University, US
Abstract
To limit the peak power consumption of a cluster, a centralized power capping system typically assigns power caps to the individual servers, which are then enforced using local capping controllers. Consequently, the performance and throughput of the servers are affected, and the runtime of jobs is extended as a result. We observe that servers in big data processing clusters often execute big data applications that have different tolerance for approximate results. To mitigate the impact of power capping, we propose a new power-Capping aware resource manager for Approximate Big data processing (CAB) that takes into consideration the minimum Quality-of-Result (QoR) of the jobs. We use industry standard feedback power capping controllers to enforce a power cap quickly, while, simultaneously modifying the resource allocations to various jobs based on their progress rate, target minimum QoR, and the power cap such that the impact of capping on runtime is minimized. Based on the applied cap and the progress rates of jobs, CAB dynamically allocates the computing resources (i.e., number of cores and memory) to the jobs to mitigate the impact of capping on the finish time. We implement CAB in Hadoop-2.7.3 and evaluate its improvement over other methods on a state-of-the-art 28-core Xeon server. We demonstrate that CAB minimizes the impact of power capping on runtime by up to 39.4% while meeting the minimum QoR constraints.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-6EXACT MULTI-OBJECTIVE DESIGN SPACE EXPLORATION USING ASPMT
Speaker:
Kai Neubauer, University of Rostock, DE
Authors:
Kai Neubauer1, Philipp Wanko2, Torsten Schaub2 and Christian Haubelt1
1University of Rostock, DE; 2University of Potsdam, DE
Abstract
An efficient Design Space Exploration (DSE) is imperative for the design of modern, highly complex embedded systems in order to steer the development towards optimal design points. The early evaluation of design decisions at system-level abstraction layer helps to find promising regions for subsequent development steps in lower abstraction levels by diminishing the complexity of the search problem. In recent works, symbolic techniques, especially Answer Set Programming (ASP) modulo Theories (ASPmT), have been shown to find feasible solutions of highly complex system-level synthesis problems with non-linear constraints very efficiently. In this paper, we present a novel approach to a holistic system-level DSE based on ASPmT. To this end, we include additional background theories that concurrently guarantee compliance with hard constraints and perform the simultaneous optimization of several design objectives. We implement and compare our approach with a state-of-the-art preference handling framework for ASP. Experimental results indicate that our proposed method produces better solutions with respect to both diversity and convergence to the true Pareto front.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-7HIPE: HMC INSTRUCTION PREDICATION EXTENSION APPLIED ON DATABASE PROCESSING
Speaker:
Diego Tomé, Centrum Wiskunde & Informatica (CWI), BR
Authors:
Diego Gomes Tomé1, Paulo Cesar Santos2, Luigi Carro2, Eduardo Cunha de Almeida3 and Marco Antonio Zanata Alves3
1Federal University of Paraná, BR; 2UFRGS, BR; 3UFPR, BR
Abstract
The recent Hybrid Memory Cube (HMC) is a smart memory which includes functional units inside one logic layer of the 3D stacked memory design. In order to execute instructions inside the Hybrid Memory Cube (HMC), the processor needs to send instructions to be executed near data, keeping most of the pipeline complexity inside the processor. Thus, control-flow and data-flow dependencies are all managed inside the processor, in such way that only update instructions are supported by the HMC. In order to solve data-flow dependencies inside the memory, previous work proposed HMC Instruction Vector Extensions(HIVE), which embeds a high number of functional units with an interlock register bank. In this work, we propose HMC Instruction Prediction Extensions (HIPE), that supports predicated execution inside the memory, in order to transform control-flow dependencies into data-flow dependencies. Our mechanism focuses on removing the high latency iteration between the processor and the smart memory during the execution of branches that depends on data processed inside the memory. In this paper, we evaluate a balanced design of HIVE comparing to x86 and HMC executions.After we show the HIPE mechanism results when executing a database workload, which is a strong candidate to use smart memories. We show interesting trade-offs of performance when comparing our mechanism to previous work.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-8PARAMETRIC FAILURE MODELING AND YIELD ANALYSIS FOR STT-MRAM
Speaker:
Sarath Mohanachandran Nair, Karlsruhe Institute of Technology, DE
Authors:
Sarath Mohanachandran Nair, Rajendra Bishnoi and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
The emerging Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising candidate to replace conventional on-chip memory technologies due to its advantages such as non-volatility, high density, scalability and unlimited endurance. However, as the technology scales, yield loss due to extreme parametric variations is becoming a major challenge for STT-MRAM because of its higher sensitivity to process variations as compared to CMOS memories. In addition, the parametric variations in STT-MRAM exacerbates its stochastic switching behavior, leading to both test time fails and reliability failures in the field. Since an STT-MRAM memory array consists of both CMOS and magnetic components, it is important to consider variations in both these components to obtain the failures at the system level. In this work, we model the parametric failures of STT-MRAM at the system level considering the correlation among bit-cells as well as the impact of peripheral components. The proposed approach provides realistic fault distribution maps and equips the designer to investigate the efficacy of different combinations of defect tolerance techniques for an effective design-for-yield exploration.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-9ONE-WAY SHARED MEMORY
Speaker and Author:
Martin Schoeberl, Technical University of Denmark, DK
Abstract
Standard multicore processors use the shared main memory via the on-chip caches for communication between cores. However, this form of communication has two limitations: (1) it is hardly time-predictable and therefore not a good solution for real-time systems and (2) this single shared memory is a bottleneck in the system. This paper presents a communication architecture for time-predictable multicore systems where core-local memories are distributed on the chip. A network-on-chip constantly copies data from a sender core-local memory to a receiver core-local memory. As this copying is performed in one direction we call this architecture a one-way shared memory. With the use of time-division multiplexing for the memory accesses and the network-on-chip routers we achieve a time-predictable solution where the communication latency and bandwidth can be bounded. An example architecture for a 3x3 core processor and 32-bit wide links and memory ports provides a cumulative bandwidth of 29 bytes per clock cycle. Furthermore, the evaluation shows that this architecture, due to its simplicity, is small compared to other network-on-chip solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-10AN EFFICIENT RESOURCE-OPTIMIZED LEARNING PREFETCHER FOR SOLID STATE DRIVES
Speaker:
Rui Xu, University of Science and Technology of China, CN
Authors:
Rui Xu, Xi Jin, Linfeng Tao, Shuaizhi Guo, Zikun Xiang and Teng Tian, Strongly-Coupled Quantum Matter Physics, Chinese Academy of Sciences, School of Physical Sciences, University of Science and Technology of China, Hefei, Anhui, China, CN
Abstract
In recent years, solid-state drives (SSDs) have been widely deployed in modern storage systems. To increase the performance of SSDs, prefetchers for SSDs have been designed both at operating system (OS) layer and flash translation layer (FTL). Prefetchers in FTL have many advantages like OS-independence, easy-using, and compatibility. However, due to the limitation of computing capabilities and memory resources, existing prefetchers in FTL merely employ simple sequential prefetching which may incur high penalty cost for I/O access stream with complex patterns. In this paper, an efficient learning prefetcher implemented in FTL is proposed. Considering the resource limitation of SSDs, a learning algorithm based on Markov chains is employed and optimized so that high hit ratio and low penalty cost can be achieved even for complex access patterns. To validate our design, a simulator with the prefetcher is designed and implemented based on Flashsim. The TPC-H benchmark and an application launch trace are tested on the simulator. According to experimental results of the TPC-H benchmark, more than 90% of memory cost can be saved in comparison with a previous design at OS layer. The hit ratio can be increased by 24.1% and the number of times of misprefetching can be reduced by 95.8% in comparison with the simple sequential prefetching strategy.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-11BRIDGING DISCRETE AND CONTINUOUS TIME MODELS WITH ATOMS
Speaker:
George Ungureanu, KTH Royal Institute of Technology, SE
Authors:
George Ungureanu1, José E. G. de Medeiros2 and Ingo Sander1
1KTH Royal Institute of Technology, SE; 2University of Brasília, BR
Abstract
Recent trends in replacing traditionally digital components with analog counterparts in order to overcome physical limitations have led to an increasing need for rigorous modeling and simulation of hybrid systems. Combining the two domains under the same set of semantics is not straightforward and often leads to chaotic and non-deterministic behavior due to the lack of a common understanding of aspects concerning time. We propose an algebra of primitive interactions between continuous and discrete aspects of systems which enables their description within two orthogonal layers of computation. We show its benefits from the perspective of modeling and simulation, through the example of an RC oscillator modeled in a formal framework implementing this algebra.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-12OHEX: OS-AWARE HYBRIDIZATION TECHNIQUES FOR ACCELERATING MPSOC FULL-SYSTEM SIMULATION
Speaker:
Róbert Lajos Bücs, Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE
Authors:
Róbert Lajos Bücs1, Maximilian Fricke2, Rainer Leupers1, Gerd Ascheid1, Stephan Tobies2 and Andreas Hoffmann2
1Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE; 2Synopsys GmbH, DE
Abstract
Virtual platform (VP) technology is an established enabler of embedded system design. However, the sheer number of CPU models in modern multi-core VPs forms a performance bottleneck. Hybrid simulation addresses this issue by executing parts of the embedded software stack on the host. Although the approach is significantly faster, hybridization can not cope with higher software layers, e.g., Operating Systems (OSs). Thus, this paper presents the OS-aware Host EXtension (OHEX) framework to accelerate VPs while expanding the applicability of hybridization. OHEX is evaluated on various system layers, yielding speedups between 2.99x-21.14x with specific benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-13A HIGHLY EFFICIENT FULL-SYSTEM VIRTUAL PROTOTYPE BASED ON VIRTUALIZATION-ASSISTED APPROACH
Speaker:
Hsin-I Wu, National Tsing Hua University, Department of Computer Science, Hsinchu, Taiwan, TW
Authors:
Hsin-I Wu, Chi-Kang Chen, Tsung-Ying Lu and Ren-Song Tsay, National Tsing Hua University, TW
Abstract
An effective full-system virtual prototype is critical for early-stage systems design exploration. Generally, however, traditional acceleration approaches of virtual prototypes cannot accurately analyze system performance and model non-deterministic inter-component interactions due to the unpredictability of simulation progress. In this paper, we propose an effective virtualization-assisted approach for modeling and performance analysis. First, we develop a deterministic synchronization process that manages the interactions affecting the data dependency in chronological order to model inter-component interactions consistently. Next, we create accurate timing and bus contention models based on runtime operation statistics for analyzing system performance. We implement the proposed virtualization-assisted approach on an off-the-shelf System-on-Chip (SoC) board to demonstrate the effectiveness of our idea. The experimental results show that the proposed approach runs 12~77 times faster than a commercial virtual prototyping tool and performance estimation is only 3~6% apart from real systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-14INDUSTRIAL EVALUATION OF TRANSITION FAULT TESTING FOR COST EFFECTIVE OFFLINE ADAPTIVE VOLTAGE SCALING
Speaker:
Mahroo Zandrahimi, TU Delft, NL
Authors:
Mahroo Zandrahimi1, Philippe Debaud2, Armand Castillejo2 and Zaid Al-Ars1
1Delft University of Technology, NL; 2STMicroelectronics, FR
Abstract
Adaptive voltage scaling (AVS) has been used widely to compensate for process, voltage, and temperature variations as well as power optimization of integrated circuits. The current industrial state-of-the-art AVS approaches using Process Monitoring Boxes (PMBs) have shown several limitations such as huge characterization effort, which makes these approaches very expensive, and a low accuracy that results in extra margins, which consequently lead to yield loss and performance limitations. To overcome those limitations, in this paper we propose an alternative solution using transition fault test patterns, which is able to eliminate the need for PMBs, while improving the accuracy of voltage estimation. The paper shows, using simulation of ISCAS'99 benchmarks with 28nm FD-SOI library, that AVS using transition fault testing (TF-based AVS) results in an error as low as 5.33%. The paper also shows that the PMB approach can only account for 85% of the uncertainty in voltage measurements, which results in power waste, while the TF-based approach can account for 99% of that uncertainty.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-15AN ANALYSIS ON RETENTION ERROR BEHAVIOR AND POWER CONSUMPTION OF RECENT DDR4 DRAMS
Speaker:
Deepak M. Mathew, University of Kaiserslautern, DE
Authors:
Deepak M. Mathew1, Martin Schultheis1, Carl C. Rheinländer1, Chirag Sudarshan1, Matthias Jung2, Christian Weis1 and Norbert Wehn1
1University of Kaiserslautern, DE; 2Fraunhofer IESE, DE
Abstract
DRAM technology is scaling aggressively that results in high leakage power, worse data retention time behavior, and large process variations. Due to these process variations, vendors provide large guard bands on various DRAM currents and timing specifications that are over pessimistic. Detailed knowledge on the DRAM retention behavior and currents for the average case allow to improve memory system performance and energy efficiency of specific applications by moving away from worst case behavior. In this paper, we present an advanced measurement platform to investigate off-the-shelf DDR4 DRAMs' retention behavior, and to precisely measure various DRAM currents (IDDs and IPPs) at a wide range of operating temperatures. Error Checking and Correction (ECC) schemes are popular in correcting randomly scattered single bit errors. Since retention failures also occur randomly, ECCs can be used to improve DRAM retention behavior. Therefore, for the first time, we show the influence of ECC on the retention behavior of recent DDR4 DRAMs, and how it varies across various DRAM architectures considering detailed structure of the DRAM (true-cell devices / mixed-cell devices).

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-16A BOOLEAN MODEL FOR DELAY FAULT TESTING OF EMERGING DIGITAL TECHNOLOGIES BASED ON AMBIPOLAR DEVICES
Speaker:
Davide Bertozzi, DE - University of Ferrara, IT
Authors:
Marcello Dalpasso1, Davide Bertozzi2 and Michele Favalli2
1DEI - UNiv. of Padova, IT; 2DE - Univ. of Ferrara, IT
Abstract
Emerging nanotechnonologies such as ambipolar carbon nanotube field effect transistors (CNTFETs) and silicon nanowire FETs (SiNFETs) provide ambipolar devices allowing the design of more complex logic primitives than those found in today's typical CMOS libraries. When switching, such devices show a behavior not seen in simpler CMOS and FinFET cells, making unsuitable the existing delay fault testing approaches. We provide a Boolean model of switching ambipolar devices to support delay fault testing of logic cells based on such devices both in Boolean and Pseudo-Boolean satisfiability engines.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-17ATPG POWER GUARDS: ON LIMITING THE TEST POWER BELOW THRESHOLD
Speaker:
Virendra Singh, Indian Institute of Technology Bombay, IN
Authors:
Rohini Gulve1 and Virendra Singh2
1Indian Institute of Technology Bombay, IN; 2IIT Bombay, IN
Abstract
Modern circuits with high performance and low power requirements impose strict constraints on manufacturing test generation, particularly on timing test. Delay test is used for performance grading of the circuit. During the application of the test, power consumption has to be less than the functional threshold value, in order to avoid yield loss. This work proposes a new direction to generate power safe test without any changes in DFT (design for testability) structure or existing CAD (computeraided design) tools. We propose a virtual wrapper circuitry around the circuit under test (CUT), for test generation purpose, which acts as a shield to obtain power safe vectors. The wrapper prohibits the generation of test vector if power consumption exceeds the threshold limits. We consider analytical power models for power analysis of candidate test vector patterns. Experiments performed on benchmark circuits show power safe test generation without coverage loss.

Download Paper (PDF; Only available from the DATE venue WiFi)

4.1 Executive Session: Exact Synthesis and SAT

Date: Tuesday 20 March 2018
Time: 17:00 - 18:30
Location / Room: Saal 2

Chair:
Patrick Vuillod, Synopsys, FR

Co-Chair:
Amaru Luca, Synopsys, US

Exact synthesis and SAT-based methods open new opportunities in design automation flows, where attaining the best possible logic implementation is key. This executive session covers recent advances on these two topics, which are tightly related, from both academic and industrial standpoints. The first paper shows how to find optimal circuit, on small number of variables, using SAT-solvers. The frontiers of circuits achievable by this method are discussed, together with known open problems. The second paper presents recent advancements on exact synthesis, with focus on implicit enumeration methods. Improvements on the SAT-formulation are delineated, which enable complex constraints to be considered while solving exact synthesis. The third paper introduces a redundancy removal engine based on SAT. Its integration in a commercial EDA tool is described, detailing challenges and opportunities arising in an industrial synthesis environment.

TimeLabelPresentation Title
Authors
17:004.1.1IMPROVING CIRCUIT SIZE UPPER BOUNDS USING SAT-SOLVERS
Speaker and Author:
Alexander Kulikov, Steklov Mathematical Institute at St. Petersburg, RU
Abstract
Boolean circuits is arguably the most natural model for computing Boolean functions. Despite intensive research, for many functions, we still do not know what optimal circuits look like. In this paper, we discuss how SAT-solvers can be used for constructing optimal circuits for functions on moderate number of variables. We first discuss why this problem is important and then indicate the current frontiers: what can and cannot be found by state-of-the-art SAT-solvers, and for what functions we are interested in finding efficient circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.1.2PRACTICAL EXACT SYNTHESIS
Speaker:
Winston Haaswijk, EPFL, CH
Authors:
Mathias Soeken1, Winston Haaswijk1, Eleonora Testa1, Alan Mishchenko2, Luca G. Amaru3, Robert K. Brayton2 and Giovanni De Micheli1
1EPFL, CH; 2University of California, Berkeley, US; 3Synopsys Inc., US
Abstract
In this paper, we discuss recent advances in exact synthesis, considering both their efficient implementation and various applications in which they can be employed. We emphasize on solving exact synthesis through Boolean satisfiability (SAT) encodings. Different SAT encodings for exact synthesis are compared, and examined the applications to multi-level logic synthesis, in both area and depth optimization. Another application of SAT based exact synthesis is optimization under many constraints. These constraints can, e.g., be a fixed fanout or delay constraints. Finally, we end our discussion by proposing directions for future research in exact synthesis.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.1.3SAT-BASED REDUNDANCY REMOVAL
Speaker and Author:
Krishanu Debnath, Synopsys, IN
Abstract
Logic optimization is an integral part of digital circuit design. It reduces design area and power consumption, and quite often improves circuit delay as well. Redundancy removal is a key step in logic optimization, in which redundant connections in the circuit are determined and replaced by constant values 0 or 1. The resulting circuit is simplified, resulting in area and power savings. In this paper, we describe a redundancy removal approach for combinational circuits based on a combination of logic simulation and SAT. We show that this approach can handle large industrial-strength designs in a reasonable amount of CPU time.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

4.2 Domain Specific Design Methodologies

Date: Tuesday 20 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 6

Chair:
Frédéric Pétrot, Grenoble Institute of Technology, FR

Co-Chair:
Lars Bauer, Karlsruhe Institute of Technology, DE

In the quest for high efficiency, design methodologies specialize to particular domains. At first, a case study for approximate computing in the field of biometric security is presented. The second talk proposes a framework that uses a genetic algorithm to find an optimal mapping of artificial neural networks onto GPU + multi-core systems. Finally, a method is presented that controls by software the tolerance to disturbances provoked by neighboring readings of read-intensive applications in NAND flash.

TimeLabelPresentation Title
Authors
17:004.2.1APPROXIMATE COMPUTING FOR BIOMETRIC SECURITY SYSTEMS: A CASE STUDY ON IRIS SCANNING
Speaker:
Sherief Reda, Brown University, US
Authors:
Soheil Hashemi, Hokchhay Tann, Francesco Buttafuoco and Sherief Reda, Brown University, US
Abstract
Exploiting the error resilience of emerging data-rich applications, approximate computing promotes the introduction of small amount of inaccuracy into computing systems to achieve significant reduction in computing resources such as power, design area, runtime or energy. Successful applications for approximate computing have been demonstrated in the areas of machine learning, image processing and computer vision. In this paper we make the case for a new direction for approximate computing in the field of biometric security with a comprehensive case study of iris scanning. We devise an end-to-end flow from an input camera to the final iris encoding that produces sufficiently accurate final results despite relying on intermediate approximate computational steps. Unlike previous methods which evaluated approximate computing techniques on individual algorithms, our flow consists of a complex SW/HW pipeline of four major algorithms that eventually compute the iris encoding from input live camera feeds. In our flow, we identify overall eight approximation knobs at both the algorithmic and hardware levels to trade-off accuracy with runtime. To identify the optimal values for these knobs, we devise a novel design space exploration technique based on reinforcement learning with a recurrent neural network agent. Finally, we fully implement and test our proposed methodologies using both benchmark dataset images and live images from a camera using an FPGA-based SoC. We show that we are able to reduce the runtime of the system by 48x on top of an already HW accelerated design, while meeting industry-standard accuracy requirements for iris scanning systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.2.2FLASH READ DISTURB MANAGEMENT USING ADAPTIVE CELL BIT-DENSITY WITH IN-PLACE REPROGRAMMING
Speaker:
Sung-Ming Wu, National Chiao-Tung University, TW
Authors:
Tai-Chou Wu, Yu-Ping Ma and Li-Pin Chang, National Chiao-Tung University, TW
Abstract
Read disturbance is a circuit-level noise induced by flash read operations. Read refreshing employs data migration to prevent read disturbance from corrupting useful data. However, it costs frequent block erasure under read-intensive workloads. Inspired by software-controlled cell bit-density, we propose to reserve selected threshold voltage levels as guard levels to extend the tolerance of read disturbance. Blocks with guard levels have a low cell bit-density, but they can store frequently read data without frequent read refreshing. We further propose to convert a high-density block into a low-density one using in-place reprogramming to reduce the need for data migration. Our approach reduced the number of blocks erased due to read refreshing by up to 85% and the average read response time by up to 22%.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.2.3HTF-MPR: A HETEROGENEOUS TENSORFLOW MAPPER TARGETING PERFORMANCE USING GENETIC ALGORITHMS AND GRADIENT BOOSTING REGRESSORS
Speaker:
Nader Bagherzadeh, University of California, Irvine, US
Authors:
Ahmad Albaqsami, Maryam S. Hosseini and Nader Bagherzadeh, University of California, Irvine, US
Abstract
TensorFlow is a library developed by Google to implement Artificial Neural Networks using computational dataflow graphs. The neural network has many iterations during training. A distributed, parallel environment is ideal to speedup learning. Parallelism requires proper mapping of devices to TensorFlow operations. We developed HTF-MPR framework for that reason. HTF-MPR utilizes a genetic algorithm approach to search for the best mapping that outperforms the default Tensorflow mapper. By using Gradient Boosting Regressors to create the fitness predictive model, the search space is expanded which increases the chances of finding a solution mapping. Our results on well-known neural network benchmarks, such as ALEXNET, MNIST softmax classifier, and VGG-16, show an overall speedup in the training stage by 1.18, 3.33, and 1.13, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP1-10, 363AN EFFICIENT RESOURCE-OPTIMIZED LEARNING PREFETCHER FOR SOLID STATE DRIVES
Speaker:
Rui Xu, University of Science and Technology of China, CN
Authors:
Rui Xu, Xi Jin, Linfeng Tao, Shuaizhi Guo, Zikun Xiang and Teng Tian, Strongly-Coupled Quantum Matter Physics, Chinese Academy of Sciences, School of Physical Sciences, University of Science and Technology of China, Hefei, Anhui, China, CN
Abstract
In recent years, solid-state drives (SSDs) have been widely deployed in modern storage systems. To increase the performance of SSDs, prefetchers for SSDs have been designed both at operating system (OS) layer and flash translation layer (FTL). Prefetchers in FTL have many advantages like OS-independence, easy-using, and compatibility. However, due to the limitation of computing capabilities and memory resources, existing prefetchers in FTL merely employ simple sequential prefetching which may incur high penalty cost for I/O access stream with complex patterns. In this paper, an efficient learning prefetcher implemented in FTL is proposed. Considering the resource limitation of SSDs, a learning algorithm based on Markov chains is employed and optimized so that high hit ratio and low penalty cost can be achieved even for complex access patterns. To validate our design, a simulator with the prefetcher is designed and implemented based on Flashsim. The TPC-H benchmark and an application launch trace are tested on the simulator. According to experimental results of the TPC-H benchmark, more than 90% of memory cost can be saved in comparison with a previous design at OS layer. The hit ratio can be increased by 24.1% and the number of times of misprefetching can be reduced by 95.8% in comparison with the simple sequential prefetching strategy.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

4.3 System Modelling for Simulation and Optimisation

Date: Tuesday 20 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 1

Chair:
Frederic Mallet, Universite Nice Cote d'Azur, FR

Co-Chair:
Gianluca Palermo, Politecnico di Milano, IT

The session highlights the usefulness of modelling to improve the efficiency of system-level design and simulation. The first paper presents a framework to generate accurate representative workloads from big-data applications. The second paper minimises the energy consumption by reducing accesses to off-chip memory for convolutional neural networks (CNNs). The third paper introduces compile-time analysis to improve parallel SystemC simulation.

TimeLabelPresentation Title
Authors
17:004.3.1(Best Paper Award Candidate)
CAMP: ACCURATE MODELING OF CORE AND MEMORY LOCALITY FOR PROXY GENERATION OF BIG-DATA APPLICATIONS
Speaker:
Andreas Gerstlauer, University of Texas at Austin, US
Authors:
Reena Panda, Xinnian Zheng, Andreas Gerstlauer and Lizy John, The University of Texas at Austin, US
Abstract
Fast and accurate design-space exploration is a critical requirement for enabling future hardware designs. However, big-data applications are often complex targets to evaluate on early performance models (e.g., simulators or RTL models) owing to their complex software-stacks, significantly long run times, system dependencies and the limited speed of performance models. To overcome the challenges in benchmarking complex big-data applications, in this paper, we propose a proxy generation methodology, CAMP that can generate miniature proxy benchmarks, which are representative of the performance of big-data applications and yet converge to results quickly without needing any complex software stack support. Prior system-level proxy generation techniques model core locality features in detail, but abstract out memory locality modeling using simple stride-based models, which results in poor cloning accuracy for most applications. CAMP accurately models both core-performance and memory locality, along with modeling the feedback loop between the two. CAMP replicates core performance by modeling the dependencies between instructions, instruction types, control-flow behavior, etc. CAMP also adds a memory locality profiling approach that captures spatial and temporal locality of applications. Finally, we propose a novel proxy replay methodology that integrates the core and memory locality models to create accurate system-level proxy benchmarks. We demonstrate that CAMP proxies can mimic the original application's performance behavior and that they can capture the performance feedback loop well. For a variety of real-world big-data applications, we show that CAMP achieves an average cloning accuracy of 89%. We believe this is a new capability that can facilitate for overall system (core and memory subsystem) design exploration.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.3.2SMARTSHUTTLE: OPTIMIZING OFF-CHIP MEMORY ACCESSES FOR DEEP LEARNING ACCELERATORS
Speaker:
Guihai Yan, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Jiajun Li, Guihai Yan, Wenyan Lu, Shuhao Jiang, Shijun Gong, Jingya Wu and Xiaowei Li, Institute of Computing Technology, Chinese Academy of Sciences, CN
Abstract
Convolutional Neural Network(CNN) accelerators are rapidly growing in popularity as a promising solution for deep learning based applications. Though optimizations on computation have been intensively studied, the energy efficiency of such accelerators remains limited by off-chip memory accesses since their energy cost is magnitudes higher than other operations. Minimizing off-chip memory access volume, therefore, is the key to higher energy efficiency. However, there exists a dilemma of minimizing the access of which data types. We observed that sticking to minimizing the access of one data type cannot fit the varying shapes of convolutional layers in CNNs. To overcome this problem, this paper proposed a adaptive layer partitioning and scheduling scheme, called SmartShuttle, which can adaptively switch among the specific data reuse oriented scheduling schemes and the corresponding layer partitioning schemes to dynamically match different shapes of convolutional layers. Specifically, SmartShuttle takes both data reusability and sparsity into account since they have significant impact on the memory access volume. The experimental results sho that SmartShuttle achieves a performance at 434.8 multiply and accumulations(MACs)/DRAM access for VGG-16, and 526.3 MACs/DRAM access for AlexNet, which outperforms the state-of-the-art approach (Eyeriss) by 52.2% and 52.6%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.3.3PORT CALL PATH SENSITIVE CONFLICT ANALYSIS FOR INSTANCE-AWARE PARALLEL SYSTEMC SIMULATION
Speaker:
Tim Schmidt, Student, US
Authors:
Tim Schmidt, Zhongqi Cheng and Rainer Doemer, University of California, Irvine, US
Abstract
Many parallel SystemC approaches expect a thread safe and conflict free model from the designer. Alternatively, an advanced compiler can identify and avoid possible parallel access conflicts. While manual conflict resolution can theoretically be more precise, it is impractical for real world applications because of the inherent complexities. Here automatic compiler-based analysis is preferred which provides conservative conflict avoidance with minimal false positives. This paper introduces a novel compiler technique called port call path analysis that greatly reduces the amount of false positive conflicts resulting in significantly increased simulation speed. Experimental results show that the new analysis reduces the amount of false conflicts by up to 98% and, on a 4-core processor, speeds up the simulation up to 3x for a NoC particle simulator and 3.5x for a bitcoin miner SystemC model.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP1-11, 142BRIDGING DISCRETE AND CONTINUOUS TIME MODELS WITH ATOMS
Speaker:
George Ungureanu, KTH Royal Institute of Technology, SE
Authors:
George Ungureanu1, José E. G. de Medeiros2 and Ingo Sander1
1KTH Royal Institute of Technology, SE; 2University of Brasília, BR
Abstract
Recent trends in replacing traditionally digital components with analog counterparts in order to overcome physical limitations have led to an increasing need for rigorous modeling and simulation of hybrid systems. Combining the two domains under the same set of semantics is not straightforward and often leads to chaotic and non-deterministic behavior due to the lack of a common understanding of aspects concerning time. We propose an algebra of primitive interactions between continuous and discrete aspects of systems which enables their description within two orthogonal layers of computation. We show its benefits from the perspective of modeling and simulation, through the example of an RC oscillator modeled in a formal framework implementing this algebra.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP1-12, 436OHEX: OS-AWARE HYBRIDIZATION TECHNIQUES FOR ACCELERATING MPSOC FULL-SYSTEM SIMULATION
Speaker:
Róbert Lajos Bücs, Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE
Authors:
Róbert Lajos Bücs1, Maximilian Fricke2, Rainer Leupers1, Gerd Ascheid1, Stephan Tobies2 and Andreas Hoffmann2
1Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE; 2Synopsys GmbH, DE
Abstract
Virtual platform (VP) technology is an established enabler of embedded system design. However, the sheer number of CPU models in modern multi-core VPs forms a performance bottleneck. Hybrid simulation addresses this issue by executing parts of the embedded software stack on the host. Although the approach is significantly faster, hybridization can not cope with higher software layers, e.g., Operating Systems (OSs). Thus, this paper presents the OS-aware Host EXtension (OHEX) framework to accelerate VPs while expanding the applicability of hybridization. OHEX is evaluated on various system layers, yielding speedups between 2.99x-21.14x with specific benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP1-13, 144A HIGHLY EFFICIENT FULL-SYSTEM VIRTUAL PROTOTYPE BASED ON VIRTUALIZATION-ASSISTED APPROACH
Speaker:
Hsin-I Wu, National Tsing Hua University, Department of Computer Science, Hsinchu, Taiwan, TW
Authors:
Hsin-I Wu, Chi-Kang Chen, Tsung-Ying Lu and Ren-Song Tsay, National Tsing Hua University, TW
Abstract
An effective full-system virtual prototype is critical for early-stage systems design exploration. Generally, however, traditional acceleration approaches of virtual prototypes cannot accurately analyze system performance and model non-deterministic inter-component interactions due to the unpredictability of simulation progress. In this paper, we propose an effective virtualization-assisted approach for modeling and performance analysis. First, we develop a deterministic synchronization process that manages the interactions affecting the data dependency in chronological order to model inter-component interactions consistently. Next, we create accurate timing and bus contention models based on runtime operation statistics for analyzing system performance. We implement the proposed virtualization-assisted approach on an off-the-shelf System-on-Chip (SoC) board to demonstrate the effectiveness of our idea. The experimental results show that the proposed approach runs 12~77 times faster than a commercial virtual prototyping tool and performance estimation is only 3~6% apart from real systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

4.4 Overcoming the Limitations of Worst-Case IC Design

Date: Tuesday 20 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 2

Chair:
Vasilis Pavlidis, University of Manchester, GB

Co-Chair:
Giorgios Karakonstantis, Queen's University Belfast, GB

The session illustrates novel approaches to lower the high voltage and timing guard-bands affecting the performance of computing systems. The first talk introduces a methodology to increase the resiliency towards timing errors at ultra-low-voltages. Then, the placement of the timing monitor infrastructure is investigated in the second talk. The illustration of a mechanism to reliably tune the voltage supply concludes the session.

TimeLabelPresentation Title
Authors
17:004.4.1(Best Paper Award Candidate)
TRIDENT: A COMPREHENSIVE TIMING ERROR RESILIENT TECHNIQUE AGAINST CHOKE POINTS AT NTC
Speaker:
Aatreyi Bal, Utah State University, US
Authors:
Aatreyi Bal, Sanghamitra Roy and Koushik Chakraborty, Utah State University, US
Abstract
Near Threshold Computing (NTC) systems have been inherently plagued with heightened process variation (PV) sensitivity. Choke points are an intriguing manifestation of this PV sensitivity. In this paper, we explore the probability of minimum timing violations, caused by choke points, in an NTC system and, their non-trivial impacts on the system reliability. We show that conventional timing error mitigation techniques are inefficient in tackling choke point induced minimum timing violations. Consequently, we propose a comprehensive error mitigation technique, Trident, to tackle choke points, at NTC. Trident offers a 1.37× performance improvement and a 1.1× energy efficiency gain over Razor at NTC, with minimal overheads.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.4.2BAYESIAN THEORY BASED SWITCHING PROBABILITY CALCULATION METHOD OF CRITICAL TIMING PATH FOR ON-CHIP TIMING SLACK MONITORING
Speaker:
Byung Su Kim, Samsung Electronics, Foundry, KR
Authors:
Byung Su Kim1 and Joon-Sung Yang2
1Samsung Electronics, KR; 2Sungkyunkwan University, KR
Abstract
Accurate in-situ monitoring is urgently required for an adaptive performance control system and post silicon validation. For accurate in-situ monitoring, a direct probing method is presented in which monitors directly measure a path delay from real critical timing paths. However, we may not be able to predict when the timing slack monitors would activate since the activation depends on a design structure and input patterns. If a timing slack monitor is rarely activated by timing critical paths, the observability from this monitor would be low and the monitor possibly can be discarded. For this reason, we propose a novel timing slack monitoring methodology based on switching probability of timing critical paths. Switching probability and correlation on critical timing paths are formulated, and the proposed method finds a list of critical path endpoints for the timing slack monitor insertion under given power and area constraints. Experimental results with ISCAS'89 circuits show that, compared to the method which places monitors for all worst critical paths, 16.67 ~ 97.2% of timing slack monitors are removed and 32.56 ~ 96.88% of dynamic power reduction from the monitors is achieved by the proposed method.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.4.3PERFORMANCE BASED TUNING OF AN INDUCTIVE INTEGRATED VOLTAGE REGULATOR DRIVING A DIGITAL CORE AGAINST PROCESS AND PASSIVE VARIATIONS
Speaker:
Venkata Chaitanya Krishna Chekuri, Georgia Institute of Technology, US
Authors:
Venkata Chaitanya Krishna Chekuri, Monodeep Kar, Arvind Singh and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
This paper presents an auto-tuning method for fully integrated voltage regulators (IVRs) driving digital cores against variations in passive as well as process/temperature of the core. The key contribution is to perform auto-tuning of the coefficients of the feedback loop of the IVR based on the performance of the digital cores. Simulations using a high-frequency IVR Simulink model and digital logic in 45nm CMOS process shows that the proposed performance driven auto-tuning demonstrates potential for up to 12% increase in system performance under inductance and threshold variation.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP1-14, 686INDUSTRIAL EVALUATION OF TRANSITION FAULT TESTING FOR COST EFFECTIVE OFFLINE ADAPTIVE VOLTAGE SCALING
Speaker:
Mahroo Zandrahimi, TU Delft, NL
Authors:
Mahroo Zandrahimi1, Philippe Debaud2, Armand Castillejo2 and Zaid Al-Ars1
1Delft University of Technology, NL; 2STMicroelectronics, FR
Abstract
Adaptive voltage scaling (AVS) has been used widely to compensate for process, voltage, and temperature variations as well as power optimization of integrated circuits. The current industrial state-of-the-art AVS approaches using Process Monitoring Boxes (PMBs) have shown several limitations such as huge characterization effort, which makes these approaches very expensive, and a low accuracy that results in extra margins, which consequently lead to yield loss and performance limitations. To overcome those limitations, in this paper we propose an alternative solution using transition fault test patterns, which is able to eliminate the need for PMBs, while improving the accuracy of voltage estimation. The paper shows, using simulation of ISCAS'99 benchmarks with 28nm FD-SOI library, that AVS using transition fault testing (TF-based AVS) results in an error as low as 5.33%. The paper also shows that the PMB approach can only account for 85% of the uncertainty in voltage measurements, which results in power waste, while the TF-based approach can account for 99% of that uncertainty.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP1-15, 693AN ANALYSIS ON RETENTION ERROR BEHAVIOR AND POWER CONSUMPTION OF RECENT DDR4 DRAMS
Speaker:
Deepak M. Mathew, University of Kaiserslautern, DE
Authors:
Deepak M. Mathew1, Martin Schultheis1, Carl C. Rheinländer1, Chirag Sudarshan1, Matthias Jung2, Christian Weis1 and Norbert Wehn1
1University of Kaiserslautern, DE; 2Fraunhofer IESE, DE
Abstract
DRAM technology is scaling aggressively that results in high leakage power, worse data retention time behavior, and large process variations. Due to these process variations, vendors provide large guard bands on various DRAM currents and timing specifications that are over pessimistic. Detailed knowledge on the DRAM retention behavior and currents for the average case allow to improve memory system performance and energy efficiency of specific applications by moving away from worst case behavior. In this paper, we present an advanced measurement platform to investigate off-the-shelf DDR4 DRAMs' retention behavior, and to precisely measure various DRAM currents (IDDs and IPPs) at a wide range of operating temperatures. Error Checking and Correction (ECC) schemes are popular in correcting randomly scattered single bit errors. Since retention failures also occur randomly, ECCs can be used to improve DRAM retention behavior. Therefore, for the first time, we show the influence of ECC on the retention behavior of recent DDR4 DRAMs, and how it varies across various DRAM architectures considering detailed structure of the DRAM (true-cell devices / mixed-cell devices).

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

4.5 Test: innovative infrastructures and ATPG techniques

Date: Tuesday 20 March 2018
Time: 17:00 - 18:00
Location / Room: Konf. 3

Chair:
Danilo Pau, STMicroelectronics, IT

Co-Chair:
Lukasz Rybak, Mentor Graphics Poland, PL

The session addresses hot challenges for 2.5D and 3D integration and asynchronous circuits, and introduces solutions for improving ATPG efficiency

TimeLabelPresentation Title
Authors
17:004.5.1PRE-ASSEMBLY TESTING OF INTERCONNECTS IN EMBEDDED MULTI-DIE INTERCONNECT BRIDGE (EMIB) DIES
Speaker:
Krishnendu Chakrabarty, Duke University, US
Authors:
Sudipta Mondal and Krishnendu Chakrabarty, Duke University, US
Abstract
The embedded multi-die interconnect bridge (EMIB) is an advanced packaging technology for 2.5D integration. This paper presents a bridge test architecture based on the proposed IEEE Std. P1838. The proposed test method enables access to interconnects at a pre-assembly stage by pairing the interconnects using metal shorts and probing on coarse-pitch C4 bumps. It can efficiently detect resistive-open and resistive-short defects in the bridge interconnects and micro-bumps. Simulation results are presented to evaluate the range of defects that can be detected by the proposed method.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.5.2ON THE REUSE OF TIMING RESILIENT ARCHITECTURE FOR TESTING PATH DELAY FAULTS IN CRITICAL PATHS
Speaker:
Luciano Ost, University of Leicester, GB
Authors:
Felipe Kuentzer, Leonardo Juracy and Alexandre Amory, PUCRS University, BR
Abstract
Energy efficiency has become one of the most common and important demands for contemporary applications, increasing the desire for chips that operate near the threshold voltage levels, which unfortunately worsens the effects of process, voltage, and temperature (PVT) variability. An alternative solution to cope with PVT variations are the timing resilient architectures, such as the synchronous Razor family and the asynchronous Blade template, that rely on error detection logic (EDL) to detect and recover from timing violations. On one hand, the use of timing resilient architectures makes the path delay testing more challenging because it is not a matter of simple pass or fails the test. On the other hand, we show that timing resilient architectures, such as Blade, present opportunities to design low-cost online delay testing of the critical paths. Results show the area overhead and fault coverage using functional testing on a 32-bit MIPS CPU and a crypto core.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:454.5.3CHARACTERIZATION OF POSSIBLY DETECTED FAULTS BY ACCURATELY COMPUTING THEIR DETECTION PROBABILITY
Speaker:
Jan Burchard, Mentor, a Siemens Business, DE
Authors:
Jan Burchard1, Dominik Erb2 and Bernd Becker1
1University of Freiburg, DE; 2Infineon Technologies, DE
Abstract
With ever more complex and larger VLSI devices and higher and higher reliability requirements, high quality test with a large fault and defect coverage is becoming even more relevant. At the same time, when unspecified or unknown input values (X-values) have to be considered in a pattern, commercial ATPG tools are sometimes not capable of determining whether a fault can be tested - but there is at least a chance to detect the fault, as 0/X or 1/X could be propagated to at least one output. Consequently, these faults are considered to be possibly detected and often counted towards the overall fault coverage with a weighting factor. However, as the actual probability to detect these faults with the considered test pattern is not taken into account, this could lead to an over- or underestimation of their real fault coverage, falsifying the test results. We introduce a #SAT-based characterization algorithm for this class of faults. This new algorithm is, for the first time, able to accurately compute the detection probability for faults marked as possibly detected by state-of-the-art commercial tools. Our experimental results for the largest ITC'99 benchmarks as well as larger industrial-circuits show that our algorithm can accurately determine the detection probability for most of the possibly detected faults and also identify faults that are completely untestable or found with a probability of 100% irrespective of the assignment of the inputs with an X-value. Furthermore, they show that the detection probability is circuit dependent and consequently should not just be estimated by a simple weighting factor but requires a more in-depth evaluation. Otherwise, there is a high risk that the achieved results could clearly be to optimistic or pessimistic with regard to the real fault coverage.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:00IP1-16, 728A BOOLEAN MODEL FOR DELAY FAULT TESTING OF EMERGING DIGITAL TECHNOLOGIES BASED ON AMBIPOLAR DEVICES
Speaker:
Davide Bertozzi, DE - University of Ferrara, IT
Authors:
Marcello Dalpasso1, Davide Bertozzi2 and Michele Favalli2
1DEI - UNiv. of Padova, IT; 2DE - Univ. of Ferrara, IT
Abstract
Emerging nanotechnonologies such as ambipolar carbon nanotube field effect transistors (CNTFETs) and silicon nanowire FETs (SiNFETs) provide ambipolar devices allowing the design of more complex logic primitives than those found in today's typical CMOS libraries. When switching, such devices show a behavior not seen in simpler CMOS and FinFET cells, making unsuitable the existing delay fault testing approaches. We provide a Boolean model of switching ambipolar devices to support delay fault testing of logic cells based on such devices both in Boolean and Pseudo-Boolean satisfiability engines.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:01IP1-17, 845ATPG POWER GUARDS: ON LIMITING THE TEST POWER BELOW THRESHOLD
Speaker:
Virendra Singh, Indian Institute of Technology Bombay, IN
Authors:
Rohini Gulve1 and Virendra Singh2
1Indian Institute of Technology Bombay, IN; 2IIT Bombay, IN
Abstract
Modern circuits with high performance and low power requirements impose strict constraints on manufacturing test generation, particularly on timing test. Delay test is used for performance grading of the circuit. During the application of the test, power consumption has to be less than the functional threshold value, in order to avoid yield loss. This work proposes a new direction to generate power safe test without any changes in DFT (design for testability) structure or existing CAD (computeraided design) tools. We propose a virtual wrapper circuitry around the circuit under test (CUT), for test generation purpose, which acts as a shield to obtain power safe vectors. The wrapper prohibits the generation of test vector if power consumption exceeds the threshold limits. We consider analytical power models for power analysis of candidate test vector patterns. Experiments performed on benchmark circuits show power safe test generation without coverage loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:02IP2-1, 369IN-GROWTH TEST FOR MONOLITHIC 3D INTEGRATED SRAM
Speaker:
Yixun Zhang, Shanghai Jiao Tong University, CN
Authors:
Pu Pang1, Yixun Zhang1, Tianjian Li1, Sung Kyu Lim2, Quan Chen1, Xiaoyao Liang1 and Li Jiang1
1Shanghai Jiao Tong University, CN; 2Georgia Tech, US
Abstract
Monolithic three-dimensional integration (M3I) directly fabricates tiers of integrated circuits upon each other and provides millions of vertical interconnections with interlayer vias (ILVs). It thus brings higher integration density and communication capability compared with three-dimensional stacked integration (3D-SI). However, the Known-Good-Die problem haunting 3D-SI-a faulty tier causes the failure of the entire stack-also occurs in M3I. Lack of efficient test methodologies such as the pre-bond testing in 3D-SI, M3I may have a more significant yield drop and thus its cost may be unacceptable for main-stream adoption. This paper introduces a novel In-growth test method for M3I SRAM. We propose a novel Design-for- Test (DfT) methodology to enable the proposed In-growth test on cell-level partitioned incomplete SRAM cells. We also build a statistical model of cost and discover a prospective judgement to determine whether or not to stop the fabrication, in order to prevent from raising the cost of fabricating more tiers upon the irreparable tiers. We find that a "sweet point" exists in the judgement, which can minimize the overall cost. Experimental results show the effectiveness of our proposed test methodology.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:00End of session
18:30Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

4.6 Special Session: Securing Power-constrained System-on-Chips: Challenges and Opportunities

Date: Tuesday 20 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 4

Chair:
Mukhopadhyay Saibal, School of ECE, Georgia Institute of Technology, US

The power is the key constraint for modern System-on-Chip (SoC) design. After decade long research and development, the low-power circuit techniques and run-time power-management schemes are maturing at circuit, logic, and system levels. Now the SoCs face a new but critical design challenge: how to keep them secure - and techniques are being investigated in software and hardware to achieve this goal. This session is dedicated to study the critical, but often complex, interplay between power and security, in different aspects of design-for-security practices in SoCs.

TimeLabelPresentation Title
Authors
17:004.6.1ULTRA-LOW ENERGY CIRCUIT BUILDING BLOCKS FOR SECURITY TECHNOLOGIES
Speaker:
Sanu Mathew, Intel Corporation, US
Authors:
Sanu Mathew1, Sudhir Satpathy2, Vikram Suresh2 and Ram Krishnamurthy2
1Intel Labs, US; 2Intel Corporation, Hillsboro, US
Abstract
Low-area energy-efficient security primitives are key building blocks for enabling end-to-end content protection, user authentication in IoT platforms. This paper describes 3 designs that employ energy-efficient circuit techniques with optimal hardware-friendly arithmetic for seamless integration into area/battery constrained IoT systems: 1) A 2040-gate AES accelerator achieving 289Gbps/W efficiency in 22nm CMOS, 2) Hardened hybrid Physically Unclonable Function (PUF) circuit to generate a 100% stable encryption key. 3) All-digital TRNG to achieve >0.99 min-entropy with 3pJ/bit energy-efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:154.6.2EMBEDDED RANDOMNESS AND DATA DEPENDENCIES DESIGN PARADIGM: ADVANTAGES AND CHALLENGES
Speaker:
Itamar Levi, Université catholique de Louvain (UCL), Belgium, BE
Authors:
Itamar Levi, Alexander Fish and Osnat Keren, Faculty of Engineering, Bar-Ilan University, IL
Abstract
Information leakage through physical channels is a major hurdle in embedded hardware security. This paper overviews the three key factors in the embedded hardware security space, focusing on gray-box (bounded resources) power analysis attacks: the adversary's knowledge and abilities, the security metrics used by adversaries' and security evaluators and gate-level countermeasures. A new design paradigm, dubbed pAsynch, that utilizes internal signals and random signals to uniformly spread the information-carrying energy within the clock period in a specific way with a resolution below the band-width and noise-filtering abilities of advanced measurement equipment is introduced. The advantages and design challenges introduced by the pAsynch paradigm are discussed.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:404.6.3EXPLOITING ON-CHIP POWER MANAGEMENT FOR SIDE-CHANNEL SECURITY
Speaker:
Saibal Mukhopadhyay, Georgia Institute of Technology, US
Authors:
Arvind Singh1, Monodeep Kar1, Sanu Mathew2, Anand Rajan2, Vivek De3 and Saibal Mukhopadhyay1
1Georgia Institute of Technology, US; 2Intel Labs, US; 3Intel Corporation, US
Abstract
The high-performance and energy-efficient encryption engines have emerged as a key component for modern System-On-Chip (SoC) in various platforms including servers, desktops, mobile, and IoT edge devices. A key bottleneck to secure operation of encryption engines is leakage of information through various side-channels. For example, an adversary can extract the secret key by performing statistical analysis on measured power and electromagnetic (EM) emission signatures generated by the hardware during encryption. Countermeasures to such side-channel attacks often come at high power, area, or performance overheads. Therefore, design of side-channel secure encryption engines is a critical challenge for high-performance and/or power-/energy efficient operations. This paper reviews that although low-power requirement imposes critical challenge for side-channel security, but circuit techniques traditionally developed for power management also present new opportunities for side-channel resistance. As a case study, we review the feasibility of using integrated voltage regulator and dynamic voltage frequency scaling normally used for efficient power management, for increasing power-side-channel resistance of AES engines. The hardware measurement results from test-chip fabricated in 130nm process are presented to demonstrate the impact of power management circuits on side-channel security.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:054.6.4ENERGY-SECURE SWARM POWER MANAGEMENT
Speaker:
Pradip Bose, IBM Corporation, US
Authors:
Augusto Vega, Alper Buyuktosunoglu and Pradip Bose, IBM T. J. Watson Research, US
Abstract
We present a visionary concept of a distributed (or decentralized) power/thermal control mechanism that applies the bio-inspired artificial intelligence paradigm of swarm intelligence. The target use case is a future many-core processor. Preliminary results based on a swarm simulator are presented. The talk then addresses the challenge of making such power control systems secure against energy attacks - i.e. maliciously launched virus programs that are designed to disrupt the power control mechanism and cause performance shortfalls or even physical damage from over-heating.
18:30End of session
Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

4.7 Adaptive Reliable Computing Using Memristive and Reconfigurable Hardware

Date: Tuesday 20 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 5

Chair:
Walter Weber, NAMLAB, DE

Co-Chair:
Alessandro Cilardo, University of Naples Federico II, IT

This session discusses reliability analysis and enhancement of memristive computing, addressing the non-linear behavior and the development of a logic synthesis flow for defect tolerance. The session also focuses on adapting the precision of the heterogenous hardware to fit application requirements.

TimeLabelPresentation Title
Authors
17:004.7.1RESCUING MEMRISTOR-BASED COMPUTING WITH NON-LINEAR RESISTANCE LEVELS
Speaker:
Jilan Lin, Tsinghua University, CN
Authors:
Jilan Lin1, Lixue Xia1, Zhenhua Zhu1, Hanbo Sun1, Yi Cai1, Hui Gao1, Ming Cheng1, Xiaoming Chen2, Yu Wang1 and Huazhong Yang1
1Tsinghua University, Beijing, CN; 2University of Notre Dame, US
Abstract
Emerging metal oxide resistive switching random access memory (RRAM) device and RRAM crossbar have shown great potential in computing matrix-vector multiplication. However, due to the nonlinear distribution of resistance levels in RRAM devices, state-of-the-art multi-bit RRAM cannot accomplish the multi-bit computing task accurately. In this paper, we propose fault-tolerant schemes to rescue RRAM-based computation with nonlinear resistance levels. We classify the resistance level distributions in RRAM devices into three types, and the corresponding models are proposed to analyze the computation characteristics. We propose two theoretical conditions for the resistance levels to determine if an RRAM device can support multi-bit matrix computation. For the linear model, the least squares method is used to reduce the computing error. When the resistance distribution obeys the proposed power model, a logarithmic operation is used to decode the multiplication results and accomplish accuracy computing. For exponential model, since the device cannot complete typical matrix-vector multiplication from hardware level, we propose online and offline quantization methods to make the neural computing algorithms friendly to RRAM device. Simulation results show that the root-mean-square error improves around 4% with the linear model and more than 99% with the power model. After quantization, the accuracy of ResNet-18 using RRAM with exponential resistance levels can be improved to the same accuracy with ideal linear RRAM devices.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.7.2PX-CGRA: POLYMORPHIC APPROXIMATE COARSE-GRAINED RECONFIGURABLE ARCHITECTURE
Speaker:
Omid Akbari, University of Tehran, IR
Authors:
Omid Akbari1, Mehdi Kamal1, Ali Afzali-Kusha1, Massoud Pedram2 and Muhammad Shafique3
1University of Tehran, IR; 2University of Southern California, US; 3TU Wien, AT
Abstract
Coarse-Grained Reconfigurable Architectures (CGRAs) provide tradeoff between the energy-efficiency of Application Specific Integrated Circuits (ASICs) and the flexibility of General Purpose Processors (GPPs). State-of-the-art CGRAs only support exact architectures and precise application executions. However, a majority of the streaming applications such as multimedia and digital signal processing, which are amenable to CGRAs, are inherently error resilient. Therefore, these applications can greatly benefit from the emerging trend of Approximate Computing that leverages this error-resiliency to provide higher energy efficiency proportional to the tolerable accuracy loss (can even be constrained). This paper, for the first time, introduces the novel concept of Polymorphic Approximate CGRA (PX-CGRA) that employs heterogeneous tiles of Polymorphic-Approximated ALU Clusters (PACs) connected in a 2-D mesh style connection. These PACs can implement different approximate modes as well as accurate modes depending upon their selected configuration as per the run-time requirements of executing applications. For designing an efficient PX-CGRA, we propose a bottom-up design flow. In addition, the flow of application mapping on PX-CGRA is discussed including accuracy-level mapping, scheduling, and binding steps. To comprehensively evaluate the efficacy of the proposed CGRA, the complete PX-CGRA architecture in different sizes as well as with different PACs configurations are synthesized using a 15-nm FinFET technology. Our results show up to 15%-45% energy efficiency improvement for 5%-35% output quality degradation, respectively, when compared to the state-of-the-art exact-mode CGRA. Our proposed architecture and design methodology enable a new era of accuracy-configurable CGRAs to provide significant energy gains.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.7.3MULTI-PRECISION CONVOLUTIONAL NEURAL NETWORKS ON HETEROGENEOUS HARDWARE
Speaker:
Mohammad Hosseinabady, University of Bristol, GB
Authors:
Moslem Amiri, Mohammad Hosseinabady, Simon McIntosh-Smith and Jose Nunez-Yanez, University of Bristol, GB
Abstract
Fully binarised convolutional neural networks (CNNs) deliver very high inference performance using single-bit weights and activations, together with XNOR type operators for the kernel convolutions. Current research shows that full binarisation results in a degradation of accuracy and different approaches to tackle this issue are being investigated such as using more complex models as accuracy reduces. This paper proposes an alternative based on a multi-precision CNN framework that combines a binarised and a floating point CNN in a pipeline configuration deployed on heterogeneous hardware. The binarised CNN is mapped onto an FPGA device and used to perform inference over the whole input set while the floating point network is mapped onto a CPU device and performs re-inference only when the classification confidence level is low. A light-weight confidence mechanism enables a flexible trade-off between accuracy and throughput. To demonstrate the concept, we choose a Zynq 7020 device as the hardware target and show that the multi-precision network is able to increase the BNN accuracy from 78.5% to 82.5% and the CPU inference speed from 29.68 to 90.82 images/sec.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:154.7.4LOGIC SYNTHESIS AND DEFECT TOLERANCE FOR MEMRISTIVE CROSSBAR ARRAYS
Speaker:
Onur Tunali, Istanbul Technical University, TR
Authors:
Onur Tunali and Mustafa Altun, Istanbul Technical University, TR
Abstract
Contrary to abundant memory related studies of memristive crossbar structures, logic oriented applications are only gaining popularity in recent years. In this paper, we study logic synthesis, regarding both two-level and multi level designs, and defect aspects of memristor based crossbar architectures. First, we introduce our two-level and multi-level logic synthesis techniques. We elaborate on advantages and disadvantages of both approaches with experimental results regarding area cost. After that, we devise a defect model in alignment with the conventional stuck-at open and closed paradigm. In addition, we determine the effects of defects to the operational capacity of the crossbar. Furthermore, we propose a preliminary defect tolerant Boolean logic mapping approach. In order to evaluate our approach, we conduct extensive Monte Carlo simulations with industrial benchmarks. Finally, we discuss future directions concerning both existing two-level and prospective multi-level logic designs as well as defect tolerance with area redundancy.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-2, 199(Best Paper Award Candidate)
A CO-DESIGN METHODOLOGY FOR SCALABLE QUANTUM PROCESSORS AND THEIR CLASSICAL ELECTRONIC INTERFACE
Speaker:
Jeroen van Dijk, Delft University of Technology, NL
Authors:
Jeroen van Dijk1, Andrei Vladimirescu2, Masoud Babaie1, Edoardo Charbon1 and Fabio Sebastiano1
1Delft University of Technology, NL; 2University of California, Berkeley, US
Abstract
A quantum computer fundamentally comprises a quantum processor and a classical controller. The classical electronic controller is used to correct and manipulate the qubits, the core components of a quantum processor. To enable quantum computers scalable to millions of qubits, as required in practical applications, the simultaneous optimization of both the classical electronic and quantum systems is needed. In this paper, a co-design methodology is proposed for obtaining an optimized qubit performance while considering practical trade-offs in the control circuits, such as power consumption, complexity, and cost. The SPINE (SPIN Emulator) toolset is introduced for the co-design and co-optimization of electronic/quantum systems. It comprises a circuit simulator enhanced with a Verilog-A model emulating the quantum behavior of single-electron spin qubits. Design examples show the effectiveness of the proposed methodology in the optimization, design and verification of a whole electronic/quantum system.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-3, 757APPROXIMATE QUATERNARY ADDITION WITH THE FAST CARRY CHAINS OF FPGAS
Speaker:
Philip Brisk, University of California, Riverside, US
Authors:
Sina Boroumand1, Hadi P. Afshar2 and Philip Brisk3
1University of Tehran, IR; 2Qualcomm Research, US; 3University of California, Riverside, US
Abstract
A heuristic is presented to efficiently synthesize approximate adder trees on Altera and Xilinx FPGAs using their carry chains. The mapper constructs approximate adder trees using an approximate quaternary adder as the fundamental building block. The approximate adder trees are smaller than exact adder trees, allowing more operators to fit into a fixed-area device, trading off arithmetic accuracy for higher throughput.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP2-4, 424NN COMPACTOR: MINIMIZING MEMORY AND LOGIC RESOURCES FOR SMALL NEURAL NETWORKS
Speaker:
Seongmin Hong, Hongik University, KR
Authors:
Seongmin Hong1, Inho Lee1 and Yongjun Park2
1Hongik University, KR; 2Hanyang University, KR
Abstract
Special neural accelerators are an appealing hardware platform for machine learning systems because they provide both high performance and energy efficiency. Although various neural accelerators have recently been introduced, they are difficult to adapt to embedded platforms because current neural accelerators require high memory capacity and bandwidth for the fast preparation of synaptic weights. Embedded platforms are often unable to meet these memory requirements because of their limited resources. In FPGA-based IoT (internet of things) systems, the problem becomes even worse since computation units generated from logic blocks cannot be fully utilized due to the small size of block memory. In order to overcome this problem, we propose a novel dual-track quantization technique to reduce synaptic weight width based on the magnitude of the value while minimizing accuracy loss. In this value-adaptive technique, large and small value weights are quantized differently. In this paper, we present a fully automatic framework called NN Compactor that generates a compact neural accelerator by minimizing the memory requirements of synaptic weights through dual-track quantization and minimizing the logic requirements of PUs with minimum recognition accuracy loss. For the three widely used datasets of MNIST, CNAE-9, and Forest, experimental results demonstrate that our compact neural accelerator achieves an average performance improvement of 6.4x over a baseline embedded system using minimal resources with minimal accuracy loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

4.8 Components for Secure IoT Systems

Date: Tuesday 20 March 2018
Time: 17:00 - 18:30
Location / Room: Exhibition Theatre

Organiser:
Jürgen Haase, edacentrum, DE

In this Exhibition Workshop leading suppliers from the microelectronics industry present their newest technical solutions for designing and securing the IoT systems of the upcoming digital age. By elaborating on their technical approaches and the experience made during the design and in the field, they will provide attendees with valuable advice for the challenges in their own job

TimeLabelPresentation Title
Authors
17:004.8.1SECURING THE INTERNET OF THINGS WITH TI SIMPLELINK PLATFORM
Speaker:
Roger Monk, Texas Instruments Europe, FR
Abstract

With Billions of IoT devices getting connected to the Internet, it is ever more important to make these devices are as secure and robust as possible.  These devices should be protected from running malicious software and it is critical that the sensitive user data that these device handle is kept secret.  These security requirements add significant responsibility to the system-on-chip solutions at their heart.  The SimpleLink Wi-Fi family of devices have been instrumental in enabling small, power-optimised IoT devices leveraging existing Wi-Fi infrastructure.  The first generation CC3100/CC3200 provides secure network socket connectivity to enable secure data connection to remote servers and services.  The next generation of this family, CC3120/CC3220, significantly extends these capabilities by not only offering the latest network security ciphers, but also an advanced security platform to protect system assets for the entire life-cycle of the product, offering customers further confidence and protection.  This presentation aims to detail the challenges of security in today's IoT products and how the architecture of this latest generation of embedded Wi-Fi platform has been designed to efficiently address the technical challenges presented and how these advanced and differentiated security capabilities can be exposed and enabled for all users via a 'simple' SimpleLink API.

17:304.8.2DEVELOPMENT OF A NEAR-THRESHOLD DIGITAL CELL LIBRARY AND A DESIGN FLOW FOR IOT SENSOR SYSTEMS
Speaker:
Jörg Doblaski, X-FAB, DE
Abstract

Optimized digital standard cells which efficiently operate in the near-threshold voltage (NTV) region are one basic enabler for the next generation of smart sensor systems especially of IoT. While a significant reduction of both dynamic power and leakage power is necessary to meet the power requirements of such systems, a reasonable performance still needs to be supported, to enable on-chip pre-processing and analysis of the sensor data.

The presentation will provide an overview about the development of a near-threshold digital library implemented in X-FAB's 0.18 µm Silicon-on-Insulator technology carried out in the framework of the BMBF-funded project RoMulus [1]. A digital ultra-low-power logic library was developed based on the standard CMOS technique, which operates in NTV region with 700 ... 900mV operating voltage at -20 °C ... 85 °C. Additionally supporting cells like level shifters and an NTV I/O pad cell library with ESD protection have been developed. The cells have been implemented with a NTV test chip, using a power-aware digital implementation flow. The test chip has been manufactured and characterized. The test results prove the function of the NTV logic cells in the specified voltage and temperature range and demonstrate the feasibility and possibilities of the development of further NTV logic cells.

17:504.8.3FULL CUSTOM MEMS DESIGN: NEW METHODS FOR THE ANALYSIS OF PARASITIC ELECTROSTATIC EFFECTS
Speaker:
Axel Hald, Robert Bosch GmbH, DE
Abstract

Microelectromechanical systems (MEMS) are widely used in IoT devices. Due to the lack of sophisticated component libraries for MEMS, highly optimized MEMS sensors are currently designed using a polygon-driven design flow. The advantage of this design flow is its accurate mechanical simulation, but it lacks of methods for analyzing the parasitic electrostatic effects arising from the electric coupling between (stationary) wiring and the mechanical structures. For a robust and secure MEMS design, it is necessary to analyze, to optimize and finally to include these parasitics into the MEMS-ASIC co-simulation.
The presentation will provide an overview about the development of new methods for the analysis of parasitic electrostatic effects by a 3D field-solver carried out in the framework of the BMBF-funded project RoMulus. The developed methods include a rule based structure recognition algorithm, which allows the identification of meaningful MEMS sensor parts out of the plain graphical polygon representation of the MEMS layout. The mapping of the extracted RC-values to the recognized elements of the MEMS sensor enables a detailed analysis and optimization of actual MEMS sensors. This method is upgraded by a feature that enables the parasitics arising from in-plane, sensor-structure motion to be extracted quasi-dynamically.

18:104.8.4A SCHMITT-TRIGGER BASED SUB-THRESHOLD DIGITAL CMOS CIRCUIT DESIGN TECHNIQUE FOR ULTRA LOW-VOLTAGE AND ULTRA LOW-POWER APPLICATIONS
Speaker:
Matthias Keller, University of Freiburg, DE
Abstract

Power optimized digital standard cell libraries are a "must have" when it comes to the implementation of power efficient smart sensor systems for IoT applications. In the framework of the BMBF-funded project RoMulus, the Fritz Huettinger Chair of Microelectronics of the University of Freiburg and X-FAB jointly develop near- and sub-threshold digital standard cell libraries for both ultra low-voltage and ultra low-power applications.

 The presentation provides an overview of our research activities on a sub-threshold digital circuit design technique for ultra low-voltage and ultra low-power applications. At first, the fundamentals of the Schmitt-Trigger based design methodology are presented that allow reducing the supply voltage of digital circuits to a few tens of millivolts in a 130nm CMOS technology. Subsequently, the BMBF-funded project RoMulus is considered in which the Schmitt-Trigger based design methodology is applied to the design of a digital standard cell library in an X-FAB 180nm CMOS process. Measurement results as well as an outlook on the next steps in the project conclude the presentation.

18:30End of session
Exhibition Reception in Exhibition Area
The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

UB04 Session 4

Date: Tuesday 20 March 2018
Time: 17:30 - 19:30
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB04.1EWCMS: AN EMBEDDED WALK-CYCLE MONITORING SYSTEM USING BODY AREA COMMUNICATION AND SECURE LOW-POWER DYNAMIC SIGNALING
Authors:
Shahzad Muzaffar1 and Ibrahim (Abe) M. Elfadel2
1Masdar Institute, Khalifa University of Science and Technology, AE; 2Khalifa University of Science and Technology, AE
Abstract
The demo presents a novel ultra-low power, embedded, and wearable walk-cycle monitoring system with applications in areas such as healthcare, robotics, sports medicine, physical therapy, prosthesis, and animal sports. Customized shoes with sensors continuously measure the forces, and an electronic digital assistant is used to analyze the acquired measurements in real time by employing an IMU free and self-synchronizing method in order to estimate weight and study motion patterns. To achieve ultra-low power operation, the human body is used as a communication medium between the sensors and the digital assistant. The single-channel behavior of the human body is accommodated with a novel, simple yet robust single-wire signaling technique, Pulsed-Index Communication (PIC), that significantly reduces the system footprint and overall power consumption as it eliminates the need for clock and data recovery. The system prototype has been rigorously and successfully tested.

More information ...
UB04.2GENERATING FULL-CUSTOM SCHEMATICS IN A MIXED-SIGNAL TOP-DOWN DESIGN FLOW
Authors:
Tobias Markus1, Markus Mueller2 and Ulrich Bruening1
1University of Heidelberg, DE; 2Extoll GmbH, DE
Abstract
Design time is one of the precious assets in the cycle of hardware design. The top down methodology has been used in digital designs very successfully and now we also apply it for analog and mixed signal designs. Generating most of the structures automatically saves time and avoids errors. A Top Down Design Flow for Mixed Signal Designs is used which generates the schematic structure from the system RNM representation. Since the structural verilog part of the system level design will automatically generate the schematic structure it is only the functional part which is missing and has to be implemented by the analog designer. Some often used blocks can be used as an entry point to partially generate parts of the design in the schematic and furthermore even parts of the layout. We will demonstrate this design method with an example project.

More information ...
UB04.3PRIME: PLATFORM- AND APPLICATION-AGNOSTIC RUN-TIME POWER MANAGEMENT OF HETEROGENEOUS EMBEDDED SYSTEMS
Authors:
Domenico Balsamo, Graeme M. Bragg, Charles Leech and Geoff V. Merrett, University of Southampton, GB
Abstract
Increasing energy efficiency and reliability at runtime is a key challenge of heterogeneous many-core systems. We demonstrate how contributions from the PRiME project integrate to enable application- and platform-agnostic runtime management that respects application performance targets. We consider opportunities to enable runtime management across the system stack and we enable cross-layer interactions to trade-off power and reliability with performance and accuracy. We consider a system as three distinct layers, with abstracted communication between them, which enables the direct comparison of different approaches, without requiring specific application or platform knowledge. Application-agnostic runtime management is demonstrated with a selection of runtime managers from PRiME, including linear regression modelling and predictive thermal management, operating across multiple applications. Platform-independent runtime management is demonstrated using two heterogeneous platforms.

More information ...
UB04.4OTPG: SPECIFICATION-BASED CONSTRUCTION OF ONLINE TPGS FOR MICROPROCESSORS
Authors:
Mikhail Chupilko, Alexander Kamkin and Andrei Tatarnikov, ISP RAS, RU
Abstract
This work presents an approach to construction of online test program generators (TPGs). The approach is intended to use specifications of ISA presented in nML/mmuSL specification languages. They are processed by a meta-generator to obtain their binary representations supplied with meta information and a test generation core compatible with the target microprocessor. The test generation core is loaded as a binary image into the target microprocessor's memory (for experiments we're using QEMU for MIPS) and produces test cases to be processed (incl. results checking) by an executor. It should be noticed that the meta-generator and the executor are not obligatory run at the same microprocessor (especially, if it is highly incomplete). The final goal of the project is to propose a method of obtaining online TPGs for a wide range of ISAs, and to develop a mature tool implementing this method.

More information ...
UB04.7T-CREST: THE OPEN-SOURCE REAL-TIME MULTICORE PROCESSOR
Authors:
Martin Schoeberl, Luca Pezzarossa and Jens Sparsø, Technical University of Denmark, DK
Abstract
Future real-time systems, such as advanced control systems or real-time image recognition, need more powerful processors, but still a system where the worst-case execution time (WCET) can be statically predicted. Multicore processors are one answer to the need for more processing power. However, it is still an open research question how to best organize and implement time-predictable communication between processing cores. T-CREST is an open-source multicore processor for research on time-predictable computer architecture. In consists of several Patmos processors connected by various time-predictable communication structures: access to shared off-chip, access to shared on-chip memory, and the Argo network-on-chip for fast inter-processor communication. T-CREST is supported by open-source development tools, such as compilation and WCET analysis. To best of our knowledge, T-CREST is the only fully open-source architecture for research on future real-time multicore architectures.

More information ...
UB04.10EXPERIENCE-BASED AUTOMATION OF ANALOG IC DESIGN
Authors:
Florian Leber and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous circuit synthesis approaches, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches (also known as "generators"): They (a) emulate experts' decisions, thus (b) make expert knowledge re-usable and (c) can consider all relevant aspects and constraints implicitly. Nowadays, generators are successfully applied in analog layout (Pcells, Pycells). We aim at an entire design flow completely based on procedural automation techniques. This flow will consist of procedures for the generation of schematics and layouts for every typical analog circuit class, such as amplifier, bandgap, filter a.s.o. In our presentation we give an overview on such a design flow and we show an approach for capturing an analog circuit designer's strategy as an executable "expert design plan".

More information ...
19:30End of session

Exhibition-Reception Exhibition Reception

Date: Tuesday 20 March 2018
Time: 18:30 - 19:30
Location / Room: Exhibition Area

The Exhibition Reception will take place on Tuesday in the exhibition area, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

TimeLabelPresentation Title
Authors
19:30End of session

5.1 Special Day Session on Future and Emerging Technologies: Challenges for the Design of Microfluidic Devices: EDA for your Lab-on-a-Chip

Date: Wednesday 21 March 2018
Time: 08:30 - 10:00
Location / Room: Saal 2

Chair:
Chakrabarty Krishnendu, Duke University, US

This session introduces experts from design automation to the field of microfluidic devices. Those devices, often also referred to as labs-on-chip, allow for conducting biological, chemical, and/or medical experiments with fluidics on a nano- or even picolitre scale automatically on miniaturized devices. By this, they revolutionized point-of-care diagnostics, chemo-fluidic logic, and more. The speakers in this session will introduce those areas and show how microfluidic devices help here. At the same time, they will cover how design automation can actually advance the further development of this emerging technology and how this can help to broaden the scope of applications for it.

TimeLabelPresentation Title
Authors
08:305.1.1POINT-OF-CARE DIAGNOSTICS 2.0: STANDARDS, DESIGN AUTOMATION, AND CONSUMER ELECTRONICS FOR THE NEXT GENERATION OF DIAGNOSTIC DEVICES
Author:
Emmanuel Delamarche, IBM Research, CH
Abstract
Diagnostics are ubiquitous in healthcare because they support prevention, diagnosis and treatment of diseases. Specifically, point-of-care diagnostics are particularly attractive for identifying diseases near patients, quickly, and in many settings and scenarios. One of our contribution to the field of microfluidics is the development of capillary-driven microfluidic chips for highly miniaturized immunoassays. In this presentation, I will review how to program capillary flow and encode specific functions to form microfluidic elements that can easily be assembled into self-powered devices for immunoassays, reaching unprecedented levels of precision for manipulating samples and reagents. I will also reflect on the fragmented approaches that our community has in developing microfluidic-based diagnostics, which is exacerbated by the fragmented nature of the in vitro diagnostic market. Standards, design automation, consumer electronic components, and smartphones may play a critical role in helping to rationalize our development and utilization of the next generation of point-of-care diagnostic devices.
09:005.1.2DESIGN AUTOMATION IN MICROFLUIDICS: AN EXPERIMENTALIST'S PERSPECTIVE
Author:
William H. Grover, University of California, Riverside, US
Abstract
The field of microfluidic lab-on-a-chip devices is nearly 40 years old. In that time, microfluidic technologies have found several important applications, primarily in bioscience research and healthcare. However, the overall spread of microfluidic technologies has been remarkably slow, especially into important fields like agriculture and environmental monitoring. Why is this? In this talk I will give my answers to this question, based on perspectives gained during 20 years of research in experimental microfluidics. I will share my lab's approaches at automating the design of microfluidic chips, including using randomly-designed chips and databases of chip simulations to automatically find custom designs for given applications. I'll also explain why chip design is only half the problem in microfluidics, and why the other half of the problem—chip fabrication—also deserves our attention. Finally, I will argue that the biggest source of pain for creators of microfluidic chips isn't the chip itself, but the off-chip equipment needed to control the chip, and I will share recent work by my group and others aimed at reducing or eliminating off-chip equipment by integrating its control functions onto the chip itself using "pneumatic logic."
09:305.1.3CHEMOFLUIDICS: PROSPECTS AND CHALLENGES
Author:
Andreas Richter, Technische Universität Dresden, DE
Abstract
The lab-on-a-chip (LoC) concept was originally intended to introduce the irresistible and steady increasing performance of integrated circuit (IC) technology in microfluidics to automate and to parallelize the processes of complete chemical and biochemical laboratories on a single tiny microchip - a ground-breaking concept especially for life science and analytics. However, LoCs have not yet evolved revolutionary impact in microfluidics as known from microelectronics in information technology. The indisputable reason for that is the lack of scalability of the existing microfluidic platforms, especially the predominating microelectromechanical systems. The perspective solution of the basic challenge of scalability is hoped for the logic microfluidics, which is already named as "second breath of microfluidics" with revolutionary potential. The basic idea of this concept is to drastically reduce the effort for the off-chip control by integration of parts of the chip program by valve-based circuits. Currently there are two major approaches for. The micropneumatic logic microfluidics is technologically based on the multi-layer soft lithography and requires pressure-switchable valves, which derivate from pneumatic membrane valves. The second approach is the concept of chemo-fluidic logic microfluidics, the so-called chemofluidics. It is based on components with decision-making functionality for chemical information, the chemo-fluidic switches, chemostats and transistors and allows to fabricate integrated circuits by patterning of active material layers as known from microelectronics. Because of the chemical feedback the chip control is placed directly on the chip and exclusively controlled by chemical and other events. Since the components of the chip take their energy needed directly from the fluidic chemicals the chemo-fluidic logic chips are fully autonomous micro-systems processing their chip programs automatically and self-powered. A first set of hardware instructions including combinatorial logic gates, RS-flipflops and oscillators principally enabling to realise each more complex function is demonstrated. However, the concept of chemofluidics is still in its infancy. Many challenges on the levels of technology, design and system still await resolution.
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

5.2 Smart Energy and Automotive Systems

Date: Wednesday 21 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 6

Chair:
Sebastian Steinhorst, Technical University of Munich, DE

Co-Chair:
Lulu Chan, NXP Semiconductors, NL

This session presents the latest advancements in battery and photovoltaic system management and optimization, as well as novel approaches towards efficient environmental mapping for autonomous driving and cloud-connected vehicles.

TimeLabelPresentation Title
Authors
08:305.2.1SOH-AWARE ACTIVE CELL BALANCING STRATEGY FOR HIGH POWER BATTERY PACKS
Speaker:
Alma Proebstl, Technical University of Munich, DE
Authors:
Alma Proebstl1, Sangyoung Park1, Swaminathan Narayanaswamy2, Sebastian Steinhorst1 and Samarjit Chakraborty1
1Technical University of Munich, DE; 2TUM CREATE, SG
Abstract
Short drive range due to limited battery capacity and high battery depreciation costs persist to be the main deterrents to the wide adoption of Electric Vehicles (EVs). High power battery packs consisting of a large number of battery cells require extensive management, such as State of Charge (SOC) balancing and thermal management, in order to keep the operating conditions within a safe range. In this paper, we propose a novel State of Health (SOH)-aware active cell balancing technique, which is capable of extending the cycle life of the whole battery pack. In contrast to the state-of-the-art active cell balancing techniques, the proposed technique allows cells to have different SOC values such that aging is mitigated when an EV trip does not require the full capacity. Based on the observation that preferring cells with higher SOH over cells with lower SOH extends cycle life, the technique identifies the charge transfers between cells that would benefit the most. We find that with our proposed scheme, aging could be mitigated by up to 23.5% over passive cell balancing and 17.6% over active SOC cell balancing.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.2.2(Best Paper Award Candidate)
GIS-BASED OPTIMAL PHOTOVOLTAIC PANEL FLOORPLANNING FOR RESIDENTIAL INSTALLATIONS
Speaker:
Sara Vinco, Politecnico di Torino, IT
Authors:
Sara Vinco, Lorenzo Bottaccioli, Edoardo Patti, Andrea Acquaviva, Enrico Macii and Massimo Poncino, Politecnico di Torino, IT
Abstract
Shading is a crucial issue for the placement of PV installations, as it heavily impacts power production and the corresponding return of investment. Nonetheless, residential rooftop installations still rely on rule-of-thumb criteria and on gross estimates of the shading patterns, while more optimized approaches focus solely on the identification of suitable surfaces (e.g., roofs) in a larger geographic area (e.g., city or district). This work addresses the challenge of identifying an optimal (with respect to the overall energy production) placement of PV panels on a roof. The novel aspect of the proposed solution lies in the possibility of having a sparse, irregular placement of individual modules so as to better exploit the variance of solar data. The latter are represented in terms of the distribution of irradiance and temperature values over the roof, as elaborated from historical traces and Geographical Information System (GIS) data. Experimental results will prove the effectiveness of the algorithm through three real world case studies, and that the generated optimal solutions allow to increase power production by up to 28% with respect to rule-of-thumb solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.2.3CELL-BASED UPDATE ALGORITHM FOR OCCUPANCY GRID MAPS AND HYBRID MAP FOR ADAS ON EMBEDDED GPUS
Speaker:
Jörg Fickenscher, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), DE
Authors:
Jörg Fickenscher1, Jens Schlumberger1, Frank Hannig1, Mohamed Essayed Bouzouraa2 and Jürgen Teich1
1Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), DE; 2Concept Development Automated Driving, AUDI AG, DE
Abstract
Advanced Driver Assistance Systems (ADASs), such as autonomous driving, require the continuous computation and update of detailed environment maps. Today's standard processors in automotive Electronic Control Units (ECUs) struggle to provide enough computing power for those tasks. Here, new architectures, like Graphics Processing Units (GPUs) might be a promising accelerator candidate for ECUs. Current algorithms have to be adapted to these new architectures when possible, or new algorithms have to be designed to take advantage of these architectures. In this paper, we propose a novel parallel update algorithm, called cell-based update algorithm for occupancy grid maps, which exploits the highly parallel architecture of GPUs and overcomes the shortcomings of previous implementations based on the Bresenham algorithm on such architectures. A second contribution is a new hybrid map, which takes the advantages of the classic occupancy grid map and reduces the computational effort of those. All algorithms are parallelized and implemented on a discrete GPU as well as on an embedded GPU (Nvidia Tegra K1 Jetson board). Compared with the stateof- the-art Bresenham algorithm as used in the case of occupancy grid maps, our parallelized cell-based update algorithm and our proposed hybrid map approach achieve speedups of up to 2.5 and 4.5, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-5, 203IMPROVING FAST CHARGING EFFICIENCY OF RECONFIGURABLE BATTERY PACKS
Speaker:
Alexander Lamprecht, TUM CREATE, SG
Authors:
Alexander Lamprecht1, Swaminathan Narayanaswamy1 and Sebastian Steinhorst2
1TUM CREATE, SG; 2Technical University of Munich, DE
Abstract
Recently, reconfigurable battery packs that can dynamically modify the electrical connection topology of their individual cells are gaining importance. While several circuit architectures and management algorithms are proposed in the literature, the electrical characteristics of the reconfiguration circuit architectures are not sufficiently studied so far. In this paper, we derive a detailed analytical model for a state-of-the-art reconfiguration architecture capturing the losses introduced by the parasitic resistances of the circuit components. For the first time, we propose a novel fast charging strategy using the reconfiguration architecture that significantly reduces the power losses in comparison to conventional battery packs. Moreover, using the analytical model, we highlight the challenges faced by existing reconfiguration architectures using state-of-the-art components and we derive the specifications for the switches which are essential for improving the energy efficiency of such reconfigurable battery packs.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-6, 816CLOUD-ASSISTED CONTROL OF GROUND VEHICLES USING ADAPTIVE COMPUTATION OFFLOADING TECHNIQUES
Speaker:
Soheil Samii, General Motors R&D, Warren, MI 48090, US
Authors:
Arun Adiththan1, Ramesh S2 and Soheil Samii2
1City University of New York, US; 2General Motors R&D, US
Abstract
The existing approaches to design efficient safety-critical control applications is constrained by limited in-vehicle sensing and computational capabilities. In the context of automated driving, we argue that there is a need to leverage resources "out-of-the-vehicle" to meet the sensing and powerful processing requirements of sophisticated algorithms (e.g., deep neural networks). To realize the need, a suitable computation offloading technique that meets the vehicle safety and stability requirements, even in the presence of unreliable communication network, has to be identified. In this work, we propose an adaptive offloading technique for control computations into the cloud. The proposed approach considers both current network conditions and control application requirements to determine the feasibility of leveraging remote computation and storage resources. As a case study, we describe a cloud-based path following controller application that leverages crowdsensed data for path planning.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

5.3 Heterogeneous multi-level caching

Date: Wednesday 21 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 1

Chair:
Jeronimo Castrillon, Technische Universität Dresden, DE

Co-Chair:
Lei Ju, Shandong University, CN

This session discusses different aspects and optimization for multi-level caching, involving various aspects of non-volatile memory technologies and embedded systems. The first paper proposes a novel management of last-level-cache for PCM-based main memory. The next paper presents a multi-level cache architecture with time-randomized behaviour, amenable for WCET analysis. The last paper explores the trade-off between retention time, performance, and energy for STT-RAM-based caches.

TimeLabelPresentation Title
Authors
08:305.3.1WALL: A WRITEBACK-AWARE LLC MANAGEMENT FOR PCM-BASED MAIN MEMORY SYSTEMS
Speaker:
Bahareh Pourshirazi, University of Illinois at Chicago, US
Authors:
Bahareh Pourshirazi1, Majed Valad Beigi2, Zhichun Zhu1 and Gokhan Memik2
1University of Illinois at Chicago, US; 2Northwestern University, US
Abstract
In this paper, we propose WALL, a novel writeback-aware LLC management scheme to reduce the number of LLC writebacks and consequently improve performance, energy efficiency, and lifetime of a PCM-based main memory system. First, we investigate the writeback behavior of LLC sets and show that writebacks are not uniformly distributed among sets; some sets observe much higher writeback rates than others. We then propose a writeback-aware set-balancing mechanism, which employs the underutilized LLC sets with few writebacks as an auxiliary storage for storing the evicted dirty lines of sets with frequent writebacks. We also propose a simple and effective writeback-aware replacement policy to avoid the eviction of the writeback blocks that are highly reused after being evicted from the cache. Our experimental results show that WALL achieves an average of 26.6% reduction in the total number of LLC writebacks, compared to the baseline scheme, which uses the LRU replacement policy. As a result, WALL can reduce the memory energy consumption by 19.2% and enhance PCM lifetime by 1.25×, on average, on an 8-core system with a 4GB PCM main memory, running memory intensive applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.3.2DESIGN AND INTEGRATION OF HIERARCHICAL-PLACEMENT MULTI-LEVEL CACHES FOR REAL-TIME SYSTEMS
Speaker:
Pedro Benedicte, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Pedro Benedicte1, Carles Hernandez2, Jaume Abella3 and Francisco Cazorla4
1Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center, ES; 3Barcelona Supercomputing Center (BSC-CNS), ES; 4Barcelona Supercomputing Center and IIIA-CSIC, ES
Abstract
Enabling timing analysis in the presence of caches has been pursued by the real-time embedded systems (RTES) community for years due to cache's huge potential to reduce software's worst-case execution time (WCET). However, caches heavily complicate timing analysis due to hard-to-predict access patterns, with few works dealing with time analyzability of multi-level cache hierarchies. For measurement-based timing analysis (MBTA) techniques - widely used in domains such as avionics, automotive, and rail - we propose several cache hierarchies amenable to MBTA. We focus on a probabilistic variant of MBTA (or MBPTA) that requires caches with time-randomized behavior whose execution time variability can be captured in the measurements taken during system's test runs. For this type of caches, we explore and propose different multi-level cache setups. From those, we choose a cost-effective cache hierarchy that we implement and integrate in a 4-core LEON3 RTL processor model and prototype in a FPGA. Our results show that our proposed setup implemented in RTL results in better (reduced) WCET estimates with similar implementation cost and no impact on average performance w.r.t. other MBPTA-amenable setups.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.3.3LARS: LOGICALLY ADAPTABLE RETENTION TIME STT-RAM CACHE FOR EMBEDDED SYSTEMS
Speaker:
Tosiron Adegbija, University of Arizona, US
Authors:
Kyle Kuan and Tosiron Adegbija, University of Arizona, US
Abstract
STT-RAMs have been studied as a promising alternative to SRAMs in embedded systems' caches and main memories. STT-RAMs are attractive due to their low leakage power and high density; STT-RAMs, however, also have drawbacks of long write latency and high dynamic write energy. A popular solution to this drawback relaxes the retention time to lower both write latency and energy, and uses a dynamic refresh scheme that refreshes data blocks to prevent them from prematurely expiring. However, the refreshes can incur overheads, thus limiting optimization potential. In addition, this solution only provides a single retention time, and cannot adapt to applications' variable retention time requirements. In this paper, we propose LARS (Logically Adaptable Retention Time STT-RAM) cache as a viable alternative for reducing the write energy and latency. LARS cache comprises of multiple STT-RAM units with different retention times, with only one unit on at a given time. LARS dynamically determines which STT-RAM unit to power on during runtime, based on executing applications' needs. Our experiments show that LARS cache is low-overhead, and can reduce the average energy and latency by 35.8% and 13.2%, respectively, as compared to the dynamic refresh scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-7, 789FUSIONCACHE: USING LLC TAGS FOR DRAM CACHE
Speaker:
Evangelos Vasilakis, Chalmers University of Technology, SE
Authors:
Evangelos Vasilakis1, Vassilis Papaefstathiou2, Pedro Trancoso1 and Ioannis Sourdis1
1Chalmers University of Technology, SE; 2FORTH-ICS, GR
Abstract
DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stacked DRAM. Although they can capture the spatial and temporal data locality of applications, their access latency is still substantially higher than conventional on-chip SRAM caches. Moreover, their tag access latency and storage overheads are excessive. Storing tags for a large DRAM cache in SRAM is impractical as it would occupy a significant fraction of the processor chip. Storing them in the DRAM itself incurs high access overheads. Attempting to cache the DRAM tags on the processor adds a constant delay to the access time. In this paper, we introduce FusionCache, a DRAM cache that offers more efficient tag accesses by fusing DRAM cache tags with the tags of the on-chip Last Level Cache (LLC). We observe that, in an inclusive cache model where the DRAM cachelines are multiples of on-chip SRAM cachelines, LLC tags could be re-purposed to access a large part of the DRAM cache contents. Then, accessing DRAM cache tags incurs zero additional latency in the common case.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

5.4 Special Session: Lightweight Security for Resources-Constrained Internet-of-Things Applications

Date: Wednesday 21 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 2

Chair:
Halak Basel, Southampton University, GB

Co-Chair:
Jin Yier, University of Florida, US

This special sessions includes four papers: the first paper addresses the first question, it presents a lightweight cryptographic primitive based on physical unclonable functions, the second and third papers tackle the second and the third questions. They present two security protocols, for authentication and attestation respectively, which are specifically developed for resources-constrained IoT platforms. The forth paper addresses the last challenge, it presents a solution which exploits existing on-chip hardware structure to detect abnormal and suspicious behaviours of an embedded system.

TimeLabelPresentation Title
Authors
08:305.4.1COST EFFICIENT DESIGN OF MODELLING ATTACKS-RESISTANT PHYSICAL UNCLONABLE FUNCTIONS
Speaker:
Basel Halak, Southampton University, GB
Authors:
Mohd Syafiq Mispan1, Haibo Su1, Mark Zwolinski2 and Basel Halak3
1Electronics and Computer Science Department, Southampton University, GB; 2University of Southampton, GB; 3Southampton University, GB
Abstract
Physical Unclonable Functions (PUFs) exploit the intrinsic manufacturing process variations to generate a unique signature for each silicon chip; this technology allows building lightweight cryptographic primitive suitable for resource-constrained devices. However, the vast majority of existing PUF design is susceptible to modeling attacks using machine learning technique, this means it is possible for an adversary to build a mathematical clone of the PUF that have the same challenge/response behavior of the device. Existing approaches to solve this problem include the use of hash functions, which can be prohibitively expensive and render PUF technology as the suitable candidate for lightweight security. This work presents a challenge permutation and substitution techniques which are both area and energy efficient. We implemented two examples of the proposed solution in 65-nm CMOS technology, the first using a delay-based structure design (an Arbiter-PUF), and the second using sub-threshold current design (two-choose-one PUF or TCO-PUF). The resiliency of both architectures against modeling attacks is tested using an artificial neural network machine learning algorithm. The experiment results show that it is possible to reduce the predictability of PUFs to less than 70% and a fractional area and power costs compared to existing hash function approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
08:525.4.2DEVICE ATTESTATION: PAST, PRESENT, AND FUTURE
Speaker:
Yier Jin, University of Florida, US
Authors:
Orlando Arias1, Dean Sullivan1, Fahim Rahman2, Mark M. Teranipoor2 and Yier Jin2
1University of Central Florida, US; 2University of Florida, US
Abstract
In recent years we have seen a rise in popularity of networked devices. From traffic signals in a city's busiest intersection and energy metering appliances, to internet-connected security cameras, these embedded devices have become entrenched in everyday life. As a consequence, a need to ensure secure and reliable operation of these devices has also risen. Device attestation is a promising solution to the operational demands of embedded devices, especially those widely used in Internet of Things and Cyber-Physical System. In this paper, we summarize the basics of device attestation. We then present a summary of attestation approaches by classifying them based on their functionality and reliability guarantees they provide to networked devices. Lastly, we discuss the limitations and potential issues current mechanisms exhibit and propose new research directions.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:145.4.3A RECONFIGURABLE SCAN NETWORK BASED IC IDENTIFICATION FOR EMBEDDED DEVICES
Speaker:
Omid Aramoon, University of Maryland, US
Authors:
Omid Aramoon1, Xi Chen1 and Gang Qu2
1University of Maryland, US; 2Univ. of Maryland, College Park, US
Abstract
Most of the Internet of Things (IoT) and embedded devices are resource constrained, making it impractical to secure them with the traditional computationally expensive crypto-based solutions. However, security and privacy are crucial in many IoT applications such as health monitoring. In this paper, we consider one of the most fundamental security problems: how to identify and authenticate an embedded device. We consider the fact that embedded devices are designed by reusing IP cores with reconfigurable scan network (RSN) as the standard testing facility and propose to generate unique integrated circuit (IC) identifications (IDs) based on different configurations for the RSN. These circuit IDs not only solve the IC and device identification and authentication problems, they can also be considered as a lightweight security primitive in other applications such as IC metering and IP fingerprinting. We demonstrate through the ITC'02 benchmarks that the proposed approach can easily create from 107 to 10186 unique IDs without any overhead. Finally, our method complies with the IEEE standards and thus has high practical value.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:365.4.4EARLY DETECTION OF SYSTEM-LEVEL ANOMALOUS BEHAVIOUR USING HARDWARE PERFORMANCE COUNTERS
Speaker:
Mark Zwolinski, University of Southampton, GB
Authors:
Lai Leng Woo1, Basel Halak2 and Mark Zwolinski3
1Electronics and Computer Science Department, Southampton University, GB; 2Southampton University, GB; 3University of Southampton, GB
Abstract
Embedded systems suffer from reliability issues such as variations in temperature and voltage, single event effects and component degradation, as well as being exposed to various security attacks such as control hijacking, malware, reverse engineering, eavesdropping and many others. Both reliability problems and security attacks can cause the system to behave anomalously. In this paper, we will present a detection technique that is able to detect a change in the system before the system encounters a failure, by using data from Hardware Performance Counters (HPCs). Previously, we have shown how HPC data can be used to create an execution profile of a system based on measured events and any deviation from this profile indicates an anomaly has occurred in the system. The first step in developing a detector is to analyse the HPC data and extract the features from the collected data to build a forecasting model. Anomalies are assumed to happen if the observed value falls outside a given confidence interval, which is calculated based on the forecast values and prediction confidence. The detector is designed to provide a warning to the user if anomalies that are detected occur consecutively for a certain number of times. We evaluate our detection algorithm on benchmarks that are affected by single bit flip faults. Our initial results show that the detection algorithm is suitable for use for this kind of univariate time series data and is able to correctly identify anomalous data from normal data.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

5.5 Emerging Technologies for Future Computing

Date: Wednesday 21 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 3

Chair:
Aida Todri-Sanial, CNRS, FR

Co-Chair:
Mariagrazia Graziano, Politecnico di Torino, IT

A wide overview of emerging technologies to enable novel computing paradigms. The session covers topics from carbon nanotube thin film transistors for flexible electronics, novel 3D interconnects using inductive coupling links, physical design of quantum cellular automata, and improving reliability of quantum logic cell implementation.

TimeLabelPresentation Title
Authors
08:305.5.1(Best Paper Award Candidate)
COMPACT MODELING OF CARBON NANOTUBE THIN FILM TRANSISTORS FOR FLEXIBLE CIRCUIT DESIGN
Speaker:
Leilai Shao, University of California, Santa Barbara, US
Authors:
Leilai Shao1, Tsung-Ching Huang2, Ting Lei3, Zhenan Bao3, Raymond Beausoleil4 and Tim Cheng5
1University of California Santa Barbara, US; 2Hewlett Packard Labs, US; 3Stanford University, US; 4HPE Labs, US; 5HKUST, HK
Abstract
Carbon nanotube thin film transistor (CNT-TFT) is a promising candidate for flexible electronics, because of its high carrier mobility and great mechanical flexibility. An accurate and trustworthy device model for CNT-TFTs, however, is still missing. In this paper, we present a SPICE-compatible compact model for CNT-TFT circuit simulation and validate the proposed model based on fabricated CNT-TFTs and Pseudo-CMOS circuits. The proposed CNT-TFT model enables circuit designers to explore design space by adjusting device parameters, supply voltages and transistor sizes to optimize the noise margin (NM) and power-delay product (PDP), which are the key merits for larger scale CNT-TFT circuits. We further propose a design framework to effectively optimize the NM and PDP to facilitate greater automation of flexible circuit design based on CNT-TFTs.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.5.2A HIGH-SPEED DESIGN METHODOLOGY FOR INDUCTIVE COUPLING LINKS IN 3D-ICS
Speaker:
Benjamin Fletcher, University of Southampton, GB
Authors:
Benjamin Fletcher1, Shidhartha Das2 and Terrence Mak1
1University of Southampton, GB; 2ARM Ltd., GB
Abstract
Inductive coupling links (ICLs) are gaining traction as an alternative to through silicon vias (TSVs) for 3D integration, promising high-bandwidth connectivity without the inflated fabrication costs associated with TSV-enabled processes. For power-efficient ICL design, optimisation of the utilized physical inductor geometries is essential, however typically necessitates the use of finite element analysis (FEA) in addition to manual parameter fitting, a process that can take several hours even for a single geometry. As a result, the generation of optimised inductor designs poses a significant challenge. In this paper, we address this challenge, presenting a CAD-tool for Optimisation of Inductive coupling Links for 3D-ICs (COIL-3D). COIL-3D uses a rapid solver based upon semi-empirical expressions to quickly and accurately characterise a given link, in conjunction with a high-speed refined optimisation flow to find optimal inductor geometries for use in ICL-based 3D-ICs. The proposed solver achieves an average accuracy within 9.1% of commercial FEA software tools, and the proposed optimisation flow reduces the search time by 26 orders of magnitude. This work unlocks new potential for power-efficient 3D integration using inductive coupling links.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.5.3AN EXACT METHOD FOR DESIGN EXPLORATION OF QUANTUM-DOT CELLULAR AUTOMATA
Speaker:
Marcel Walter, University of Bremen, DE
Authors:
Marcel Walter1, Robert Wille2, Daniel Grosse3, Frank Sill Torres4 and Rolf Drechsler3
1University of Bremen, DE; 2Johannes Kepler University Linz, AT; 3University of Bremen/DFKI GmbH, DE; 4Federal University of Minas Gerais, BR
Abstract
Quantum-dot Cellular Automata (QCA) are an emerging computation technology in which basic states are represented by nanosize particles and logic operations are conducted through corresponding effects such as Coulomb interaction. This allows to overcome physical boundaries of conventional solutions such as CMOS and, hence, constitutes a promising direction for future computing devices. Despite these promises, however, the development of (automatic) design methods for QCAs is still in its infancy. In fact, QCA circuits are mainly designed manually thus far and only few heuristics are available. This frequently leads to unsatisfactory results and generally makes it hard to evaluate the quality of respective QCA designs. In this work, we propose an exact solution for the design of QCA circuits that can be configured e. g. to generate circuits that satisfy certain design objectives and/or physical constraints. For the first time, this allows for design exploration of QCA circuits. Experimental evaluations and case studies demonstrate the benefit of the proposed solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:455.5.4ACCURATE MARGIN CALCULATION FOR SINGLE FLUX QUANTUM LOGIC CELLS
Speaker:
Massoud Pedram, University of Southern California, US
Authors:
Soheil Nazar Shahsavani, Bo Zhang and Massoud Pedram, University of Southern California, US
Abstract
This paper presents a novel method for accurate margin calculation of single flux quantum (SFQ) logic cells in a superconducting electronic circuit. The proposed method can be utilized as a figure of merit to estimate the robustness of a logic cell without the need for expensive Monte-Carlo simulations. This is achieved through efficient state-space exploration of all parameters in the cell structure. Using proposed approach, distinct parameter dispersion (DPD) based yield of SFQ cells increases by 55% on average, compared with state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-8, 858IMPROVED SYNTHESIS OF CLIFFORD+T QUANTUM FUNCTIONALITY
Speaker:
Philipp Niemann, German Research Center for Articial Intelligence (DFKI GmbH), DE
Authors:
Philipp Niemann1, Robert Wille2 and Rolf Drechsler3
1Cyber-Physical Systems, DFKI GmbH, DE; 2Johannes Kepler University Linz, AT; 3University of Bremen/DFKI GmbH, DE
Abstract
The Clifford+T library provides robust and fault-tolerant realizations for quantum computations. Consequently, (logic) synthesis of Clifford+T quantum circuits became an important research problem. However, previously proposed solutions are either only applicable to very small quantum systems or lead to circuits that are far from being optimal—mainly caused by a local, i.e. column-wise, consideration of the underlying transformation matrix to be synthesized. In this paper, we suggest an improved approach that considers the matrix globally and, by this, overcomes many of these drawbacks. Preliminary evaluations show the promises of this direction.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-9, 462ENERGY-EFFICIENT CHANNEL ALIGNMENT OF DWDM SILICON PHOTONIC TRANSCEIVERS
Speaker:
Yuyang Wang, University of California, Santa Barbara, US
Authors:
Yuyang Wang1, M. Ashkan Seyedi2, Rui Wu1, Jared Hulme2, Marco Fiorentino2, Raymond G. Beausoleil2 and Kwang-Ting Cheng3
1University of California, Santa Barbara, US; 2Hewlett Packard Labs, US; 3Hong Kong University of Science and Technology, HK
Abstract
The comb laser-driven microring-based dense wavelength division multiplexing silicon photonics is a promising candidate for next-generation optical interconnects. However, existing solutions for exploring the power-performance trade-off of such systems have been restricted to a limited design space, resulting from the unnecessary constraints of using an identical spacing for laser comb lines and microring channels, and of utilizing consecutive laser comb lines for data transmission. We propose an energy-efficient channel alignment scheme that aligns the microring channels to a subset of laser comb lines that are non-uniformly distributed in the free spectrum range of the microrings. Based on a well-established process variation model, our simulations show that the proposed scheme significantly reduces the microring tuning power in the presence of denser comb lines. The power saved from microring tuning can improve the overall system energy efficiency despite some power wasted in unused laser comb lines. We further conducted a case study for design space exploration using the proposed channel alignment scheme, seeking the most energy-efficient configuration in order to achieve a target aggregated data rate.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:02IP2-10, 267A PHYSICAL SYNTHESIS FLOW FOR EARLY TECHNOLOGY EVALUATION OF SILICON NANOWIRE BASED RECONFIGURABLE FETS
Speaker:
Shubham Rai, Chair For Processor Design, CFAED, Technische Universität Dresden, Dresden, DE
Authors:
Shubham Rai1, Ansh Rupani2, Dennis Walter1, Michael Raitza1, André Heinzig3, Christian Mayr1, Walter Weber4 and Akash Kumar1
1Technische Universität Dresden, DE; 2Birla Institute of Technology and Science Pilani, Hyderabad Campus, IN; 3NaMLab GmbH, DE; 4NaMLab gGmbH and CfAED, DE
Abstract
Silicon Nanowire based reconfigurable transistors (RFETs) provide an additional gate terminal called the program gate which gives the freedom of programming p-type or n-type functionality for the same device at runtime. This enables the circuit designers to pack more functionality per computational unit. This saves processing costs as only one device type is required. No doping and associated lithography steps are needed for this technology. In this paper, we present a complete design flow including both logic and physical synthesis for circuits based on SiNW RFETs. We propose layouts of logic gates, Liberty and LEF (library extension format) files for the physical synthesis flow and make these available under an open source license to enable further research in the domain of these novel, functionally enhanced transistors. We develop a table model based on a transistor cell with relaxed dimensions following an SOI-based 22 nm technology having a gate pitch of 110 nm and modeled our logic gates on dual gate RFETs. For the sake of comparison, we use the same tool flow for CMOS. We show that in the first of its kind comparison, for these fully symmetrical reconfigurable transistors, the area after placement and routing for SiNW based circuits is 17% more than that of CMOS for MCNC benchmark. Further, we discuss areas of improvement for obtaining better area results from the silicon nanowire based RFETs from a fabrication and technology point of view. The future use of self-aligned techniques to structure two independent gates within a smaller pitch holds the promise of substantial area reduction.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

5.6 Reliability improvement and evaluation techniques

Date: Wednesday 21 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 4

Chair:
Stefano Di Carlo, Politecnico di Torino, IT

Co-Chair:
Vasileios Tenentes, University of Southampton, GB

This session introduces reliability improvement approaches using dynamic recovery, redundant multithreading, aging mitigation and optimization of metastability effects, spanning from the system to the circuit layer. Also, cross-layer resilience evaluation via fault injection for complex microprocessors is presented.

TimeLabelPresentation Title
Authors
08:305.6.1IMPROVING RELIABILITY FOR REAL-TIME SYSTEMS THROUGH DYNAMIC RECOVERY
Speaker:
Yue Ma, University of Notre Dame, US
Authors:
Yue Ma1, Tam Chantem2, Robert P. Dick3 and Xiaobo Sharon Hu1
1University of Notre Dame, US; 2Virginia Tech, US; 3University of Michigan, US
Abstract
Technology scaling has increased concerns about transient faults due to soft errors and permanent faults due to lifetime wear processes. Although researchers have investigated related problems, they have either considered only one of the two reliability concerns or presented simple recovery allocation algorithms that cannot effectively use available time slack to improve soft-error reliability. This paper introduces a framework for improving soft-error reliability while satisfying lifetime reliability and real-time constraints. We present a dynamic recovery allocation technique that guarantees to recover any failed task if the remaining slack is adequate. Based on this technique, we propose two scheduling algorithms for task sets with different characteristics to improve system-level soft-error reliability. Lifetime reliability requirements are satisfied by reducing core frequencies for appropriate tasks, thereby reducing wear due to temperature and thermal cycling. Simulation results show that the proposed framework reduces the probability of failure by at least 8% and 73% on average compared to existing approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.6.2OPTIMAL METASTABILITY-CONTAINING SORTING NETWORKS
Speaker:
Johannes Bund, Saarland University, DE
Authors:
Johannes Bund1, Christoph Lenzen2 and Moti Medina3
1Saarland University, Saarland Informatics Campus, DE; 2Max Planck Institute for Informatics, Saarland Informatics Campus, DE; 3From 1/10/2017 in The Department of Electrical and Computer Engineering Ben-Gurion University, IL
Abstract
When setup/hold times of bistable elements are violated, they may become metastable, i.e., enter a transient state that is neither digital 0 nor 1 [Marino 81]. In general, metastability cannot be avoided, a problem that manifests whenever taking discrete measurements of analog values. Metastability of the output then reflects uncertainty as to whether a measurement should be rounded up or down to the next possible integral measurement outcome. Surprisingly, Lenzen & Medina (ASYNC 2016) showed that metastability can be contained, i.e., measurement values can be correctly sorted without resolving metastability first. However, both their work and the state of the art by Bund et al. (DATE 2017) leave open whether such a solution can be as small and fast as standard sorting networks. We show that this is indeed possible, by providing a circuit that sorts Gray code inputs (possibly containing a metastable bit) and has asymptotically optimal depth and size. Concretely, for 10-channel sorting networks and 16-bit wide inputs, we improve by 48.46% in delay and by 71.58% in area over Bund et al. Our simulations indicate that straightforward transistor-level optimization is likely to result in performance on par with standard (non-containing) solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.6.3MAUI: MAKING AGING USEFUL, INTENTIONALLY
Speaker:
Shou-Chun Li, Department of Computer Science, National Chiao Tung University, TW
Authors:
Kai-Chiang Wu1, Tien-Hung Tseng2 and Shou-Chun Li1
1National Chiao Tung University, TW; 2National Chiao Tung University, Taiwan, TW
Abstract
Device aging, which causes significant loss on circuit performance and lifetime, has been a primary factor in reliability degradation of nanoscale designs. In this paper, we propose to take advantage of aging-induced clock skews (i.e., make them useful for aging tolerance) by manipulating these time-varying skews to compensate for the performance degradation of logic networks. The goal is to assign achievable/reasonable aging-induced clock skews in a circuit, such that its overall performance degradation due to aging can be minimized, that is, the lifespan can be maximized. On average, 25% aging tolerance can be achieved with insignificant design overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:455.6.4EXPERT: EFFECTIVE AND FLEXIBLE ERROR PROTECTION BY REDUNDANT MULTITHREADING
Speaker:
HwiSoo So, Yonsei University, KR
Authors:
HwiSoo So1, Moslem Didehban2, Yohan Ko1, Aviral Shrivastava2 and Kyoungwoo Lee1
1Yonsei University, KR; 2Arizona State University, US
Abstract
Resiliency is a first-order design concern in modern microprocessor design. Compiler-level Redundant MultiThreading (RMT) schemes are promising because of their capability to detect the manifestation of hardware transient and permanent faults. In this work, we propose EXPERT, a compiler-level RMT scheme which can detect the manifestation of hardware faults in all hardware components. EXPERT transformation generates a checker thread for program main execution thread. These redundant threads execute simultaneously on two physically different cores of a multi-core processor. They perform mostly same computations, however, after each memory write operation committed by the main thread, the checker thread loads back the written data from the memory and checks it against its own locally computed values. If they match, execution continues. Otherwise, the error flag will be raised. Our processor-wide statistical transient and permanent fault injection experiments show that EXPERT error coverage is ~65 better than the state-of-the-art scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-11, 579ETISS-ML: A MULTI-LEVEL INSTRUCTION SET SIMULATOR WITH RTL-LEVEL FAULT INJECTION SUPPORT FOR THE EVALUATION OF CROSS-LAYER RESILIENCY TECHNIQUES
Speaker:
Martin Dittrich, Technical University of Munich, DE
Authors:
Daniel Mueller-Gritschneder1, Martin Dittrich1, Josef Weinzierl1, Eric Cheng2, Subhasish Mitra2 and Ulf Schlichtmann1
1Technical University of Munich, DE; 2Stanford University, US
Abstract
ETISS is an instruction set simulator (ISS) for Virtual Prototypes (VPs) modeled with SystemC/TLM. In this paper, we propose the extension ETISS-ML, which enables a multi-level simulation that switches between ISS-level and register transfer level (RTL) to accurately evaluate the impact of soft errors in the pipeline of a RISC processor. ETISS-ML achieves close-to-RTL-accurate fault injection simulation results with close-to-ISS simulation performance with a speed up gain up to 100x compared to RTL. For this, we propose an approach to dynamically determine the length of the RTL simulation period. The high simulation performance of ETISS-ML enables an ultra-efficient and accurate evaluation of cross-layer resiliency techniques for embedded applications, which requires running a large number of fault injections for long simulation scenarios. This is demonstrated on a case study of a Microcontroller Unit (MCU) executing a control algorithm for adaptive cruise control.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-12, 806PRECISE EVALUATION OF THE FAULT SENSITIVITY OF OOO SUPERSCALAR PROCESSORS
Speaker:
Antonio Carlos Schneider Beck, Federal University of Rio Grande do Sul, BR
Authors:
Rafael Tonetto1, Gabriel Luca Nazar2 and Antonio Carlos Schneider Beck2
1Federal University of Rio Grande do Sul, BR; 2Universidade Federal do Rio Grande do Sul, BR
Abstract
Since superscalar processors lead the market, their resiliency evaluation by means of fault injection grows in importance. Fault injection strategies usually trade-off their levels of accuracy: low-level HW-based methods are accurate, but very expensive, need special equipment and the actual hardware, and lack controllability; while high-level simulation-based strategies are flexible, fast, easily accessible and have high controllability, but are not accurate since they are based on models that do not always reflect the low-level implementation, mainly when it comes to complex designs like out-of-order multiple-issue processors. In this work, we propose a cycle-accurate fault injection platform for superscalar processors, which has a smart checkpointing mechanism to accelerate injection time, attenuating the shortcomings imposed by the aforementioned fault injection methods while providing the same level of abstraction as detailed RTL models. Leveraging from this new platform, we evaluate a complex and parameterizable Out-of-Order processor (BOOM) by experimenting with different issue widths and analyzing the sensitivity of several hardware structures of the processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

5.7 Software-centric techniques for embedded systems

Date: Wednesday 21 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 5

Chair:
Marc Geilen, Eindhoven University of Technology, NL

Co-Chair:
Daniel Ziener, University of Twente, NL

Modern heterogeneous architectures pose new challenges for energy-efficient embedded realizations. The talks in this session address these challenges using software techniques, such as approximate computing, task scheduling, and memory and power-management.

TimeLabelPresentation Title
Authors
08:305.7.1HEPREM: ENABLING PREDICTABLE GPU EXECUTION ON HETEROGENEOUS SOC
Speaker:
Björn Forsberg, ETH Zürich, CH
Authors:
Björn Forsberg1, Luca Benini2 and Andrea Marongiu3
1ETH Zürich, CH; 2Università di Bologna, IT; 3IIS, ETH Zurich, CH
Abstract
Heterogeneous systems-on-a-chip are increasingly embracing shared memory designs, in which a single DRAM is used for both the main CPU and an integrated GPU. This architectural paradigm reduces the overheads associated with data movements and simplifies programmability. However, the deployment of real-time workloads on such architectures is troublesome, as memory contention significantly increases execution time of tasks and the pessimism in worst-case execution time (WCET) estimates. The Predictable Execution Model (PREM) separates memory and computation phases in real-time codes, then arbitrates memory phases from different tasks such that only one core at a time can access the DRAM. This paper revisits the original PREM proposal in the context of heterogeneous SoCs, proposing a compiler-based approach to make GPU codes PREM-compliant. Starting from high-level specifications of computation offloading, suitable program regions are selected and separated into memory and compute phases. Our experimental results show that the proposed technique is able to reduce the sensitivity of GPU kernels to memory interference to near zero, and achieves up to a 20× reduction in the measured WCET.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:005.7.2CIRCUIT CARVING: A METHODOLOGY FOR THE DESIGN OF APPROXIMATE HARDWARE
Speaker:
Ilaria Scarabottolo, USI Lugano, CH
Authors:
Ilaria Scarabottolo, Giovanni Ansaloni and Laura Pozzi, USI Lugano, CH
Abstract
Systems-on-Chip (SoCs) commonly couple low-power processors and dedicated hardware accelerators, which allow the execution of high-workload and/or timing-critical applications while relying on constrained resources. The functions performed by accelerators are often robust with respect to approximations that, when implemented in HW, can lead to circuits with tangibly lower area and power consumption. Research in approximate computing aims at developing effective strategies to explore the ensuing correctness/efficiency trade-off. In this context, we address the challenge of approximate circuit design in an innovative way, called here Circuit Carving, which consists in identifying the maximum portion of an exact circuit that can be discarded from it, or carved out, to derive an inexact version not exceeding an error threshold. We achieve this goal by proposing an algorithm based on binary tree exploration, bounded by conditions extracted from the circuit topology. Our approach can be applied to any combinatorial circuit, without a-priori knowledge of its functionality. The proposed algorithm allows back-tracking in order to never be trapped in local minima, and identifies the exact influence of each circuit gate on the output correctness, resulting in inexact circuits with higher efficiency and accuracy with respect to state-of-the-art greedy strategies.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:155.7.3ICNN: AN ITERATIVE IMPLEMENTATION OF CONVOLUTIONAL NEURAL NETWORKS TO ENABLE ENERGY AND COMPUTATIONAL COMPLEXITY AWARE DYNAMIC APPROXIMATION
Speaker:
Avesta Sasan, George Mason University, US
Authors:
Katayoun Neshatpour, Farnaz Behnia, Houman Homayoun and Avesta Sasan, George Mason University, US
Abstract
With Convolutional Neural Networks (CNN) becoming more of a commodity in the computer vision field, many have attempted to improve CNN in a bid to achieve better accuracy to a point that CNN accuracies have surpassed that of human's capabilities. However, with deeper networks, the number of computations and consequently the power needed per classification has grown considerably. In this paper, we propose Iterative CNN (ICNN) by reformulating the CNN from a single feed-forward network to a series of sequentially executed smaller networks. Each smaller network processes a small set of sub-sampled input features and enhances the accuracy of the classification. Upon reaching an acceptable classification confidence, further possessing of smaller networks is discarded. The proposed network architecture allows the CNN function to be dynamically approximated by creating the possibility of early termination and performing the classification with far fewer operations compared to a conventional CNN. Initial results show that this iterative approach competes with the original larger networks in terms of accuracy while incurring far less computational complexity by detecting many images in early iterations.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:305.7.4TASK SCHEDULING FOR MANY-CORES WITH S-NUCA CACHES
Speaker:
Anuj Pathania, Karlsruhe Institute of Technology, IN
Authors:
Anuj Pathania and Joerg Henkel, Karlsruhe Institute of Technology, DE
Abstract
A many-core processor may comprise a large number of processing cores on a single chip. The many-core's last-level shared cache can potentially be physically distributed alongside the cores in the form of cache banks connected through a Network on Chip~(NoC). Static Non-Uniform Cache Access (S-NUCA) memory address mapping policy provides a scalable mechanism for providing the cores quick access to the entire last-level cache. By design, S-NUCA introduces a unique topology-based performance heterogeneity and we introduce a scheduler that can exploit it. The proposed scheduler improves performance of the many-core by 9.93% in comparison to a state-of-the-art generic many-core scheduler with minimal run-time overheads.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:455.7.5KVSSD: CLOSE INTEGRATION OF LSM TREES AND FLASH TRANSLATION LAYER FOR WRITE-EFFICIENT KV STORE
Speaker:
Sung-Ming Wu, National Chiao-Tung University, TW
Authors:
Sung-Ming Wu, Kai-Hsiang Lin and Li-Pin Chang, National Chiao-Tung University, TW
Abstract
Log-Structured-Merge (LSM) trees are a write-optimized data structure for lightweight, high-performance Key-Value (KV) store. Solid State Disks (SSDs) provide acceleration of KV operations on LSM trees. However, this hierarchical design involves multiple software layers, including the LSM tree, host file system, and Flash Translation Layer (FTL), causing cascading write amplifications. We propose KVSSD, a close integration of LSM trees and the FTL, to manage write amplifications from different layers. KVSSD exploits the FTL mapping mechanism to implement copy-free compaction of LSM trees, and it enables direct data allocation in flash memory for efficient garbage collection. In our experiments, compared to the hierarchical design, our KVSSD reduced the write amplification by 88% and improved the throughput by 347%.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP2-13, 231(Best Paper Award Candidate)
STREAMFTL: STREAM-LEVEL ADDRESS TRANSLATION SCHEME FOR MEMORY CONSTRAINED FLASH STORAGE
Speaker:
Dongkun Shin, Sungkyunkwan University, KR
Authors:
Hyukjoong Kim, Kyuhwa Han and Dongkun Shin, Sungkyunkwan University, KR
Abstract
Although much research efforts have been devoted to reducing the size of address mapping table which consumes DRAM space in solid state drives (SSDs), most SSDs still use page-level mapping for high performance in their firmware called flash translation layer (FTL). In this paper, we propose a novel FTL scheme, called StreamFTL. In order to reduce the size of the mapping table in SSDs, StreamFTL maintains a mapping entry for each stream, which consists of several logical pages written at contiguous physical pages. Unlike extent, which is used by previous FTL schemes, the logical pages in a stream do not need to be contiguous. We show that StreamFTL can reduce the size of the mapping table by up to 90% compared to page-level mapping scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP2-14, 512ONLINE CONCURRENT WORKLOAD CLASSIFICATION FOR MULTI-CORE ENERGY MANAGEMENT
Speaker:
Karunakar Reddy Basireddy, University of Southampton, GB
Authors:
Karunakar Reddy Basireddy1, Amit Kumar Singh2, Geoff V. Merrett1 and Bashir M. Al-Hashimi1
1University of Southampton, GB; 2University of Essex, GB
Abstract
Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

IP2 Interactive Presentations

Date: Wednesday 21 March 2018
Time: 10:00 - 10:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP2-1IN-GROWTH TEST FOR MONOLITHIC 3D INTEGRATED SRAM
Speaker:
Yixun Zhang, Shanghai Jiao Tong University, CN
Authors:
Pu Pang1, Yixun Zhang1, Tianjian Li1, Sung Kyu Lim2, Quan Chen1, Xiaoyao Liang1 and Li Jiang1
1Shanghai Jiao Tong University, CN; 2Georgia Tech, US
Abstract
Monolithic three-dimensional integration (M3I) directly fabricates tiers of integrated circuits upon each other and provides millions of vertical interconnections with interlayer vias (ILVs). It thus brings higher integration density and communication capability compared with three-dimensional stacked integration (3D-SI). However, the Known-Good-Die problem haunting 3D-SI-a faulty tier causes the failure of the entire stack-also occurs in M3I. Lack of efficient test methodologies such as the pre-bond testing in 3D-SI, M3I may have a more significant yield drop and thus its cost may be unacceptable for main-stream adoption. This paper introduces a novel In-growth test method for M3I SRAM. We propose a novel Design-for- Test (DfT) methodology to enable the proposed In-growth test on cell-level partitioned incomplete SRAM cells. We also build a statistical model of cost and discover a prospective judgement to determine whether or not to stop the fabrication, in order to prevent from raising the cost of fabricating more tiers upon the irreparable tiers. We find that a "sweet point" exists in the judgement, which can minimize the overall cost. Experimental results show the effectiveness of our proposed test methodology.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-2A CO-DESIGN METHODOLOGY FOR SCALABLE QUANTUM PROCESSORS AND THEIR CLASSICAL ELECTRONIC INTERFACE
Speaker:
Jeroen van Dijk, Delft University of Technology, NL
Authors:
Jeroen van Dijk1, Andrei Vladimirescu2, Masoud Babaie1, Edoardo Charbon1 and Fabio Sebastiano1
1Delft University of Technology, NL; 2University of California, Berkeley, US
Abstract
A quantum computer fundamentally comprises a quantum processor and a classical controller. The classical electronic controller is used to correct and manipulate the qubits, the core components of a quantum processor. To enable quantum computers scalable to millions of qubits, as required in practical applications, the simultaneous optimization of both the classical electronic and quantum systems is needed. In this paper, a co-design methodology is proposed for obtaining an optimized qubit performance while considering practical trade-offs in the control circuits, such as power consumption, complexity, and cost. The SPINE (SPIN Emulator) toolset is introduced for the co-design and co-optimization of electronic/quantum systems. It comprises a circuit simulator enhanced with a Verilog-A model emulating the quantum behavior of single-electron spin qubits. Design examples show the effectiveness of the proposed methodology in the optimization, design and verification of a whole electronic/quantum system.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-3APPROXIMATE QUATERNARY ADDITION WITH THE FAST CARRY CHAINS OF FPGAS
Speaker:
Philip Brisk, University of California, Riverside, US
Authors:
Sina Boroumand1, Hadi P. Afshar2 and Philip Brisk3
1University of Tehran, IR; 2Qualcomm Research, US; 3University of California, Riverside, US
Abstract
A heuristic is presented to efficiently synthesize approximate adder trees on Altera and Xilinx FPGAs using their carry chains. The mapper constructs approximate adder trees using an approximate quaternary adder as the fundamental building block. The approximate adder trees are smaller than exact adder trees, allowing more operators to fit into a fixed-area device, trading off arithmetic accuracy for higher throughput.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-4NN COMPACTOR: MINIMIZING MEMORY AND LOGIC RESOURCES FOR SMALL NEURAL NETWORKS
Speaker:
Seongmin Hong, Hongik University, KR
Authors:
Seongmin Hong1, Inho Lee1 and Yongjun Park2
1Hongik University, KR; 2Hanyang University, KR
Abstract
Special neural accelerators are an appealing hardware platform for machine learning systems because they provide both high performance and energy efficiency. Although various neural accelerators have recently been introduced, they are difficult to adapt to embedded platforms because current neural accelerators require high memory capacity and bandwidth for the fast preparation of synaptic weights. Embedded platforms are often unable to meet these memory requirements because of their limited resources. In FPGA-based IoT (internet of things) systems, the problem becomes even worse since computation units generated from logic blocks cannot be fully utilized due to the small size of block memory. In order to overcome this problem, we propose a novel dual-track quantization technique to reduce synaptic weight width based on the magnitude of the value while minimizing accuracy loss. In this value-adaptive technique, large and small value weights are quantized differently. In this paper, we present a fully automatic framework called NN Compactor that generates a compact neural accelerator by minimizing the memory requirements of synaptic weights through dual-track quantization and minimizing the logic requirements of PUs with minimum recognition accuracy loss. For the three widely used datasets of MNIST, CNAE-9, and Forest, experimental results demonstrate that our compact neural accelerator achieves an average performance improvement of 6.4x over a baseline embedded system using minimal resources with minimal accuracy loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-5IMPROVING FAST CHARGING EFFICIENCY OF RECONFIGURABLE BATTERY PACKS
Speaker:
Alexander Lamprecht, TUM CREATE, SG
Authors:
Alexander Lamprecht1, Swaminathan Narayanaswamy1 and Sebastian Steinhorst2
1TUM CREATE, SG; 2Technical University of Munich, DE
Abstract
Recently, reconfigurable battery packs that can dynamically modify the electrical connection topology of their individual cells are gaining importance. While several circuit architectures and management algorithms are proposed in the literature, the electrical characteristics of the reconfiguration circuit architectures are not sufficiently studied so far. In this paper, we derive a detailed analytical model for a state-of-the-art reconfiguration architecture capturing the losses introduced by the parasitic resistances of the circuit components. For the first time, we propose a novel fast charging strategy using the reconfiguration architecture that significantly reduces the power losses in comparison to conventional battery packs. Moreover, using the analytical model, we highlight the challenges faced by existing reconfiguration architectures using state-of-the-art components and we derive the specifications for the switches which are essential for improving the energy efficiency of such reconfigurable battery packs.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-6CLOUD-ASSISTED CONTROL OF GROUND VEHICLES USING ADAPTIVE COMPUTATION OFFLOADING TECHNIQUES
Speaker:
Soheil Samii, General Motors R&D, Warren, MI 48090, US
Authors:
Arun Adiththan1, Ramesh S2 and Soheil Samii2
1City University of New York, US; 2General Motors R&D, US
Abstract
The existing approaches to design efficient safety-critical control applications is constrained by limited in-vehicle sensing and computational capabilities. In the context of automated driving, we argue that there is a need to leverage resources "out-of-the-vehicle" to meet the sensing and powerful processing requirements of sophisticated algorithms (e.g., deep neural networks). To realize the need, a suitable computation offloading technique that meets the vehicle safety and stability requirements, even in the presence of unreliable communication network, has to be identified. In this work, we propose an adaptive offloading technique for control computations into the cloud. The proposed approach considers both current network conditions and control application requirements to determine the feasibility of leveraging remote computation and storage resources. As a case study, we describe a cloud-based path following controller application that leverages crowdsensed data for path planning.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-7FUSIONCACHE: USING LLC TAGS FOR DRAM CACHE
Speaker:
Evangelos Vasilakis, Chalmers University of Technology, SE
Authors:
Evangelos Vasilakis1, Vassilis Papaefstathiou2, Pedro Trancoso1 and Ioannis Sourdis1
1Chalmers University of Technology, SE; 2FORTH-ICS, GR
Abstract
DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stacked DRAM. Although they can capture the spatial and temporal data locality of applications, their access latency is still substantially higher than conventional on-chip SRAM caches. Moreover, their tag access latency and storage overheads are excessive. Storing tags for a large DRAM cache in SRAM is impractical as it would occupy a significant fraction of the processor chip. Storing them in the DRAM itself incurs high access overheads. Attempting to cache the DRAM tags on the processor adds a constant delay to the access time. In this paper, we introduce FusionCache, a DRAM cache that offers more efficient tag accesses by fusing DRAM cache tags with the tags of the on-chip Last Level Cache (LLC). We observe that, in an inclusive cache model where the DRAM cachelines are multiples of on-chip SRAM cachelines, LLC tags could be re-purposed to access a large part of the DRAM cache contents. Then, accessing DRAM cache tags incurs zero additional latency in the common case.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-8IMPROVED SYNTHESIS OF CLIFFORD+T QUANTUM FUNCTIONALITY
Speaker:
Philipp Niemann, German Research Center for Articial Intelligence (DFKI GmbH), DE
Authors:
Philipp Niemann1, Robert Wille2 and Rolf Drechsler3
1Cyber-Physical Systems, DFKI GmbH, DE; 2Johannes Kepler University Linz, AT; 3University of Bremen/DFKI GmbH, DE
Abstract
The Clifford+T library provides robust and fault-tolerant realizations for quantum computations. Consequently, (logic) synthesis of Clifford+T quantum circuits became an important research problem. However, previously proposed solutions are either only applicable to very small quantum systems or lead to circuits that are far from being optimal—mainly caused by a local, i.e. column-wise, consideration of the underlying transformation matrix to be synthesized. In this paper, we suggest an improved approach that considers the matrix globally and, by this, overcomes many of these drawbacks. Preliminary evaluations show the promises of this direction.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-9ENERGY-EFFICIENT CHANNEL ALIGNMENT OF DWDM SILICON PHOTONIC TRANSCEIVERS
Speaker:
Yuyang Wang, University of California, Santa Barbara, US
Authors:
Yuyang Wang1, M. Ashkan Seyedi2, Rui Wu1, Jared Hulme2, Marco Fiorentino2, Raymond G. Beausoleil2 and Kwang-Ting Cheng3
1University of California, Santa Barbara, US; 2Hewlett Packard Labs, US; 3Hong Kong University of Science and Technology, HK
Abstract
The comb laser-driven microring-based dense wavelength division multiplexing silicon photonics is a promising candidate for next-generation optical interconnects. However, existing solutions for exploring the power-performance trade-off of such systems have been restricted to a limited design space, resulting from the unnecessary constraints of using an identical spacing for laser comb lines and microring channels, and of utilizing consecutive laser comb lines for data transmission. We propose an energy-efficient channel alignment scheme that aligns the microring channels to a subset of laser comb lines that are non-uniformly distributed in the free spectrum range of the microrings. Based on a well-established process variation model, our simulations show that the proposed scheme significantly reduces the microring tuning power in the presence of denser comb lines. The power saved from microring tuning can improve the overall system energy efficiency despite some power wasted in unused laser comb lines. We further conducted a case study for design space exploration using the proposed channel alignment scheme, seeking the most energy-efficient configuration in order to achieve a target aggregated data rate.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-10A PHYSICAL SYNTHESIS FLOW FOR EARLY TECHNOLOGY EVALUATION OF SILICON NANOWIRE BASED RECONFIGURABLE FETS
Speaker:
Shubham Rai, Chair For Processor Design, CFAED, Technische Universität Dresden, Dresden, DE
Authors:
Shubham Rai1, Ansh Rupani2, Dennis Walter1, Michael Raitza1, André Heinzig3, Christian Mayr1, Walter Weber4 and Akash Kumar1
1Technische Universität Dresden, DE; 2Birla Institute of Technology and Science Pilani, Hyderabad Campus, IN; 3NaMLab GmbH, DE; 4NaMLab gGmbH and CfAED, DE
Abstract
Silicon Nanowire based reconfigurable transistors (RFETs) provide an additional gate terminal called the program gate which gives the freedom of programming p-type or n-type functionality for the same device at runtime. This enables the circuit designers to pack more functionality per computational unit. This saves processing costs as only one device type is required. No doping and associated lithography steps are needed for this technology. In this paper, we present a complete design flow including both logic and physical synthesis for circuits based on SiNW RFETs. We propose layouts of logic gates, Liberty and LEF (library extension format) files for the physical synthesis flow and make these available under an open source license to enable further research in the domain of these novel, functionally enhanced transistors. We develop a table model based on a transistor cell with relaxed dimensions following an SOI-based 22 nm technology having a gate pitch of 110 nm and modeled our logic gates on dual gate RFETs. For the sake of comparison, we use the same tool flow for CMOS. We show that in the first of its kind comparison, for these fully symmetrical reconfigurable transistors, the area after placement and routing for SiNW based circuits is 17% more than that of CMOS for MCNC benchmark. Further, we discuss areas of improvement for obtaining better area results from the silicon nanowire based RFETs from a fabrication and technology point of view. The future use of self-aligned techniques to structure two independent gates within a smaller pitch holds the promise of substantial area reduction.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-11ETISS-ML: A MULTI-LEVEL INSTRUCTION SET SIMULATOR WITH RTL-LEVEL FAULT INJECTION SUPPORT FOR THE EVALUATION OF CROSS-LAYER RESILIENCY TECHNIQUES
Speaker:
Martin Dittrich, Technical University of Munich, DE
Authors:
Daniel Mueller-Gritschneder1, Martin Dittrich1, Josef Weinzierl1, Eric Cheng2, Subhasish Mitra2 and Ulf Schlichtmann1
1Technical University of Munich, DE; 2Stanford University, US
Abstract
ETISS is an instruction set simulator (ISS) for Virtual Prototypes (VPs) modeled with SystemC/TLM. In this paper, we propose the extension ETISS-ML, which enables a multi-level simulation that switches between ISS-level and register transfer level (RTL) to accurately evaluate the impact of soft errors in the pipeline of a RISC processor. ETISS-ML achieves close-to-RTL-accurate fault injection simulation results with close-to-ISS simulation performance with a speed up gain up to 100x compared to RTL. For this, we propose an approach to dynamically determine the length of the RTL simulation period. The high simulation performance of ETISS-ML enables an ultra-efficient and accurate evaluation of cross-layer resiliency techniques for embedded applications, which requires running a large number of fault injections for long simulation scenarios. This is demonstrated on a case study of a Microcontroller Unit (MCU) executing a control algorithm for adaptive cruise control.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-12PRECISE EVALUATION OF THE FAULT SENSITIVITY OF OOO SUPERSCALAR PROCESSORS
Speaker:
Antonio Carlos Schneider Beck, Federal University of Rio Grande do Sul, BR
Authors:
Rafael Tonetto1, Gabriel Luca Nazar2 and Antonio Carlos Schneider Beck2
1Federal University of Rio Grande do Sul, BR; 2Universidade Federal do Rio Grande do Sul, BR
Abstract
Since superscalar processors lead the market, their resiliency evaluation by means of fault injection grows in importance. Fault injection strategies usually trade-off their levels of accuracy: low-level HW-based methods are accurate, but very expensive, need special equipment and the actual hardware, and lack controllability; while high-level simulation-based strategies are flexible, fast, easily accessible and have high controllability, but are not accurate since they are based on models that do not always reflect the low-level implementation, mainly when it comes to complex designs like out-of-order multiple-issue processors. In this work, we propose a cycle-accurate fault injection platform for superscalar processors, which has a smart checkpointing mechanism to accelerate injection time, attenuating the shortcomings imposed by the aforementioned fault injection methods while providing the same level of abstraction as detailed RTL models. Leveraging from this new platform, we evaluate a complex and parameterizable Out-of-Order processor (BOOM) by experimenting with different issue widths and analyzing the sensitivity of several hardware structures of the processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-13STREAMFTL: STREAM-LEVEL ADDRESS TRANSLATION SCHEME FOR MEMORY CONSTRAINED FLASH STORAGE
Speaker:
Dongkun Shin, Sungkyunkwan University, KR
Authors:
Hyukjoong Kim, Kyuhwa Han and Dongkun Shin, Sungkyunkwan University, KR
Abstract
Although much research efforts have been devoted to reducing the size of address mapping table which consumes DRAM space in solid state drives (SSDs), most SSDs still use page-level mapping for high performance in their firmware called flash translation layer (FTL). In this paper, we propose a novel FTL scheme, called StreamFTL. In order to reduce the size of the mapping table in SSDs, StreamFTL maintains a mapping entry for each stream, which consists of several logical pages written at contiguous physical pages. Unlike extent, which is used by previous FTL schemes, the logical pages in a stream do not need to be contiguous. We show that StreamFTL can reduce the size of the mapping table by up to 90% compared to page-level mapping scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-14ONLINE CONCURRENT WORKLOAD CLASSIFICATION FOR MULTI-CORE ENERGY MANAGEMENT
Speaker:
Karunakar Reddy Basireddy, University of Southampton, GB
Authors:
Karunakar Reddy Basireddy1, Amit Kumar Singh2, Geoff V. Merrett1 and Bashir M. Al-Hashimi1
1University of Southampton, GB; 2University of Essex, GB
Abstract
Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-15AIM: FAST AND ENERGY-EFFICIENT AES IN-MEMORY IMPLEMENTATION FOR EMERGING NON-VOLATILE MAIN MEMORY
Speaker:
Jingtong Hu, University of Pittsburgh, US
Authors:
Mimi Xie1, Shuangchen Li2, Alvin Glova2, Jingtong Hu1, Yuangang Wang3 and Yuan Xie2
1University of Pittsburgh, US; 2University of California, Santa Barbara, US; 3Huawei Technologies, China, CN
Abstract
Non-volatile main memory-based systems pose an opportunity for an attacker to readily access sensitive information on the memory because of its long retention time. While real-time memory encryption with dedicated AES engine can address this vulnerability, it incurs extra performance and energy overheads. As an alternative, we propose an AES in-memory implementation, AIM, to encrypt the whole/part of the memory only when it is necessary. We leverage the benefits offered by the in-memory computing architecture to address the challenges of the bandwidth intensive encryption application. We take advantage of NVM's intrinsic logic operation capability to implement the AES task. Embracing the massive parallelism inside the memory, AIM outperforms existing mechanisms with higher throughput yet lower energy consumption. Compared with state-of-the-art AES engine running at 2.1GHz, AIM can speed up the encryption process by 80 times for a 1GB NVM.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-16SAT-BASED BIT-FLIPPING ATTACK ON LOGIC ENCRYPTIONS
Speaker:
Hai Zhou, Northwestern University, US
Authors:
Yuanqi Shen, Amin Rezaei and Hai Zhou, Northwestern University, US
Abstract
Logic encryption is a hardware security technique that uses extra key inputs to prevent unauthorized use of a circuit. With the discovery of the SAT-based attack, new encryption techniques such as SARLock and Anti-SAT are proposed, and further combined with traditional logic encryption techniques, to guarantee both high error rates and resilience to the SAT-based attack. In this paper, the SAT-based bit-flipping attack is presented. It first separates the two groups of keys via SAT-based bit-flippings, and then attacks the traditional encryption and the SAT-resilient encryption, by conventional SAT-based attack and by-passing attack, respectively. The experimental results show that the bit-flipping attack successfully returns a circuit with the correct functionality and significantly reduces the execution time compared with other advanced attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-17AMS VERIFICATION METHODOLOGY REGARDING SUPPLY MODULATION IN RF SOCS INDUCED BY DIGITAL STANDARD CELLS
Speaker:
Fabian Speicher, RWTH Aachen University, DE
Authors:
Fabian Speicher, Jonas Meier, Soheil Aghaie, Ralf Wunderlich and Stefan Heinen, RWTH Aachen University, DE
Abstract
Nanoscale CMOS enables and forces the use of digital-centric RF architectures, where timing resolution is traded for analog resolution. Simultaneously, digital circuits act as aggressors endangering the performance of the time continuous digital and analog parts. The switching activities of logic cells result in power supply variations which lead to jitter in the digital signal paths and causes interferers coupling to the analog paths, appearing as e.g. phase noise, crosstalk, unwanted frequency conversion, etc. Since todays commonly used AMS simulation methods are limited to register-transfer level (RTL) models for the digital domain, the electrical behavior caused by digital switching is not considered. Here, a method for modeling logic cells with regard to power supply noise is presented using the available characterization data of a standard cell library. It covers the influence of switching on the supply voltage as well as influences of supply variations on the digital path delay and their feedthrough to blocks of the RF domain. A fast event-driven simulation of an entire AMS system regarding the mentioned aspects is enabled. The method is demonstrated on a digital-centric transmitter to detect the effects on system level.

Download Paper (PDF; Only available from the DATE venue WiFi)

UB05 Session 5

Date: Wednesday 21 March 2018
Time: 10:00 - 12:00
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB05.1CCF: A CGRA COMPILATION FRAMEWORK
Authors:
Shail Dave and Aviral Shrivastava, Arizona State University, US
Abstract
Coarse-grained reconfigurable array (CGRA) can efficiently accelerate even non-parallel loops. Although scores of techniques have been developed in the past decade to map loops on CGRA PEs, several challenges in enabling acceleration of general-purpose applications on CGRAs remained unresolved, in particular, the automatic code generation for the CGRA accelerator coupled with modern processor cores. In this demonstration, we showcase CCF - CGRA compiler framework. CCF is implemented in LLVM 4.0 and includes a set of transformation and analysis passes. We show that given performance-critical loops annotated in embedded applications, how CCF extracts the loop, constructs the data dependency graph (DDG), maps it onto CGRA architecture, off-loads necessary configuration instructions for CGRA PEs, and automatically communicates data between the CPU and CGRA.

More information ...
UB05.2TOPOLINANO & MAGCAD: A DESIGN AND SIMULATION FRAMEWORK FOR THE EXPLORATION OF EMERGING TECHNOLOGIES
Authors:
Umberto Garlando and Fabrizio Riente, Politecnico di Torino, IT
Abstract
We developed a design framework that enables the exploration and analysis of emerging beyond-CMOS technologies. It is composed of two powerful tools: ToPoliNano and MagCAD. Different technologies are supported, and new ones could be added thanks to their modular structure. ToPoliNano starts from a VHDL description of a circuit and performs the place&route following the technological constraints. The resulting circuit can be simulated both at logical or physical level. MagCAD is a layout editor where the user can design custom circuits, by placing basic elements of the selected technology. The tool can extract a VHDL netlist based on compact models of placed elements derived from experiments or physical simulations. Circuits can be verified with standard VHDL simulators. The design workflow will be demonstrated at the U-booth to show how those tools could be a valuable help in the studying and development of emerging technologies and to obtain feedbacks from the scientific community.

More information ...
UB05.3OISC MULTICORE STENCIL PROCESSOR: ONE INSTRUCTION-SET COMPUTER-BASED MULTICORE PROCESSOR FOR STENCIL COMPUTING
Authors:
Kaoru Saso, Jing Yuan Zhao and Yuko Hara-Azumi, School of Engineering, Tokyo Institute of Technology, JP
Abstract
Subtract and Branch on NEGative with 4 operands (SUBNEG4) is one of One Instruction-Set Computers that execute only one type of instruction. Thanks to its simplicity, SUBNEG4 has only 1/20x circuit area and 1/10x power consumption against MIPS processor. As SUBNEG4 is Turing-complete, it is suitable for parallel computing by multiple cores, while keeping its low-power feature. Our on-going project is seeking for effective use and deployment of SUBNEG4 cores on embedded systems. Our booth will demonstrate the significant speed-up by a SUBNEG4-based many-core processor against a conventional processor, for stencil computing. Our 64-core processor efficiently handles 2D von-Neumann neighborhood stencils, e.g., wave simulation by Verlet integration and 2D Jacobi iteration, to compute 64 points simultaneously. We show that small many-core processors can be realized even with such large number of cores while achieving good speed-up for heaving computation.

More information ...
UB05.4ROS X FPGA FOR ROBOT-CLOUD SYSTEM: ROBOT-CLOUD COOPERATIVE VISUAL SLAM PROCESSING USING ROS-COMPLIANT FPGA COMPONENT
Authors:
Takeshi Ohkawa, Yuhei Sugata, Aoi Soya, Kanemitsu Ootsu and Takashi Yokota, Utsunomiya University, JP
Abstract
Distributed processing in robot-cloud cooperative system is discussed in terms of processing performance and communication performance. Cooperation of robots and cloud-servers is inevitable for realizing intelligent robots in the next generation society and industry. To improve processing performance of the cooperative system, we utilize ROS-compliant FPGA component as a robot-side embedded processing for low-power and high-performance image processing. We prepare two demonstrations. (1)Key-point Detection from camera image using Fully-hardwired ROS-Compliant FPGA component In the evaluation, the processing performance of the component is almost same as PC, while it operates at more than 10 times less power (5W), compared to PC (50W). (2)Distributed Visual SLAM using two-wheeled robot (TurtleBot3) Distributed Visual SLAM (Simultaneous Localization and Mapping) are presented as a concrete example of the robot-cloud cooperative system.

More information ...
UB05.5VIRTUAL PROTOTYPE MAKANI: ANALYZING THE USAGE OF POWER MANAGEMENT TECHNIQUES AND EXTRA-FUNCTIONAL PROPERTIES BY USING VIRTUAL PROTOTYPING
Author:
Sören Schreiner, OFFIS – Institute for Information Technology, DE
Abstract
My Phd work consists of analyzing the correct usage of power management techniques, as well as the analysis of extra-functional properties, including power and timing properties, in MPSoCs. Especially in safety-critical environments the power management gets safety-critical too, since it is able to influence the overall system behavior. To demonstrate my methodologies a mixed-critical multi-rotor system and its corresponding virtual prototype is used. The multi-rotor system's avionics is served by a Xilinx Zynq 7000 MPSoC. The hardware architecture includes ARM and MicroBlaze cores, a NoC for communication and peripherals. The MPSoC processes the flight algorithms with triple modular redundancy and a mission-critical video processing task. The virtual prototype consists of a virtual platform and an environmental model. The virtual platform is equipped with my measuring tool libraries to generate traces of the observed power management techniques and the extra-functional properties.

More information ...
UB05.6SPANNER: SELF-REPAIRING SPIKING NEURAL NETWORK CONTROLLER FOR AN AUTONOMOUS ROBOT
Authors:
Alan Millard1, Anju Johnson1, James Hilder1, David Halliday1, Andy Tyrrell1, Jon Timmis1, Junxiu Liu2, Shvan Karim2, Jim Harkin2 and Liam McDaid2
1University of York, GB; 2Ulster University, GB
Abstract
The human brain is remarkably resilient, and is able to self-repair following injury or a stroke. In contrast, electronic systems typically exhibit limited self-repair capabilities, and cannot recover from faults. We demonstrate a bio-inspired approach to self-repair that allows an autonomous robot to recover from faults in its artificial 'brain'. Astrocytes are support cells in the human brain that interact with neurons to regulate synaptic activity. We have modelled this interaction to create a spiking neural network that can self-repair when synapses between neurons are damaged, by strengthening redundant pathways. We demonstrate a robot platform controlled by a self-repairing spiking neural network that is implemented on an FPGA. We demonstrate that injecting faults into the synapses of the network initially causes the robot to behave erratically, but that the neural controller is able to automatically repair itself, thus allowing the robot to resume normal function.

More information ...
UB05.7EVOAPPROX: LIBRARIES AND GENERATORS OF APPROXIMATE CIRCUITS
Authors:
Lukas Sekanina, Zdeněk Vašíček and Vojtěch Mrazek, Brno University of Technology, CZ
Abstract
Our contribution deals with a fully automated functional approximation methodology for combinational digital circuits. We present open libraries of approximate circuits and tools performing desired approximations. Our approach uses a multi-objective genetic programming-based method to automatically design approximate k-bit adders and multipliers (k = 8, 12, 16, 32). All circuits can be downloaded from [1] at the level of a source code (C, Verilog, and Matlab). Several error metrics are pre-calculated and formal guarantees are given in terms of these errors. By means of an interactive web interface the user can easily find the best trade-off between the error and electrical parameters provided for 45/90/180 nm technology process. We will also demonstrate the circuit design flow developed. References: [1] http://www.fit.vutbr.cz/research/groups/ehw/approxlib/

More information ...
UB05.8REPABIT: AUTOMATED DESIGN OF RELOCATABLE PARTIAL BITSTREAMS
Authors:
Jens Rettkowski1 and Diana Göhringer2
1Ruhr-University Bochum, DE; 2Technische Universität Dresden, DE
Abstract
Dynamic partial reconfiguration of FPGAs enables the replacement of hardware modules at runtime without disturbing remaining hardware modules. The standard vendor tools generate a specific partial bitstream for each reconfigurable region. Relocation generates a partial bitstream in such a way, that it can be moved to different regions. Hence, the number and the time to generate bitstreams is reduced. In this work, RePaBit is presented that automates the generation of relocatable partial bitstreams for Xilinx Vivado. TCL scripts check the compatibility of resource footprints and arrange identical partition pins in all regions for the connection of relocatable modules with the remaining design. Feedthrough routes are avoided using the isolation design flow from Xilinx. The results show an overhead of LUTs by 0.7% and a frequency reduction of only 1.5%. Nevertheless, RePaBit simplifies the design and reduces the design time as well as the needed memory for storing the partial bitstreams.

More information ...
UB05.9CIJTAG: CONCURRENT IJTAG DEMONSTRATOR
Author:
Krenz-Baath René, Hamm-Lippstadt University of Applied Sciences, DE
Abstract
The flexibility of on-chip instrument access enabled by IEEE 1687 (IJTAG) has shown tremendous improvements in modern industrial designs. Due to a constantly increasing spektrum of tasks performed through 1687 networks such as performing test operations during production test, on-line test operations as well as operating health monitors the test requirements in modern designs increase dramatically with respect to test performance, responsiveness and low power. These requirements have a major impact on the design of such test infrastructures. In complex designs with large test infrastructures it might be challenging to comply with the large spectrum of requirements. Concurrent IJTAG is novel partitioning concept to a reconfigurable test infrastructure in order to enable an independent operation of different sections of the test infrastructure. The proposed demonstrator shows the first FPGA-based implementation of concurrent IJTAG test infrastructures.

More information ...
UB05.10EXPERIENCE-BASED AUTOMATION OF ANALOG IC DESIGN
Authors:
Florian Leber and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous circuit synthesis approaches, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches (also known as "generators"): They (a) emulate experts' decisions, thus (b) make expert knowledge re-usable and (c) can consider all relevant aspects and constraints implicitly. Nowadays, generators are successfully applied in analog layout (Pcells, Pycells). We aim at an entire design flow completely based on procedural automation techniques. This flow will consist of procedures for the generation of schematics and layouts for every typical analog circuit class, such as amplifier, bandgap, filter a.s.o. In our presentation we give an overview on such a design flow and we show an approach for capturing an analog circuit designer's strategy as an executable "expert design plan".

More information ...
12:00End of session
12:30Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

6.1 Special Day Session on Future and Emerging Technologies: Transistors for Digital NanoSystems: The Road Ahead

Date: Wednesday 21 March 2018
Time: 11:00 - 12:30
Location / Room: Saal 2

Chair:
Aitken Rob, ARM, US

This session presents energy-efficient digital design approaches using new transistor ideas and their experimental demonstrations. Examples include negative capacitance-based gate control, carbon nanotube-based channels, and polarity-control by design.

TimeLabelPresentation Title
Authors
11:006.1.1NEGATIVE CAPACITANCE TRANSISTORS
Author:
Michael Hoffmann, NaMLab gGmbH, DE
Abstract
A transistor looking from the gate essentially acts as a series combination of two capacitors: the gate oxide capacitor and the channel capacitor. When the gate oxide is an appropriate ferroelectric, this series combination can stabilize the ferroelectric material at a state of negative capacitance. At this state, the total capacitance of the series combination is enhanced, leading to more charge at the channel at the same voltage. This boost of charge, in turn, leads to larger current at the same voltage. In fact, this boost makes it possible to reduce supply voltage of transistors below the traditional Boltzmann limit --- often termed as the Boltzmann tyranny. In the recent years, many groups around the world, both in academy and in the industry, have demonstrated the fundamental effect and the Negative Capacitance Transistors. In this work, we shall describe the physical origin of the negative capacitance effect and our current understanding of the recent experimental work. We shall also discuss potential ways to optimize devices that could lead to significant improvement in energy efficiency for next generation computers.
11:306.1.2CARBON NANOTUBE FILM-BASED CMOS AND OPTOELECTRONIC DEVICES AND INTEGRATED SYSTEMS
Author:
Lian-Mao Peng, Peking University, CN
Abstract
Carbon nanotube (CNT)-based electronics has been considered one of the most promising candidates to replace Si complementary metal-oxide-semiconductor (CMOS) technology, which will soon meet its performance limit. Prototype device studies on individual CNTs revealed that CNT based devices have the potential to outperform Si CMOS technology in both performance and power consumption, especially at sub-10 nm technology nodes, which are close to the theoretical limits; and various optoelectronic device such as light-emitting diodes, photodetectors and photovoltaic (PV) cells have been demonstrated. In this talk, I will discuss the use of randomly oriented CNT film to build CNT CMOS and optoelectronic devices, and show that the performance of CNT film devices and systems can be dramatically improved by optimizing the material purity, device structure and fabrication processes, thus yielding CNT devices with outstanding performance comparable to that of Si CMOS and ICs working in the GHz regime, and integrated electronic and optoelectronic systems for communications between nanoelectronic circuits using CNT devices.
12:006.1.3TOWARDS HIGH-PERFORMANCE POLARITY-CONTROLLABLE FETS WITH 2D MATERIALS
Speaker:
Pierre-Emmanuel Gaillardon, University of Utah, US
Authors:
Giovanni V. Resta1, Jorge Romero Gonzalez2, Yashwanth Balaji3, Tarun Kumar Agarwal3, Dennis Lin3, Francky Catthoor3, Iuliana P. Radu3, Giovanni De Micheli1 and Pierre-Emmanuel Gaillardon4
1Integrated System Laboratory – EPFL, CH; 2Laboratory of NanoIntegrated Systems (LNIS), Department of Electrical and Computer Engineering, University of Utah, US; 3IMEC, BE; 4University of Utah, US
Abstract
As scaling of conventional silicon-based electronics is reaching its ultimate limit, two-dimensional semiconducting materials of the transition-metal-dichalcogenides family, such as MoS2 and WSe2, are considered as viable candidates for next-generation electronic devices. Fully relying on electrostatic doping, polarity-controllable devices, that use additional gate terminals to modulate the Schottky barriers at source and drain, can strongly take advantages of 2D materials to achieve high on/off ratio and low leakage floor. Here, we provide an overview of the latest advances in 2D material processes and growth. Then, we report on the experimental demonstration of polarity-controllable devices fabricated on 2D-WSe2 and study the scaling trends of such devices using ballistic self-consistent quantum simulations. Finally, we discuss the circuit-level opportunities of such technology.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

6.2 Memory Security

Date: Wednesday 21 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 6

Chair:
Francesco Regazzoni, ALaRI USI, CH

Co-Chair:
Todd Austin, University of Michigan, US

Papers in this session address the problem of dealing with secure memory architectures and cover the whole memory hierarchy from cache to storage. Different levels of the hierarchy need different protection mechanisms, such as: protecting information from attacks on untrusted clouds and ensuring integrity of the main memory used by the CPU. Finally, the last paper identifies cache attacks and vulnerabilities on MPSoCs using Networks on Chips.

TimeLabelPresentation Title
Authors
11:006.2.1DYNAMIC SKEWED TREE FOR FAST MEMORY INTEGRITY VERIFICATION
Speaker:
Saru Vig, Nanyang Technological University, SG
Authors:
Saru Vig, Jiang Guiyuan and Lam Siew Kei, Nanyang Technological University, SG
Abstract
Memory authentication techniques often employ an integrity tree as a countermeasure against replay, spoofing and splicing attacks. However, the balanced memory integrity trees used in existing approaches lead to excessive memory access overheads for runtime verification. In this paper, we propose a framework to dynamically construct a customized integrity tree based on the data access patterns to reduce the overhead of runtime verification. The proposed framework can adapt the memory integrity tree structure at runtime such that the nodes that correspond to frequently accessed data are placed closer to the root. We have validated the effectiveness of our approach on the Altera NIOS II processor with an external DRAM. Experimental results based on applications from widely used CHStone and SNU Real-Time benchmarks demonstrate that the proposed approach can lead to an average performance gain of 30% compared to the conventional means of using balanced memory integrity trees. In addition, to preserve data confidentiality, we implemented the encryption/decryption operations using custom instructions on the NIOS II processor to notably reduce the overall overhead of memory security.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.2.2EARTHQUAKE - A NOC-BASED OPTIMIZED DIFFERENTIAL CACHE-COLLISION ATTACK FOR MPSOCS
Speaker:
Cezar Rodolfo W. Reinbrecht, UFRGS, BR
Authors:
Cezar Rodolfo Wedig Reinbrecht1, Bruno Endres Forlin1, Andreas Zankl2 and Johanna Sepulveda3
1UFRGS, BR; 2Fraunhofer Institute for Applied and Integrated Security, DE; 3Technical University of Munich, DE
Abstract
Multi-Processor Systems-on-Chips (MPSoCs) are a platform for a wide variety of applications and use-cases. The high on-chip connectivity, the programming flexibility, and the reuse of IPs, however, also introduce security concerns. Problems arise when applications with different trust and protection levels share resources of the MPSoC, such as processing units, cache memories, and the Network-on-Chip (NoC) communication structure. If a program gets compromised, an adversary can observe the use of these resources and infer (potentially secret) information from other applications. In this work, we explore the cache-based attack by Bogdanov et al., which infers the cache activity of a target program through timing measurements and exploits collisions that occur when the same cache location is accessed for different program inputs. We implement this differential cache-collision attack on the MPSoC Glass and introduce an optimized variant of it, the Earthquake Attack, which leverages the NoC-based communication to increase attack efficiency. Our results show that Earthquake performs well under different cache line and MPSoC configurations, illustrating that cache-collision attacks are considerable threats on MPSoCs.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.2.3(Best Paper Award Candidate)
A FAST AND RESOURCE EFFICIENT FPGA IMPLEMENTATION OF SECRET SHARING FOR STORAGE APPLICATIONS
Speaker:
Jakob Stangl, Austrian Institute of Technology (AIT), AT
Authors:
Jakob Stangl1, Thomas Lorünser2 and Sai Dinakarrao1
1TU Wien, AT; 2Austrian Institute of Technology, AT
Abstract
Outsourcing data into the cloud gives wide benefits and opportunities to customers. Beside these advantages, new challenges such as confidentiality and accessibility have to be addressed. One approach to overcome these challenges is by applying secret sharing in a distributed storage setting, known as cloud of clouds approach. For this purpose we present a new hardware architecture of a wide parametrizable secret sharing core. Performance metrics for various applied bit-widths of secret words are given, which are crucial for benefits of higher level protocols in the cloud of clouds approach. Additionally, a complete system which is able to operate in a network environment is presented. The achieved throughputs are in the order of Gbit/s. It is significantly faster than similar comparable hardware architectures and orders of magnitude higher than software implementations.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP2-15, 958AIM: FAST AND ENERGY-EFFICIENT AES IN-MEMORY IMPLEMENTATION FOR EMERGING NON-VOLATILE MAIN MEMORY
Speaker:
Jingtong Hu, University of Pittsburgh, US
Authors:
Mimi Xie1, Shuangchen Li2, Alvin Glova2, Jingtong Hu1, Yuangang Wang3 and Yuan Xie2
1University of Pittsburgh, US; 2University of California, Santa Barbara, US; 3Huawei Technologies, China, CN
Abstract
Non-volatile main memory-based systems pose an opportunity for an attacker to readily access sensitive information on the memory because of its long retention time. While real-time memory encryption with dedicated AES engine can address this vulnerability, it incurs extra performance and energy overheads. As an alternative, we propose an AES in-memory implementation, AIM, to encrypt the whole/part of the memory only when it is necessary. We leverage the benefits offered by the in-memory computing architecture to address the challenges of the bandwidth intensive encryption application. We take advantage of NVM's intrinsic logic operation capability to implement the AES task. Embracing the massive parallelism inside the memory, AIM outperforms existing mechanisms with higher throughput yet lower energy consumption. Compared with state-of-the-art AES engine running at 2.1GHz, AIM can speed up the encryption process by 80 times for a 1GB NVM.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP2-16, 748SAT-BASED BIT-FLIPPING ATTACK ON LOGIC ENCRYPTIONS
Speaker:
Hai Zhou, Northwestern University, US
Authors:
Yuanqi Shen, Amin Rezaei and Hai Zhou, Northwestern University, US
Abstract
Logic encryption is a hardware security technique that uses extra key inputs to prevent unauthorized use of a circuit. With the discovery of the SAT-based attack, new encryption techniques such as SARLock and Anti-SAT are proposed, and further combined with traditional logic encryption techniques, to guarantee both high error rates and resilience to the SAT-based attack. In this paper, the SAT-based bit-flipping attack is presented. It first separates the two groups of keys via SAT-based bit-flippings, and then attacks the traditional encryption and the SAT-resilient encryption, by conventional SAT-based attack and by-passing attack, respectively. The experimental results show that the bit-flipping attack successfully returns a circuit with the correct functionality and significantly reduces the execution time compared with other advanced attacks.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

6.3 Advances in AMS/RF Design & Test Automation and Beyond

Date: Wednesday 21 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 1

Chair:
Marie-Minerve Louerat, LIP6, FR

This session brings together new design and test automation developments for AMS/RF systems and beyond. Papers in the session cover a wide range of exciting topics from circuit optimization to design tools and verification. The topics include innovative combination of principal component analysis and evolutionary computation applied to analog/RF IC optimization; hybrid automation approach for SAR ADC design aimed at IoT applications; and design space exploration for wireless systems. Interactive papers discuss AMS circuit testbenches, modeling and simulation of systems that combine continuous and discrete time components, and AMS verification.

TimeLabelPresentation Title
Authors
11:006.3.1ENHANCED ANALOG AND RF IC SIZING METHODOLOGY USING PCA AND NSGA-II OPTIMIZATION KERNEL
Speaker:
Nuno Lourenco, Instituto de Telecomunicações, PT
Authors:
Tiago Pessoa, Nuno Lourenco, Ricardo Martins, Ricardo Povoa and Nuno Horta, Instituto de Telecomunicações /Instituto Superior Técnico – Universidade de Lisboa, PT
Abstract
State-of-the-art design of analog and radio frequency integrated circuits is often accomplished using sizing optimization. In this paper, an innovative combination of principal component analysis (PCA) and evolutionary computation is used to increase the optimizer's efficiency. The adopted NSGA-II optimization kernel is improved by applying the genetic operators of mutation and crossover on a transformed design-space, obtained from the latest set of solutions (the parents) using PCA. By applying crossover and mutation on variables that are projections of the principal components, the optimization moves more effectively, finding solutions with better performances, in the same amount of time, than the standard NSGA-II optimization kernel. The proposed method was validated in the optimization of two widely used analog circuits, an amplifier and a voltage controlled oscillator, reaching wider solutions sets, and in some cases, solutions sets that can be almost 3 times better in terms of hypervolume.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.3.2A SYSTEMC-BASED SIMULATOR FOR DESIGN SPACE EXPLORATION OF SMART WIRELESS SYSTEMS
Speaker:
Gabriele Miorandi, University of Verona, IT
Authors:
Gabriele Miorandi1, Francesco Stefanni2, Federico Fraccaroli3 and Davide Quaglia1
1University of Verona, IT; 2EDALab s.r.l., IT; 3Wagoo Italia s.r.l.s., IT
Abstract
Smart wireless techniques are at the core of many today's telecommunication and networked embedded systems where performance are enhanced by intertwining radio frequency (RF) and digital aspects. Therefore their design requires to focus on both domains. Traditional approaches for their simulation rely either on different domain-specific tools or on analog-mixed-signal modeling languages. In the former case, the simulation of the whole platform in the same session is not possible while in the latter case, simulation performance are limited by the computationally most intensive domain (usually RF). We present an extension of the SystemC Network Simulation Library that allows to simulate antenna details and node position together with digital hardware and software. The validation on a real wearable system shows that the proposed simulation approach achieves a good trade-off between accuracy and speed thus allowing fast exploration of various configurations in the early phase of the design flow without recurring to the expensive and time-consuming creation of physical prototypes.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.3.3A CIRCUIT-DESIGN-DRIVEN TOOL WITH A HYBRID AUTOMATION APPROACH FOR SAR ADCS IN IOT
Speaker:
Ming Ding, IMEC/Holst Centre, NL
Authors:
Ming Ding1, Guibin Chen2, Pieter Harpe3, Benjamin Busze1, Yao-Hong Liu1, Christian Bachmann1, Kathleen Philips1 and Arthur van Roermund3
1imec-Holst Centre, NL; 2imec-Holst Centre, Eindhoven University of Technology, NL; 3Eindhoven University of Technology, NL
Abstract
A circuit-design-driven tool with a hybrid design automation approach for asynchronous SAR ADCs in IoT applications is presented. To minimize the circuit design time while still being able to maintain ADC performance, the hybrid approach allocates automation and manual effort properly for each block: fully-synthesized control logic, highly-automated DAC and S&H circuit, library-based comparator and template-based layout generation. A user interface governs the automated design flow from specification and circuit implementation to layout generation. Two prototypes are generated using the proposed flow in 40nm CMOS: an 8b 32MS/s and a 12b 1MS/s SAR ADC. The measured and the simulated ADC performance are in good agreement, showing the robustness of the proposed method. At 1V supply, two chips consume 187uW and 16.7uW, achieving 30.7fJ/conversion.step and 18.1fJ/conversion.step respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:15IP2-17, 902AMS VERIFICATION METHODOLOGY REGARDING SUPPLY MODULATION IN RF SOCS INDUCED BY DIGITAL STANDARD CELLS
Speaker:
Fabian Speicher, RWTH Aachen University, DE
Authors:
Fabian Speicher, Jonas Meier, Soheil Aghaie, Ralf Wunderlich and Stefan Heinen, RWTH Aachen University, DE
Abstract
Nanoscale CMOS enables and forces the use of digital-centric RF architectures, where timing resolution is traded for analog resolution. Simultaneously, digital circuits act as aggressors endangering the performance of the time continuous digital and analog parts. The switching activities of logic cells result in power supply variations which lead to jitter in the digital signal paths and causes interferers coupling to the analog paths, appearing as e.g. phase noise, crosstalk, unwanted frequency conversion, etc. Since todays commonly used AMS simulation methods are limited to register-transfer level (RTL) models for the digital domain, the electrical behavior caused by digital switching is not considered. Here, a method for modeling logic cells with regard to power supply noise is presented using the available characterization data of a standard cell library. It covers the influence of switching on the supply voltage as well as influences of supply variations on the digital path delay and their feedthrough to blocks of the RF domain. A fast event-driven simulation of an entire AMS system regarding the mentioned aspects is enabled. The method is demonstrated on a digital-centric transmitter to detect the effects on system level.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:16IP3-1, 611TESTBENCH QUALIFICATION FOR SYSTEMC-AMS TIMED DATA FLOW MODELS
Speaker:
Muhammad Hassan, DFKI GmbH, DE
Authors:
Muhammad Hassan1, Daniel Grosse2, Hoang M. Le3, Thilo Voertler4, Karsten Einwich4 and Rolf Drechsler2
1Cyber Physical Systems, DFKI, DE; 2University of Bremen/DFKI GmbH, DE; 3University of Bremen, DE; 4COSEDA Technologies GmbH, DE
Abstract
Analog-Mixed Signal (AMS) circuits have become increasingly important for today's SoCs. The Timed Data Flow (TDF) model of computation available in SystemC-AMS offers here a good tradeoff between accuracy and simulation-speed at the system-level. One of the main challenges in system-level verification is the quality of the testbench. In this paper, we present a testbench qualification approach for SystemC-AMS TDF models. Our contribution is twofold: First, we propose specific mutation models for the class of filters implemented as TDF models. This requires to analyze the Laplace transfer function of the filter design. Second, we present the mutation based qualification approach based on the proposed specific mutations as well as standard behavioral mutations. This allows to find serious quality issues in the testbench. Our experimental results for a real-world AMS system demonstrate the applicability and efficacy of our approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:17IP3-2, 143AN ALGEBRA FOR MODELING CONTINUOUS TIME SYSTEMS
Speaker:
José Medeiros, University of Brasilia, BR
Authors:
José E. G. de Medeiros1, George Ungureanu2 and Ingo Sander2
1University of Brasília, BR; 2KTH Royal Institute of Technology, SE
Abstract
Advancements on analog integrated design have led to new possibilities for complex systems combining both continuous and discrete time modules on a signal processing chain. However, this also increases the complexity any design flow needs to address in order to describe a synergy between the two domains, as the interactions between them should be better understood. We believe that a common language for describing continuous and discrete time computations is beneficial for such a goal and a step towards it is to gain insight and describe more fundamental building blocks. In this work we present an algebra based on the General Purpose Analog Computer, a theoretical model of computation recently updated as a continuous time equivalent of the Turing Machine.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

6.4 Modeling, Control and Scheduling for Cyber-Physical Systems

Date: Wednesday 21 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 2

Chair:
Shiyan Hu, Michigan Tech., US

Co-Chair:
Franco Fummi, University of Verona, IT

The fast advancement of cyber-physical systems has been presenting significant design challenges. The papers in this session address these CPS design challenges across layers of control, communication, computation and embedded microarchitecture. They include methodologies for modeling and integrating heterogeneous models to build CPS virtual platforms, routing and scheduling messages with control stability consideration for networked CPS, designing feedback control of EtherCAT networks for reliability enhancement, and scheduling tasks with consideration of cache to maximize control performance.

TimeLabelPresentation Title
Authors
11:006.4.1AUTOMATIC INTEGRATION OF CYCLE-ACCURATE DESCRIPTIONS WITH CONTINUOUS-TIME MODELS FOR CYBER-PHYSICAL VIRTUAL PLATFORMS
Speaker:
Franco Fummi, University of Verona, IT
Authors:
Michele Lora1, Stefano Centomo1, Davide Quaglia1 and Franco Fummi2
1University of Verona, IT; 2Universita' di Verona, IT
Abstract
Development of cyber-physical systems' control algorithms usually relies on architecture-agnostic abstract models, often leading to ineffective implementations. This paper presents a technique to automatically integrate cycle-accurate models of digital HW components with continuous-time physical models. It proposes a solution to the semantic gap between the involved models of computation. Furthermore, model generation and integration for both Simulink-based proprietary environment and FMI-based portable standard are presented. The aim of such techniques is to produce cyber-physical virtual-platforms: a powerful tool to refine control algorithms up to their SW implementations on the actual HW platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.4.2STABILITY-AWARE INTEGRATED ROUTING AND SCHEDULING FOR CONTROL APPLICATIONS IN ETHERNET NETWORKS
Speaker:
Rouhollah Mahfouzi, Linköping University, SE
Authors:
Rouhollah Mahfouzi1, Amir Aminifar2, Soheil Samii3, Ahmed Rezine1, Petru Eles1 and Zebo Peng1
1Linköping University, SE; 2Swiss Federal Institute of Technology in Lausanne (EPFL), CH; 3General Motors Research & Development, US
Abstract
Real-time communication over Ethernet is becoming important in various application areas of cyber-physical systems such as industrial automation and control, avionics, and automotive networking. Since such applications are typically time critical, Ethernet technology has been enhanced to support time-driven communication through the IEEE 802.1 TSN standards. The performance and stability of control applications is strongly impacted by the timing of the network communication. Thus, in order to guarantee stability requirements, when synthesizing the communication schedule and routing, it is needed to consider the degree to which control applications can tolerate message delays and jitters. In this paper we jointly solve the message scheduling and routing problem for networked cyber-physical systems based on the time-triggered Ethernet TSN standards. Moreover, we consider this communication synthesis problem in the context of control applications and guarantee their worst-case stability, taking explicitly into consideration the impact of communication delay and jitter on control quality. Considering the inherent complexity of the network communication synthesis problem, we also propose new heuristics to improve synthesis efficiency without any major loss of quality. Experiments demonstrate the effectiveness of the proposed solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.4.3FEEDBACK CONTROL OF REAL-TIME ETHERCAT NETWORKS FOR RELIABILITY ENHANCEMENT IN CPS
Speaker:
Tongquan Wei, East China Normal University, CN
Authors:
Liying Li1, Peijin Cong1, Kun Cao1, Junlong Zhou2, Tongquan Wei1, Mingsong Chen1 and Xiaobo Sharon Hu3
1East China Normal University, CN; 2Nanjing University of Science and Technology, CN; 3University of Notre Dame, US
Abstract
EtherCAT has become one of the leading real-time Ethernet solutions for networked industrial systems where a reliable communication infrastructure is needed due to highly error-prone environments. However, existing work on EtherCAT mainly focuses on clock synchronization and timeliness improvement. The reliability of EtherCAT-based networked systems has largely been ignored. In this paper, we present a PID-based feedback control scheme that aims at enhancing reliability of networked systems under timing and system resource constraints. Instead of automatic repeat request method (ARQ), a forward error control technique is introduced to achieve the required system reliability at a lower deadline miss rate of messages. The PID-based feedback control scheme can also improve the stability of a system in terms of deadline miss rate in the presence of bursty errors. Simulation results show that the proposed scheme can achieve reliability enhancement of up to 79% compared to benchmarking methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:156.4.4CACHE-AWARE TASK SCHEDULING FOR MAXIMIZING CONTROL PERFORMANCE
Speaker:
Wanli Chang, Singapore Institute of Technology, SG
Authors:
Wanli Chang1, Debayan Roy2, Xiaobo Sharon Hu3 and Samarjit Chakraborty2
1Singapore Institute of Technology, SG; 2Technical University of Munich, DE; 3University of Notre Dame, US
Abstract
Embedded control applications are widely implemented on small, low-cost and resource-constrained microcontrollers, e.g., in the automotive domain. Conventionally, control algorithms are designed using model-based approaches, without considering the details of the implementation platform. This leads to inefficient utilization of the resources. With the emergence of the cyber-physical system (CPS)-oriented thinking, there has lately been a strong interest in co-design of control algorithms and their implementation platforms. Some recent efforts have shown that a schedule on multiple applications with more on-chip cache reuse is able to improve the control performance. However, it has not been studied how the control performance can be maximized for a given schedule and how an optimal schedule can be computed. In this work, we propose a two-stage framework to compute the schedule maximizing the overall control performance of all the applications. First, a holistic controller design taking all the sampling periods and sensing-to-actuation delays in a schedule into account is presented, aiming to maximize the overall control performance. Second, a hybrid search algorithm for discrete decision space is reported to efficiently compute an optimal schedule. Experimental results on a case study with multiple automotive applications show that a significant improvement of 10-20% in control performance can be achieved by the proposed cache-aware scheduling approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-3, 135TTW: A TIME-TRIGGERED WIRELESS DESIGN FOR CPS
Speaker:
Romain Jacob, ETH Zurich, CH
Authors:
Romain Jacob1, Licong Zhang2, Marco Zimmerling3, Jan Beutel1, Samarjit Chakraborty2 and Lothar Thiele1
1ETH Zurich, CH; 2Technical University of Munich, DE; 3Technische Universität Dresden, DE
Abstract
Wired fieldbuses have long been proven effective in supporting Cyber-Physical Systems (CPS). However, various domains are now striving for wireless solutions due to ease of deployment or novel functionality requiring the ability to support mobile devices. Low-power wireless protocols have been proposed in response to this need, but requirements of a large class of CPS applications can still not be satisfied. We thus propose Time-Triggered Wireless (TTW), a distributed low-power wireless system design that minimizes communication energy consumption and offers end-to-end timing predictability, runtime adaptability, reliability, and low latency. Evaluation shows a 2x reduction in communication latency and 33-40% lower radio-on time compared with DRP, the closest related work, validating the suitability of TTW for new exciting wireless CPS applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP3-4, 429PHYLAX: SNAPSHOT-BASED PROFILING OF REAL-TIME EMBEDDED DEVICES VIA JTAG INTERFACE
Speaker:
Eduardo Chielle, New York University Abu Dhabi, BR
Authors:
Charalambos Konstantinou1, Eduardo Chielle2 and Michail Maniatakos2
1New York University, US; 2New York University Abu Dhabi, AE
Abstract
Real-time embedded systems play a significant role in the functionality of critical infrastructure. Legacy microprocessor-based embedded systems, however, have not been developed with security in mind. Applying traditional security mechanisms in such systems is challenging due to computing constraints and/or real-time requirements. Their typical 20-30 year lifespan further exacerbates the problem. In this work, we propose PHYLAX, a plug-and-play solution to detect intrusions in already installed embedded devices. PHYLAX is an external monitoring tool which does not require code instrumentation. Also, our tool adapts and prioritizes intrusion detection based on the requirements of the underlying infrastructure (power grid, chemical factory, etc.) as well as the computing capabilities of the target embedded system (CPU model, memory size, etc.). PHYLAX can be employed on any legacy device which incorporates a JTAG interface. As a case study, we present the inclusion of PHYLAX on a power grid recloser controller.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP3-5, 463CHARACTERIZING DISPLAY QOS BASED ON FRAME DROPPING FOR POWER MANAGEMENT OF INTERACTIVE APPLICATIONS ON SMARTPHONES
Speaker:
Chung-Ta King, National Tsing Hua University, TW
Authors:
Kuan-Ting Ho1, Chung-Ta King1, Bhaskar Das1 and Yung-Ju Chang2
1National Tsing Hua University, TW; 2National Chiao Tung University, TW
Abstract
User-centric power management in smartphones aims to conserve power without affecting user's perceived quality of experience. Most existing works focus on periodically updated applications such as games and video players and use a fixed frame rate, measured in frame per second (FPS), as the metric to quantify the display quality of service (QoS). The idea is to adjust the CPU/GPU frequency just enough to maintain the frame rate at a user satisfactory level. However, when applied to aperiodically-updated interactive applications, e.g. Facebook or Instagram, that draw the frame buffer at a varying rate in response to user inputs, such a power management strategy becomes too conservative. Based on real user experiments, we observe that users can tolerate a certain percentage of frame drops when running aperiodically updated applications without affecting their perceived display quality. Hence, we introduce a new metric to characterize display quality of service, called the frame drawn ratio (FDR), and propose a new CPU/GPU frequency governor based on the FDR metric. The experiments by real users show that the proposed governor can conserve 17.2% power in average when compared to the default governor, while maintaining the same or even better QoE rating.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

6.5 Special Session: Three Years of Low-Power Image Recognition Challenge

Date: Wednesday 21 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 3

Chair:
Yung-Hsiang Lu, Purdue University, US

Reducing power consumption has been one of the most important goals since the creation of electronic systems. Energy efficiency is increasingly important as battery-powered systems equipped with cameras (such as drones and body cameras) are widely used. It is desirable using the on-board computers to recognize objects in the images captured by these cameras. The Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015. LPIRC considers both energy consumption and the accuracy in detecting and locating objects in images. The special session includes presentations given by the winners of the first three years of LPIRC explaining their winning solutions.

TimeLabelPresentation Title
Authors
11:006.5.1THREE YEARS OF LOW-POWER IMAGE RECOGNITION CHALLENGE: INTRODUCTION TO SPECIAL SESSION
Speaker:
Yung-Hsiang Lu, Purdue University, US
Authors:
Kent Gauen1, Ryan Dailey1, Yung-Hsiang Lu1, Eunbyung Park2, Wei Liu2, Alexander Berg2 and Yiran Chen3
1Purdue University, US; 2University of North Carolina at Chapel Hill, US; 3Duke University, US
Abstract
Reducing power consumption has been one of the most important goals since the creation of electronic systems. Energy efficiency is increasingly important as battery-powered systems (such as smartphones, drones, and body cameras) are widely used. It is desirable using the on-board computers to recognize objects in the images captured by these cameras. The Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015. The special session includes presentations given by the winners of the first three years of LPIRC. This paper explains the rules of the competition and the rationale, summarizes the teams' scores, and describes the lessons learned in the first three years. The paper suggests possible improvements of future challenges.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:156.5.2REAL-TIME OBJECT DETECTION TOWARDS HIGH POWER EFFICIENCY
Speaker:
Jincheng Yu, Tsinghua University, CN
Authors:
Jincheng Yu1, Kaiyuan Guo1, Yiming Hu1, Xuefei Ning1, Jiantao Qiu1, Huizi Mao1, Song Yao2, Tianqi Tang1, Boxun Li1, Yu Wang1 and Huazhong Yang1
1Tsinghua University, Beijing, CN; 2DeePhi Technology, Beijing, CN
Abstract
In recent years, Convolutional Neural Network (CNN) has been widely applied in computer vision tasks and has achieved significant improvement in image object detection. The CNN methods consume more computation as well as storage, so GPU is introduced for real-time object detection. However, due to the high power consumption of GPU, it is difficult to adopt GPU in mobile applications like automatic driving. The previous work proposes some optimizing techniques to lower the power consumption of object detection on mobile GPU or FPGA. In the first Low-Power Image Recognition Challenge (LPIRC), our system achieved the best result with mAP/Energy on mobile GPU platforms. We further research the acceleration of detection algorithms and implement two more systems for real-time detection on FPGA with higher energy efficiency. In this paper, we will introduce the object detection algorithms and summarize the optimizing techniques in three of our previous energy efficient detection systems on different hardware platforms for object detection.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:406.5.3A RETROSPECTIVE EVALUATION OF ENERGY-EFFICIENT OBJECT DETECTION SOLUTIONS ON EMBEDDED DEVICES
Speaker:
Ying Wang, Chinese Academy of Sciences, CN
Authors:
Ying Wang, Zhenyu Quan, Yinhe Han, Jiajun Li, Huawei Li and Xiaowei Li, Institute of Computing Technology Chinese Academy of Sciences, Beijing, CN
Abstract
The field of image and video recognition has been propelled by the rapid development of deep learning in recent years. With its fascinating accuracy and generalization ability, deep CNNs have shown remarkable performance in large-scale and real-life image dataset. However, accommodating computation-intensive CNN-based image detection frameworks on power-constrained devices is considered more challenging than desktop or warehouse computing systems. Instead of emphasizing purely on detection accuracy, Low Power Image Recognition Challenge (LPIRC) is initiated to highlight the energy-efficiency of different image recognition solutions, and it witnesses the advancement of cost-effective image recognition technology in aspects of both algorithmic and architecture innovation. This paper introduces the cost-effective CNN-based object detection solutions that reached an improved tradeoff between energy and accuracy for mobile CPU+GPU SoCs, which is the winner of LPIRC2016, and it also analyzes the implications of both recent hardware and algorithm advancement on such a technique. It is demonstrated in our evaluation that the performance growth of embedded SoCs and CNN models have clearly contributed to a sheer growth of mAP/WH in current CNN-based object detection solutions, and also shifted the balance between accuracy and energy-cost in the contest solution design when we seek to maximize the efficiency score defined by LPIRC through design parameter exploration.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:056.5.4JOINT OPTIMIZATION OF SPEED, ACCURACY, AND ENERGY FOR EMBEDDED IMAGE RECOGNITION SYSTEMS
Speaker:
Soonhoi Ha, Seoul National University, KR
Authors:
Duseok Kang, Jintaek Kang, Donghyun Kang, Sungjoo Yoo and Soonhoi Ha, Seoul National University, KR
Abstract
This paper presents the image recognition system that won the first prize in the LPIRC (Low Power Image Recognition Challenge) in 2017. The goal of the challenge is to maximize the ratio between the accuracy and energy consumption within a time limit of 10 minutes for the processing of 20,000 images. Among three conflicting goals of accuracy, speed, and energy consumption, we considered the trade-off between accu- racy and speed first to select Nvidia Jetson TX2 as the hardware platform and Tiny YOLO as the image recognition algorithm. Next, we applied a series of software optimization techniques to improve throughput, such as pipelining, multithreading, Tucker decomposition, and 16-bit quantization. Lastly, we explored the CPU and GPU frequencies to minimize the total energy consumption. As a result, we could achieve an accuracy of 0.24 mAP with energy consumption of 2.08Wh, which corresponds to the score of 0.11931, 2.7 times higher than the winner of LPIRC 2016.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

6.8 Innovative Products for Autonomous Driving (part 2)

Date: Wednesday 21 March 2018
Time: 11:00 - 12:30
Location / Room: Exhibition Theatre

Organiser:
Hans-Jürgen Brand, IDT/ZMDI, DE

The workshop on Innovative Products for Autonomous Driving includes 2 sessions (part 1:session 3.8). This session will highlight how to do ultra-low-voltage design, how to accelerate physical signoff and a 22 nm FDSOI System-on-Chip development for Advanced Driver Assistance System.

TimeLabelPresentation Title
Authors
11:006.8.122FDX ULTRA-LOW-VOLTAGE DESIGN BASED ON ADAPTIVE BODY BIAS
Speaker:
Holger Eisenreich, Racyics GmbH, DE
Abstract

22FDX body bias allows to compensate process, temperature and slow voltage variations. Applied in Adaptive Body Bias (ABB) scheme, this technology feature enables Ultra-Low-Voltage implementations down to 0.4V for IoT-like designs with unparalleled energy efficiency. Racyics will present its 22FDX ABB IP platform and the related ABB-aware implementation and sign-off methodology.

11:306.8.2A NEW ADAS CHIP DESIGN IN 22 NM FDSOI TECHNOLOGY FOR AUTOMOTIVE COMPUTER VISION APPLICATIONS
Speaker:
Jens Benndorf, Dream Chip Technologies, DE
Abstract

The presentation explores the European collaboration on a 22 nm FDSOI System-on-Chip development for Advanced Driver Assistance System.
It will highlight partner co-operation and key trade-offs necessary to deliver the target performance. The scope includes:
• Short company introduction
• Project setup and target applications
• Chip architecture and performance requirements
• Hardware Demonstration System
• Outlook
• Summary

12:006.8.3ACCELERATING PHYSICAL SIGNOFF FOR LEADING EDGE CHIP DESIGNS
Speaker:
David DeMarcos, Synopsys, DE
Abstract

Physical Verification with IC Validator in the Synopsys Design Platform provides technology-leading, production-proven signoff solutions for design rule checking (DRC), connectivity verification layout-vs.-schematic (LVS), metal fill insertion, and design-for-manufacturability (DFM) enhancements. IC Validator is supported by all major foundries as a signoff solution for established-node designs, as well as advanced emerging-node designs at 20nm and below. It includes productivity links to leading design tools such as IC Compiler™/IC Compiler II physical implementation, StarRC™  parasitic extraction, and Custom Compiler™  mixed-signal design. IC Validator's In-Design physical verification speeds up design closure with timing-aware metal fill and DRC fixing within the IC Compiler and IC Compiler II environments.

12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

UB06 Session 6

Date: Wednesday 21 March 2018
Time: 12:00 - 14:00
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB06.1SYSTEM-LEVEL OPERATING CONDITION CHECKS: AUTOMATED AUGMENTATION OF VERILOGAMS MODELS
Authors:
Georg Gläser1, Martin Grabmann1, Gerrit Kropp1 and Andreas Fürtig2
1Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE; 2Goethe University Frankfurt, DE
Abstract
Analog/Mixed-Signal design and verification strongly relies on more or less abstract models to make extensive simulations feasible. Maintaining consistent behavior between system model and implementation is crucial for a correct verification. This also involves the operating conditions: A faulty model might introduce false-positive verification results despite of e.g. an incorrect supply voltage or missing bias currents. We present an automated workflow for extracting these checks from a transistor-level implementation and transfer it into a given Verilog-AMS model. The correctness of our approach is proved by evaluating the model coverage between the implementation and the model. As a demonstration scenario, we use a demodulator component of a HF RFID communication system. We extract the acceptance region from the transistor-level schematic and automatically generate and integrate a model safe-guard unit for performing the operating condition check.

More information ...
UB06.2TOPOLINANO & MAGCAD: A DESIGN AND SIMULATION FRAMEWORK FOR THE EXPLORATION OF EMERGING TECHNOLOGIES
Authors:
Umberto Garlando and Fabrizio Riente, Politecnico di Torino, IT
Abstract
We developed a design framework that enables the exploration and analysis of emerging beyond-CMOS technologies. It is composed of two powerful tools: ToPoliNano and MagCAD. Different technologies are supported, and new ones could be added thanks to their modular structure. ToPoliNano starts from a VHDL description of a circuit and performs the place&route following the technological constraints. The resulting circuit can be simulated both at logical or physical level. MagCAD is a layout editor where the user can design custom circuits, by placing basic elements of the selected technology. The tool can extract a VHDL netlist based on compact models of placed elements derived from experiments or physical simulations. Circuits can be verified with standard VHDL simulators. The design workflow will be demonstrated at the U-booth to show how those tools could be a valuable help in the studying and development of emerging technologies and to obtain feedbacks from the scientific community.

More information ...
UB06.3ADVANCED SIMULATION OF QUANTUM COMPUTATIONS
Authors:
Zulehner Alwin and Robert Wille, Johannes Kepler University Linz, AT
Abstract
Quantum computation is a promising emerging technology which allows for substantial speed-ups compared to classical computation. Since physical realizations of quantum computers are in their infancy, most research in this domain still relies on simulations on classical machines. This causes an exponential overhead which current simulators try to tackle with straight forward array-based representations and massive hardware power. There also exist solutions based on decision diagrams (graph-based approaches) that try to tackle the complexity by exploiting redundancies in quantum states and operations. However, they did not get established since they yield speedups only for certain benchmarks. Here, we demonstrate a new graph-based simulation approach which clearly outperforms state-of-the-art simulators. By this, users can efficiently execute quantum algorithms even if the respective quantum computers are not broadly available yet.

More information ...
UB06.4ROS X FPGA FOR ROBOT-CLOUD SYSTEM: ROBOT-CLOUD COOPERATIVE VISUAL SLAM PROCESSING USING ROS-COMPLIANT FPGA COMPONENT
Authors:
Takeshi Ohkawa, Yuhei Sugata, Aoi Soya, Kanemitsu Ootsu and Takashi Yokota, Utsunomiya University, JP
Abstract
Distributed processing in robot-cloud cooperative system is discussed in terms of processing performance and communication performance. Cooperation of robots and cloud-servers is inevitable for realizing intelligent robots in the next generation society and industry. To improve processing performance of the cooperative system, we utilize ROS-compliant FPGA component as a robot-side embedded processing for low-power and high-performance image processing. We prepare two demonstrations. (1)Key-point Detection from camera image using Fully-hardwired ROS-Compliant FPGA component In the evaluation, the processing performance of the component is almost same as PC, while it operates at more than 10 times less power (5W), compared to PC (50W). (2)Distributed Visual SLAM using two-wheeled robot (TurtleBot3) Distributed Visual SLAM (Simultaneous Localization and Mapping) are presented as a concrete example of the robot-cloud cooperative system.

More information ...
UB06.5WIRELESS SENSOR SYSTEM WITH ELECTROMAGNETIC ENERGY HARVESTER FOR INDUSTRY 4.0 APPLICATIONS
Authors:
Bianca Leistritz, Elena Chervakova, Sven Engelhardt, Axl Schreiber and Wolfram Kattanek, Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE
Abstract
An energy-autonomous and adaptive wireless multi-sensor system for a wide range of Industry 4.0 applications is presented here. By taking a holistic view of the sensor system and of the specific interactions of its components, technological barriers of individual system elements can be overcome. The energy supply of the demonstrator is realized by an miniaturized electromagnetic energy harvester, which could be easily and quickly adapted to the application-specific boundary conditions with the help of a computer assisted design process. Variations in the available energy are monitored by advanced energy management functions. The modular hardware and software platform is demonstrated by an adaptive measurement and data transmission rate. Communication takes place by means of industry 4.0 compliant standard protocols. The demonstrator was developed in the research group Green-ISAS funded by the Free State of Thuringia from the European Social Fund (ESF) under grant no. 2016 FGR 0055.

More information ...
UB06.6TTOOL/OMC: OPTIMIZED COMPILATION OF EXECUTABLE UML/SYSML DIAGRAMS FOR THE DESIGN OF DATA-FLOW APPLICATIONS
Authors:
Andrea Enrici1, Julien Lallet1, Renaud Pacalet2 and Ludovic Apvrille2
1Nokia Bell Labs, FR; 2Télécom ParisTech, FR
Abstract
Future 5G networks are expected to increase data rates by a factor of 10x. To meet this requirement, baseband stations will be equipped with both programmable (e.g., CPUs, DSPs) and reconfigurable components (e.g., FPGAs). Efficiently programming these architectures is not trivial due to the inner complexity and interactions of these two types of components. This raises the need for unified design flows capable of rapidly partitioning and programming these mixed architectures. Our demonstration will show the complete system-level design and Design Space Exploration, based on UML/SysML diagrams, of a 5G data-link layer receiver, that is partitioned onto both programmable and reconfigurable hardware. We realize an implementation of such a UML/SysML design by compiling it into an executable C application whose memory footprint is optimized with respect to a given scheduling. We will validate the effectiveness of our solution by comparing automated vs manual designs.

More information ...
UB06.7PRIME: PLATFORM- AND APPLICATION-AGNOSTIC RUN-TIME POWER MANAGEMENT OF HETEROGENEOUS EMBEDDED SYSTEMS
Authors:
Domenico Balsamo, Graeme M. Bragg, Charles Leech and Geoff V. Merrett, University of Southampton, GB
Abstract
Increasing energy efficiency and reliability at runtime is a key challenge of heterogeneous many-core systems. We demonstrate how contributions from the PRiME project integrate to enable application- and platform-agnostic runtime management that respects application performance targets. We consider opportunities to enable runtime management across the system stack and we enable cross-layer interactions to trade-off power and reliability with performance and accuracy. We consider a system as three distinct layers, with abstracted communication between them, which enables the direct comparison of different approaches, without requiring specific application or platform knowledge. Application-agnostic runtime management is demonstrated with a selection of runtime managers from PRiME, including linear regression modelling and predictive thermal management, operating across multiple applications. Platform-independent runtime management is demonstrated using two heterogeneous platforms.

More information ...
UB06.8RISC-V PROCESSOR MODELING IN IP-XACT USING KACTUS2
Authors:
Esko Pekkarinen and Timo Hämäläinen, Tampere University of Technology, FI
Abstract
The complexity of modern embedded system design is managed by advanced, high-level design methodologies such as IP-XACT. However, integrating IP-XACT as a part of an existing design flow and packaging legacy sources is too often inhibited by the inherit differences between IP-XACT and the traditional hardware description languages. In this work, we take an existing Verilog implementation of a RISC-V microprocessor and package it with our open-source IP-XACT tool Kactus2. The resulting IP-XACT description will be publicly available and based on the modeling experience we report the observed pitfalls in the transition from HDL to IP-XACT.

More information ...
UB06.10EXPERIENCE-BASED AUTOMATION OF ANALOG IC DESIGN
Authors:
Florian Leber and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous circuit synthesis approaches, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches (also known as "generators"): They (a) emulate experts' decisions, thus (b) make expert knowledge re-usable and (c) can consider all relevant aspects and constraints implicitly. Nowadays, generators are successfully applied in analog layout (Pcells, Pycells). We aim at an entire design flow completely based on procedural automation techniques. This flow will consist of procedures for the generation of schematics and layouts for every typical analog circuit class, such as amplifier, bandgap, filter a.s.o. In our presentation we give an overview on such a design flow and we show an approach for capturing an analog circuit designer's strategy as an executable "expert design plan".

More information ...
14:00End of session
16:00Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.0 LUNCH TIME KEYNOTE SESSION: From Inverse Design to Implementation of Robust and Efficient Photonics for Computing

Date: Wednesday 21 March 2018
Time: 13:50 - 14:20
Location / Room: Saal 2

Chair:
Ayse Coskun, Boston University, US

It is estimated that nearly 10% of the world electricity is consumed in information processing and computing, including data centers [D.A.B. Miller, Journal of Lightwave Technology, 2017]. It is clear that the exponential growth in use of these technologies is not sustainable unless dramatic changes are made to computing hardware, in order to increase its speed and energy efficiency. Optical interconnects are considered a solution to these obstacles, with potential to reduce energy consumption in on-chip optical interconnects to atto-Joule per bit (aJ/bit), while increasing operating speed beyond 20GHz. However, the state of the art photonics is bulky, inefficient, sensitive to environment, lossy, and its performance is severely degraded in real-world environment as opposed to ideal laboratory conditions, which has prevented from using it in many practical applications, including interconnects. Therefore, it is clear that new approaches for implementing photonics are crucial. We have recently developed a computational approach to inverse-design photonics based on desired performance, with fabrication constraints and structure robustness incorporated in design process. Our approach performs physics guided search through the full parameter space until the optimal solution is reached. Resulting device designs are non-intuitive, but are fabricable using standard techniques, resistant to temperature variations of hundreds of degrees, typical fabrication errors, and they outperform state of the art counterparts by many orders of magnitide in footprint, efficiency and stability. This is completely different from conventional approach to design photonics, which is almost always performed by brute-force or intuition-guided tuning of a few parameters of known structures, until satisfactory performance is achieved, and which almost always leads to sub-optimal designs. Apart from integrated photonics, our approach is also applicable to any other optical and quantum optical devices and systems.

TimeLabelPresentation Title
Authors
13:507.0.2KEYNOTE SPEAKER
Author:
Jelena Vuckovic, Stanford University, US
Abstract
It is estimated that nearly 10% of the world electricity is consumed in information processing and computing, including data centers [D.A.B. Miller, Journal of Lightwave Technology, 2017]. It is clear that the exponential growth in use of these technologies is not sustainable unless dramatic changes are made to computing hardware, in order to increase its speed and energy efficiency. Optical interconnects are considered a solution to these obstacles, with potential to reduce energy consumption in on-chip optical interconnects to atto-Joule per bit (aJ/bit), while increasing operating speed beyond 20GHz. However, the state of the art photonics is bulky, inefficient, sensitive to environment, lossy, and its performance is severely degraded in real-world environment as opposed to ideal laboratory conditions, which has prevented from using it in many practical applications, including interconnects. Therefore, it is clear that new approaches for implementing photonics are crucial. We have recently developed a computational approach to inverse-design photonics based on desired performance, with fabrication constraints and structure robustness incorporated in design process. Our approach performs physics guided search through the full parameter space until the optimal solution is reached. Resulting device designs are non-intuitive, but are fabricable using standard techniques, resistant to temperature variations of hundreds of degrees, typical fabrication errors, and they outperform state of the art counterparts by many orders of magnitide in footprint, efficiency and stability. This is completely different from conventional approach to design photonics, which is almost always performed by brute-force or intuition-guided tuning of a few parameters of known structures, until satisfactory performance is achieved, and which almost always leads to sub-optimal designs. Apart from integrated photonics, our approach is also applicable to any other optical and quantum optical devices and systems.
14:20End of session
16:00Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

UB07 Session 7

Date: Wednesday 21 March 2018
Time: 14:00 - 16:00
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB07.1GENERATING FULL-CUSTOM SCHEMATICS IN A MIXED-SIGNAL TOP-DOWN DESIGN FLOW
Authors:
Tobias Markus1, Markus Mueller2 and Ulrich Bruening1
1University of Heidelberg, DE; 2Extoll GmbH, DE
Abstract
Design time is one of the precious assets in the cycle of hardware design. The top down methodology has been used in digital designs very successfully and now we also apply it for analog and mixed signal designs. Generating most of the structures automatically saves time and avoids errors. A Top Down Design Flow for Mixed Signal Designs is used which generates the schematic structure from the system RNM representation. Since the structural verilog part of the system level design will automatically generate the schematic structure it is only the functional part which is missing and has to be implemented by the analog designer. Some often used blocks can be used as an entry point to partially generate parts of the design in the schematic and furthermore even parts of the layout. We will demonstrate this design method with an example project.

More information ...
UB07.2CLAVA-MARGOT: CLAVA + MARGOT = C/C++ TO C/C++ COMPILER AND RUNTIME AUTOTUNING FRAMEWORK
Authors:
João Bispo1, Davide Gadioli2, Pedro Pinto1, Emanuele Vitali2, Hamid Arabnejad1, Gianluca Palermo2, Cristina Silvano2, Jorge G. Barbosa1 and João M. P Cardoso1
1Porto University, PT; 2Politecnico di Milano (POLIMI), IT
Abstract
Current computing platforms consist of heterogeneous architectures. To efficiently target those platforms, compilers can be extend with code transformations and insertion of code to interface to runtime autotuning schemes, which tune application parameters according to: the actual execution, target architecture, and workload. We present an approach consisting of a C/C++ source-to-source compiler (Clava) and an autotuner (mARGOt ). They are part of the toolflow of the FET-HPC ANTAREX project and allow parallelization, multiversioning and code transformations in the context of runtime autotuning. mARGOt is an autotuner that allows application adaptation to changing conditions and goals. Clava is a source-to-source compiler to transform C/C++ programs, including code instrumentation and integration with components such as mARGOt. We will demonstrate how to use Clava to integrate the mARGOT autotuner in an example application, and several mARGOt functionalities exposed through a Clava API.

More information ...
UB07.3OISC MULTICORE STENCIL PROCESSOR: ONE INSTRUCTION-SET COMPUTER-BASED MULTICORE PROCESSOR FOR STENCIL COMPUTING
Authors:
Kaoru Saso, Jing Yuan Zhao and Yuko Hara-Azumi, School of Engineering, Tokyo Institute of Technology, JP
Abstract
Subtract and Branch on NEGative with 4 operands (SUBNEG4) is one of One Instruction-Set Computers that execute only one type of instruction. Thanks to its simplicity, SUBNEG4 has only 1/20x circuit area and 1/10x power consumption against MIPS processor. As SUBNEG4 is Turing-complete, it is suitable for parallel computing by multiple cores, while keeping its low-power feature. Our on-going project is seeking for effective use and deployment of SUBNEG4 cores on embedded systems. Our booth will demonstrate the significant speed-up by a SUBNEG4-based many-core processor against a conventional processor, for stencil computing. Our 64-core processor efficiently handles 2D von-Neumann neighborhood stencils, e.g., wave simulation by Verlet integration and 2D Jacobi iteration, to compute 64 points simultaneously. We show that small many-core processors can be realized even with such large number of cores while achieving good speed-up for heaving computation.

More information ...
UB07.4ROS X FPGA FOR ROBOT-CLOUD SYSTEM: ROBOT-CLOUD COOPERATIVE VISUAL SLAM PROCESSING USING ROS-COMPLIANT FPGA COMPONENT
Authors:
Takeshi Ohkawa, Yuhei Sugata, Aoi Soya, Kanemitsu Ootsu and Takashi Yokota, Utsunomiya University, JP
Abstract
Distributed processing in robot-cloud cooperative system is discussed in terms of processing performance and communication performance. Cooperation of robots and cloud-servers is inevitable for realizing intelligent robots in the next generation society and industry. To improve processing performance of the cooperative system, we utilize ROS-compliant FPGA component as a robot-side embedded processing for low-power and high-performance image processing. We prepare two demonstrations. (1)Key-point Detection from camera image using Fully-hardwired ROS-Compliant FPGA component In the evaluation, the processing performance of the component is almost same as PC, while it operates at more than 10 times less power (5W), compared to PC (50W). (2)Distributed Visual SLAM using two-wheeled robot (TurtleBot3) Distributed Visual SLAM (Simultaneous Localization and Mapping) are presented as a concrete example of the robot-cloud cooperative system.

More information ...
UB07.5WIRELESS SENSOR SYSTEM WITH ELECTROMAGNETIC ENERGY HARVESTER FOR INDUSTRY 4.0 APPLICATIONS
Authors:
Bianca Leistritz, Elena Chervakova, Sven Engelhardt, Axl Schreiber and Wolfram Kattanek, Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE
Abstract
An energy-autonomous and adaptive wireless multi-sensor system for a wide range of Industry 4.0 applications is presented here. By taking a holistic view of the sensor system and of the specific interactions of its components, technological barriers of individual system elements can be overcome. The energy supply of the demonstrator is realized by an miniaturized electromagnetic energy harvester, which could be easily and quickly adapted to the application-specific boundary conditions with the help of a computer assisted design process. Variations in the available energy are monitored by advanced energy management functions. The modular hardware and software platform is demonstrated by an adaptive measurement and data transmission rate. Communication takes place by means of industry 4.0 compliant standard protocols. The demonstrator was developed in the research group Green-ISAS funded by the Free State of Thuringia from the European Social Fund (ESF) under grant no. 2016 FGR 0055.

More information ...
UB07.6EMBEDDED ACCELERATION OF IMAGE CLASSIFICATION APPLICATIONS FOR STEREO VISION SYSTEMS
Authors:
Mohammad Loni1, Carl Ahlberg2, Masoud Daneshtalab2, Mikael Ekström2 and Mikael Sjödin2
1MDH, SE; 2Mälardalen University, SE
Abstract
Autonomous systems are used in a broad range of applications from indoor utensils to medical application. Stereo vision cameras probably are the most flexible sensing way in these systems since they can extract depth, luminance, color, and shape information. However, stereo vision based applications suffer from huge image sizes, computational complexity and high energy consumption. To tackle these challenges, we first developed GIMME2 [1], a high-throughput, and cost efficient FPGA-based stereo vision system. In the next step, we present a novel approximation accelerator which is also compatible with GIMME2. Our accelerator tries to map neural network (NN) based image classification algorithms to FPGA by using DeepMaker which is an evolutionary based module embed in our accelerator that regenerates a near-optimal NN in term of accuracy. Then, the back-end side of DeepMaker maps the generated NN to FPGA. We will demo a GIMME2-based accelerator for image classification applications.

More information ...
UB07.7T-CREST: THE OPEN-SOURCE REAL-TIME MULTICORE PROCESSOR
Authors:
Martin Schoeberl, Luca Pezzarossa and Jens Sparsø, Technical University of Denmark, DK
Abstract
Future real-time systems, such as advanced control systems or real-time image recognition, need more powerful processors, but still a system where the worst-case execution time (WCET) can be statically predicted. Multicore processors are one answer to the need for more processing power. However, it is still an open research question how to best organize and implement time-predictable communication between processing cores. T-CREST is an open-source multicore processor for research on time-predictable computer architecture. In consists of several Patmos processors connected by various time-predictable communication structures: access to shared off-chip, access to shared on-chip memory, and the Argo network-on-chip for fast inter-processor communication. T-CREST is supported by open-source development tools, such as compilation and WCET analysis. To best of our knowledge, T-CREST is the only fully open-source architecture for research on future real-time multicore architectures.

More information ...
UB07.8IIP GENERATORS TO EASE ANALOG IC DESIGN
Authors:
Benjamin Prautsch, Uwe Eichler and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS/EAS, DE
Abstract
Semiconductor technology has shown significant progress over the last decades. Digital EDA (electronic design automation) allowed that this progress could be converted to high-performance digital ICs. Analog components are part of Systems-on-Chip (SoC) too, but analog EDA lags far behind. Therefore, a lot of effort was spent to automate analog IC design. Mayor results are constraint-based layout-aware optimization tools using predefined layout templates or pure automation as well as analog generators containing expert knowledge. While optimization is a holistic top-down approach, generators allow parameterized and fast bottom-up generation of critical schematic and layout parts, pre-planned by experienced designers. With IIP Generators, we follow three use cases to ease analog design: 1) design on higher hierarchy levels, 2) development of hierarchical high-level IIPs, and 3) automated design porting due to highly technology-independent blocks down to 22nm.

More information ...
UB07.9CIJTAG: CONCURRENT IJTAG DEMONSTRATOR
Author:
Krenz-Baath René, Hamm-Lippstadt University of Applied Sciences, DE
Abstract
The flexibility of on-chip instrument access enabled by IEEE 1687 (IJTAG) has shown tremendous improvements in modern industrial designs. Due to a constantly increasing spektrum of tasks performed through 1687 networks such as performing test operations during production test, on-line test operations as well as operating health monitors the test requirements in modern designs increase dramatically with respect to test performance, responsiveness and low power. These requirements have a major impact on the design of such test infrastructures. In complex designs with large test infrastructures it might be challenging to comply with the large spectrum of requirements. Concurrent IJTAG is novel partitioning concept to a reconfigurable test infrastructure in order to enable an independent operation of different sections of the test infrastructure. The proposed demonstrator shows the first FPGA-based implementation of concurrent IJTAG test infrastructures.

More information ...
UB07.10EXPERIENCE-BASED AUTOMATION OF ANALOG IC DESIGN
Authors:
Florian Leber and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous circuit synthesis approaches, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches (also known as "generators"): They (a) emulate experts' decisions, thus (b) make expert knowledge re-usable and (c) can consider all relevant aspects and constraints implicitly. Nowadays, generators are successfully applied in analog layout (Pcells, Pycells). We aim at an entire design flow completely based on procedural automation techniques. This flow will consist of procedures for the generation of schematics and layouts for every typical analog circuit class, such as amplifier, bandgap, filter a.s.o. In our presentation we give an overview on such a design flow and we show an approach for capturing an analog circuit designer's strategy as an executable "expert design plan".

More information ...
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.1 Special Day Session on Future and Emerging Technologies: Theoretical and practical aspects of verification of quantum computers

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Saal 2

Chair:
Naveh Yehuda, IBM Research, IL

Quantum computing is emerging at a meteoric pace from a pure academic field to a fully industrial framework. Rapid advances are happening both in the physical realizations of quantum chips, and in their potential software applications. In contrast, we are not seeing that rapid growth in the design and verification methodologies for scaled-up quantum machines. In this session we describe the field of verification of quantum computers. We discuss the underlying concepts of this field, its theoretical and practical challenges, and state-of-the-art approaches to addressing those challenges. The goal of this session is to help facilitate early efforts to adapt and create verification methodologies for quantum computers and systems. Without such early efforts, a debilitating gap may form between the state-of-the-art of low level physical technologies for quantum computers, and our ability to build medium, large, and very large scale integrated quantum circuits (M/L/VLSIQ).

TimeLabelPresentation Title
Authors
14:307.1.1VERIFICATION OF QUANTUM COMPUTING
Speaker:
Petros Wallden, School of Informatics, University of Edinburgh, GB
Author:
Elham Kashefi, School of Informatics, University of Edinburgh, UK & CNRS LIP6, GB
Abstract
Quantum computers promise to efficiently solve not only problems believed to be intractable for classical computers, but also problems for which verifying the solution is also considered intractable. This raises the question of how one can check whether quantum computers are indeed producing correct results. This task, known as quantum verification, has been highlighted as a significant challenge on the road to scalable quantum computing technology. We review the most significant approaches to quantum verification and compare them in terms of structure, complexity and required resources. We also comment on the use of cryptographic techniques which, for many of the presented protocols, has proven extremely useful in performing verification. Finally, we discuss issues related to fault tolerance, experimental implementations and the outlook for future protocols.
14:507.1.2GAINING INSIGHT INTO NEAR-TERM QUANTUM DEVICES WITH TAILOR-MADE APPLICATIONS
Author:
James R. Wootton, University of Basel, CH
Abstract
Many interesting algorithms have been designed for large scale fault-tolerant quantum computers. However, most will not be suitable for the smaller and noisier devices of the next decade. To understand how these devices function, we must therefore use applications specifically designed for their capabilities. In this talk we briefly introduce two possibilities. One is quantum error correction, which allows us to directly analyze imperfections in a device, as well as determine how well we can control them. The other is games, which can provide general insights into the capabilities of a device in a widely relatable manner.
15:157.1.3THE ENGINEERING CHALLENGES IN QUANTUM COMPUTING
Author:
Koen Bertels, Delft University of Technology, NL
Abstract
In this presentation we will present the ongoing work that focuses on defining and building a micro-architecture for a quantum computer. We will present the essence of quantum computing, the challenges as well as our current long term (>5 years) and short term (<5 years) in this respect and we will discuss the system vision as well as the Transmon and Spinqubit processor prototypes that we have developed with the colleagues L. DiCarmo and L. Vandersypen at QuTech.
15:407.1.4QUANTUM VERIFICATION: WHAT CAN WE ADOPT AND LEARN FROM CLASSICAL VERIFICATION
Author:
Yehuda Naveh, IBM Research - Haifa, IL
Abstract
I will provide a view of the challenges of verifying quantum computers through the lenses of classical verification methodologies. I will argue that while the fields are inherently different (e.g., quantum verification is a challenge already at the 50-qubit chip levels, while classical verification challenges stem mostly from complex micro-architectural structures present only at multi-million transistor chips), many methods of classical verification may still be adapted to the quantum regime. These include abstracted simulation and modeling languages, directed constraint-based random benchmarking, coverage measures, and more. My hope is that learning from the long history of classical verification will make the process of reaching robust, efficient, and stable verification methodologies for quantum computers much faster and less painful than has been for the classical case.
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.2 Run-time power estimation and optimization

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 6

Chair:
Pascal Vivet, CEA-Leti, FR

Co-Chair:
Donghwa Shin, Yeungnam Univ. Daegu, KR

In this session, the first paper presents energy efficiency optimization for CPU-GPU heterogeneous architectures using machine-learning. The next two papers present run-time power modeling and estimation methods for embedded systems. Finally, the last paper presents an online reconfiguration method for photovoltaic power system.

TimeLabelPresentation Title
Authors
14:307.2.1AIRAVAT: IMPROVING ENERGY EFFICIENCY OF HETEROGENEOUS APPLICATIONS
Speaker:
Trinayan Baruah, Northeastern University, US
Authors:
Trinayan Baruah1, Yifan Sun1, Shi Dong1, David Kaeli1 and Norm Rubin2
1Northeastern University, Boston, US; 2NVIDIA, US
Abstract
We are seeing an emerging class of applications that attempt to make use of both the CPU and GPU in a heterogeneous system. The peak performance for these applications is achieved when both the CPU and GPU are used collaboratively. However, along with this increased gain in performance, power and energy management is a larger challenge. In this paper we address the issue of executing applications that utilize both the CPU and GPU in an energy efficient way. Towards this end we propose a power management framework named Airavat that tunes the CPU, GPU and memory frequencies, synergestically, in order to improve the energy efficiency of collaborative CPU-GPU applications. Airavat uses machine learning-based prediction models, combined with feedback based Dynamic Voltage and Frequency Scaling to improve the energy efficiency of such applications. We demonstrate our framework on the Jetson TX1 and observe an improvement in terms of Energy Delay Product(EDP) by 24% with only a minimal performance loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.2.2ALL-DIGITAL EMBEDDED METERS FOR ON-LINE POWER ESTIMATION
Speaker:
Daniele Jahier Pagliari, Politecnico di Torino, IT
Authors:
Daniele Jahier Pagliari, Valentino Peluso, Yukai Chen, Andrea Calimera, Enrico Macii and Massimo Poncino, Politecnico di Torino, IT
Abstract
Modern low power designs use multiple knobs for concurrent dynamic and leakage power optimization; supply voltage and threshold voltage are the most adopted. An efficient control of these knobs needs management policies aware of the power breakdown. This implies the availability of smart on-chip strategies for dynamic and leakage power estimation at runtime. In this paper, we address this issue proposing the implementation of embedded dynamic/static power meters that use an optimized regression model fed with data collected from in-situ activity monitors. The number of sensors, their bitwidth and optimal placement are obtained through an automated design flow. The methodology works for general logic and applies not just to processor cores, but also to application-specific designs. We apply our solution to a representative class of benchmarks, showing that it can achieve an average estimation error smaller than 3%, with limited area and power overheads.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.2.3POWERPROBE: RUN-TIME POWER MODELING THROUGH AUTOMATIC RTL INSTRUMENTATION
Speaker:
Davide Zoni, Politecnico di Milano, IT
Authors:
Davide Zoni, Luca Cremona and William Fornaciari, Politecnico di Milano, IT
Abstract
Online power monitoring represents a de-facto solution to enable energy- and power-aware run-time optimizations for current and future computing architectures. Traditionally, the performance counters of the target architecture are used to feed a software-based, power model that is continuously updated to deliver the required run-time power estimates. The solution introduces a non-negligible performance and energy overhead. Moreover, it is limited to the availability of such performance counters that, however, are not primarily intended for online power monitoring. This paper introduces PowerProbe, a run-time power monitoring methodology that automatically extracts and implements a power model from the RTL description of the target architecture. The solution does not leverage any performance counter to ensure wide applicability. Moreover, the use of ad-hoc hardware that continuously updates the power estimate minimizes both the performance and the power overheads. We employ a fully compliant OpenRisc 1000 implementation to validate PowerProbe. The results highlight an average prediction error within 9% (standard deviation less than 2%), with a power and area overheads limited to 6.89% and 4.71%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:457.2.4DESIGN OPTIMIZATION OF PHOTOVOLTAIC ARRAY ON A CURVED SURFACE
Speaker:
Sangyoung Park, Technical University of Munich, DE
Authors:
Sangyoung Park and Samarjit Chakraborty, Technical University of Munich, DE
Abstract
Flexible photovoltaic (PV) arrays often have to be mounted on surfaces that have a significant amount of curvature. These include solar-powered vehicles, planes, and also some wearable devices. However, this inevitably leads to non-uniform solar irradiance among connected PV cells. If one cell among series-connected PV cells receives significantly lower solar irradiance, the overall power generation of the string is reduced. While previous works dealt with this by employing sophisticated run-time techniques, we show that design-time approaches that determine the electrical series-parallel connection of a PV array could also significantly enhance the power output. In this paper, we propose a k-means clustering-based algorithm to group PV cells/modules with similar solar irradiance to form a PV string, even allowing irregular arrays, to maximize the power generation of the array for a given irradiance profile. Our experimental results show that the power generation of a PV array could be increased by 84% compared to usual PV array organizations that do not take the curvature of the mounted surface into account.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-6, 554PREDICTION-BASED FAST THERMOELECTRIC GENERATOR RECONFIGURATION FOR ENERGY HARVESTING FROM VEHICLE RADIATORS
Speaker:
Xue Lin, Northeastern University, US
Authors:
Hanchen Yang1, Feiyang Kang2, Caiwen Ding3, Ji Li4, Jaemin Kim5, Donkyu Baek6, Shahin Nazarian4, Xue Lin7, Paul Bogdan4 and Naehyuck Chang8
1Beijing University of Posts and Telecommunications, CN; 2Zhejiang University, CN; 3Syracuse University, US; 4University of Southern California, US; 5Seoul National University, KR; 6Korea Advanced Institute of Science and Technology, KR; 7Northeastern University, US; 8KAIST, KR
Abstract
Thermoelectric generation has increasingly drawn attention for being environmentally friendly. However, only a few of the prior researches on thermoelectric generators (TEG) have focused on improving efficiency at system level. They attempt to capture the electrical property changes on TEG modules as the temperature fluctuates on vehicle radiators. The most recent reconfiguration algorithm shows large improvements on output performance but suffers from major drawback on computational time and energy overhead, and non-scalability in terms of array size and processing frequency. In this paper, we propose a novel TEG array reconfiguration algorithm that determines near-optimal configuration with an acceptable computational time. More precisely, with O(N) time complexity, our prediction-based fast TEG reconfiguration algorithm enables all modules to work at or near their maximum power points (MPP). Additionally, we incorporate prediction methods to further reduce the runtime and switching overhead during the reconfiguration process. Experimental results present 30% performance improvement, almost 100x reduction on switching overhead and 13x enhancement on computational speed compared to the baseline and prior work. The scalability of our algorithm makes it applicable to larger scale systems such as industrial boilers and heat exchangers.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP3-7, 141A PARAMETERIZED TIMING-AWARE FLIP-FLOP MERGING ALGORITHM FOR CLOCK POWER REDUCTION
Speaker:
Chaochao Feng, National University of Defense Technology, CN
Authors:
Chaochao Feng1, Daheng Yue1, Zhenyu Zhao1 and Zhuofan Liao2
1National University of Defense Technology, CN; 2Changsha University of Science and Technology, CN
Abstract
In modern integrated circuits, the clock power contributes a dominant part of the chip power. Clock power can be reduced effectively by utilizing multi-bit flip-flops. In this paper, a parameterized timing-aware flip-flop merging algorithm is proposed for clock power reduction. The single-bit flip-flops are merged into multi-bit flip-flops after placement & optimization and before clock network synthesis with consideration of function, scan chain information, distance and timing constraints. The algorithm can be configured with different parameters, such as the bit-number of MBFF, the setup timing margin and the distance margin. Experimental results under an industrial design show that compared with the basic design without MBFF, the design with 2-bit, 4-bit, 6-bit, and 8-bit MBFFs can save 7.5%, 12%, 11.8% and 11.1% total power consumption respectively. Using MBFF4 to replace 1-bit FFs is the best choice for the design optimization, which achieves minimum area and total power consumption. We also compare the designs with MBFF4 replacement under five different setup timing margins and distance margins. Without violating any timing constraint, it is better to set the setup timing margin as small as possible to achieve best power optimization. The distance margin (100μm, 30μm) is the best choice for this industry design to achieve minimum power consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:02IP3-8, 621FAST CHIP-PACKAGE-PCB COANALYSIS METHODOLOGY FOR POWER INTEGRITY OF MULTI-DOMAIN HIGH-SPEED MEMORY: A CASE STUDY
Speaker:
Seungwon Kim, Ulsan National Institute of Science and Technology, KR
Authors:
Seungwon Kim1, Ki Jin Han2, Youngmin Kim3 and Seokhyeong Kang1
1Ulsan National Institute of Science and Technology (UNIST), KR; 2Dongguk University, KR; 3Kwangwoon University, KR
Abstract
The power integrity of high-speed interfaces is an increasingly important issue in mobile memory systems. However, because of complicated design variations such as adjacent VDD domain coupling, conventional case-specific modeling is limited in analyzing trends in results from parametric variations. Moreover, conventional industrial methods can be simulated only after the design layout is completed and it requires a lot of back-annotation processes, which result in delayed delays time to market. In this paper, we propose a chip-package-PCB coanalysis methodology applied to our multi-domain high-speed memory system model with a current generation method. Our proposed parametric simulation model can analyze the tendency of power integrity results from variable sweeps and Monte Carlo simulations, and it shows a significantly reduced runtime compared to the conventional EDA methodology under JEDEC LPPDR4 environment.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.3 Advances in Logic Synthesis and Technology Mapping

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 1

Chair:
Luciano Lavagno, Politecnico di Torino, IT

Co-Chair:
Mathias Soeken, EPFL, CH

This session presents recent progress in logic synthesis and technology mapping. The first paper discusses improvements to Boolean resynthesis using a theory on Boolean filtering and a more general notion of permissible functions. The second paper applies methods based on Boolean relations for the optimization of combinational logic networks. The third paper proposes a technology mapping approach for silicon nanowire reconfigurable FETs. The fourth paper presents for the first time an approximate synthesis methods for threshold logic circuits through an iterative approach that guarantees an error bound.

TimeLabelPresentation Title
Authors
14:307.3.1IMPROVEMENTS TO BOOLEAN RESYNTHESIS
Speaker:
Mathias Soeken, EPFL, CH
Authors:
Luca Amaru1, Mathias Soeken2, Patrick Vuillod3, Jiong Luo1, Alan Mishchenko4, Janet Olson1, Robert Brayton4 and Giovanni De Micheli2
1Synopsys Inc., US; 2Integrated System Laboratory – EPFL, CH; 3Synopsys Inc., FR; 4UC Berkeley, US
Abstract
Boolean resynthesis techniques are increasingly used in electronic design automation, to improve quality of results where algebraic methods hit local minima. Boolean methods rely on complete functional properties of a logic circuit, eventually including don't cares. In order to gather such properties, computationally expensive engines are required, e.g., truth tables, SAT and BDDs, which in turn determine the scalability of Boolean resynthesis. In this paper, we present theoretical and practical improvements to Boolean resynthesis, enabling more optimization opportunities to be found at the same, or smaller, runtime cost than state-of-the-art methods. Our contributions include: (i) a theory of Boolean filtering, to drastically reduce the number of gates processed and still retain all possible optimization opportunities, (ii) a weaker notion of maximum set of permissible functions, which can be computed efficiently via truth tables, (iii) a parallel package for truth table computation tailored to speedup Boolean methods, (iv) a generalized refactoring engine which supports multiple representation forms and (v) a practical Boolean resynthesis flow, which combines the techniques proposed so far. Using our Boolean resynthesis on the EPFL benchmarks, we improve 9 of the best known area results in the synthesis competition. Embedded in a commercial EDA flow for ASICs, our Boolean resynthesis flow reduces the area by 2.67%, and total negative slack by 5.48%, after physical implementation, at negligible runtime cost.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.3.2LOGIC OPTIMIZATION WITH CONSIDERING BOOLEAN RELATIONS
Speaker:
Chia-Cheng Wu, National Tsing Hua University, TW
Authors:
Tung-Yuan Lee1, Chia-Cheng Wu1, Chia-Chun Lin1, Yung-Chih Chen2 and Chun-Yao Wang3
1National Tsing Hua University, TW; 2Yuan Ze University, TW; 3Dept. CS, National Tsing Hua University, TW
Abstract
Boolean Relation (BR) is a many-to-many mapping between two domains. Logic optimization considering BR can exploit the potential flexibility existed in logic networks to minimize the circuits. In this paper, we present a logic optimization approach considering BR. The approach identifies a proper sub-circuit and locally changes its functionality by solving the corresponding BR in the sub-circuit without altering the overall functionality of the circuit. We conducted experiments on a set of MCNC benchmarks that cannot be further optimized by resyn2 script in ABC. The experimental results show that the node counts of these benchmarks can be further reduced. Additionally, when we apply our approach followed by the resyn2 script repeatedly, we can obtain 6.11% improvements in average.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.3.3TECHNOLOGY MAPPING FLOW FOR EMERGING RECONFIGURABLE SILICON NANOWIRE TRANSISTORS
Speaker:
Shubham Rai, Chair For Processor Design, CFAED, Technische Universität Dresden, Dresden, DE
Authors:
Shubham Rai, Michael Raitza and Akash Kumar, Technische Universität Dresden, DE
Abstract
Efficient circuit designs can make use of ambipolar nature of silicon nanowire (SiNW) over CMOS. Conventional circuit Design-Flow fails to use this inherent functional flexibility as CMOS based mapping considers a single logical output from logic gates. To address this, we propose an area-optimized technology mapping which uses this innate reconfigurability, offered by SiNW transistors for efficient circuit designs. To enable this objective, we use higher order functions (HOF) to encapsulate this extended functionality. Additionally, the electrical properties of SiNW allow us to take advantage of the available inverted forms of fan-ins for additional savings of area for XOR logic family. Experimental results using our technology mapping shows that area of SiNW based logic design is less by an average of 18:38% as compared to CMOS flow for complete mcnc benchmarks suite. Further, we evaluate our flow for both reconfigurability-aware and static layout for SiNW based logic gates. The whole flow including the new SiNW based genlib and the modified ABC tool is made available under open source license to enable further research for any kind of emerging ambipolar transistors.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:457.3.4EFFICIENT SYNTHESIS OF APPROXIMATE THRESHOLD LOGIC CIRCUITS WITH AN ERROR RATE GUARANTEE
Speaker:
Chia-Chun Lin, National Tsing Hua University, TW
Authors:
Yung-An Lai1, Chia-Chun Lin1, Chia-Cheng Wu1, Yung-Chih Chen2 and Chun-Yao Wang3
1National Tsing Hua University, TW; 2Yuan Ze University, TW; 3Dept. CS, National Tsing Hua University, TW
Abstract
Recently, Threshold logic attracts a lot of attention due to the advances of its physical implementation and the strong binding to neural networks. Approximate computing is a new design paradigm that focuses on error-tolerant applications, e.g., machine learning or pattern recognition. In this paper, we integrate threshold logic with approximate computing and propose a synthesis algorithm to obtain cost-efficient approximate threshold logic circuits with an error rate guarantee. We conduct experiments on IWLS 2005 benchmarks. The experimental results show that the proposed algorithm can efficiently explore the approximability of each benchmark. For a 5% error rate constraint, the circuit cost can be reduced by up to 65%, and 22.8% on average. Compared with a naive method, our approach has a speedup of 2.42 under a 5% error rate constraint.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-9, 367APPROXIMATE HARDWARE GENERATION USING SYMBOLIC COMPUTER ALGEBRA EMPLOYING GRöBNER BASIS
Speaker:
Saman Fröhlich, DFKI GmbH, DE
Authors:
Saman Froehlich1, Daniel Grosse2 and Rolf Drechsler2
1Cyber-Physical Systems, DFKI GmbH, DE; 2University of Bremen/DFKI GmbH, DE
Abstract
Many applications are inherently error tolerant. Approximate Computing is an emerging design paradigm, which gives the opportunity to make use of this error tolerance, by trading off accuracy for performance. The behavior of a circuit can be defined at an arithmetic level, by describing the input and output relation as a polynomial. Symbolic Computer Algebra (SCA) has been employed to verify that a given circuit netlist matches the behavior specified at the arithmetic level. In this paper, we present a method that relaxes the exactness requirement of the implementation. We propose a heuristic method to generate an approximation for a given netlist and use SCA to ensure that the result is within application-specific bounds for given error-metrics. In addition, our approach allows for automatic generation of approximate hardware wrt. applicationspecific input probabilities. To the best of our knowledge taking input probabilities, which are known for many practical applications, into account has not been considered before. We employ the proposed approach to generate approximate adders and show that the results outperform state-of-the-art, handcrafted approximate hardware.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP3-10, 600RECONFIGURABLE IMPLEMENTATION OF $GF(2^M)$ BIT-PARALLEL MULTIPLIERS
Speaker and Author:
José L. Imaña, Complutense University of Madrid, ES
Abstract
Hardware implementations of arithmetic operations over binary finite fields $GF(2^m)$ are widely used in several important applications, such as cryptography, digital signal processing and error-control codes. In this paper, efficient reconfigurable implementations of bit-parallel canonical basis multipliers over binary fields generated by type II irreducible pentanomials $f(y) = y^m + y^{n+2} + y^{n+1} + y^n +1$ are presented. These pentanomials are important because all five binary fields recommended by NIST for ECDSA can be constructed using such polynomials. In this work, a new approach for $GF(2^m)$ multiplication based on type II pentanomials is given and several post-place and route implementation results in Xilinx Artix-7 FPGA are reported. Experimental results show that the proposed multiplier implementations improve the area$imes$time parameter when compared with similar multipliers found in the literature.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.4 DRAM and NVMs

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 2

Chair:
Francisco Cazorla, BSC, ES

Co-Chair:
Olivier Sentieys, IRISA, FR

Memory is one of the major bottlenecks for performance and power. The first paper addresses this inefficiency by proposing a unified LLC+DRAM memory controller to harvest row buffer hits and to increase memory bandwidth. The next paper adopts approximate computing to increase the performance of STT and to reduce the number of writes to PCM. Finally, the third paper proposes an adaptive write current scaling approach that adjusts the write current at runtime while considering the write rates of the running application.

TimeLabelPresentation Title
Authors
14:307.4.1ROW-BUFFER HIT HARVESTING IN ORCHESTRATED LAST-LEVEL CACHE AND DRAM SCHEDULING FOR HETEROGENEOUS MULTICORE SYSTEMS
Speaker:
Xun Jiao, University of California, San Diego, US
Authors:
Yang Song1, Olivier Alavoine2 and Bill Lin1
1University of California, San Diego, US; 2Qualcomm Inc., US
Abstract
In heterogeneous multicore systems, the memory subsystem, including the last-level cache and DRAM, is widely shared among the CPU, the GPU, and the real-time cores. Due to their distinct memory traffic patterns, heterogeneous cores result in more frequent cache misses at the last-level cache. As cache misses travel through the memory subsystem, two schedulers are involved for the last-level cache and DRAM respectively. Prior studies treated the scheduling of the last-level cache and DRAM as independent stages. However, with no orchestration and limited visibility of memory traffic, neither scheduling stage is able to ensure optimal scheduling decisions for memory efficiency. Unnecessary precharges and row activations happen in DRAM when the memory scheduler is ignorant of incoming cache misses and DRAM row-buffer states are invisible to the last-level cache. In this paper, we propose a unified memory controller for the the last-level cache and DRAM with orchestrated schedulers. The memory scheduler harvests row-buffer hit opportunities in cache request buffers during spare time without inducing significant implementation cost. Extensive evaluations show that the proposed controller improves the total memory bandwidth of DRAM by 16.8% on average and saves DRAM energy by up to 29.7% while achieving comparable CPU IPC. In addition, we explore the impact of last-level cache bypassing techniques on the proposed memory controller.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.4.2ADAM: ADAPTIVE APPROXIMATION MANAGEMENT FOR THE NON-VOLATILE MEMORY HIERARCHIES
Speaker:
Muhammad Abdullah Hanif, TU Wien, AT
Authors:
Mohammad Taghi Teimoori1, Muhammad Abdullah Hanif2, Alireza Ejlali1 and Muhammad Shafique2
1Sharif University of Technology, IR; 2TU Wien, AT
Abstract
Existing memory approximation techniques focus on employing approximations at an individual level of the memory hierarchy (e.g., cache, scratchpad, or main memory). However, to exploit the full potential of approximations, there is a need to manage different approximation knobs across the complete memory hierarchy. Towards this, we model a system including STT-RAM scratchpad and PCM main memory with different approximation knobs (e.g., read/write pulse magnitude/duration) in order to synergistically trade data accuracy for both STT-RAM access delay and PCM lifetime by means of an integer linear programming (ILP) problem at design-time. Furthermore, a run-time algorithm is proposed to adaptively tune the approximation knobs of both STT-RAM and PCM to obtain high energy savings while keeping the quality within acceptable ranges across the complete memory hierarchy. We evaluated our proposed technique in a baseline system consisting 1mB STT-RAM scratchpad and 1GB PCM main memory. The experimental results demonstrate that our proposed technique improves the execution time and the lifetime by up to 23% and 2.3X, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.4.3A CROSS-LAYER ADAPTIVE APPROACH FOR PERFORMANCE AND POWER OPTIMIZATION IN STT-MRAM
Speaker:
Nour Sayed, KIT - Karlsruhe Institute of Technology, DE
Authors:
Nour Sayed, Rajendra Bishnoi, Fabian Oboril and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising candidate as a universal on-chip memory technology due to non-volatility, high density and scalability. However, high write energy and latency are major challenges in this memory technology due to the asymmetry and stochastic nature of the write operation. Typically, the write current is set for the minimum energy point, which can further impact the write latency. To mitigate these issues, we propose an adaptive write current scaling technique that adjusts the write current, and hence the write latency and energy based on the performance needs at run-time. Using this technique, optimal energy and performance points for write current are obtained using detailed device and system level analysis. Furthermore, we use run- time adaptation of write current by predicting the write access rate for the next execution phase. We evaluate the efficiency of the proposed approach on SPEC2000 applications for STT-MRAM-based L1 and L2-cache levels. The results show that the effective write latency of L1 and L2 is reduced by 52.4% and 55.7% with 7.6% and 1.4% area overheads, respectively, corresponding to the overall system performance optimization of 15.5% while the total memory energy consumption is increasing by only 3.2%.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-11, 331PROCESSING IN 3D MEMORIES TO SPEEDUP OPERATIONS ON COMPLEX DATA STRUCTURES
Speaker:
Luigi Carro, UFRGS, BR
Authors:
Paulo Cesar Santos1, Geraldo Francisco de Oliveira Junior1, Joao Paulo Lima1, Marco Antonio Zanata Alves2, Luigi Carro1 and Antonio Carlos Schneider Beck1
1UFRGS, BR; 2UFPR, BR
Abstract
Pointer chasing has been, for years, the kernel operation employed by diverse data structures, from graphs to hash tables and dictionaries. However, due to the bewildering growth in the volume of data that current applications have to deal with, performing pointer chasing operations have become a major source of performance and energy bottleneck, due to its sparse memory access behavior. In this work, we aim to tackle this problem by taking advantage of the already available parallelism present in today's 3D-stacked memories. We present a simple mechanism that can accelerate pointer chasing operations by making use of a state-of-the-art PIM design that executes in-memory vector operations. The key idea behind our design is to run speculative loads, in parallel, based on a given memory address in a reconfigurable window of addresses. Our design can perform pointer-chasing operations on b+tree 4.9x faster when compared to modern baseline systems. Besides that, since our device avoids data movement and alleviates the memory hierarchy's inefficiency due to poor spatial data locality, we can also reduce energy consumption by 85% when compared to the baseline.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.5 Reliability Modeling and Mitigation

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 3

Chair:
Said Hamdioui, TU Delft, NL

Co-Chair:
Bram Kruseman, NXP, NL

This session covers various reliability modeling, characterization and mitigation approaches at different abstraction levels. The first paper uses deep learning for variability characterization. The second paper provides aging mitigation schemes for voltage regulators. The third paper addresses program vulnerability in GPU applications.

TimeLabelPresentation Title
Authors
14:307.5.1(Best Paper Award Candidate)
LOW-COST HIGH-ACCURACY VARIATION CHARACTERIZATION FOR NANOSCALE IC TECHNOLOGIES VIA NOVEL LEARNING-BASED TECHNIQUES
Speaker:
Zhijian Pan, Tsinghua University, CN
Authors:
Zhijian Pan1, Miao Li2, Jian Yao2, Hong Lu2, Zuochang Ye1, Yanfeng Li2 and Yan Wang1
1Tsinghua University, CN; 2Platform Design Automation, Inc., CN
Abstract
Faster and more accurate variation characteri-zations of semiconductor devices/circuits are in great demand as process technologies scale down to Fin-FET era. Traditional methods with intensive data testing are extremely costly. In this paper, we propose a novel learning-based high-accuracy data pre-diction framework inspired by learning methods from computer vision to efficiently characterize variabilities of device/circuit behaviors induced by manufacturing process variations. The key idea is to adaptively learn the underlying data pattern among data with variations from a small set of already obtained data and utilize it to accurately predict the unmeasured data with minimum physical measurement cost. To realize this idea, novel regression modeling techniques based on Gaussian process regression and partial least squares regression with feature extraction and matching are developed. We applied our approach to real-time variation characterization for transistors with multiple geometries from a foundry 28nm CMOS process. The results show that the framework achieves about 14x time speed-up with on average 0.1% error for variation data prediction and under 0.3% error for statistical extraction compared to traditional physical measure-ments, which demonstrates the efficacy of the framework for accurate and fast variation analysis and statistical modeling.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.5.2MITIGATION OF NBTI INDUCED PERFORMANCE DEGRADATION IN ON-CHIP DIGITAL LDOS
Speaker:
Longfei Wang, University of South Florida, US
Authors:
Longfei Wang1, S. Karen Khatamifard2, Ulya Karpuzcu2 and Selcuk Kose1
1University of South Florida, US; 2University of Minnesota, US
Abstract
On-chip digital low-dropout voltage regulators (LDOs) have recently gained impetus and drawn significant attention for integration within both mobile devices and microprocessors. Although the benefits of easy integration and fast response speed surpass analog LDOs and other voltage regulator types, NBTI induced performance degradation is typically overlooked. The conventional bi-directional shift register based controller can even exacerbate the degradation, which has been demonstrated theoretically and through practical applications. In this paper, a novel uni-directional shift register is proposed to evenly distribute the electrical stress and mitigate the NBTI effects under arbitrary load conditions with nearly no extra power and area overhead. The benefits of the proposed design as well as reliability aware design considerations are explored and highlighted through simulation of an IBM POWER8 like processor under several benchmark applications. It is demonstrated that the proposed NBTI-aware design can achieve up to 43.2% performance improvement as compared to a conventional one.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.5.3EVALUATING THE IMPACT OF EXECUTION PARAMETERS ON PROGRAM VULNERABILITY IN GPU APPLICATIONS
Speaker:
Fritz Previlon, Northeastern University, US
Authors:
Fritz Previlon1, Charu Kalra1, Paolo Rech2 and David Kaeli1
1Northeastern University, US; 2Universidade Federal do Rio Grande do Sul, BR
Abstract
While transient faults continue to be a major concern for the High Performance Computing (HPC) community, we still lack a clear understanding of how these faults propagate in applications. This paper addresses two particular aspects of the vulnerabilities of HPC applications as run on Graphics Processing Units (GPUs): their dependence on input data and on thread-block size. To characterize fault propagation as a function of input parameters, we leverage an ISA-level fault injection framework and carry out an extensive fault injection campaign to characterize the vulnerability of a suite of GPU applications. Our results show that the vulnerability of most of the programs studied are insensitive to changes in input values, except in less common cases when input values were highly biased, i.e., values that exhibit a special vulnerability behavior. For example, the multiplication property of any value with a zero value (zero times any number is equal to zero) makes it a biased input for multiplication operations. Our study also examines the effects of changing the GPU thread-block size and its impact on vulnerability. We found that, similar to performance, the vulnerability of an application can depend on the block size of the kernels in the application. In some applications, we found that the silent data corruption rate can vary by as much as 8% when changing the block size of a kernel.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00IP3-12, 387AN EFFICIENT NBTI-AWARE WAKE-UP STRATEGY FOR POWER-GATED DESIGNS
Speaker:
Yu-Guang Chen, Yuan Ze University, TW
Authors:
Kun-Wei Chiu1, Yu-Guang Chen2 and Ing-Chao Lin1
1National Cheng Kung University, TW; 2Yuan Ze University, TW
Abstract
The wake-up process of a power-gated design may induce an excessive surge current and threaten the signal integrity. A proper wake-up sequence should be carefully designed to avoid surge current violations. On the other hand, PMOS sleep transistors may suffer from the negative-bias temperature instability (NBTI) effect which results in decreased driving current. Conventional wake-up sequence decision approaches do not consider the NBTI effect, which may result in a longer or unacceptable wake-up time after circuit aging. Therefore, in this paper, we propose a novel NBTI-aware wake-up strategy to reduce the average wake-up time within a circuit lifetime. Our strategy first finds a set of proper wake-up sequences for different aging scenarios (i.e. after a certain period of aging), and then dynamically reconfigures the wake-up sequences at runtime. The experimental results show that compared to a traditional fixed wake-up sequence approach, our strategy can reduce average wake-up time by as much as 45.04% with only 3.7% extra area overhead for the reconfiguration structure.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:01IP3-13, 835DESIGNING RELIABLE PROCESSOR CORES IN ULTIMATE CMOS AND BEYOND: A DOUBLE SAMPLING SOLUTION
Speaker:
Nacer-Eddine Zergainoh, TIMA, FR
Authors:
Thierry Bonnoit, Fraidy Bouesse, Nacer-Eddine Zergainoh and Michael Nicolaidis, TIMA, FR
Abstract
The double sampling paradigm is an efficient method to protect the circuits against soft-errors. But the data that are going out of the area protected by double sampling are still vulnerable. To eliminate this weakness without having additional constraints on the datapaths, the most common solution adds a contaminable buffer stage between the two areas. Therefore, this stage avoids the propagation of the potentially corrupted data further in the circuit when an error is detected in the double sampling area. But the issue is that this stage must itself be protected against soft-errors, which drastically increases the cost of the solution. In this paper we characterize the additional implementation constraints due to this vulnerability. We proposed an architectural solution that uses three latches to remove those constraints and protect the area outside the double sampling domain without adding a buffer stage. We present an implementation of this solution on the LEON3 processor, and we compare the results in terms of additional cost and efficiency with other solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.6 Special Session: Next Generation Processors and Architectures for Deep Learning

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 4

Chair:
Theocharides Theocharis, University of Cyprus, CY

Co-Chair:
Shafique Muhammad, TU Wien, AT

Machine Learning is nowadays embedded in several computing devices, consumer electronics and cyber-physical systems. Smart sensors are deployed everywhere, in applications such as wearables and perceptual computing devices, and intelligent algorithms power the so called "Internet of Things". Similarly, smart cyber-physical systems emerge as a vital computation paradigm in a vast application spectrum ranging from consumer electronics to large-scale complex critical infrastructures. The need for smartification of such systems, and intelligent data analytics (especially in the era of Big Data), emphasizes the need of revolutionizing the way we build processors and systems geared towards machine learning (and deep learning in particular). Issues related from memory, to interconnect, and spanning across the hardware and software spectrum, need to be addressed typically by advances in technology, design methodologies, and new programming paradigms among others. The emergence of powerful embedded devices and ultra-low-power hardware has enabled us to transfer the paradigm of deep learning architectures and systems, from high-end costly clusters/supercomputers, to affordable systems and even mobile devices. Such systems and devices have not received the required attention so far in research and development, until the last few years, when such systems where initially proposed to accelerate deep learning, providing previously unattainable levels of performance of such algorithms, whilst maintaining the power and reliability constraints imposed by the nature of these embedded applications. This special session aims to present a holistic overview of emerging works in such architecture and systems, using all available technology spectrums, and bring together views from academia and industry in order to exchange information and explain how we can take advantage of existing and emerging hardware technologies in addressing the associated challenges.

TimeLabelPresentation Title
Authors
14:307.6.1RERAM-BASED ACCELERATOR FOR DEEP LEARNING
Speaker:
Hai Li, Duke university, US
Authors:
Li Bing1, Linghao Song2, Fan Chen2, Xuehai Qian3, Yiran Chen2 and Hai (Helen) Li4
1Duke university, US; 2Duke University, US; 3University of South California, US; 4Duke University/TUM-IAS, US
Abstract
Big data computing applications such as deep learning and graph analytic usually incur a large amount of data movements. Deploying such applications on conventional von Neumann architecture that separates the processing units and memory components likely leads to performance bottleneck due to the limited memory bandwidth. A common approach is to develop architecture and memory co-design methodologies to overcome the challenge. Our research follows the same strategy by leveraging resistive memory (ReRAM) to further enhance the performance and energy efficiency. Specifically, we employ the general principles behind processing-in-memory to design efficient ReRAM based accelerators that support both testing and training operations. Related circuit and architecture optimization will be discussed too.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:457.6.2EXPLOITING APPROXIMATE COMPUTING FOR DEEP LEARNING ACCELERATION
Speaker:
Jungook Choi, IBM Research, US
Authors:
Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Viji Srinivasan and Swagath Venkataramani, IBM T. J. Watson Research Center, US
Abstract
Deep Neural Networks (DNNs) have emerged as a powerful and versatile set of techniques to address challenging artificial intelligence (AI) problems. Applications in domains such as image/video processing, natural language processing, speech synthesis and recognition, genomics and many others have embraced deep learning as the foundational technique. DNNs achieve superior accuracy for these applications using very large models which require 100s of MBs of data storage, ExaOps of computation and high bandwidth for data movement. Despite advances in computing systems, training state-of-the-art DNNs on large datasets takes several days/weeks, directly limiting the pace of innovation and adoption. In this paper, we discuss how these challenges can be addressed via approximate computing. Based on our earlier studies demonstrating that DNNs are resilient to numerical errors from approximate computing, we present techniques to reduce communication overhead of distributed deep learning training via adaptive residual gradient compression ({em AdaComp}), and computation cost for deep learning inference via Prameterized clipping ACTivation ({em PACT}) based network quantization. Experimental evaluation demonstrates order of magnitude savings in communication overhead for training and computational cost for inference while not compromising application accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:107.6.3AN OVERVIEW OF NEXT-GENERATION ARCHITECTURES FOR MACHINE LEARNING: ROADMAP, OPPORTUNITIES AND CHALLENGES IN THE IOT ERA
Speaker:
Muhammad Shafique, Vienna University of Technology (TU Wien), AT
Authors:
Muhammad Shafique1, Theocharis Theocharides2, Christos Bouganis3, Muhammad Abdullah Hanif1, Faiq Khalid Lodhi4, Rehan Hafiz5 and Semeen Rehman1
1TU Wien, AT; 2University of Cyprus, CY; 3Imperial College London, GB; 4Department of Computer Engineering, Vienna University of Technology, AT; 5ITU, PK
Abstract
The number of connected Internet of Things (IoT) devices are expected to reach over 20 billion by 2020. These range from basic sensor nodes that log and report the data to the ones that are capable of processing the incoming information and taking an action accordingly. Machine learning, and in particular deep learning, is the de facto processing paradigm for intelligently processing these immense volumes of data. However, the resource inhibited environment of IoT devices, owing to their limited energy budget and lower compute capabilities, render them a challenging platform for deployment of desired data analytics. This paper provides an overview of the current and emerging trends in designing highly efficient, reliable, secure and scalable machine learning architectures for such devices. The paper highlights the focal challenges and obstacles being faced by the community in achieving its desired goals. The paper further presents a research roadmap that can help in addressing the highlighted challenges and thereby designing scalable, high-performance, and energy efficient architectures for performing machine learning on the edge.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:357.6.4INFERENCE OF QUANTIZED NEURAL NETWORKS ON HETEROGENEOUS ALL-PROGRAMMABLE DEVICES
Speaker:
Thomas Preusser, Xilinx Inc., IE
Authors:
Thomas Preußer1, Giulio Gambardella2, Nicholas Fraser2 and Michaela Blott3
1Technische Universität Dresden, DE; 2Xilinx Research Labs, IE; 3Xilinx, IE
Abstract
Neural networks have established as a generic and powerful means to approach challenging problems such as image classification, object detection or decision making. Their successful employment foots on an enormous demand of compute. The quantization of network parameters and the processed data has proven a valuable measure to reduce the challenges of network inference so effectively that the feasible scope of applications is expanded even into the embedded domain. This paper describes the making of a real-time object detection in a live video stream processed on an embedded all-programmable device. The presented case illustrates how the required processing is tamed and parallelized across both the CPU cores and the programmable logic and how the most suitable resources and powerful extensions, such as NEON vectorization, are leveraged for the individual processing steps. The crafted result is an extended Darknet framework implementing a fully integrated, end-to-end solution from video capture over object annotation to video output applying neural network inference at different quantization levels running at 16~frames per second on an embedded Zynq UltraScale+ (XCZU3EG) platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.7 Rigorous design, analysis, and monitoring of dependable embedded systems

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Konf. 5

Chair:
Petru Eles, Linköping University, SE

Co-Chair:
Akash Kumar, Technische Universität Dresden, DE

Dependability is a crucial aspect of embedded software systems. This session focuses on achieving dependability in different stages of the embedded software life cycle: requirements engineering, design, and maintenance. In particular, the papers presented in this session will address (1) contract based requirement engineering for cyber-physical systems, (2) formal analysis of code using SMT-based symbolic execution to deal with hardware faults, and (3) non-intrusive runtime trace analysis using FPGAs.

TimeLabelPresentation Title
Authors
14:307.7.1CHASE: CONTRACT-BASED REQUIREMENT ENGINEERING FOR CYBER-PHYSICAL SYSTEM DESIGN
Speaker:
Pierluigi Nuzzo, University of Southern California, US
Authors:
Pierluigi Nuzzo1, Michele Lora2, Yishai Feldman3 and Alberto Sangiovanni-Vincentelli4
1University of Southern California, US; 2University of Verona, IT; 3IBM Research, Haifa, IL; 4University of California at Berkeley, US
Abstract
This paper presents CHASE, a framework for requirement capture, formalization, and validation for cyber-physical systems. CHASE combines a practical front-end formal specification language based on patterns with a rigorous verification back-end based on assume-guarantee contracts. The front-end language can express temporal properties of networks using a declarative style, and supports automatic translation from natural-language constructs to low-level mathematical languages. The verification back-end leverages the mathematical formalism of contracts to reason about system requirements and determine inconsistencies and dependencies between them. CHASE features a modular and extensible software infrastructure that can support different domain-specific languages, modeling formalisms, and analysis tools. We illustrate its effectiveness on industrial design examples, including control of aircraft power distribution networks and arbitration of a mixed-criticality automotive bus.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:007.7.2RESILIENCE EVALUATION VIA SYMBOLIC FAULT INJECTION ON INTERMEDIATE CODE
Speaker:
Hoang M. Le, University of Bremen, DE
Authors:
Hoang M. Le1, Vladimir Herdt1, Daniel Grosse2 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen/DFKI GmbH, DE
Abstract
There is a growing need for error-resilient software that can tolerate hardware faults as well as for new resilience evaluation techniques. For the latter, a promising direction is to apply formal techniques in fault injection-based evaluations to improve the coverage of evaluation results. Building on the recent development of Software-implemented Fault Injection (SWiFI) techniques on compiler's intermediate code, this paper proposes a novel resilience evaluation framework combining LLVM-based SWiFI and SMT-based symbolic execution. This novel combination offers significant advantages over state-of-the-art approaches with respect to accuracy and coverage.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:307.7.3ONLINE ANALYSIS OF DEBUG TRACE DATA FOR EMBEDDED SYSTEMS
Speaker:
Philip Gottschling, TU Darmstadt, DE
Authors:
Normann Decker1, Boris Dreyer2, Philip Gottschling2, Christian Hochberger2, Alexander Lange3, Martin Leucker1, Torben Scheffel1, Simon Wegener4 and Alexander Weiss3
1Universität zu Lübeck, DE; 2TU Darmstadt, DE; 3Accemic Technologies GmbH, DE; 4AbsInt Angewandte Informatik GmbH, DE
Abstract
Modern multi-core Systems-on-Chip (SoC) provide very high computational power. On the downside, they are hard to debug and it is often very difficult to understand what is going on in these chips because of the limited observability inside the SoC. Chip manufacturers try to compensate this difficulty by providing highly compressed trace data from the individual cores. In the past, the common way to deal with this data was storing it for later offline analysis, which severely limits the time span that can be observed. In this contribution, we present an FPGA-based solution that is able to process the trace data in real-time, enabling continuous observation of the state of a core. Moreover, we discuss applications enabled by this technology.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

7.8 22FDX - the superior technology for IoT, RF, Automotive and Mobility: Advanced Design Methodologies for Ultra-low Power Solutions

Date: Wednesday 21 March 2018
Time: 14:30 - 16:00
Location / Room: Exhibition Theatre

Organiser:
Claudia Kretzschmar, GLOBALFOUNDRIES, DE

22FDX is the choice for applications in mobility, IoT, RF and mmWave as well as Automotive applications. It provides low active and standby power at a very small area. It is equally suited for digital as well as analog/RF/mmWave applications. The back gate bias capability provides an additional degree of freedom to the designer allowing the usage of near-threshold operation. Back gate biasing opens the possibility for many innovative design features like boosting the operation speed when needed as well as compensating for aging and process, temperature and voltage variations. Compared to other advanced node technologies 22FDX has a very low mask count which makes the technology a perfect fit for low-cost applications.

This session will give an introduction into the technology and provide an overview over design methodology. Adaptive body biasing is one of the innovative design methods that will be presented in the third talk applied to extreme low-voltage MPSoC. This session will be concluded with the design of a SoC base on the open-source PULPissimo architecture, built around a 32-bit RISC-V core.

TimeLabelPresentation Title
Authors
14:307.8.122FDX: A TECHNOLOGY ALTERNATIVE TO THE MAINSTREAM OPTIMIZED FOR IOT APPLICATIONS
Speaker:
Jürgen Faul, GLOBALFOUNDRIES Fab1 LLC & Co. KG, DE
Abstract

Serving the new trend in semiconductor industries to connect everything with everything, computing power does not matter as much as low leakage and/or low dynamic power at low cost.

GLOBALFOUNDRIES offers a technology with less complexity than FinFET, same gate length scaling enabled by fully-depleted channels, but with additional features like back-gate biasing which perfectly suits the IoT market needs.

Back-biasing is unique to FDSOI technologies and provides an additional degree of freedom to circuit and chip designers. Prominent examples for back-biasing utilization are extremely low Vdd operation and chip-level global corner trimming, static by OTP or eFuse as well as dynamic for power and temperature compensation.

This talk will give an overview on technology capabilities and features.

14:507.8.222FDX DESIGN METHODOLOGY ENABLING OPTIMIZED POWER PERFORMANCE AND AREA FOR IOT AND MOBILE AP DESIGNS
Speaker:
Ulrich Hensel, GLOBALFOUNDRIES Fab1 LLC & Co. KG, DE
Abstract

GLOBALFOUNDRIES 22FDX technology offers body biasing as a new optimization tool for designers to optimize power performance and area of their circuit. GLOBALFOUNDRIES and its IP eco-system partners have collaborated to develop a design platform that enables designers to take full advantage of these technology capabilities for low voltage IoT all the way to mobile application processor designs.

The talk will present benchmark implementation results that demonstrate the benefits of the 22FDX design platform for IoT and mobile application processor design points.

GLOBALFOUNDRIES' design guidelines and reference flows help the designer to easily adopt the 22FDX design platform. The talk will walk through the 22FDX specifics of the digital design flow and point out how to implement a digital design using body bias. 

15:107.8.3ADAPTIVE BODY BIAS FOR A 0.4V OPERABLE MPSOC IN 22FDX AS AN EXAMPLE FOR BIG DATA HANDLING
Speaker:
Christian Mayr, Technische Universität Dresden, DE
Abstract

One of the hottest buzzwords today is big data, i.e. the massive amounts of data that are produced by an ever expanding number of sensors across disciplines from archaeology to soccer. Some examples: in 2016, the amount of data transmitted globally for the first time exceeded a Zettabyte (1021), using up about 10% of the world energy supply. In 2015, the number of image sensors has risen above the number of humans on earth.

Thus, there is a need for dedicated data processing/machine learning chips that handle this load automatedly and reduce it to a data extract usable by humans. Deployment of these chips can be anywhere along the processing chain, e.g. integrated with the sensor interface to reduce data load at the source or as data aggregator in a server farm.

Prof. Mayr will give an overview of the multi-processor systems-on-chip (MPSoC) and sensor interfacing developed at his chair. As some of the first MPSoCs in 22nm FDSOI which use adaptive body biasing to compensate for process variability, they operate as low as 0.4V. Through a combination of dedicated accelerators (e.g. for machine learning) and conventional CPUs, these MPSoCs achieve an optimal compromise between energy efficiency and configurability. Applications pursued at the chair include: sensor nodes for the tactile internet, autonomous driving, neural implants and brain simulation.

15:307.8.4QUENTIN: A NEAR-THRESHOLD SOC FOR ENERGY-EFFICIENT IOT END-NODES IN 22NM FDX TECHNOLOGY
Speaker:
Davide Rossi, Università di Bologna, IT
Abstract

Co-authors: Pasquale Davide Schiavone1, Davide Rossi2, Antonio Pullini1, Francesco Conti1,2, Frank K. Gurkaynak1, Luca Benini1,2
1. Integrated System laboratory, ETH, Zurich, Switzerland; 2. Energy Efficient Embedded Systems Laboratory, University Of Bologna, Bologna, Italy

An increasing number of end-node IoT applications require high performance and extreme energy efficiency to deal with the high computational requirements of near-sensor data analytics algorithms, within a power envelope of few milliWatts for long battery lifetime. A significant improvement of energy efficiency for digital computing systems can be achieved exploiting near-threshold operation. We present Quentin: a near-threshold SoC based on the open-source PULPissimo architecture, implemented in 22nm FDX technology. The proposed SoC is built around a 32-bit RISC-V core "RISCY" optimized for energy efficient digital signal processing, 512 kByte of L2 memory, and an autonomous I/O subsystem featuring an IO DMA coupled with a standard set of peripherals. RISCY features a 32-bit, 4-stages in-order pipeline implementing the RV32IMFC RISC-V ISA plus domain-specific extensions for near-sensor data analytics such as packed-SIMD (additions, comparisons, logic, shuffle and dot product), bit manipulation, hardware loops, etc. This talk will present the implementation of the SoC in 22nm FDX, featuring an area of 2 mm2, a maximum operating frequency of 170 MHz (SSG, 0.59V, -40 C) and an estimated power consumption of 5 mW.

16:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

IP3 Interactive Presentations

Date: Wednesday 21 March 2018
Time: 16:00 - 16:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP3-1TESTBENCH QUALIFICATION FOR SYSTEMC-AMS TIMED DATA FLOW MODELS
Speaker:
Muhammad Hassan, DFKI GmbH, DE
Authors:
Muhammad Hassan1, Daniel Grosse2, Hoang M. Le3, Thilo Voertler4, Karsten Einwich4 and Rolf Drechsler2
1Cyber Physical Systems, DFKI, DE; 2University of Bremen/DFKI GmbH, DE; 3University of Bremen, DE; 4COSEDA Technologies GmbH, DE
Abstract
Analog-Mixed Signal (AMS) circuits have become increasingly important for today's SoCs. The Timed Data Flow (TDF) model of computation available in SystemC-AMS offers here a good tradeoff between accuracy and simulation-speed at the system-level. One of the main challenges in system-level verification is the quality of the testbench. In this paper, we present a testbench qualification approach for SystemC-AMS TDF models. Our contribution is twofold: First, we propose specific mutation models for the class of filters implemented as TDF models. This requires to analyze the Laplace transfer function of the filter design. Second, we present the mutation based qualification approach based on the proposed specific mutations as well as standard behavioral mutations. This allows to find serious quality issues in the testbench. Our experimental results for a real-world AMS system demonstrate the applicability and efficacy of our approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-2AN ALGEBRA FOR MODELING CONTINUOUS TIME SYSTEMS
Speaker:
José Medeiros, University of Brasilia, BR
Authors:
José E. G. de Medeiros1, George Ungureanu2 and Ingo Sander2
1University of Brasília, BR; 2KTH Royal Institute of Technology, SE
Abstract
Advancements on analog integrated design have led to new possibilities for complex systems combining both continuous and discrete time modules on a signal processing chain. However, this also increases the complexity any design flow needs to address in order to describe a synergy between the two domains, as the interactions between them should be better understood. We believe that a common language for describing continuous and discrete time computations is beneficial for such a goal and a step towards it is to gain insight and describe more fundamental building blocks. In this work we present an algebra based on the General Purpose Analog Computer, a theoretical model of computation recently updated as a continuous time equivalent of the Turing Machine.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-3TTW: A TIME-TRIGGERED WIRELESS DESIGN FOR CPS
Speaker:
Romain Jacob, ETH Zurich, CH
Authors:
Romain Jacob1, Licong Zhang2, Marco Zimmerling3, Jan Beutel1, Samarjit Chakraborty2 and Lothar Thiele1
1ETH Zurich, CH; 2Technical University of Munich, DE; 3Technische Universität Dresden, DE
Abstract
Wired fieldbuses have long been proven effective in supporting Cyber-Physical Systems (CPS). However, various domains are now striving for wireless solutions due to ease of deployment or novel functionality requiring the ability to support mobile devices. Low-power wireless protocols have been proposed in response to this need, but requirements of a large class of CPS applications can still not be satisfied. We thus propose Time-Triggered Wireless (TTW), a distributed low-power wireless system design that minimizes communication energy consumption and offers end-to-end timing predictability, runtime adaptability, reliability, and low latency. Evaluation shows a 2x reduction in communication latency and 33-40% lower radio-on time compared with DRP, the closest related work, validating the suitability of TTW for new exciting wireless CPS applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-4PHYLAX: SNAPSHOT-BASED PROFILING OF REAL-TIME EMBEDDED DEVICES VIA JTAG INTERFACE
Speaker:
Eduardo Chielle, New York University Abu Dhabi, BR
Authors:
Charalambos Konstantinou1, Eduardo Chielle2 and Michail Maniatakos2
1New York University, US; 2New York University Abu Dhabi, AE
Abstract
Real-time embedded systems play a significant role in the functionality of critical infrastructure. Legacy microprocessor-based embedded systems, however, have not been developed with security in mind. Applying traditional security mechanisms in such systems is challenging due to computing constraints and/or real-time requirements. Their typical 20-30 year lifespan further exacerbates the problem. In this work, we propose PHYLAX, a plug-and-play solution to detect intrusions in already installed embedded devices. PHYLAX is an external monitoring tool which does not require code instrumentation. Also, our tool adapts and prioritizes intrusion detection based on the requirements of the underlying infrastructure (power grid, chemical factory, etc.) as well as the computing capabilities of the target embedded system (CPU model, memory size, etc.). PHYLAX can be employed on any legacy device which incorporates a JTAG interface. As a case study, we present the inclusion of PHYLAX on a power grid recloser controller.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-5CHARACTERIZING DISPLAY QOS BASED ON FRAME DROPPING FOR POWER MANAGEMENT OF INTERACTIVE APPLICATIONS ON SMARTPHONES
Speaker:
Chung-Ta King, National Tsing Hua University, TW
Authors:
Kuan-Ting Ho1, Chung-Ta King1, Bhaskar Das1 and Yung-Ju Chang2
1National Tsing Hua University, TW; 2National Chiao Tung University, TW
Abstract
User-centric power management in smartphones aims to conserve power without affecting user's perceived quality of experience. Most existing works focus on periodically updated applications such as games and video players and use a fixed frame rate, measured in frame per second (FPS), as the metric to quantify the display quality of service (QoS). The idea is to adjust the CPU/GPU frequency just enough to maintain the frame rate at a user satisfactory level. However, when applied to aperiodically-updated interactive applications, e.g. Facebook or Instagram, that draw the frame buffer at a varying rate in response to user inputs, such a power management strategy becomes too conservative. Based on real user experiments, we observe that users can tolerate a certain percentage of frame drops when running aperiodically updated applications without affecting their perceived display quality. Hence, we introduce a new metric to characterize display quality of service, called the frame drawn ratio (FDR), and propose a new CPU/GPU frequency governor based on the FDR metric. The experiments by real users show that the proposed governor can conserve 17.2% power in average when compared to the default governor, while maintaining the same or even better QoE rating.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-6PREDICTION-BASED FAST THERMOELECTRIC GENERATOR RECONFIGURATION FOR ENERGY HARVESTING FROM VEHICLE RADIATORS
Speaker:
Xue Lin, Northeastern University, US
Authors:
Hanchen Yang1, Feiyang Kang2, Caiwen Ding3, Ji Li4, Jaemin Kim5, Donkyu Baek6, Shahin Nazarian4, Xue Lin7, Paul Bogdan4 and Naehyuck Chang8
1Beijing University of Posts and Telecommunications, CN; 2Zhejiang University, CN; 3Syracuse University, US; 4University of Southern California, US; 5Seoul National University, KR; 6Korea Advanced Institute of Science and Technology, KR; 7Northeastern University, US; 8KAIST, KR
Abstract
Thermoelectric generation has increasingly drawn attention for being environmentally friendly. However, only a few of the prior researches on thermoelectric generators (TEG) have focused on improving efficiency at system level. They attempt to capture the electrical property changes on TEG modules as the temperature fluctuates on vehicle radiators. The most recent reconfiguration algorithm shows large improvements on output performance but suffers from major drawback on computational time and energy overhead, and non-scalability in terms of array size and processing frequency. In this paper, we propose a novel TEG array reconfiguration algorithm that determines near-optimal configuration with an acceptable computational time. More precisely, with O(N) time complexity, our prediction-based fast TEG reconfiguration algorithm enables all modules to work at or near their maximum power points (MPP). Additionally, we incorporate prediction methods to further reduce the runtime and switching overhead during the reconfiguration process. Experimental results present 30% performance improvement, almost 100x reduction on switching overhead and 13x enhancement on computational speed compared to the baseline and prior work. The scalability of our algorithm makes it applicable to larger scale systems such as industrial boilers and heat exchangers.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-7A PARAMETERIZED TIMING-AWARE FLIP-FLOP MERGING ALGORITHM FOR CLOCK POWER REDUCTION
Speaker:
Chaochao Feng, National University of Defense Technology, CN
Authors:
Chaochao Feng1, Daheng Yue1, Zhenyu Zhao1 and Zhuofan Liao2
1National University of Defense Technology, CN; 2Changsha University of Science and Technology, CN
Abstract
In modern integrated circuits, the clock power contributes a dominant part of the chip power. Clock power can be reduced effectively by utilizing multi-bit flip-flops. In this paper, a parameterized timing-aware flip-flop merging algorithm is proposed for clock power reduction. The single-bit flip-flops are merged into multi-bit flip-flops after placement & optimization and before clock network synthesis with consideration of function, scan chain information, distance and timing constraints. The algorithm can be configured with different parameters, such as the bit-number of MBFF, the setup timing margin and the distance margin. Experimental results under an industrial design show that compared with the basic design without MBFF, the design with 2-bit, 4-bit, 6-bit, and 8-bit MBFFs can save 7.5%, 12%, 11.8% and 11.1% total power consumption respectively. Using MBFF4 to replace 1-bit FFs is the best choice for the design optimization, which achieves minimum area and total power consumption. We also compare the designs with MBFF4 replacement under five different setup timing margins and distance margins. Without violating any timing constraint, it is better to set the setup timing margin as small as possible to achieve best power optimization. The distance margin (100μm, 30μm) is the best choice for this industry design to achieve minimum power consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-8FAST CHIP-PACKAGE-PCB COANALYSIS METHODOLOGY FOR POWER INTEGRITY OF MULTI-DOMAIN HIGH-SPEED MEMORY: A CASE STUDY
Speaker:
Seungwon Kim, Ulsan National Institute of Science and Technology, KR
Authors:
Seungwon Kim1, Ki Jin Han2, Youngmin Kim3 and Seokhyeong Kang1
1Ulsan National Institute of Science and Technology (UNIST), KR; 2Dongguk University, KR; 3Kwangwoon University, KR
Abstract
The power integrity of high-speed interfaces is an increasingly important issue in mobile memory systems. However, because of complicated design variations such as adjacent VDD domain coupling, conventional case-specific modeling is limited in analyzing trends in results from parametric variations. Moreover, conventional industrial methods can be simulated only after the design layout is completed and it requires a lot of back-annotation processes, which result in delayed delays time to market. In this paper, we propose a chip-package-PCB coanalysis methodology applied to our multi-domain high-speed memory system model with a current generation method. Our proposed parametric simulation model can analyze the tendency of power integrity results from variable sweeps and Monte Carlo simulations, and it shows a significantly reduced runtime compared to the conventional EDA methodology under JEDEC LPPDR4 environment.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-9APPROXIMATE HARDWARE GENERATION USING SYMBOLIC COMPUTER ALGEBRA EMPLOYING GRöBNER BASIS
Speaker:
Saman Fröhlich, DFKI GmbH, DE
Authors:
Saman Froehlich1, Daniel Grosse2 and Rolf Drechsler2
1Cyber-Physical Systems, DFKI GmbH, DE; 2University of Bremen/DFKI GmbH, DE
Abstract
Many applications are inherently error tolerant. Approximate Computing is an emerging design paradigm, which gives the opportunity to make use of this error tolerance, by trading off accuracy for performance. The behavior of a circuit can be defined at an arithmetic level, by describing the input and output relation as a polynomial. Symbolic Computer Algebra (SCA) has been employed to verify that a given circuit netlist matches the behavior specified at the arithmetic level. In this paper, we present a method that relaxes the exactness requirement of the implementation. We propose a heuristic method to generate an approximation for a given netlist and use SCA to ensure that the result is within application-specific bounds for given error-metrics. In addition, our approach allows for automatic generation of approximate hardware wrt. applicationspecific input probabilities. To the best of our knowledge taking input probabilities, which are known for many practical applications, into account has not been considered before. We employ the proposed approach to generate approximate adders and show that the results outperform state-of-the-art, handcrafted approximate hardware.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-10RECONFIGURABLE IMPLEMENTATION OF $GF(2^M)$ BIT-PARALLEL MULTIPLIERS
Speaker and Author:
José L. Imaña, Complutense University of Madrid, ES
Abstract
Hardware implementations of arithmetic operations over binary finite fields $GF(2^m)$ are widely used in several important applications, such as cryptography, digital signal processing and error-control codes. In this paper, efficient reconfigurable implementations of bit-parallel canonical basis multipliers over binary fields generated by type II irreducible pentanomials $f(y) = y^m + y^{n+2} + y^{n+1} + y^n +1$ are presented. These pentanomials are important because all five binary fields recommended by NIST for ECDSA can be constructed using such polynomials. In this work, a new approach for $GF(2^m)$ multiplication based on type II pentanomials is given and several post-place and route implementation results in Xilinx Artix-7 FPGA are reported. Experimental results show that the proposed multiplier implementations improve the area$imes$time parameter when compared with similar multipliers found in the literature.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-11PROCESSING IN 3D MEMORIES TO SPEEDUP OPERATIONS ON COMPLEX DATA STRUCTURES
Speaker:
Luigi Carro, UFRGS, BR
Authors:
Paulo Cesar Santos1, Geraldo Francisco de Oliveira Junior1, Joao Paulo Lima1, Marco Antonio Zanata Alves2, Luigi Carro1 and Antonio Carlos Schneider Beck1
1UFRGS, BR; 2UFPR, BR
Abstract
Pointer chasing has been, for years, the kernel operation employed by diverse data structures, from graphs to hash tables and dictionaries. However, due to the bewildering growth in the volume of data that current applications have to deal with, performing pointer chasing operations have become a major source of performance and energy bottleneck, due to its sparse memory access behavior. In this work, we aim to tackle this problem by taking advantage of the already available parallelism present in today's 3D-stacked memories. We present a simple mechanism that can accelerate pointer chasing operations by making use of a state-of-the-art PIM design that executes in-memory vector operations. The key idea behind our design is to run speculative loads, in parallel, based on a given memory address in a reconfigurable window of addresses. Our design can perform pointer-chasing operations on b+tree 4.9x faster when compared to modern baseline systems. Besides that, since our device avoids data movement and alleviates the memory hierarchy's inefficiency due to poor spatial data locality, we can also reduce energy consumption by 85% when compared to the baseline.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-12AN EFFICIENT NBTI-AWARE WAKE-UP STRATEGY FOR POWER-GATED DESIGNS
Speaker:
Yu-Guang Chen, Yuan Ze University, TW
Authors:
Kun-Wei Chiu1, Yu-Guang Chen2 and Ing-Chao Lin1
1National Cheng Kung University, TW; 2Yuan Ze University, TW
Abstract
The wake-up process of a power-gated design may induce an excessive surge current and threaten the signal integrity. A proper wake-up sequence should be carefully designed to avoid surge current violations. On the other hand, PMOS sleep transistors may suffer from the negative-bias temperature instability (NBTI) effect which results in decreased driving current. Conventional wake-up sequence decision approaches do not consider the NBTI effect, which may result in a longer or unacceptable wake-up time after circuit aging. Therefore, in this paper, we propose a novel NBTI-aware wake-up strategy to reduce the average wake-up time within a circuit lifetime. Our strategy first finds a set of proper wake-up sequences for different aging scenarios (i.e. after a certain period of aging), and then dynamically reconfigures the wake-up sequences at runtime. The experimental results show that compared to a traditional fixed wake-up sequence approach, our strategy can reduce average wake-up time by as much as 45.04% with only 3.7% extra area overhead for the reconfiguration structure.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-13DESIGNING RELIABLE PROCESSOR CORES IN ULTIMATE CMOS AND BEYOND: A DOUBLE SAMPLING SOLUTION
Speaker:
Nacer-Eddine Zergainoh, TIMA, FR
Authors:
Thierry Bonnoit, Fraidy Bouesse, Nacer-Eddine Zergainoh and Michael Nicolaidis, TIMA, FR
Abstract
The double sampling paradigm is an efficient method to protect the circuits against soft-errors. But the data that are going out of the area protected by double sampling are still vulnerable. To eliminate this weakness without having additional constraints on the datapaths, the most common solution adds a contaminable buffer stage between the two areas. Therefore, this stage avoids the propagation of the potentially corrupted data further in the circuit when an error is detected in the double sampling area. But the issue is that this stage must itself be protected against soft-errors, which drastically increases the cost of the solution. In this paper we characterize the additional implementation constraints due to this vulnerability. We proposed an architectural solution that uses three latches to remove those constraints and protect the area outside the double sampling domain without adding a buffer stage. We present an implementation of this solution on the LEON3 processor, and we compare the results in terms of additional cost and efficiency with other solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-14DESIGN OF A TIME-PREDICTABLE MULTICORE PROCESSOR: THE T-CREST PROJECT
Speaker and Author:
Martin Schoeberl, Technical University of Denmark, DK
Abstract
Real-time systems need to deliver results in time and often this timely production of a result needs to be guaranteed. Static timing analysis can be used to bound the worst-case execution time of tasks. However, this timing analysis is only possible if the processor architecture is analysis friendly. This paper presents the T-CREST processor, a real-time multicore processor developed to be time-predictable and an easy target for static worst-case execution time analysis. We present how to achieve time-predictability at all levels of the architecture, from the processor pipeline, via a network-on-chip, up to the memory controller. The main architectural feature to provide time predictability is to use static arbitration of shared resources in a time-division multiplexing way.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-15ERROR RESILIENCE ANALYSIS FOR SYSTEMATICALLY EMPLOYING APPROXIMATE COMPUTING IN CONVOLUTIONAL NEURAL NETWORKS
Speaker:
Muhammad Abdullah Hanif, Vienna University of Technology, Vienna, AT
Authors:
Muhammad Abdullah Hanif1, Rehan Hafiz2 and Muhammad Shafique1
1TU Wien, AT; 2ITU, PK
Abstract
Approximate computing is an emerging paradigm for error resilient applications as it leverages accuracy loss for improving power, energy, area, and/or performance of an application. The spectrum of error resilient applications includes the domains of Image and video processing, Artificial intelligence (AI) and Machine Learning (ML), data analytics, and other Recognition, Mining, and Synthesis (RMS) applications. In this work, we address one of the most challenging question, i.e., how to systematically employ approximate computing in Convolutional Neural Networks (CNNs), which are one of the most compute-intensive and the pivotal part of AI. Towards this, we propose a methodology to systematically analyze error resilience of deep CNNs and identify parameters that can be exploited for improving performance/efficiency of these networks for inference purposes. We also present a case study for significance-driven classification of filters for different convolutional layers, and propose to prune those having the least significance, and thereby enabling accuracy vs. efficiency tradeoffs by exploiting their resilience characteristics in a systematic way.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-16DEMAS: AN EFFICIENT DESIGN METHODOLOGY FOR BUILDING APPROXIMATE ADDERS FOR FPGA-BASED SYSTEMS
Speaker:
Semeen Rehman, Vienna University of Technology (TU Wien), AT
Authors:
Bharath Srinivas Prabakaran1, Semeen Rehman1, Muhammad Abdullah Hanif1, Salim Ullah2, Ghazal Mazaheri3, Akash Kumar2 and Muhammad Shafique1
1TU Wien, AT; 2Technische Universität Dresden, DE; 3UC Riverside, US
Abstract
The current state-of-the-art approximate adders are mostly ASIC-based, i.e., they focus solely on gate and/or transistor level approximations (e.g., through circuit simplification or truncation) to achieve area, latency, power and/or energy savings at the cost of accuracy loss. However, when these designs are synthesized for FPGA-based systems, they do not offer similar reductions in area, latency and power/energy due to the underlying architectural differences between ASICs and FPGAs. In this paper, we present a novel generic design methodology to synthesize and implement approximate adders for any FPGA-based system by considering the underlying resources and architectural differences. Using our methodology, we have designed, analyzed and presented eight different multi-bit adder architectures. Compared to the 16-bit accurate adder, our designs are successful in achieving area, latency and power-delay product gains of 50%, 38%, and 53%, respectively. We also compare our approximate adders to state-of-the-art approximate adders specialized for ASIC and FPGA fabrics and demonstrate the benefits of our approach. We will make the RTL and behavioral models of our and state-of-the-art designs open-source at https://sourceforge.net/projects/approxfpgas/ to further fuel the research and development in the FPGA community and to ensure reproducible research.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-17GAIN SCHEDULED CONTROL FOR NONLINEAR POWER MANAGEMENT IN CMPS
Speaker:
Nikil Dutt, University of California, Irvine, US
Authors:
Bryan Donyanavard, Amir M. Rahmani, Tiago Muck, Kasra Moazzemi and Nikil Dutt, University of California, Irvine, US
Abstract
Dynamic voltage and frequency scaling (DVFS) is a well-established technique for power management of thermal- or energy-sensitive chip multiprocessors (CMPs). In this context, linear control theoretic solutions have been successfully implemented to control the voltage-frequency knobs. However, modern CMPs with a large range of operating frequencies and multiple voltage levels display nonlinear behavior in the relationship between frequency and power. State-of-the-art linear controllers therefore leave room for opportunity in optimizing DVFS operation. We propose a Gain Scheduled Controller (GSC) for nonlinear runtime power management of CMPs that simplifies the controller implementation of systems with varying dynamic properties by utilizing an adaptive control theoretic approach in conjunction with static linear controllers. Our design improves the stability, accuracy, settling time, and overshoot of the controller over a linear controller with minimal overhead. We implement our approach on an Exynos platform containing ARM's big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that the system's response to changes in target power is improved by 2x while operating up to 12% more efficiently.

Download Paper (PDF; Only available from the DATE venue WiFi)

UB08 Session 8

Date: Wednesday 21 March 2018
Time: 16:00 - 18:00
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB08.1EWCMS: AN EMBEDDED WALK-CYCLE MONITORING SYSTEM USING BODY AREA COMMUNICATION AND SECURE LOW-POWER DYNAMIC SIGNALING
Authors:
Shahzad Muzaffar1 and Ibrahim (Abe) M. Elfadel2
1Masdar Institute, Khalifa University of Science and Technology, AE; 2Khalifa University of Science and Technology, AE
Abstract
The demo presents a novel ultra-low power, embedded, and wearable walk-cycle monitoring system with applications in areas such as healthcare, robotics, sports medicine, physical therapy, prosthesis, and animal sports. Customized shoes with sensors continuously measure the forces, and an electronic digital assistant is used to analyze the acquired measurements in real time by employing an IMU free and self-synchronizing method in order to estimate weight and study motion patterns. To achieve ultra-low power operation, the human body is used as a communication medium between the sensors and the digital assistant. The single-channel behavior of the human body is accommodated with a novel, simple yet robust single-wire signaling technique, Pulsed-Index Communication (PIC), that significantly reduces the system footprint and overall power consumption as it eliminates the need for clock and data recovery. The system prototype has been rigorously and successfully tested.

More information ...
UB08.2EVOAPPROX: LIBRARIES AND GENERATORS OF APPROXIMATE CIRCUITS
Authors:
Lukas Sekanina, Zdeněk Vašíček and Vojtěch Mrazek, Brno University of Technology, CZ
Abstract
Our contribution deals with a fully automated functional approximation methodology for combinational digital circuits. We present open libraries of approximate circuits and tools performing desired approximations. Our approach uses a multi-objective genetic programming-based method to automatically design approximate k-bit adders and multipliers (k = 8, 12, 16, 32). All circuits can be downloaded from [1] at the level of a source code (C, Verilog, and Matlab). Several error metrics are pre-calculated and formal guarantees are given in terms of these errors. By means of an interactive web interface the user can easily find the best trade-off between the error and electrical parameters provided for 45/90/180 nm technology process. We will also demonstrate the circuit design flow developed. References: [1] http://www.fit.vutbr.cz/research/groups/ehw/approxlib/

More information ...
UB08.4ROS X FPGA FOR ROBOT-CLOUD SYSTEM: ROBOT-CLOUD COOPERATIVE VISUAL SLAM PROCESSING USING ROS-COMPLIANT FPGA COMPONENT
Authors:
Takeshi Ohkawa, Yuhei Sugata, Aoi Soya, Kanemitsu Ootsu and Takashi Yokota, Utsunomiya University, JP
Abstract
Distributed processing in robot-cloud cooperative system is discussed in terms of processing performance and communication performance. Cooperation of robots and cloud-servers is inevitable for realizing intelligent robots in the next generation society and industry. To improve processing performance of the cooperative system, we utilize ROS-compliant FPGA component as a robot-side embedded processing for low-power and high-performance image processing. We prepare two demonstrations. (1)Key-point Detection from camera image using Fully-hardwired ROS-Compliant FPGA component In the evaluation, the processing performance of the component is almost same as PC, while it operates at more than 10 times less power (5W), compared to PC (50W). (2)Distributed Visual SLAM using two-wheeled robot (TurtleBot3) Distributed Visual SLAM (Simultaneous Localization and Mapping) are presented as a concrete example of the robot-cloud cooperative system.

More information ...
UB08.5WIRELESS SENSOR SYSTEM WITH ELECTROMAGNETIC ENERGY HARVESTER FOR INDUSTRY 4.0 APPLICATIONS
Authors:
Bianca Leistritz, Elena Chervakova, Sven Engelhardt, Axl Schreiber and Wolfram Kattanek, Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE
Abstract
An energy-autonomous and adaptive wireless multi-sensor system for a wide range of Industry 4.0 applications is presented here. By taking a holistic view of the sensor system and of the specific interactions of its components, technological barriers of individual system elements can be overcome. The energy supply of the demonstrator is realized by an miniaturized electromagnetic energy harvester, which could be easily and quickly adapted to the application-specific boundary conditions with the help of a computer assisted design process. Variations in the available energy are monitored by advanced energy management functions. The modular hardware and software platform is demonstrated by an adaptive measurement and data transmission rate. Communication takes place by means of industry 4.0 compliant standard protocols. The demonstrator was developed in the research group Green-ISAS funded by the Free State of Thuringia from the European Social Fund (ESF) under grant no. 2016 FGR 0055.

More information ...
UB08.6WARE: WEARABLE ELECTRONICS DIRECTIONAL AUGMENTED REALITY
Authors:
Gabriele Miorandi1, Walter Vendraminetto2, Federico Fraccaroli3, Davide Quaglia1 and Gianluca Benedetti4
1University of Verona, IT; 2EDALab Srl, IT; 3Wagoo LLC, IT; 4Wagoo Italia srls, IT
Abstract
Augmented Reality (AR) currently require large form factors, weight, cost and frequent recharging cycles that reduce usability. Connectivity, image processing, localization, and direction evaluation lead to high processing and power requirements. A multi-antenna system, patented by the industrial partner, enables a new generation of smart eye-wear that elegantly requires less hardware, connectivity, and power to provide AR functionalities. They will allow users to directionally locate nearby radio emitting sources that highlight objects of interest (e.g., people or retail items) by using existing standards like Bluetooth Low Energy, Apple's iBeacon and Google's Eddystone. This booth will report the current level of research addressed by the Computer Science Department of University of Verona, Wagoo LLC, and Wagoo Italia srls. In the presented demo, different objects emit an "I am here" signal and a prototype of the smart glasses shows the information related to the observed object.

More information ...
UB08.7T-CREST: THE OPEN-SOURCE REAL-TIME MULTICORE PROCESSOR
Authors:
Martin Schoeberl, Luca Pezzarossa and Jens Sparsø, Technical University of Denmark, DK
Abstract
Future real-time systems, such as advanced control systems or real-time image recognition, need more powerful processors, but still a system where the worst-case execution time (WCET) can be statically predicted. Multicore processors are one answer to the need for more processing power. However, it is still an open research question how to best organize and implement time-predictable communication between processing cores. T-CREST is an open-source multicore processor for research on time-predictable computer architecture. In consists of several Patmos processors connected by various time-predictable communication structures: access to shared off-chip, access to shared on-chip memory, and the Argo network-on-chip for fast inter-processor communication. T-CREST is supported by open-source development tools, such as compilation and WCET analysis. To best of our knowledge, T-CREST is the only fully open-source architecture for research on future real-time multicore architectures.

More information ...
UB08.10EXPERIENCE-BASED AUTOMATION OF ANALOG IC DESIGN
Authors:
Florian Leber and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous circuit synthesis approaches, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches (also known as "generators"): They (a) emulate experts' decisions, thus (b) make expert knowledge re-usable and (c) can consider all relevant aspects and constraints implicitly. Nowadays, generators are successfully applied in analog layout (Pcells, Pycells). We aim at an entire design flow completely based on procedural automation techniques. This flow will consist of procedures for the generation of schematics and layouts for every typical analog circuit class, such as amplifier, bandgap, filter a.s.o. In our presentation we give an overview on such a design flow and we show an approach for capturing an analog circuit designer's strategy as an executable "expert design plan".

More information ...
18:00End of session

8.1 Special Day Session on Future and Emerging Technologies: NanoSystems: Connecting Devices, Architectures, and Applications

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Saal 2

Chair:
Mohamed M. Sabry Aly, Nanyang Technological University, SG

This session presents end-to-end approaches for building nanosystems that enable new applications by connecting nanotechnologies for logic, memory and sensing with new architecture design. Projected energy efficiency benefits are significant. Hardware prototypes demonstrate the feasibility of such nanosystems.

TimeLabelPresentation Title
Authors
17:008.1.13D NANOSYSTEMS: THE PATH TO 1,000X ENERGY EFFICIENCY
Author:
Max Shulaker, MIT, US
Abstract
While trillions of sensors connected to the "Internet of Everything" (IoE) promise to transform our lives, they simultaneously pose major obstacles which we are already encountering today. The massive amount of generated raw data (i.e., the "data deluge") is quickly exceeding computing capabilities of existing systems, and cannot be overcome by isolated improvements in sensors, transistors, memories or architectures alone. Rather, an end-to-end approach is needed, whereby the unique benefits of new emerging nanotechnologies - for sensors, memories and transistors - are exploited to realize new nanosystem architectures that are not possible using today's technologies. However, emerging nanomaterials and nanodevices suffer from significant imperfections and variations. Thus, realizing working circuits, let alone transformative nanosystems, has been infeasible. In this talk, I present a path towards realizing future nanosystems, and show how recent progress in several emerging nanotechnologies (carbon nanotubes for logic, non-volatile memories for data storage, and new materials for sensing) enables us to realize such nanosystems today. As a case-study, I will discuss how by leveraging emerging nanotechnologies, we have realized the first monolithically-integrated three-dimensional (3D) nanosystem architectures with vertically-integrated layers of logic, memory, and sensing circuits. With dense and fine-grained connectivity between millions of on-chip sensors, data storage, and embedded computation, such nanosystems can capture terabytes of data from the outside world every second, and produce "processed information" by performing in-situ classification of the sensor data using on-chip accelerators. As a demonstration, we tailor a demo system for gas classification, for real-time health monitoring from breath.
17:308.1.2RESISTIVE RAM FOR NEW COMPUTING SYSTEMS: FROM DEEP LEARNING TO BIOMIMICRY
Author:
Elisa Vianello, CEA LETI, FR
Abstract
Resistive random-access memory (RRAM) is a memory technology that promises high-capacity, non-volatile data storage, low voltages, fast programming and reading time (few 10's of ns, even <1ns), single bit alterability, execution in place, good cycling performance (higher than Flash), density. Moreover RRAM can be easily integrated in the Back-End-Of-Line of advanced CMOS logic. This will revolutionize traditional memory hierarchy and facilitate the implementation of in-memory computing architectures and Deep Learning accelerators. To further improve the connectivity between memory arrays and computing, a combination of logic 3D Sequential Integration (3DSI) and memory arrays is a promising solution. Thanks to low processing thermal budget (<400°C), thermal stability (>500°C) and low cost (few additional masks), RRAM technologies are good candidates to be inserted in between sequentially stacked MOSFET tiers. RRAMs are also promising candidates for implementing energy-efficient bioinspired synapses, creating a path towards online real time unsupervised learning and life-long learning abilities. We will also explore the use of RRAM for future circuits and systems inspired by the emerging paradigm of biomimicry.
18:008.1.3HOW MIGHT NEW TECHNOLOGIES FOR SENSING SHAPE THE FUTURE OF COMPUTING
Author:
Naveen Verma, Princeton University, US
Abstract
The tremendous value computation has shown across applications is driving its expansion from cyber systems to systems that pervade every aspect of our lives. This is being fueled especially by algorithms from artificial intelligence, leading to systems qualified for such integration in our lives, with cognitive capabilities approaching those of humans. A fascinating consequence for system designers is that a tight coupling now results between the data sensed from the physical world and the computations performed on that data. This enforces a unification of design spaces, where new sensing technologies open up new algorithmic opportunities, which in turn open up new architectural options, bringing the potential to overcome traditional bottlenecks in computing. But, a conceptual unification is not enough, a technological unification is also needed. This talk explores such a unification, via hybrid systems based on Large-Area Electronics (LAE) and silicon-CMOS technologies. LAE enables diverse, expansive, and form-fitting sensors, which can be associated with physical objects. This yields sematic structure in the sensor data, which can be exploited towards simpler machine-learning models that are both more data efficient, in terms of learning, and specifically well suited for energy-aggressive hardware architectures. Illustrations will be presented based on integrated LAE-CMOS sensory-compute system prototypes.
18:30End of session

8.2 EU Projects: Novel Technologies, Predictable Architectures and Worst-Case Execution Times

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 6

Chair:
Paul Pop, Technical University of Denmark, DK

Co-Chair:
Petru Eles, Linköping University, SE

This session presents the following EU projects: ARGO—developing a Worst-Case Execution Time (WCET)-aware parallelizing compilation toolchain, GREAT—aiming at using multifunctional standardized stack as a universal spintronic technology for IoT, CONNECT—highlighting the recent advancements of carbon nanotubes as interconnect material and T-CREST—which developed a a real-time multicore processor to be time-predictable and an easy target for static worst-case execution time analysis.

TimeLabelPresentation Title
Authors
17:008.2.1USING POLYHEDRAL TECHNIQUES TO TIGHTEN WCET ESTIMATES OF OPTIMIZED CODE: A CASE STUDY WITH ARRAY CONTRACTION
Speaker:
Steven Derrien, Université de Rennes 1 / IRISA, FR
Authors:
Thomas Lefeuvre1, Imen Fassi1, Christoph Cullmann2, Gernot Gebhard2, Emin Koray Kasnakli3, Isabelle Puaut1 and Steven Derrien1
1University of Rennes 1/IRISA, FR; 2Absint GmbH, DE; 3Fraunhofer IIS, DE
Abstract
The ARGO H2020 European project aims at developing a Worst-Case Execution Time (WCET)-aware parallelizing compilation toolchain. This toolchain operates on Scilab and XCoS inputs, and targets ScratchPad memory (SPM)-based multi-cores. Data-layout and loop transformations play a key role in this flow as they improve SPM efficiency and reduce the number of accesses to shared main memory. In this paper, we study how these transformations impact WCET estimates. We demonstrate that they can bring significant improvements of WCET estimates (up to 2.7x) provided that the WCET analysis process is guided with automatically generated flow annotations obtained using polyhedral counting techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.2.2USING MULTIFUNCTIONAL STANDARDIZED STACK AS UNIVERSAL SPINTRONIC TECHNOLOGY FOR IOT
Speaker:
Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Authors:
Mehdi Tahoori1, Sarath Mohanachandran Nair1, Rajendra Bishnoi1, Sophiane SENNI2, Jad Mohdad2, Frederick Mailly2, Lionel Torres2, Pascal Benoit2, Abdoulaye Gamatie2, Pascal Nouet2, Kotb Jabeur3, Pierre Vanhauwaert3, Alexandru Atitoaie4, Iona Firastrau4, Gregory Di Pendina3 and Guillaume Prenat3
1Karlsruhe Institute of Technology, DE; 2LIRMM, FR; 3CEA, Univ. Grenoble Alpes, Grenoble, FR; 4Transilvania University of Brasov, RO
Abstract
For monolithic heterogeneous integration, fast yet low-power processing and storage, and high integration density, the objective of the EU GREAT project is to co-integrate multiple digital and analog functions together within CMOS by adapting the Magnetic Tunneling Junctions (MTJs) into a single baseline technology enabling logic, memory, and analog functions, particularly for Internet of Things (IoT) platforms. This will lead to a unique STT-MTJ cell technology called Multifunctional Standardized Stack (MSS). This paper presents the progress in the project from the technology, compact modeling, process design kit, standard cells, as well as memory and system level design evaluation and exploration. The proposed technology and toolsets are giant leaps towards heterogeneous integrated technology and architectures for IoT.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.2.3PROGRESS ON CARBON NANOTUBE BEOL INTERCONNECTS
Speaker:
Benjamin Uhlig, Fraunhofer IPMS, Dresden, DE
Authors:
Benjamin Uhlig1, Jie Liang2, Jaehyun Lee3, Raphael Ramos4, Abitha Dhavamani1, Nicole Nagy1, Jean Dijon4, Hanako Okuno5, Dipankar Kalita5, Vihar Georgiev3, Asen Asenov3, Salvatore Amoroso6, Liping Wang6, Campbell Millar6, Fabian Koenemann7, Bernd Gotsmann7, Goncalo Goncalves8, Bingan Chen8, Reetu Raj Pandey2, Rongmei Chen2 and Aida Todri-Sanial2
1Fraunhofer IPMS, DE; 2CNRS-LIRMM/University of Montpellier, FR; 3School of Engineering, University of Glasgow, GB; 4CEA-LITEN/University Grenoble Alpes, FR; 5CEA-INAC/University Grenoble Alpes, FR; 6Synopsys Inc., GB; 7IBM Research Zurich, CH; 8Aixtron Ltd., GB
Abstract
This article is a review of the current progress and results obtained in the European H2020 CONNECT project. Amongst all the research on carbon nanotube interconnects, those discussed here cover 1) process & growth of carbon nanotube interconnects compatible with back-end-of-line integration, 2) modeling and simulation from atomistic to circuit-level benchmarking and performance prediction, and 3) characterization and electrical measurements. We provide an overview of the current advancements on carbon nanotube interconnects and also regarding the prospects for designing energy efficient integrated circuits. Each selected category is presented in an accessible manner aiming to serve as a review and informative cornerstone on carbon nanotube interconnects.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.2.4A WCET-AWARE PARALLEL PROGRAMMING MODEL FOR PREDICTABILITY ENHANCED MULTI-CORE ARCHITECTURES
Speaker:
Simon Reder, Karlsruhe Institute of Technology (KIT), DE
Authors:
Simon Reder1, Leonard Masing1, Harald Bucher1, Timon ter Braak2, Timo Stripf3 and Jürgen Becker1
1Karlsruhe Institute of Technology, DE; 2Recore Systems B.V., NL; 3emmtrix Technologies GmbH, DE
Abstract
Increasing performance requirements for cyber-physical systems in real-time applications raise the necessity to migrate to multi-core processor systems. However, commercial of the shelf multi-core systems are often inappropriate for the real-time domain and real-time capable multi-core programming models are rare. In this paper, we present a solution developed within the EU research project ARGO. By means of a predictability-enhanced NoC-based multi-/many-core architecture, we investigate hardware properties that can help to improve the predictability of the platform and the programming model. Both platform and programming model are complemented by a WCET-aware Architecture Description Language (ADL). This enables a certain degree of hardware abstraction while preserving the relevant details for accurate multi-core WCET analysis algorithms. Target platform and programming model are designed to be statically analyzable by multi-core WCET computation tools, that are part of the automated WCET-aware software parallelization tool flow developed in the ARGO project.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP3-14, 1007DESIGN OF A TIME-PREDICTABLE MULTICORE PROCESSOR: THE T-CREST PROJECT
Speaker and Author:
Martin Schoeberl, Technical University of Denmark, DK
Abstract
Real-time systems need to deliver results in time and often this timely production of a result needs to be guaranteed. Static timing analysis can be used to bound the worst-case execution time of tasks. However, this timing analysis is only possible if the processor architecture is analysis friendly. This paper presents the T-CREST processor, a real-time multicore processor developed to be time-predictable and an easy target for static worst-case execution time analysis. We present how to achieve time-predictability at all levels of the architecture, from the processor pipeline, via a network-on-chip, up to the memory controller. The main architectural feature to provide time predictability is to use static arbitration of shared resources in a time-division multiplexing way.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.3 Real time intelligent methods for energy-efficient approaches in CNN and biomedical applications

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 1

Chair:
Theocharis Theocharides, University of Cyprus, CY

Co-Chair:
Jose L. Ayala, Dpto Arquitectura de Computadores - UCM, ES

Mobile devices and wearables require increased integration of technology for real-time applications, in particular in health and transport technology. This enables the possibility to implement machine-learning techniques directly on board. This session will firstly outline applications to detect and predict pathological health conditions, before examining real-time applications in UAVs.

TimeLabelPresentation Title
Authors
17:008.3.1ONLINE EFFICIENT BIO-MEDICAL VIDEO TRANSCODING ON MPSOCS THROUGH CONTENT-AWARE WORKLOAD ALLOCATION
Speaker:
Arman Iranfar, Embedded Systems Lab (ESL), EPFL, CH
Authors:
Arman Iranfar1, Ali Pahlevan1, Marina Zapater1, Martin Žagar2, Mario Kovač2 and David Atienza1
1Embedded Systems Lab (ESL), EPFL, CH; 2University of Zagreb, HR
Abstract
Bio-medical image processing in the field of telemedicine, and in particular the definition of systems that allow medical diagnostics in a collaborative and distributed way is experiencing an undeniable growth. Due to the high quality of bio-medical videos and the subsequent large volumes of data generated, to enable medical diagnosis on-the-go it is imperative to efficiently transcode and stream the stored videos on real time, without quality loss. However, online video transcoding is a high-demanding computationally-intensive task and its efficient management in Multiprocessor Systems-on-Chip (MPSoCs) poses an important challenge. In this work we propose an efficient motion- and texture-aware frame-level parallelization approach to enable online medical imaging transcoding on MPSoCs for next generation video encoders. By exploiting the unique characteristics of bio-medical videos and the medical procedure that enable diagnosis, we split frames into tiles based on their motion and texture, deciding the most adequate level of parallelization. Then, we employ the available encoding parameters to satisfy the required video quality and compression. Moreover, we propose a new fast motion search algorithm for bio-medical videos that allows to drastically reduce the computational complexity of the encoder, thus achieving the frame rates required for online transcoding. Finally, we heuristically allocate the threads to the most appropriate available resources and set the operating frequency of each one. We evaluate our work on an enterprise multi-core server achieving online medical imaging with 1.6x higher throughput and 44% less power consumption when compared to the state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.3.2HIGHLY EFFICIENT AND ACCURATE SEIZURE PREDICTION ON CONSTRAINED IOT DEVICES
Speaker:
Farzad Samie, Karlsruhe Institute of Technology (KIT), DE
Authors:
Farzad Samie, Sebastian Paul, Lars Bauer and Joerg Henkel, Karlsruhe Institute of Technology, DE
Abstract
In this paper we present an efficient and accurate algorithm for epileptic seizure prediction on low-power and portable IoT devices. State-of-the-art algorithms suffer from two issues: computation intensive features and large internal memory requirement, which make them inapplicable for constrained devices. We reduce the memory requirement of our algorithm by reducing the size of data segments (i.e. the window of input stream data on which the processing is performed), and the number of required EEG channels. To respect the limitation of the processing capability, we reduce the complexity of our exploited features by only considering the simple features, which also contributes to reducing the memory requirements. Then, we provide new relevant features to compensate the information loss due to the simplifications (i.e. less number of channels, simpler features, shorter segment, etc.). We measured the energy consumption (12.41 mJ) and execution time (565 ms) for processing each segment (i.e. 5.12 seconds of EEG data) on a low-power MSP432 device. Even though the state-of-art does not fit to IoT devices, we evaluate the classification performance and show that our algorithm achieves the highest AUC score (0.79) for the held-out data and outperforms the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.3.3A WEARABLE LONG-TERM SINGLE-LEAD ECG PROCESSOR FOR EARLY DETECTION OF CARDIAC ARRHYTHMIA
Speaker:
Muhammad Awais Bin Altaf, Lahore University of Management Sciences (LUMS), PK
Authors:
Syed Muhammad Abubakar, Wala Saadeh and Muhammad Awais Bin Altaf, Lahore University of Management Sciences (LUMS), PK
Abstract
Cardiac arrhythmia (CA) is one of the most serious heart diseases that lead to a very large number of annual casualties around the world. The traditional electrocardiography (ECG) devices usually fail to capture arrhythmia symptoms during patients' hospital visits due to their recurrent nature. This paper presents a wearable long term single-lead ECG processor for the CA detection at an early stage. To achieve on-sensor integration and long-term continuous monitoring, an ultra-low complexity feature extraction engine using reduced feature set of four (RFS4) is proposed. It reduces the area by >25% compared to the conventional QRS complex detection algorithms without compromising the accuracy. Moreover, RFS4 eliminates the need for complex machine learning decision logics for the detection of premature ventricular contraction (PVC) and nonsustained ventricular tachycardia (NVT). To ensure correct functional verification, the proposed system is implemented on FPGA and tested using the MIT-BIH ECG arrhythmia database. It achieves a sensitivity and specificity of 94.64% and 99.41%, respectively. The proposed processor is also synthesized using 0.18um CMOS technology with an overall energy efficiency of 139 nJ/detection.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.3.4DRONET: EFFICIENT CONVOLUTIONAL NEURAL NETWORK DETECTOR FOR REAL-TIME UAV APPLICATIONS
Speaker:
Christos Kyrkou, University of Cyprus, KIOS CoE, CY
Authors:
Christos Kyrkou1, George Plastiras1, Stylianos Venieris2, Theocharis Theocharides1 and Christos Bouganis2
1University of Cyprus, CY; 2Imperial College London, GB
Abstract
Unmanned Aerial Vehicles (drones) are emerging as a promising technology for both environmental and infrastructure monitoring, with broad use in a plethora of applications. Many such applications require the use of computer vision algorithms in order to analyse the information captured from an on-board camera. Such applications include detecting vehicles for emergency response and traffic monitoring. This paper therefore, explores the trade-offs involved in the development of a single-shot object detector based on deep convolutional neural networks (CNNs) that can enable UAVs to perform vehicle detection under a resource constrained environment such as in a UAV. The paper presents a holistic approach for designing such systems; the data collection and training stages, the CNN architecture, and the optimizations necessary to efficiently map such a CNN on a lightweight embedded processing platform suitable for deployment on UAVs. Through the analysis we propose a CNN architecture that is capable of detecting vehicles from aerial UAV images and can operate between 5-18 frames-per-second for a variety of platforms with an overall accuracy of ~95%. Overall, the proposed architecture is suitable for UAV applications, utilizing low-power embedded processors that can be deployed on commercial UAVs.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP3-15, 889ERROR RESILIENCE ANALYSIS FOR SYSTEMATICALLY EMPLOYING APPROXIMATE COMPUTING IN CONVOLUTIONAL NEURAL NETWORKS
Speaker:
Muhammad Abdullah Hanif, Vienna University of Technology, Vienna, AT
Authors:
Muhammad Abdullah Hanif1, Rehan Hafiz2 and Muhammad Shafique1
1TU Wien, AT; 2ITU, PK
Abstract
Approximate computing is an emerging paradigm for error resilient applications as it leverages accuracy loss for improving power, energy, area, and/or performance of an application. The spectrum of error resilient applications includes the domains of Image and video processing, Artificial intelligence (AI) and Machine Learning (ML), data analytics, and other Recognition, Mining, and Synthesis (RMS) applications. In this work, we address one of the most challenging question, i.e., how to systematically employ approximate computing in Convolutional Neural Networks (CNNs), which are one of the most compute-intensive and the pivotal part of AI. Towards this, we propose a methodology to systematically analyze error resilience of deep CNNs and identify parameters that can be exploited for improving performance/efficiency of these networks for inference purposes. We also present a case study for significance-driven classification of filters for different convolutional layers, and propose to prune those having the least significance, and thereby enabling accuracy vs. efficiency tradeoffs by exploiting their resilience characteristics in a systematic way.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.4 Efficient and reliable memory and computing architectures

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 2

Chair:
Göhringer Diana, Technische Universität Dresden, DE

Co-Chair:
Jie Han, University of Alberta, CA

This session covers the exploitation of techniques to improve energy efficiency and resilience in memory and computing architectures. The first paper proposes a method using hybrid vertex-edge memory hierarchy to reduce the energy consumption of resistive random-access memory (ReRAM) systems. This method significantly improves the energy efficiency compared to conventional DRAM-based systems. The second paper examines the use of neural networks to improve resilience. The third paper addresses the performance bottleneck in von Neumann architectures by proposing an efficient algorithm for matrix multiplication in the memory of resistive associative processors (ReAPs). Finally, the last paper covers run-time application mapping on manycore systems for aging and process variations.

TimeLabelPresentation Title
Authors
17:008.4.1(Best Paper Award Candidate)
HYVE: HYBRID VERTEX-EDGE MEMORY HIERARCHY FOR ENERGY-EFFICIENT GRAPH PROCESSING
Speaker:
Tianhao Huang, Tsinghua University, CN
Authors:
Tianhao Huang, Guohao Dai, Yu Wang and Huazhong Yang, Tsinghua University, CN
Abstract
High energy consumption of conventional memory modules (e.g. DRAMs) hinders the further improvement of large-scale graph processing's energy efficiency. The emerging metal-oxide resistive random-access memory (ReRAM) and ReRAM crossbar have shown great potential in providing the energy-efficient memory module. However, the performance of ReRAMs suffers from data access patterns with poor locality and a large amount of written data, which are common in graph processing. In this paper, we propose a Hybrid Vertex-Edge memory hierarchy, HyVE, to avoid random access and data written to ReRAM modules. With data allocation and scheduling over vertices and edges, HyVE reduces memory energy consumption by 69% compared with conventional memory system in graph processing. Moreover, we adopt bank level power-gating scheme to further reduce the stand-by power. Our evaluations show that the optimized design achieve at least 2.0x improvement of energy efficiency against DRAM-based design.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.4.2ACCURATE NEURON RESILIENCE PREDICTION FOR A FLEXIBLE RELIABILITY MANAGEMENT IN NEURAL NETWORK ACCELERATORS
Speaker:
Christoph Schorn, Robert Bosch GmbH, DE
Authors:
Christoph Schorn1, Andre Guntoro1 and Gerd Ascheid2
1Robert Bosch GmbH, DE; 2RWTH Aachen University, DE
Abstract
Deep neural networks have become a ubiquitous tool for mastering complex classification tasks. Current research focuses on the development of power-efficient and fast neural network hardware accelerators for mobile and embedded devices. However, when used in safety-critical applications, for example autonomously operating vehicles, the reliability of such accelerators becomes a further optimization criterion which can stand in contrast to power-efficiency and latency. Furthermore, ensuring hardware reliability becomes increasingly challenging for shrinking structure widths and rising power densities in the nanometer semiconductor technology era. One solution to this challenge is the exploitation of fault tolerant parts in deep neural networks. In this paper we propose a new method for predicting the error resilience of neurons in deep neural networks and show that this method significantly improves upon existing methods in terms of accuracy as well as interpretability. We evaluate prediction accuracy by simulating hardware faults in networks trained on the CIFAR-10 and ILSVRC image classification benchmarks and protecting neurons according to the resilience estimations. In addition, we demonstrate how our resilience prediction can be used for a flexible trade-off between reliability and efficiency in neural network hardware accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.4.3RAPID IN-MEMORY MATRIX MULTIPLICATION USING ASSOCIATIVE PROCESSOR
Speaker:
Hasan Erdem Yantir, University of California Irvine, US
Authors:
Neggaz Mohamed Ayoub1, Hasan Erdem Yantır2, Smail Niar3, Ahmed Eltawil2 and Fadi Kurdahi2
1University of Valenciennes, FR; 2University of California, Irvine, US; 3LAMIH-University of Valenciennes, FR
Abstract
Memory hierarchy latency is one of the main problems that prevents processors from achieving high performance. To eliminate the need of loading/storing large sets of data, Resistive Associative Processors (ReAP) have been proposed as a solution to the von Neumann bottleneck. In ReAPs, logic and memory structures are combined together to allow in-memory computations. In this paper, we propose a new algorithm to compute the matrix multiplication inside the memory that exploits the benefits of ReAP. The proposed approach is based on the Cannon algorithm and uses a series of rotations without duplicating the data. It runs in O(n), where n is the dimension of the matrix. The method also applies to a large set of row by column matrix-based applications. Experimental results show several orders of magnitude increase in performance and reduction in energy and area when compared to the latest FPGA and CPU implementations.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.4.4HIMAP: A HIERARCHICAL MAPPING APPROACH FOR ENHANCING LIFETIME RELIABILITY OF DARK SILICON MANYCORE SYSTEMS
Speaker:
Vivek Chaturvedi, Nanyang Technological University, SG
Authors:
Vijeta Rathore1, Vivek Chaturvedi1, Amit Kumar Singh2, Thambipillai Srikanthan1, Rohith R1, Siew Kei Lam1 and Muhammad Shafique3
1Nanyang Technological University, SG; 2University of Essex, GB; 3TU Wien, AT
Abstract
Technology scaling into the nano-scale CMOS regime has resulted in increased leakage and roadblock on voltage scaling, which has led to several issues like high power density and elevated on-chip temperature. This consequently aggravates device aging, compromising lifetime reliability of the manycore systems. This paper proposes extit{HiMap}, a dynamic hierarchical mapping approach to maximize lifetime reliability of manycore systems while satisfying performance, power, and thermal constraints. HiMap is process variation- and aging-aware. It comprises of two levels: (1) it identifies a region of cores suitable for mapping, and (2) it maps threads in the region and intersperses dark cores for thermal mitigation while considering the current health of the cores. Both the levels strive to reduce aging variance across the chip. We evaluated HiMap for 64-core and 256-core systems. Results demonstrate an improved system lifetime reliability by up to 2 years at the end of 3.25 years of use, as compared to the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP3-16, 906DEMAS: AN EFFICIENT DESIGN METHODOLOGY FOR BUILDING APPROXIMATE ADDERS FOR FPGA-BASED SYSTEMS
Speaker:
Semeen Rehman, Vienna University of Technology (TU Wien), AT
Authors:
Bharath Srinivas Prabakaran1, Semeen Rehman1, Muhammad Abdullah Hanif1, Salim Ullah2, Ghazal Mazaheri3, Akash Kumar2 and Muhammad Shafique1
1TU Wien, AT; 2Technische Universität Dresden, DE; 3UC Riverside, US
Abstract
The current state-of-the-art approximate adders are mostly ASIC-based, i.e., they focus solely on gate and/or transistor level approximations (e.g., through circuit simplification or truncation) to achieve area, latency, power and/or energy savings at the cost of accuracy loss. However, when these designs are synthesized for FPGA-based systems, they do not offer similar reductions in area, latency and power/energy due to the underlying architectural differences between ASICs and FPGAs. In this paper, we present a novel generic design methodology to synthesize and implement approximate adders for any FPGA-based system by considering the underlying resources and architectural differences. Using our methodology, we have designed, analyzed and presented eight different multi-bit adder architectures. Compared to the 16-bit accurate adder, our designs are successful in achieving area, latency and power-delay product gains of 50%, 38%, and 53%, respectively. We also compare our approximate adders to state-of-the-art approximate adders specialized for ASIC and FPGA fabrics and demonstrate the benefits of our approach. We will make the RTL and behavioral models of our and state-of-the-art designs open-source at https://sourceforge.net/projects/approxfpgas/ to further fuel the research and development in the FPGA community and to ensure reproducible research.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP3-17, 515GAIN SCHEDULED CONTROL FOR NONLINEAR POWER MANAGEMENT IN CMPS
Speaker:
Nikil Dutt, University of California, Irvine, US
Authors:
Bryan Donyanavard, Amir M. Rahmani, Tiago Muck, Kasra Moazzemi and Nikil Dutt, University of California, Irvine, US
Abstract
Dynamic voltage and frequency scaling (DVFS) is a well-established technique for power management of thermal- or energy-sensitive chip multiprocessors (CMPs). In this context, linear control theoretic solutions have been successfully implemented to control the voltage-frequency knobs. However, modern CMPs with a large range of operating frequencies and multiple voltage levels display nonlinear behavior in the relationship between frequency and power. State-of-the-art linear controllers therefore leave room for opportunity in optimizing DVFS operation. We propose a Gain Scheduled Controller (GSC) for nonlinear runtime power management of CMPs that simplifies the controller implementation of systems with varying dynamic properties by utilizing an adaptive control theoretic approach in conjunction with static linear controllers. Our design improves the stability, accuracy, settling time, and overshoot of the controller over a linear controller with minimal overhead. We implement our approach on an Exynos platform containing ARM's big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that the system's response to changes in target power is improved by 2x while operating up to 12% more efficiently.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP4-1, 376EFFICIENT MAPPING OF QUANTUM CIRCUITS TO THE IBM QX ARCHITECTURES
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner, Alexandru Paler and Robert Wille, Johannes Kepler University Linz, AT
Abstract
In March 2017, IBM launched the project IBM Q with the goal to provide access to quantum computers for a broad audience. This allowed users to conduct quantum experiments on a 5-qubit and, since June 2017, also on a 16-qubit quantum computer (called IBM QX2 and IBM QX3, respectively). In order to use these, the desired quantum functionality (e.g. provided in terms of a quantum circuit) has to properly be mapped so that the underlying physical constraints are satisfied - a complex task. This demands for solutions to automatically and efficiently conduct this mapping process. In this paper, we propose such an approach which satisfies all constraints given by the architecture and, at the same time, aims to keep the overhead in terms of additionally required quantum gates minimal. The proposed approach is generic and can easily be configured for future architectures. Experimental evaluations show that the proposed approach clearly outperforms IBM's own mapping solution with respect to runtime as well as resulting costs.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.5 From NBTI to IoT security: industrial experiences

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 3

Chair:
Doris Keitel-Schulz, Infineon Technologies, DE

Co-Chair:
Norbert Wehn, University of Kaiserslautern, DE

This session covers industrial experiences from technologies such as NBTI mitigation and adaptive volatge scaling to system level aspects including safety-critical applications and IoT security.

TimeLabelPresentation Title
Authors
17:008.5.1NBTI AGED CELL REJUVENATION WITH BACK BIASING AND RESULTING CRITICAL PATH REORDERING FOR DIGITAL CIRCUITS IN 28NM FDSOI
Speaker:
Lorena Anghel, TIMA Labs, FR
Authors:
Ajith Sivadasan1, Riddhi Jitendrakumar Shah2, Vincent Huard3, Florian Cacho3 and Lorena Anghel4
1TIMA Labs & ST Microelectronics, FR; 2TIMA Labs, FR; 3STMicroelectronics, FR; 4Grenoble-Alpes University, FR
Abstract
Abstract— Increasing demands from Autonomous Driving and IoT markets are pushing the need for products with advanced CMOS nodes that guarantee a high level of performance and at the same time having to comply with industrial regulatory standards like ISO26262, AEC-Q100 etc. Implementation of NBTI & DiR Reliability models for 28nm FDSOI, developed in-house, are fundamental means to evaluate the reliability of digital IPs during the design phase. Process, Temperature, Voltage, Workload based Aging are mission profile parameters traditionally taken into account for design margin evaluations and critical path pruning. A precise critical path selection methodology is highly important considering the In-situ monitor Insertion and Critical Path Replica generation strategies to be applied to Runtime Reliability assessment with the vision to move towards dynamic wear out management solutions. This paper recommends the consideration of the silicon technology feature of back biasing as an important parameter while selecting Critical Paths for circuits fabricated with FDSOI process. Back biasing is an add-on feature of this technology with ABB (adaptive back biasing) techniques having been used to compensate for PVT variations or aimed at a gain in the overall digital circuit performance. This technique is now being increasingly applied to aging mitigation. The back-biasing gain for an aged digital IP is quantified while performing design stage Gate Level Analysis yielding interesting insights on its impact on the operational frequency determining critical path rankings. Keywords— NBTI, Back/Body Biasing, Aging, Critical Path, Reliability

Download Paper (PDF; Only available from the DATE venue WiFi)
17:158.5.2AN INDUSTRIAL CASE STUDY OF LOW COST ADAPTIVE VOLTAGE SCALING USING DELAY TEST PATTERNS
Speaker:
Mahroo Zandrahimi, TU Delft, NL
Authors:
Mahroo Zandrahimi1, Philippe Debaud2, Armand Castillejo2 and Zaid Al-Ars1
1Delft University of Technology, NL; 2STMicroelectronics, FR
Abstract
In deep sub-micron technologies, the increasing effect of process and environmental variations has lead chip manufacturers to use adaptive voltage scaling techniques in order to adapt operation parameters exclusively to each chip. The increasing effect of process variation is limiting the effectiveness of current chip monitoring approaches, such as on-chip performance monitor boxes (PMBs), which results in yield loss and high design margins, thus high power consumption. This paper proposes an alternative solution for adaptive voltage scaling using delay test patterns, which is able to eliminate the need for PMBs, and thus the long expensive characterization phase of tuning PMBs to each design, while improving the yield as well as power optimization. Results show, using an industrial grade 28nm FD- SOI library developed for low power devices, that delay testing for performance prediction reduces the inaccuracy down to 1.85%.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.5.3A CASE STUDY FOR USING DYNAMIC PARTITIONING BASED SOLUTION IN VOLUME DIAGNOSIS
Speaker:
Wu Yang, Mentor, A Siemens Business, US
Authors:
Tao Wang1, Zhangchun Shi1, Junlin Huang1, Huaxing Tang2, Wu Yang2 and Junna Zhong3
1Hisilicon Technologies Co., Ltd, CN; 2Mentor, A Siemens Business, US; 3Mentor, A Siemens Business, CN
Abstract
Diagnosis driven yield analysis (DDYA) has been widely adopted for advanced technology node product yield ramp [1]. However gigantic design size and high pattern count demand intense computation resources to diagnose volume failure data, and the diagnosis throughput becomes the bottleneck for the DDYA flow. This paper presents a case study which uses the fully automated dynamic partitioning based diagnosis solution to dramatically improve the throughput. Experimental results based on real silicon data manufactured by a 16nm FinFET technology shows more than 3X reduction for memory footprint and more than 4X improvement for runtime, which eliminates the throughput bottleneck.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:458.5.4ON-LINE RF BUILT-IN SELF-TEST USING NOISE INJECTION AND TRANSMITTER SIGNAL MODULATION BY PHASE SHIFTER
Speaker and Author:
Jan Schat, NXP Semiconductors, DE
Abstract
For on-chip self-test of radar ICs, loopback test using a signal feedback path from transmitter to receiver is state-of-the-art. Usually, such a loopback test is performed periodically after a number of application-mode chirps. The traditional loopback test has two drawbacks, however: It is performed intermittent to the application mode, not within the application mode. Moreover, it cannot detect the case that the attenuation from transmitter to receiver becomes too low due to defects on the IC, or due to targets very near to the antennas. This paper proposes an advanced loopback test not intermittent to the application, but during application mode. That way, spurious defects like transient faults (also known as Single Event Upsets) can be detected; moreover, an error-prone plausibility check of the received signal is avoided. To detect receiver saturation due to near targets, modulating the transmitter output signal using a phase shifter is proposed.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.5.5NEURAL NETWORKS FOR SAFETY-CRITICAL APPLICATIONS - CHALLENGES, EXPERIMENTS AND PERSPECTIVES
Speaker:
Chih-Hong Cheng, fortiss, DE
Authors:
Chih-Hong Cheng1, Frederik Diehl2, Yassine Hamza2, Gereon Hinz2, Georg Nührenberg2, Markus Rickert2, Harald Ruess2 and Michael Troung-Le2
1fortiss - Landesforschungsinstitut des Freistaats Bayern, DE; 2fortiss GmbH, DE
Abstract
We propose a methodology for designing dependable Artificial Neural Networks (ANNs) by extending the concepts of understandability, correctness, and validity that are crucial ingredients in existing certification standards. We apply the concept in a concrete case study for designing a highway ANN-based motion predictor to guarantee safety properties such as impossibility for the ego vehicle to suggest moving to the right lane if there exists another vehicle on its right.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.5.6IOT SECURITY ASSESSMENT THROUGH THE INTERFACES P-SCAN TEST BENCH PLATFORM
Speaker:
Thomas Maurin, CEA, Leti, Univ. Grenoble Alpes, FR
Authors:
Thomas Maurin1, Laurent-Frédéric Ducreux1, George Caraiman2 and Philippe Sissoko2
1CEA Leti, Univ. Grenoble Alpes, FR; 2LCIE Bureau Veritas, FR
Abstract
The recent, massive and always-growing usage of communicating objects exchanging data over interconnected networks makes these objects vulnerable to cyber-attacks.Ranging from mainstream industrial devices to IoT products, the P-SCAN test platform is designed as a convenient solution to democratize connected objects security assessment. Associated to guidelines easing the definition of a device security target, the platform provides a library of test suites which enables automating the process of testing security features on the device's communication interfaces. As technologies evolve, the platform is designed to be scalable and customisable (new interfaces, new standard test suites, specific test cases with respect to new Common Vulnerabilities and Exposures) to detect potential vulnerabilities. This paper explains the identified business needs and market segment, the related value proposition and gives an overview of the provided technical solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.6 Designing reliable embedded architectures under uncertainty

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 4

Chair:
Oliver Bringmann, Universität Tübingen, DE

Co-Chair:
Amit Singh, University of Essex, GB

Reliability is a target which can be reached in some different ways, by using, for instance, fault-tolerant architectures or by exploiting adaptable architecture. The session presents original contributions in both directions.On the first paper, reconfigurable VLIW processors are targeted by means of dynamic binary translation to explore a performance-energy trade-off. The following two papers propose solutions for fault prevention, detection and isolation, without compromising performance. In the last paper, the potential use of approximate, low-power functional units is targeted while remaining within the overall error budget of an application.

TimeLabelPresentation Title
Authors
17:008.6.1SUPPORTING RUNTIME RECONFIGURABLE VLIWS CORES THROUGH DYNAMIC BINARY TRANSLATION
Speaker:
Simon Rokicki, Univ Rennes, INRIA, CNRS, IRISA, FR
Authors:
Simon Rokicki1, Erven Rohou2 and Steven Derrien3
1Irisa, FR; 2Inria, FR; 3University of Rennes 1/IRISA, FR
Abstract
Single ISA Heterogeneous multi-cores such as the ARM big.LITTLE have proven to be an attractive solution to explore different energy/performance trade-offs. Such architectures combine Out of Order cores with smaller in-order ones to offer different power/energy profiles. They however do not really exploit the characteristics of workloads (compute intensive vs control dominated). In this work, we propose to enrich these architectures with runtime configurable VLIW cores, which are very efficient at compute intensive kernels. To preserve the single ISA programming model, we resort to Dynamic Binary Translation, and use this technique to enable dynamic code specialization for runtime reconfigurable VLIW cores. Our proposed DBT framework targets the RISC-V ISA, for which both OoO and in-order implementations exist. Our experimental results show that our approach can lead to best-case performance and energy efficiency when compared against static VLIW configurations.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.6.2USFI: ULTRA-LIGHTWEIGHT SOFTWARE FAULT ISOLATION FOR IOT-CLASS DEVICES
Speaker:
Zelalem Aweke, University of Michigan, US
Authors:
Zelalem Birhanu Aweke and Todd Austin, University of Michigan, US
Abstract
Embedded device security is a particularly difficult challenge, as the quantity of devices makes them attractive tar- gets, while their cost-sensitive design leads to less-than-desirable security implementations. Most current low-end embedded de- vices do not include any form of security or only include simple memory protection support. One line of research in crafting low- cost security for low-end embedded devices has focused on sand- boxing trusted code from untrusted code using both hardware and software techniques. These previous attempts suffer from large trusted code bases (e.g., including the entire kernel), high runtime overheads (e.g., due to code instrumentation), partial protection (e.g., only provide write protection), or heavyweight hardware modifications. In this work, we leverage the rudimentary memory protection support found in modern IoT-class microcontrollers to build a low-profile, low-overhead, flexible sandboxing mechanism that can provide isolation between tightly-coupled software modules. With our approach, named uSFI, only the trust management code must be trusted. Through the use of a static verifier and monitored inter-module transitions, module code at all privilege levels (including the kernel) is able to run uninstrumented and untrusted code. We implemented uSFI on an ARMv7-M based processor, both bare metal and running the freeRTOS kernel, and analyzed the performance using the MiBench embedded benchmark suite and two additional highly detailed applications. We found that performance overheads were minimal, with at most 1.1% slowdown, and code size overheads were also low, at a maximum of 10%. In addition, our trusted code base is trivially small at only 150 lines of code.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.6.3CONVERGING SAFETY AND HIGH-PERFORMANCE DOMAINS: INTEGRATING OPENMP INTO ADA
Speaker:
Sara Royuela, Barcelona Supercomputing Center, ES
Authors:
Sara Royuela1, Eduardo Quinones1 and Luis Miguel Pinho2
1barcelona supercomputing center, ES; 2polytechnic institute of porto, PT
Abstract
The use of parallel heterogeneous embedded architectures is needed to implement the level of performance required in advanced safety-critical systems. Hence, there is a demand for using high level parallel programming models capable of efficiently exploiting the performance opportunities. In this paper, we evaluate the incorporation of OpenMP, a parallel programming model used in HPC, into Ada, a language spread in safety-critical domains. We demonstrate that the execution model of OpenMP is compatible with the recently proposed Ada tasklet model, meant to exploit fine-grain structured parallelism. Moreover, we show the compatibility of the OpenMP and tasklet models, enabling the use of OpenMP directives in Ada to further exploit unstructured parallelism and heterogeneous computation. Finally, we state the safety properties of OpenMP and analyze the interoperability between the OpenMP and Ada runtimes. Overall, we conclude that OpenMP can be effectively incorporated into Ada without jeopardizing its safety properties.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.6.4COMPILER-DRIVEN ERROR ANALYSIS FOR DESIGNING APPROXIMATE ACCELERATORS
Speaker:
Jorge Castro-Godínez, Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), DE
Authors:
Jorge Castro-Godinez1, Sven Esser1, Muhammad Shafique2, Santiago Pagani1 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2TU Wien, AT
Abstract
Approximate Computing has emerged as a design paradigm suitable to applications with inherent error resilience. This paradigm aims to reduce the associated computing costs (such as execution time, area, or energy) of exact calculations by reducing the quality of their results. Several approximate arithmetic circuits have been proposed, which can be used to implement hardware blocks such as approximate accelerators. However, to satisfy quality constraints in these accelerators, it is imperative to assess how the errors introduced by approximate circuits propagate through other exact and approximate computations, and finally accumulate at the output. This is, in particular, crucial to enable high-level synthesis of approximate accelerators. This work proposes a compiler-driven error analysis methodology to evaluate the behavior of errors generated from approximate adders in the design of approximate accelerators. We present CEDA, a tool to perform a static analysis of the error propagation. This tool uses #pragma-based annotated C/C++ source code as input. With these annotations, exact additions are replaced by approximate ones during the code analysis to estimate the error at the output. The error estimations produced by our tool are comparable to those obtained through simulations.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP4-2, 105PARALLEL CODE GENERATION OF SYNCHRONOUS PROGRAMS FOR A MANY-CORE ARCHITECTURE
Speaker:
Amaury Graillat, Verimag - Univ. Grenoble Alpes, FR
Authors:
Amaury Graillat1, Matthieu Moy2, Pascal Raymond3 and Benoît Dupont de Dinechin4
1Verimag - Univ. Grenoble Alpes, FR; 2Univ. Grenoble Alpes, Verimag, FR; 3VERIMAG/CNRS, FR; 4Kalray, FR
Abstract
AmEmbedded systems tend to require more and more computational power. Many-core architectures are good candi- dates since they offer power and are considered more time predictable than classical multi-cores. Data-flow Synchronous languages such as Lustre or Scade are widely used for avionic critical software. Programs are described by networks of computational nodes. Implementation of such programs on a many-core architecture must ensure a bounded response time and preserve the functional behavior by taking interference into account. We consider the top-level node of a Lustre application as a software architecture description where each sub-node corresponds to a potential parallel task. Given a mapping (tasks to cores), we automatically generate code suitable for the targeted many-core architecture. This minimizes memory interferences and allows usage of a framework to compute the Worst-Case Response Time.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP4-3, 272SOCRATES - A SEAMLESS ONLINE COMPILER AND SYSTEM RUNTIME AUTOTUNING FRAMEWORK FOR ENERGY-AWARE APPLICATIONS
Speaker:
Gianluca Palermo, Politecnico di Milano, IT
Authors:
Davide Gadioli1, Ricardo Nobre2, Pedro Pinto3, Emanuele Vitali1, Amir H. Ashouri4, Gianluca Palermo1, Cristina Silvano1 and João M. P. Cardoso5
1Politecnico di Milano, IT; 2University of Porto / INESC TEC, PT; 3Faculty of Engineering, University of Porto, PT; 4University of Toronto, Canada, CA; 5University of Porto, PT
Abstract
Configuring program parallelism and selecting optimal compiler options according to the underlying platform architecture is a difficult task if completely demanded to the programmer or done by using a default one-fits-all policy generated by the compiler or runtime system. Given the dynamics of the problem, a runtime selection of the best configuration is obviously the desirable solution. However, implementing this solution into the application requires the insertion of a lot of glue code for profiling and runtime selection. This represents a programming wall to actually make it feasible. This paper presents a structured approach called SOCRATES, based on a Domain Specific Language (LARA) and a runtime autotuner (mARGOt), to alleviate this effort. LARA has been used to hide the glue code insertion, thus separating the pure functional application description from extra-functional requirements. mARGOT has been used for the automatic selection of the best configuration according to the runtime evolution of the application. To demonstrated the effectiveness of the proposed approach, we evaluated SOCRATES by varying the application workloads, hardware resources and energy efficiency requirements for 12 OpenMP Polybench/C with respect to a standard one-fits-all solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:33IP4-4, 377NON-INTRUSIVE PROGRAM TRACING OF NON-PREEMPTIVE MULTITASKING SYSTEMS USING POWER CONSUMPTION
Speaker:
Kamal Lamichhane, University of Waterloo, CA
Authors:
Kamal Lamichhane, Carlos Moreno and Sebastian Fischmeister, University of Waterloo, CA
Abstract
System tracing, runtime monitoring, execution reconstruction are useful techniques for protecting the safety and integrity of systems. Furthermore, with time-aware or overhead-aware techniques being available, these techniques can also be used to monitor and secure production systems. As operating systems gain in popularity, even in deeply embedded systems, these techniques face the challenge to support multitasking. In this paper, we propose a novel non-intrusive technique, which efficiently reconstructs the execution trace of non-preemptive multitasking system by observing power consumption characteristics. Our technique uses the control-flow graph (CFG) of the application program to identify the most likely block of code that the system is executing at any given point in time. For the purpose of the experimental evaluation, we first instrument the source code to obtain power consumption information for each basic block, which is used as the training data for our Dynamic Time Warping and k-Nearest Neighbours (k-NN) classifier. Once the system is trained, this technique is used to identify live code-block execution (LCBE). We show that the technique can reconstruct the execution flow of programs in a multi-tasking environment with high accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session

8.8 22FDX - the superior technology for IoT, RF, Automotive and Mobility: Best-in Class RF, 5G and mmWave designs

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Exhibition Theatre

Organiser:
Claudia Kretzschmar, GLOBALFOUNDRIES, DE

This session focusses on the RF and mmWave capabilities of 22FDX where the technology has a great advantage over bulk or FinFET technologies: 22FDX is the best choice for any application where lowest analog and RF/mmWave circuit power consumption is desired. It offers a high peak frequency performance (Ft, Fmax), enables great integration of PA due to a high stacking efficiency and has one of the best CMOS switch behaviors due to low Ron and better Ron*Coff than FinFET or bulk technologies and a low-loss BEOL.

22FDX combines the best of RF performance of SiGe and PDSOI into one process technology, giving designers the opportunity to design best in class switches, LNA's and PA's onto a single die, integrated with transceiver and digital baseband

The first talk of this session will present RF integrated circuits for multi-Gbps communication and provide two examples which are best in class regarding frequency of operation, broadest frequency band, energy efficiency and area.

The second presentation highlights the strengths of 22FDX regarding noise and linearity demonstrating smart mixed-signal calibration techniques in order to meet the performance targets at minimum power consumption and smallest silicon area.

Another outstanding application are N-Path Filters for "Software Defined Radios" which are being addressed in the third talk. N-path filters benefit from CMOS scaling as switch parasitics improve, and increasingly higher digital clock frequencies are feasible.

The fourth presentation concludes this session describing a mmWave circuit design example on how to utilize the 22FDX features.

TimeLabelPresentation Title
Authors
17:008.8.1BEST-IN CLASS RF INTEGRATED CIRCUITS FOR MULTI-GBPS COMMUNICATION IN 22FDX
Speaker:
Corrado Carta, Technical University Dresden, DE
Abstract

This talk presents an overview of the ongoing circuit design activities in cooperation with GLOBALFOUNDRIES. Several RF integrated circuits for multi-Gbps communication have been demonstrated in 22FDX. Two examples will be presented in details: a Travelling Wave Amplifier (TWA) and a Mach-Zehnder Modulator (MZM) driver.

The gain cell employed for the TWAs is based on a conventional cascode topology and its layout is optimized to avoid performance degradation and instability. The TWA delivers a maximum gain of 10 dB over a 3-dB band of 110 GHz. The measured output power at 1 dB compression of the gain (o1dBcp) is 12.5 dBm at 20 GHz. Compared against the state of the art for TWAs, the presented design achieves the highest frequency of operation, as well as the broadest frequency band, and the highest o1dBcp. This circuit was awarded the best student paper award at the 2017 IEEE Asia Pacific Microwave Conference (APMC).

The high-voltage MZM driver is realized with stacked, low-impedance and low-loss switches which allow a voltage swing significantly larger than the breakdown voltage of individual devices. To enable this topology the signal is transferred to different voltage domains with level shifters, then amplified and conditioned to the rail-to-rail swing which ultimately drives the output power-stage. The bandwidth of the driver is increased with shunt and series inductive peaking. Stacking and coupling inductors result in only 0.018 mm2 of active area, which is the most compact reported for MZM drivers. Measurements show a 3.75 Vpp swing on a 50 Ω load. Error-free (BER< 10 −12) transmission at 30 Gbit/s is demonstrated. By using a switched output stage, only a small static dc-power is consumed, resulting in the most energy-efficient MZM driver with only 2.2 pJ/bit.

17:208.8.2SMART DATA CONVERTERS FOR WIRELINE AND WIRELESS SYSTEMS USING 22FDX
Speaker:
Friedel Gerfers, Technical University Berlin, DE
Abstract

High-performance, high-precision, energy-efficient data converters are indispensable in mixed-signal and RF ICs that enable next generation wireless, mobile computing, automotive, medical, and IoT applications. These applications demand the development of data conversion including mixed-signal calibration techniques that emphasize performance, accuracy, robustness, and energy efficiency with reduced silicon area / cost.

In this presentation, we will discuss the latest ADC and DAC architectures and their respective design challenges required for 5G RF systems, optical communication and automotive systems using 22FDX. The stringent performance requirements in terms of noise and linearity demand smart mixed-signal calibration techniques in order to meet the performance targets at minimum power consumption and smallest silicon area. In addition we highlight technological benefits of 22FDX, which enable power savings on both system and circuit level.

17:408.8.3N-PATH FILTERS AND MIXERS CONTROLLABLE BY A DIGITAL MULTI-PHASE CLOCK
Speaker:
Eric Klumperink, University of Twente, NL
Abstract

FDSOI technology offers both power-efficient and high-performance digital, amongst others because SOI-MOSFET switches have lower parasitic capacitances compared to bulk-technologies. These benefits are not only relevant for digital signal processing, but can also benefit analog radio frequency circuits. For "Software Defined Radios", very selective Radio Frequency bandpass-filters are wanted with a flexibly programmable center frequency to choose the channel. Also, highly linear mixers for frequency down-conversion to baseband before A/D conversion are needed. It turns out that these functions can both be implemented exploiting switches, combined with linear capacitors and resistors, realizing so called "N-path filters" or "Frequency Translated filters". Moreover, the reception frequency is defined by the frequency of a digital clock, which can be implemented using digital dividers and logic. The resulting N-path filters benefit from CMOS scaling as switch parasitics improve, and increasingly higher digital clock frequencies are feasible. This contribution will review the developments in CMOS N-path filters over the last decade, highlighting promising achieved results, while also discussing some implementation aspects and simulation results in 22nm FDSOI.

18:008.8.4MM-WAVE CIRCUIT DESIGN USING GLOBALFOUNDRIES 22FDX
Speaker:
Janne Aikio, University of Oulu, FI
Abstract

Co-authors: Janne P. Aikio1 , Mikko Hietanen2 , Henri Hurskainen2 , Timo Rahkonen1 , Aarno Pärssinen2
1. Circuits and systems Research unit, University of Oulu; 2. Center for Wireless Communication - Radio Technologies, University of Oulu

Design complexity is increasing in all aspects when moving towards 5G wireless systems. Enhanced mobile broadband (eMBB) communications require increased bandwidth and thus also higher carrier frequencies starting from lower mmW regime. Solutions call for increased speed of transistors and better passives that are not easily available in bulk CMOS technologies. 22FDX has many potential features to support complete transceiver implementation from mmW antenna interface down to baseband processing.

22FDX provides several enhancements such as wide variety of devices: fast devices for PA and LNA circuits and slower devices for static switches, for example. Another interesting feature is the back-gating option, which we used to decrease the knee voltage of the transistors of a divider circuit. The circuit library is extensive and devices for mm-wave application contain modelling and layout up to fifth metal layer of the stack. This approach simplifies routing and reduces the extraction and simulation time significantly.

We will provide design experience from the first time access to this technology and how we used different modelling approaches precharacterized from library cells to EM simulations with Momentum, and designed test structures including active and passive devices for verification as well as complete amplifiers (LNA & PA) at 28 GHz based mostly on standard cells provided by foundry.

By the time of writing the chip fabrication is still on-going and thus we cannot guarantee if initial measurement results would be available by the time of the conference.

18:30End of session

DATE-Party DATE Party | Networking Event; A bus shuttle from the congress centre to the Hygiene-Museum Dresden will be organized, starting at 1900 from the main entrance of the ICC Dresden.

Date: Wednesday 21 March 2018
Time: 19:30 - 23:00
Location / Room: Deutsches Hygiene-Museum Dresden

The highlight of the DATE week will again be the DATE Party, which offers the perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. Thus, it states one of the main networking opportunities during the DATE week.

The party is scheduled on March 21, 2018, from 1900 to 2300, and will take place in the Deutsches Hygiene-Museum Dresden.

During the evening, all delegates will have the chance to visit the different expositions for free.

All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: EUR 70.00 per person.

TimeLabelPresentation Title
Authors
23:00End of session

9.1 Special Day Session on Designing Autonomous Systems: Embedded Machine Learning

Date: Thursday 22 March 2018
Time: 08:30 - 10:00
Location / Room: Saal 2

Chair:
Ahmed Jerraya, CEA, FR

Autonomous systems will use machine learning techniques to deal with uncertainty and to accomplish intelligent tasks. In the case of cyberphysical systems, machine learning allow to solve complex tasks however it requires novel hardware and software solutions to achieve the required performances. This session introduces the knowledge chain for embedded machine learning. It includes talks on application needs, existing approaches and design tools.

TimeLabelPresentation Title
Authors
08:309.1.1AUTOMOTIVE APPLICATION REQUIREMENTS FOR EMBEDDED MACHINE LEARNING
Speaker and Author:
Dirk Ziegenbein, Robert Bosch GmbH, DE
Abstract
Machine learning technologies such as neural networks show great potential for enabling automated driving functions in an open world context. Concrete application examples are video-based object detection and classification as well as trajectory planning. However, these technologies can only be released for series production if they can be demonstrated to be sufficiently safe. Since existing functional safety standards such as ISO 26262 do not transfer well to demonstrating the safety of machine learning and in particular regarding functional insufficiencies in open context systems, convincing arguments need to be developed for the safety of automated driving systems based on machine learning technologies. The talk introduces a categorization of machine learning approaches and use-cases in the automotive domain and outlines requirements and approaches for bringing machine learning based systems to the road.
09:009.1.2OVERVIEW OF THE STATE OF THE ART IN EMBEDDED MACHINE LEARNING
Speaker:
Frédéric Pétrot, TIMA Lab, Univ. Grenoble Alpes, FR
Authors:
Liliana Andrade, Adrien Prost-Boucle and Frédéric Pétrot, Univ. Grenoble Alpes, CNRS, Grenoble INP, FR
Abstract
Nowadays, the main challenges in embedded machine learning are related to artificial neural networks. Inspired by the biological neural networks, artificial neural networks are able to solve complex problems, by performing a tremendous amount of relatively simple parallel computations. Embedding such networks in autonomous devices raises the issues of energy efficiency, resource usage and accuracy. The aim of this paper is to provide a comprehensive analysis of the efforts made in recent years to implement artificial neural network architectures suitable for embedded applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.1.3PNEURO: A SCALABLE ENERGY-EFFICIENT PROGRAMMABLE HARDWARE ACCELERATOR FOR NEURAL NETWORKS
Speaker:
Nicolas Ventroux, CEA LIST, FR
Authors:
Alexandre Carbon1, Jean-Marc PHILIPPE1, Olivier Bichler1, Renaud Schmit1, David Briand1, Benoit Tain1, Nicolas Ventroux1, Michel Paindavoine2 and Olivier Brousse2
1CEA LIST, FR; 2GlobalSensing Technologies, FR
Abstract
Artificial intelligence and especially Machine Learning recently gained a lot of interest from the industry. Indeed, new generation of neural networks built with a large number of successive computing layers enables a large amount of new applications and services implemented from smart sensors to data centers. These Deep Neural Networks (DNN) can interpret signals to recognize objects or situations to drive decision processes. However, their integration into embedded systems remains challenging due to their high computing needs. This paper presents PNeuro, a scalable energy-efficient hardware accelerator for the inference phase of DNN processing chains. Simple programmable processing elements architectured in SIMD clusters perform all the operations needed by DNN (convolutions, pooling, non-linear functions, etc.). An FDSOI 28nm prototype shows an energy efficiency of 700GMACS/s/W at 800 MHz. These results open important perspectives regarding the development of smart energy-efficient solutions based on Deep Neural Networks.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

9.2 Emerging architectures and technologies for ultra low power and efficient embedded systems

Date: Thursday 22 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 6

Chair:
Johanna Sepulveda, Technical University of Munich, DE

Co-Chair:
Amato Paolo, Micron Technology, IT

New waves of architectures and technologies are emerging with the potential of bringing high efficiency and ultra low power in future embedded systems. This session on one hand focuses on the datapath with advances in neural networks, deep learning, and mix-precision accelerators. On the other hand it presents new technologies for non-volatile memories, and bus coding techniques for volatile memories

TimeLabelPresentation Title
Authors
08:309.2.1(Best Paper Award Candidate)
FFT-BASED DEEP LEARNING DEPLOYMENT IN EMBEDDED SYSTEMS
Speaker:
Sheng Lin, Syracuse University, US
Authors:
Sheng Lin1, Ning Liu1, Mahdi Nazemi2, Hongjia Li1, Caiwen Ding1, Yanzhi Wang3 and Massoud Pedram3
1Syracuse University, US; 2USC, US; 3University of Southern California, US
Abstract
Deep learning has delivered its powerfulness in many application domains, especially in image and speech recognition. As the backbone of deep learning, deep neural networks (DNNs) consist of multiple layers of various types with hundreds to thousands of neurons. Embedded platforms are now becoming essential for deep learning deployment due to their portability, versatility, and energy efficiency. The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage. Researchers have investigated on reducing DNN model size with negligible accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, making our approach distinguished from existing approaches. We develop the training and inference algorithms based on FFT as the computing kernel and deploy the FFT-based inference model on embedded platforms with extraordinary processing speed.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.2.2A TRANSPRECISION FLOATING-POINT PLATFORM FOR ULTRA-LOW POWER COMPUTING
Speaker:
Giuseppe Tagliavini, Università di Bologna, IT
Authors:
Giuseppe Tagliavini1, Stefan Mach2, Davide Rossi1, Andrea Marongiu2 and Luca Benini1
1Università di Bologna, IT; 2IIS, ETH Zurich, CH
Abstract
In modern low-power embedded platforms, the execution of floating-point (FP) operations emerges as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy consumed by a core and its data memory is related to FP computations. The adoption of FP formats requiring a lower number of bits is an interesting opportunity to reduce energy consumption, since it allows to simplify the arithmetic circuitry and to reduce the memory bandwidth required to transfer data between memory and registers by enabling vectorization. From a theoretical point of view, the adoption of multiple FP types perfectly fits with the principle of transprecision computing, allowing fine-grained control of approximation while meeting specified constraints on the precision of final results. In this paper we propose an extended FP type system with complete hardware support to enable transprecision computing on low-power embedded processors, including two standard formats (binary32 and binary16) and two new formats (binary8 and binary16alt). First, we introduce a software library that enables exploration of FP types by tuning both precision and dynamic range of program variables. Then, we present a methodology to integrate our library with an external tool for precision tuning, and experimental results that highlight the clear benefits of introducing the new formats. Finally, we present the design of a transprecision FP unit capable of handling 8-bit and 16-bit operations in addition to standard 32-bit operations. Experimental results on FP-intensive benchmarks show that up to 90% of FP operations can be safely scaled down to 8-bit or 16-bit formats. Thanks to precision tuning and vectorization, execution time is decreased by 12% and memory accesses are reduced by 27% on average, leading to a reduction of energy consumption up to 30%.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.2.3A PERIPHERAL CIRCUIT REUSE STRUCTURE INTEGRATED WITH A RETIMED DATA FLOW FOR LOW POWER RRAM CROSSBAR-BASED CNN
Speaker:
Keni Qiu, Capital Normal University, CN
Authors:
Keni Qiu1, Weiwen Chen1, Yuanchao Xu1, Lixue Xia2, Yu Wang2 and Zili Shao3
1Capital Normal University, CN; 2Tsinghua University, CN; 3The Hong Kong Polytechnic University, HK
Abstract
Convolutional computations implemented in RRAM crossbar-based Computing System (RCS) demonstrate the outstanding advantages of high performance and low power. However, current designs are energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, and the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, this paper proposes a Peripheral Circuit Unit (PeriCU)-Reuse scheme to meet power budgets in energy constrained embedded systems. The underlying idea is to put the expensive ADCs/DACs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. In the solution, the first step is to determine the number of PeriCUs which are organized by cycle frames. Inside a cycle frame, the layers are computed in parallel inter-PeriCUs while sequentially intra-PeriCU. Furthermore, a layer retiming technique is exploited to further improve the energy of RCS by assigning two adjacent layers within the same PeriCU so as to bypass the energy consuming memory accesses. The experiments of five convolutional applications validate that the PeriCU-Reuse scheme integrated with the retiming technique can efficiently meet variable power budgets, and further reduce energy consumption efficiently.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.2.4OPTIMAL DC/AC DATA BUS INVERSION CODING
Speaker:
Jan Lucas, TU Berlin, DE
Authors:
Jan Lucas, Sohan Lal and Ben Juurlink, TU Berlin, DE
Abstract
GDDR5 and DDR4 memories use data bus inversion (DBI) coding to reduce termination power and decrease the number of output transitions. Two main strategies exist for encoding data using DBI: DBI DC minimizes the number of outputs transmitting a zero, while DBI AC minimizes the number of signal transitions. We show that neither of these strategies is optimal and reduction of interface power of up to 6% can be achieved by taking both the number of zeros and the number of signal transitions into account when encoding the data. We then demonstrate that a hardware implementation of optimal DBI coding is feasible, results in a reduction of system power and requires only an insignificant additional die area.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-5, 778ENERGY-PERFORMANCE DESIGN EXPLORATION OF A LOW-POWER MICROPROGRAMMED DEEP-LEARNING ACCELERATOR
Speaker:
Andrea Calimera, Politecnico di Torino, IT
Authors:
Andrea Calimera1, Mario R. Casu2, Giulia Santoro1, Valentino Peluso1 and Massimo Alioto3
1Politecnico di Torino, IT; 2Politecnico di Torino, Department of Electronics and Telecommunications, IT; 3National University of Singapore, SG
Abstract
This paper presents the design space exploration of a novel microprogrammable accelerator in which PEs are connected with a Network-on-Chip and benefit from low-power features enabled through a practical implementation of a Dual- Vdd assignment scheme. An analytical model, fitted with postlayout data obtained with a 28nm FDSOI design kit, returns implementations with optimal energy-performance tradeoff by taking into consideration all the key design-space variables. The obtained Pareto analysis helps us infer optimization rules aimed at improving quality of design.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-6, 178GENPIM: GENERALIZED PROCESSING IN-MEMORY TO ACCELERATE DATA INTENSIVE APPLICATIONS
Speaker:
Tajana Rosing, UC San Diego, US
Authors:
Mohsen Imani, Saransh Gupta and Tajana Rosing, University of California, San Diego, US
Abstract
Big data has become a serious problem as data volumes have been skyrocketing for the past few years. Storage and CPU technologies are overwhelmed by the amount of data they have to handle. Traditional computer architectures show poor performance which processing such huge data. Processing in-memory is a promising technique to address data movement issue by locally processing data inside memory. However, there are two main issues with stand-alone PIM designs: (i) PIM is not always computationally faster than CMOS logic, (ii) PIM cannot process all operations in many applications. Thus, not many applications can benefit from PIM. To generalize the use of PIM, we designed GenPIM, a general processing in-memory architecture consisting of the conventional processor as well as the PIM accelerators. GenPIM supports basic PIM functionalities in specialized non-volatile memory including: bitwise operations, search operation, addition and multiplication. For each application, GenPIM identifies the part which uses PIM operations, and processes the rest of non-PIM operations or not data intensive part of applications in general purpose cores. GenPIM also enables configurable PIM approximation by relaxing in-memory computation. We test the efficiency of proposed design over different emerging machine learning, compression and security applications. Our experimental evaluation shows that our design can achieve 10.9x improvement in energy efficiency and 6.4x speedup as compared to processing data in conventional cores. The results can be improved by 21.0% in energy consumption and 30.6% in performance by enabling PIM approximation while ensuring less than 2% quality loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

9.3 Advances in Reconfigurable Computing

Date: Thursday 22 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 1

Chair:
Jürgen Teich, Friedrich-Alexander Universität, DE

Co-Chair:
Florent de Dinechin, INSA-Lyon, FR

This session presents four papers advancing the current state of the art in Coarse Grain Reconfigurable Architectures and two interactive presentations dealing with posit arithmetic and convolution neural networks.

TimeLabelPresentation Title
Authors
08:309.3.1LASER: A HARDWARE/SOFTWARE APPROACH TO ACCELERATE COMPLICATED LOOPS ON CGRAS
Speaker:
Shail Dave, Arizona State University, US
Authors:
Mahesh Balasubramanian1, Shail Dave1, Aviral Shrivastava1 and Reiley Jeyapaul2
1Arizona State University, US; 2ARM Research, GB
Abstract
Coarse-Grained Reconfigurable Arrays (CGRAs) are popular accelerators predominantly used in streaming, filtering, and decoding applications. Due to their high performance and high power-efficiency, CGRAs can be a promising solution to accelerate the loops of general purpose applications also. However, the loops in general purpose applications are often complicated, like loops with perfect and imperfect nests and loops with nested if-then-else's (conditionals). We argue that the existing hardware-software solutions to execute branches and conditions are inefficient. In order to efficiently execute complicated loops on CGRAs, we present a hardware-software hybrid solution: LASER -- a comprehensive technique to accelerate compute-intensive loops of applications. In LASER, compiler transforms complex loops, maps them to the CGRA, and lays them out in the memory in a specific manner, such that the hardware can fetch and execute the instructions from the right path at runtime. LASER achieves a geomean performance improvement of 40.91% and utilization of 43.43% with 46% lower energy consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.3.2A TIME-MULTIPLEXED FPGA OVERLAY WITH LINEAR INTERCONNECT
Speaker:
Xiangwei Li, Nanyang Technological University, SG
Authors:
Xiangwei Li1, Abhishek Kumar Jain2, Douglas L. Maskell1 and Suhaib A. Fahmy3
1Nanyang Technological University, SG; 2Lawrence Livermore National Laboratory, US; 3University of Warwick, GB
Abstract
Coarse-grained overlays improve FPGA design productivity by providing fast compilation and software like programmability. Soft processor based overlays with well-defined ISAs are attractive to application developers due to their ease of use. However, these overlays have significant FPGA resource overheads. Time multiplexed (TM) CGRA-like overlays represent an interesting alternative as they are able to change their behavior on a cycle by cycle basis while the compute kernel executes. This reduces the FPGA resource needed, but at the cost of a higher initiation interval (II) and hence reduced throughput. The fully flexible routing network of current CGRA-like overlays results in high FPGA resource usage. However, many application kernels are acyclic and can be implemented using a much simpler linear feed-forward routing network. This paper examines a DSP block based TM overlay with linear interconnect where the overlay architecture takes account of the application kernels' characteristics and the underlying FPGA architecture, so as to minimize the II and the FPGA resource usage. We examine a number of architectural extensions to the DSP block based functional unit to improve the II, throughput and latency. The results show an average 70% reduction in II, with corresponding improvements in throughput and latency.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.3.3URECA: A COMPILER SOLUTION TO MANAGE UNIFIED REGISTER FILE FOR CGRAS
Speaker:
Shail Dave, Arizona State University, US
Authors:
Shail Dave, Mahesh Balasubramanian and Aviral Shrivastava, Arizona State University, US
Abstract
Coarse-grained reconfigurable array (CGRA) is a promising solution to accelerate loops featuring loop-carried dependencies or low trip-counts. One challenge in compiling for CGRAs is to efficiently manage both recurring (repeatedly written and read) and nonrecurring (read-only) variables of loops. Although prior works manage recurring variables in rotating register file (RF), they access the nonrecurring variables through the on-chip memory. It increases memory accesses and degrades the performance. Alternatively, both the variables can be managed in separate rotating and nonrotating RFs. But, it increases code size and effective utilization of the registers becomes challenging. Instead, this paper proposes URECA, a compiler solution to manage the variables in a single nonrotating RF. During mapping loop operations on CGRA, the compiler allocates necessary registers and splits RF in rotating and nonrotating parts. It also pre-loads read-only values in the unified RF, which are then directly accessed at run-time. Evaluating compute-intensive benchmarks from MiBench show that URECA provides a geomean speedup of 11.41x over sequential loop execution. It improves the loop acceleration through CGRAs by 1.74x at 32% reduced energy consumption over state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.3.4OPTIMIZING THE DATA PLACEMENT AND TRANSFORMATION FOR MULTI-BANK CGRA COMPUTING SYSTEM.
Speaker:
Zhongyuan Zhao, Shanghai Jiao Tong University, CN
Authors:
Zhongyuan Zhao1, Yantao Liu1, Weiguang Sheng1, Tushar Krishna2, Qin Wang1 and Zhigang Mao1
1Shanghai Jiao Tong University, CN; 2Georgia Institute of Technology, US
Abstract
This paper provides a data placement optimization approach for Coarse-Grained Reconfigurable Architecture (CGRA) based computing platform in order to simultaneously optimize the performance of CGRA execution and data transformation between main memory and multi-bank memory. To achieve this goal, we have developed a performance model to evaluate the efficiency of data transformation and CGRA execution. This model is used for comparing the performances difference when using different data placement strategies. We search for the optimal data placement method by firstly choosing the method which generates the best CGRA execution efficiency from the candidates who can generate the optimal data transformation efficiency. Then we choose the best data placement strategy by comparing the performance of the selected strategy with the one generated through existing multi-bank optimization algorithm. Evaluation shows our approach is capable of optimizing the performance to 2.76x of state-of-the-art method when considering both data-transformation and CGRA execution efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-7, 120UNIVERSAL NUMBER POSIT ARITHMETIC GENERATOR ON FPGA
Speaker:
Hayden K.-H. So, The University of Hong Kong, HK
Authors:
Manish Kumar Jaiswal and Hayden So, The University of Hong Kong, HK
Abstract
Posit number system format includes a run-time varying exponent component, defined by a combination of regime-bit (with run-time varying length) and exponent-bit (with size of up to ES bits, the exponent size). This also leads to a run-time variation in its mantissa field size and position. This run-time variation in posit format poses a hardware design challenge. Being a recent development, posit lacks for its adequate hardware arithmetic architectures. Thus, this paper is aimed towards the posit arithmetic algorithmic development and their generic hardware generator. It is focused on basic posit arithmetic (floating-point to posit conversion, posit to floating point con- version, addition/subtraction and multiplication). These are also demonstrated on a FPGA platform. Target is to develop an open- source solution for generating basic posit arithmetic architectures with parameterized choices. This contribution would enable further exploration and evaluation of posit system.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-8, 347BLOCK CONVOLUTION: TOWARDS MEMORY-EFFICIENT INFERENCE OF LARGE-SCALE CNNS ON FPGA
Speaker:
Gang Li, Institute of Automation, Chinese Academy of Sciences, CN
Authors:
Gang Li, Fanrong Li, Tianli Zhao and Jian Cheng, Institute of Automation, Chinese Academy of Sciences, CN
Abstract
FPGA-based CNN accelerators are gaining popularity due to high energy efficiency and great flexibility in recent years. However, as the networks grow in depth and width, the great volume of intermediate data is too large to store on chip, data transfers between on-chip memory and off-chip memory should be frequently executed, which leads to unexpected off-chip memory access latency and energy consumption. In this paper, we propose a block convolution approach, which is a memory-efficient, simple yet effective block-based convolution to completely avoid intermediate data from streaming out to off-chip memory during network inference. Experiments on the very large VGG-16 network show that the improved top-1/top-5 accuracy of 72.60%/91.10% can be achieved on the ImageNet classification task with the proposed approach. As a case study, we implement the VGG-16 network with block convolution on Xilinx Zynq ZC706 board, achieving a frame rate of 12.19fps under 150MHz working frequency, with all intermediate data staying on chip.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

9.4 EU projects: Novel Platforms - from Self-Aware MPSoCs to Server Ecosystems

Date: Thursday 22 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 2

Chair:
Martin Schoeberl, Technical University of Denmark, DK

Co-Chair:
Flavius Gruian, Lund University, SE

This session presents three EU projects. The EU projects are: dReDBox—developing the next generation, low-power, across form-factor datacenters, enabling the creation of A338:AN521-as-a-unit, UniServer—developing a universal system architecture and software ecosystem for servers targeting cloud data-centers as well as upcoming edge-computing markets and OPRECOMP—developing concepts, methods, hardware and software building blocks for practical transprecision computing systems.

TimeLabelPresentation Title
Authors
08:309.4.1DREDBOX: MATERIALIZING A FULL-STACK RACK-SCALE SYSTEM PROTOTYPE OF A NEXT-GENERATION DISAGGREGATED DATACENTER
Speaker:
Dimitris Syrivelis, IBM Research, Ireland, GR
Authors:
Maciel Bielski1, Ilias Syrirgos2, Kostas Katrinis3, Dimitris Syrivelis3, Andrea Reale3, Dimitris Theodoropoulos4, Nikolaos Alachiotis5, Dionisios Pnevmatikatos6, Evert Pap7, George Zervas8, Vaibhawa Mishra8, Arsalan Saljoghei8, Alvise Rigo9, Jose Fernando Zazo10, Sergio Lopez-Buedo10, Ferad Zyulkyarov11, Michael Enrico12 and Osar-Gonzalez de Dios13
1Virtual Open Systems, FR; 2University of Thessaly, GR; 3IBM Research, Ireland, IE; 4Foundation for Research and Technology Hellas (FORTH), GR; 5Foundation for Research and Technologoy Hellas (FORTH), GR; 6ECE Department, Technical Univrsity of Crete & FORTH-ICS, GR; 7SINTECS, Netherlands, NL; 8University College London, GB; 9Virtual Open Systems, France, FR; 10NAUDIT HPCN, Spain, ES; 11Barcelona Supercomputing Center, Spain, ES; 12POLATIS, UK, GB; 13TELEFONICA, Spain, ES
Abstract
Current datacenters are based on server machines, whose mainboard and hardware components form the baseline, monolithic building block that the rest of the system software, middleware and application stack are built upon. This leads to the following limitations: (a) resource proportionality of a multitray system is bounded by the basic building block (mainboard), (b) resource allocation to processes or virtual machines (VMs) is bounded by the available resources within the boundary of the mainboard, leading to spare resource fragmentation and inefficiencies, and (c) upgrades must be applied to each and every server even when only a specific component needs to be upgraded. The dRedBox project (Disaggregated Recursive Datacentre-ina- Box) addresses the above limitations, and proposes the next generation, low-power, across form-factor datacenters, departing from the paradigm of the mainboard-as-a-unit and enabling the creation of function-block-as-a-unit. Hardware-level disaggregation and software-defined wiring of resources is supported by a full-fledged Type-1 hypervisor that can execute commodity virtual machines, which communicate over a low-latency and high-throughput software-defined optical network. To evaluate its novel approach, dRedBox will demonstrate application execution in the domains of network functions virtualization, infrastructure analytics, and real-time video surveillance.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.4.2AN ENERGY-EFFICIENT AND ERROR-RESILIENT SERVER ECOSYSTEM EXCEEDING CONSERVATIVE SCALING LIMITS
Speaker:
Georgios Karakonstantis, Queen, GB
Authors:
Georgios Karakonstantis1, Konstantinos Tovletoglou2, Lev Mukhanov1, Hans Vandierendonck1, Dimitrios Nikolopoulos2, Peter Lawthers3, Panos Koutsovasilis4, Manolis Maroudas5, Christos D. Antonopoulos4, Christos Kalogirou4, Nikolaos Bellas6, Spyros Lalis4, Srikumar Venugopal7, Arnau Prat-Perez8, Alejandro Lampropulos9, Marios Kleanthous10, Andreas Diavastos11, Zacharias Hadjilambrou11, Panagiota Nikolaou11, Yanos Sazeides12, Pedro Trancoso11, George Papadimitriou13, Manolis Kaliorakis13, Athanasios Chatzidimitriou13, Dimitris Gizopoulos13 and Shidhartha Das14
1Queen's University Belfast, GB; 2Queen's University of Belfast, GB; 3A.M.C.C. Deutschland, DE; 4University of Thessaly, GR; 5University of Thessaly, GB; 6University of Thessaly & CERTH, GR; 7IBM Research - Ireland, IE; 8Sparsity, ES; 9Worldsensing, ES; 10Meritorious, CY; 11University of Cyprus, CY; 12u cyprus, CY; 13University of Athens, GR; 14ARM Ltd., GB
Abstract
The explosive growth of Internet-connected devices will soon result in a flood of generated data, which will increase the demand for network bandwidth as well as compute power to process the generated data. Consequently, there is a need for more energy efficient servers to empower traditional centralized Cloud data-centers as well as emerging decentralized data-centers at the Edges of the Cloud. In this paper, we present our approach, which aims at developing a new class of micro-servers - the UniServer - that exceed the conservative energy and performance scaling boundaries by introducing novel mechanisms at all layers of the design stack. The main idea lies on the realization of the intrinsic hardware heterogeneity and the development of mechanisms that will automatically expose the unique varying capabilities of each hardware component within commercial micro-servers and allow their operation at new extended operating points. Low overhead schemes are employed to monitor and predict the hardware behavior and report it to the system software. The system software including a virtualization and resource management layer is responsible for optimizing the system operation in terms of energy or performance, while guaranteeing non-disruptive operation under the extended operating points. Our characterization results on a 64-bit ARMv8 micro-server in 28nm process reveal large voltage margins in terms of Vmin variation among the 8 cores of the CPU chip, among 3 different sigma chips, and among different benchmarks with the potential to obtain up-to 38.8% energy savings. Similarly, DRAM characterizations show that refresh rate and voltage can be relaxed by 43x and 5%, respectively, leading to 23.2% power savings on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.4.3THE TRANSPRECISION COMPUTING PARADIGM: CONCEPT, DESIGN, AND APPLICATIONS
Speaker:
Dionysios Diamantopoulos, IBM Research - Zurich, CH
Authors:
Cristiano Malossi1, Michael Schaffner2, Anca Molnos3, Luca Gammaitoni4, Giuseppe Tagliavini5, Andrew Emerson6, Andrés Tomás7, Dimitrios S. Nikolopoulos8, Eric Flamand9 and Norbert Wehn10
1IBM Research - Zurich, CH; 2ETHZ, CH; 3CEA, FR; 4Università di Perugia, IT; 5Università di Bologna, IT; 6CINECA, IT; 7Universitat Jaume I, ES; 8Queen's University of Belfast, GB; 9GreenWaves Technologies, FR; 10University of Kaiserslautern, DE
Abstract
Guaranteed numerical precision of each elementary step in a complex computation has been the mainstay of traditional computing systems for many years. This era, fueled by Moore's law and the constant exponential improvement in computing efficiency, is at its twilight: from tiny nodes of the Internet-of-Things, to large HPC computing centers, sub-picoJoule/operation energy efficiency is essential for practical realizations. To overcome the power wall, a shift from traditional computing paradigms is now mandatory. In this paper we present the driving motivations, roadmap, and expected impact of the European project OPRECOMP. OPRECOMP aims to (i) develop the first complete transprecision computing framework, (ii) apply it to a wide range of hardware platforms, from the sub-milliWatt up to the MegaWatt range, and (iii) demonstrate impact in a wide range of computational domains, spanning IoT, Big Data Analytics, Deep Learning, and HPC simulations. By combining together into a seamless design transprecision advances in devices, circuits, software tools, and algorithms, we expect to achieve major energy efficiency improvements, even when there is no freedom to relax end-to-end application quality of results. Indeed, OPRECOMP aims at demolishing the ultra-conservative ``precise'' computing abstraction, replacing it with a more flexible and efficient one, namely transprecision computing.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

9.5 Physical Attacks

Date: Thursday 22 March 2018
Time: 08:30 - 10:00
Location / Room: Konf. 3

Chair:
Bilge Kavun Elif, Infineon Technologies, DE

Co-Chair:
Batina Lejla, Radboud University, NL

Electronic circuits are increasingly processing sensitive confidential data, such as personal information. In this session, new types of attacks to extract such data out of circuits are discussed in-depth. They encompass passive side-channel attacks and active manipulations of circuits.

TimeLabelPresentation Title
Authors
08:309.5.1(Best Paper Award Candidate)
AN INSIDE JOB: REMOTE POWER ANALYSIS ATTACKS ON FPGAS
Speaker:
Falk Schellenberg, Ruhr-Universität Bochum, DE
Authors:
Falk Schellenberg1, Dennis Gnad2, Amir Moradi1 and Mehdi Tahoori2
1Ruhr University Bochum, DE; 2Karlsruhe Institute of Technology, DE
Abstract
Hardware Trojans have gained increasing interest during the past few years. Undeniably, the detection of such malicious designs needs a deep understanding of how they can practically be built and developed. In this work we present a design methodology dedicated to FPGAs which allows measuring a fraction of the dynamic power consumption. More precisely, we develop internal sensors which are based on FPGA primitives, and transfer the internally-measured side-channel leakages outside. These are distributed and calibrated delay sensors which can indirectly measure voltage fluctuations due to power consumption. By means of a cryptographic core as a case study, we present different settings and parameters for our employed sensors. Using their side-channel measurements, we further exhibit practical key-recovery attacks confirming the applicability of the underlying measurement methodology. This opens a new door to integrate hardware Trojans in a) applications where the FPGA is remotely accessible and b) FPGA-based multi-user platforms where the reconfigurable resources are shared among different users. This type of Trojan is highly difficult to detect since there is no signal connection between targeted (cryptographic) core and the internally-deployed sensors.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:009.5.2CONFIDENT LEAKAGE DETECTION - A SIDE-CHANNEL EVALUATION FRAMEWORK BASED ON CONFIDENCE INTERVALS
Authors:
Florian Bache1, Christina Plump1 and Tim Güneysu2
1University of Bremen, DE; 2University of Bremen & DFKI, DE
Abstract
Cryptographic devices that potentially operate in hostile physical environments need to be secured against side-channel attacks. In order to ensure the effectiveness of the required countermeasures, scientists, developers, and evaluators need efficient methods to test the level of security of a device. In this paper we propose a new framework based on confidence intervals that extends established t-test based approaches for test-vector leakage assessment (TVLA). In comparison to previous TVLA approaches the new methodology does not only enable the detection of leakage but can also assert its absence. The framework is robust against noise in the evaluation system and thereby avoids false negatives. These improvements can be achieved without overhead in measurement complexity and with a minimum of additional computational costs compared to previous approaches. We evaluate our method under realistic conditions by applying it to a protected implementation of AES.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:309.5.3ØZONE: EFFICIENT EXECUTION WITH ZERO TIMING LEAKAGE FOR MODERN MICROARCHITECTURES
Speaker:
Zelalem Aweke, University of Michigan, US
Authors:
Zelalem Birhanu Aweke and Todd Austin, University of Michigan, US
Abstract
Time variation during program execution can leak sensitive information. Time variations due to program control flow and hardware resource contention have been used to steal encryption keys in cipher implementations such as AES and RSA. A number of approaches to mitigate timing-based side-channel attacks have been proposed including cache partitioning, control- flow obfuscation and injecting timing noise into the outputs of code. While these techniques make timing-based side-channel attacks more difficult, they do not eliminate the risks. Prior techniques are either too specific or too expensive, and all leave remnants of the original timing side channel for later attackers to attempt to exploit. In this work, we show that the state-of-the-art techniques in timing side-channel protection, which limit timing leakage but do not eliminate it, still have significant vulnerabilities to timing-based side-channel attacks. To provide a means for total protection from timing-based side-channel attacks, we develop Ozone, the first zero timing leakage execution resource for a modern microarchitecture. Code in Ozone executes under a special hardware thread that gains exclusive access to a single core's resources for a fixed (and limited) number of cycles during which it cannot be interrupted. Memory access under Ozone thread execution is limited to pre-allocated cache lines that can not be evicted, and all Ozone threads begin execution with a known fixed microarchitectural state. We evaluate Ozone using a number of security sensitive kernels that have previously been targets of timing side-channel attacks, and show that Ozone eliminates timing leakage with minimal performance overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
09:459.5.4SCADPA: SIDE-CHANNEL ASSISTED DIFFERENTIAL-PLAINTEXT ATTACK ON BIT PERMUTATION BASED CIPHERS
Speaker:
Jakub Breier, Nanyang Technological University, SG
Authors:
Jakub Breier1, Dirmanto Jap1 and Shivam Bhasin2
1Nanyang Technological University, SG; 2Temasek Laboratories, Nanyang Technological University, SG
Abstract
Bit permutations are a common choice for diffusion function in lightweight block ciphers, owing to their low implementation footprint. In this paper, we present a novel Side-Channel Assisted Differential-Plaintext Attack (SCADPA), exploiting specific vulnerabilities of bit permutations. SCADPA is a chosen-plaintext attack, knowledge of the ciphertext is not required. Unlike statistical methods, commonly used for distinguisher in standard power analysis, the proposed method is more differential in nature. The attack shows that diffusion layer can play a significant role in distinguishing the internal cipher state. We demonstrate how to practically exploit such vulnerability to extract the secret key. Results on microcontroller-based PRESENT-80 cipher lead to full key retrieval using as low as 17 encryptions. It is possible to automate the attack by using a thresholding method detailed in the paper. Several case studies are presented, using various attacker models and targeting different encryption modes (such as CTR and CBC). We provide a discussion on how to avoid such attack from the design point of view.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00IP4-9, 167EXAMINING THE CONSEQUENCES OF HIGH-LEVEL SYNTHESIS OPTIMIZATIONS ON THE POWER SIDE CHANNEL
Speaker:
Lu Zhang, Northwestern Polytechnical University, CN
Authors:
Lu Zhang1, Wei Hu2, Armaiti Ardeshiricham2, Yu Tai1, Jeremy Blackstone2, Dejun Mu1 and Ryan Kastner2
1Northwestern Polytechnical University, CN; 2University of California, San Diego, US
Abstract
High-level synthesis (HLS) allows hardware designers to think algorithmically and not have to worry about low-level, cycle-by-cycle details. This provides the ability to quickly explore the architectural design space and tradeoff between resource utilization and performance. Unfortunately, evaluating the security is not a standard part of the HLS design flow. In this work, we aim to understand the effects of HLS optimizations with respect to power side-channel leakage. We use Vivado HLS to develop different cryptographic cores, implement them on a Xilinx Spartan 6 FPGA, and collect power traces. We evaluate the designs with respect to resource utilization, performance, and side-channel leakage through power consumption. Furthermore, we analyze the first-order leakage of the HLS-based designs alongside well-known register transfer level (RTL) cryptographic cores. We describe an evaluation procedure for hardware designers and use it to make insightful recommendations on how to design the best architecture in cryptographic domain.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:01IP4-10, 564DFARPA: DIFFERENTIAL FAULT ATTACK RESISTANT PHYSICAL DESIGN AUTOMATION
Speaker:
Debdeep Mukhopadhyay, Indian Institute of Technology Kharagpur, IN
Authors:
Mustafa Khairallah1, Rajat Sadhukhan2, Radhamanjari Samanta2, Jakub Breier1, Shivam Bhasin3, Rajat Subhra Chakraborty2, Anupam Chattopadhyay1 and Debdeep Mukhopadhyay2
1Nanyang Technological University, SG; 2Indian Institute of Technology Kharagpur, IN; 3Temasek Laboratories, Nanyang Technological University, SG
Abstract
Differential Fault Analysis (DFA), aided by sophisticated mathematical analysis techniques for ciphers and precise fault injection methodologies, has become a potent threat to cryptographic implementations. In this paper, we propose, to the best of the our knowledge, the first ``DFA-aware" physical design automation methodology, that effectively mitigates the threat posed by DFA. We first develop a novel floorplan heuristic, which resists the simultaneous corruption of cipher states necessary for successful fault attack, by exploiting the fact that most fault injections are localized in practice. Our technique results in the computational complexity of the fault attack to shoot up to exhaustive search levels, making them practically infeasible. In the second part of the work, we develop a routing mechanism, which tackles more precise and costly fault injection techniques, like laser and electromagnetic guns. We propose a routing technique by integrating a specially designed ring oscillator based sensor circuit around the potential fault attack targets without incurring any performance overhead.We demonstrate the effectiveness of our technique by applying it on state of the art ciphers.

Download Paper (PDF; Only available from the DATE venue WiFi)
10:00End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

IP4 Interactive Presentations

Date: Thursday 22 March 2018
Time: 10:00 - 10:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP4-1EFFICIENT MAPPING OF QUANTUM CIRCUITS TO THE IBM QX ARCHITECTURES
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner, Alexandru Paler and Robert Wille, Johannes Kepler University Linz, AT
Abstract
In March 2017, IBM launched the project IBM Q with the goal to provide access to quantum computers for a broad audience. This allowed users to conduct quantum experiments on a 5-qubit and, since June 2017, also on a 16-qubit quantum computer (called IBM QX2 and IBM QX3, respectively). In order to use these, the desired quantum functionality (e.g. provided in terms of a quantum circuit) has to properly be mapped so that the underlying physical constraints are satisfied - a complex task. This demands for solutions to automatically and efficiently conduct this mapping process. In this paper, we propose such an approach which satisfies all constraints given by the architecture and, at the same time, aims to keep the overhead in terms of additionally required quantum gates minimal. The proposed approach is generic and can easily be configured for future architectures. Experimental evaluations show that the proposed approach clearly outperforms IBM's own mapping solution with respect to runtime as well as resulting costs.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-2PARALLEL CODE GENERATION OF SYNCHRONOUS PROGRAMS FOR A MANY-CORE ARCHITECTURE
Speaker:
Amaury Graillat, Verimag - Univ. Grenoble Alpes, FR
Authors:
Amaury Graillat1, Matthieu Moy2, Pascal Raymond3 and Benoît Dupont de Dinechin4
1Verimag - Univ. Grenoble Alpes, FR; 2Univ. Grenoble Alpes, Verimag, FR; 3VERIMAG/CNRS, FR; 4Kalray, FR
Abstract
AmEmbedded systems tend to require more and more computational power. Many-core architectures are good candi- dates since they offer power and are considered more time predictable than classical multi-cores. Data-flow Synchronous languages such as Lustre or Scade are widely used for avionic critical software. Programs are described by networks of computational nodes. Implementation of such programs on a many-core architecture must ensure a bounded response time and preserve the functional behavior by taking interference into account. We consider the top-level node of a Lustre application as a software architecture description where each sub-node corresponds to a potential parallel task. Given a mapping (tasks to cores), we automatically generate code suitable for the targeted many-core architecture. This minimizes memory interferences and allows usage of a framework to compute the Worst-Case Response Time.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-3SOCRATES - A SEAMLESS ONLINE COMPILER AND SYSTEM RUNTIME AUTOTUNING FRAMEWORK FOR ENERGY-AWARE APPLICATIONS
Speaker:
Gianluca Palermo, Politecnico di Milano, IT
Authors:
Davide Gadioli1, Ricardo Nobre2, Pedro Pinto3, Emanuele Vitali1, Amir H. Ashouri4, Gianluca Palermo1, Cristina Silvano1 and João M. P. Cardoso5
1Politecnico di Milano, IT; 2University of Porto / INESC TEC, PT; 3Faculty of Engineering, University of Porto, PT; 4University of Toronto, Canada, CA; 5University of Porto, PT
Abstract
Configuring program parallelism and selecting optimal compiler options according to the underlying platform architecture is a difficult task if completely demanded to the programmer or done by using a default one-fits-all policy generated by the compiler or runtime system. Given the dynamics of the problem, a runtime selection of the best configuration is obviously the desirable solution. However, implementing this solution into the application requires the insertion of a lot of glue code for profiling and runtime selection. This represents a programming wall to actually make it feasible. This paper presents a structured approach called SOCRATES, based on a Domain Specific Language (LARA) and a runtime autotuner (mARGOt), to alleviate this effort. LARA has been used to hide the glue code insertion, thus separating the pure functional application description from extra-functional requirements. mARGOT has been used for the automatic selection of the best configuration according to the runtime evolution of the application. To demonstrated the effectiveness of the proposed approach, we evaluated SOCRATES by varying the application workloads, hardware resources and energy efficiency requirements for 12 OpenMP Polybench/C with respect to a standard one-fits-all solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-4NON-INTRUSIVE PROGRAM TRACING OF NON-PREEMPTIVE MULTITASKING SYSTEMS USING POWER CONSUMPTION
Speaker:
Kamal Lamichhane, University of Waterloo, CA
Authors:
Kamal Lamichhane, Carlos Moreno and Sebastian Fischmeister, University of Waterloo, CA
Abstract
System tracing, runtime monitoring, execution reconstruction are useful techniques for protecting the safety and integrity of systems. Furthermore, with time-aware or overhead-aware techniques being available, these techniques can also be used to monitor and secure production systems. As operating systems gain in popularity, even in deeply embedded systems, these techniques face the challenge to support multitasking. In this paper, we propose a novel non-intrusive technique, which efficiently reconstructs the execution trace of non-preemptive multitasking system by observing power consumption characteristics. Our technique uses the control-flow graph (CFG) of the application program to identify the most likely block of code that the system is executing at any given point in time. For the purpose of the experimental evaluation, we first instrument the source code to obtain power consumption information for each basic block, which is used as the training data for our Dynamic Time Warping and k-Nearest Neighbours (k-NN) classifier. Once the system is trained, this technique is used to identify live code-block execution (LCBE). We show that the technique can reconstruct the execution flow of programs in a multi-tasking environment with high accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-5ENERGY-PERFORMANCE DESIGN EXPLORATION OF A LOW-POWER MICROPROGRAMMED DEEP-LEARNING ACCELERATOR
Speaker:
Andrea Calimera, Politecnico di Torino, IT
Authors:
Andrea Calimera1, Mario R. Casu2, Giulia Santoro1, Valentino Peluso1 and Massimo Alioto3
1Politecnico di Torino, IT; 2Politecnico di Torino, Department of Electronics and Telecommunications, IT; 3National University of Singapore, SG
Abstract
This paper presents the design space exploration of a novel microprogrammable accelerator in which PEs are connected with a Network-on-Chip and benefit from low-power features enabled through a practical implementation of a Dual- Vdd assignment scheme. An analytical model, fitted with postlayout data obtained with a 28nm FDSOI design kit, returns implementations with optimal energy-performance tradeoff by taking into consideration all the key design-space variables. The obtained Pareto analysis helps us infer optimization rules aimed at improving quality of design.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-6GENPIM: GENERALIZED PROCESSING IN-MEMORY TO ACCELERATE DATA INTENSIVE APPLICATIONS
Speaker:
Tajana Rosing, UC San Diego, US
Authors:
Mohsen Imani, Saransh Gupta and Tajana Rosing, University of California, San Diego, US
Abstract
Big data has become a serious problem as data volumes have been skyrocketing for the past few years. Storage and CPU technologies are overwhelmed by the amount of data they have to handle. Traditional computer architectures show poor performance which processing such huge data. Processing in-memory is a promising technique to address data movement issue by locally processing data inside memory. However, there are two main issues with stand-alone PIM designs: (i) PIM is not always computationally faster than CMOS logic, (ii) PIM cannot process all operations in many applications. Thus, not many applications can benefit from PIM. To generalize the use of PIM, we designed GenPIM, a general processing in-memory architecture consisting of the conventional processor as well as the PIM accelerators. GenPIM supports basic PIM functionalities in specialized non-volatile memory including: bitwise operations, search operation, addition and multiplication. For each application, GenPIM identifies the part which uses PIM operations, and processes the rest of non-PIM operations or not data intensive part of applications in general purpose cores. GenPIM also enables configurable PIM approximation by relaxing in-memory computation. We test the efficiency of proposed design over different emerging machine learning, compression and security applications. Our experimental evaluation shows that our design can achieve 10.9x improvement in energy efficiency and 6.4x speedup as compared to processing data in conventional cores. The results can be improved by 21.0% in energy consumption and 30.6% in performance by enabling PIM approximation while ensuring less than 2% quality loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-7UNIVERSAL NUMBER POSIT ARITHMETIC GENERATOR ON FPGA
Speaker:
Hayden K.-H. So, The University of Hong Kong, HK
Authors:
Manish Kumar Jaiswal and Hayden So, The University of Hong Kong, HK
Abstract
Posit number system format includes a run-time varying exponent component, defined by a combination of regime-bit (with run-time varying length) and exponent-bit (with size of up to ES bits, the exponent size). This also leads to a run-time variation in its mantissa field size and position. This run-time variation in posit format poses a hardware design challenge. Being a recent development, posit lacks for its adequate hardware arithmetic architectures. Thus, this paper is aimed towards the posit arithmetic algorithmic development and their generic hardware generator. It is focused on basic posit arithmetic (floating-point to posit conversion, posit to floating point con- version, addition/subtraction and multiplication). These are also demonstrated on a FPGA platform. Target is to develop an open- source solution for generating basic posit arithmetic architectures with parameterized choices. This contribution would enable further exploration and evaluation of posit system.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-8BLOCK CONVOLUTION: TOWARDS MEMORY-EFFICIENT INFERENCE OF LARGE-SCALE CNNS ON FPGA
Speaker:
Gang Li, Institute of Automation, Chinese Academy of Sciences, CN
Authors:
Gang Li, Fanrong Li, Tianli Zhao and Jian Cheng, Institute of Automation, Chinese Academy of Sciences, CN
Abstract
FPGA-based CNN accelerators are gaining popularity due to high energy efficiency and great flexibility in recent years. However, as the networks grow in depth and width, the great volume of intermediate data is too large to store on chip, data transfers between on-chip memory and off-chip memory should be frequently executed, which leads to unexpected off-chip memory access latency and energy consumption. In this paper, we propose a block convolution approach, which is a memory-efficient, simple yet effective block-based convolution to completely avoid intermediate data from streaming out to off-chip memory during network inference. Experiments on the very large VGG-16 network show that the improved top-1/top-5 accuracy of 72.60%/91.10% can be achieved on the ImageNet classification task with the proposed approach. As a case study, we implement the VGG-16 network with block convolution on Xilinx Zynq ZC706 board, achieving a frame rate of 12.19fps under 150MHz working frequency, with all intermediate data staying on chip.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-9EXAMINING THE CONSEQUENCES OF HIGH-LEVEL SYNTHESIS OPTIMIZATIONS ON THE POWER SIDE CHANNEL
Speaker:
Lu Zhang, Northwestern Polytechnical University, CN
Authors:
Lu Zhang1, Wei Hu2, Armaiti Ardeshiricham2, Yu Tai1, Jeremy Blackstone2, Dejun Mu1 and Ryan Kastner2
1Northwestern Polytechnical University, CN; 2University of California, San Diego, US
Abstract
High-level synthesis (HLS) allows hardware designers to think algorithmically and not have to worry about low-level, cycle-by-cycle details. This provides the ability to quickly explore the architectural design space and tradeoff between resource utilization and performance. Unfortunately, evaluating the security is not a standard part of the HLS design flow. In this work, we aim to understand the effects of HLS optimizations with respect to power side-channel leakage. We use Vivado HLS to develop different cryptographic cores, implement them on a Xilinx Spartan 6 FPGA, and collect power traces. We evaluate the designs with respect to resource utilization, performance, and side-channel leakage through power consumption. Furthermore, we analyze the first-order leakage of the HLS-based designs alongside well-known register transfer level (RTL) cryptographic cores. We describe an evaluation procedure for hardware designers and use it to make insightful recommendations on how to design the best architecture in cryptographic domain.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-10DFARPA: DIFFERENTIAL FAULT ATTACK RESISTANT PHYSICAL DESIGN AUTOMATION
Speaker:
Debdeep Mukhopadhyay, Indian Institute of Technology Kharagpur, IN
Authors:
Mustafa Khairallah1, Rajat Sadhukhan2, Radhamanjari Samanta2, Jakub Breier1, Shivam Bhasin3, Rajat Subhra Chakraborty2, Anupam Chattopadhyay1 and Debdeep Mukhopadhyay2
1Nanyang Technological University, SG; 2Indian Institute of Technology Kharagpur, IN; 3Temasek Laboratories, Nanyang Technological University, SG
Abstract
Differential Fault Analysis (DFA), aided by sophisticated mathematical analysis techniques for ciphers and precise fault injection methodologies, has become a potent threat to cryptographic implementations. In this paper, we propose, to the best of the our knowledge, the first ``DFA-aware" physical design automation methodology, that effectively mitigates the threat posed by DFA. We first develop a novel floorplan heuristic, which resists the simultaneous corruption of cipher states necessary for successful fault attack, by exploiting the fact that most fault injections are localized in practice. Our technique results in the computational complexity of the fault attack to shoot up to exhaustive search levels, making them practically infeasible. In the second part of the work, we develop a routing mechanism, which tackles more precise and costly fault injection techniques, like laser and electromagnetic guns. We propose a routing technique by integrating a specially designed ring oscillator based sensor circuit around the potential fault attack targets without incurring any performance overhead.We demonstrate the effectiveness of our technique by applying it on state of the art ciphers.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-11AN ENERGY-EFFICIENT STOCHASTIC COMPUTATIONAL DEEP BELIEF NETWORK
Speaker:
Yidong Liu, University of Alberta, CA
Authors:
Yidong Liu1, Yanzhi Wang2, Fabrizio Lombardi3 and Jie Han1
1University of Alberta, CA; 2Syracuse university, US; 3Northeastern University, US
Abstract
Deep neural networks (DNNs) are effective machine learning models to solve a large class of recognition problems, including the classification of nonlinearly separable patterns. The applications of DNNs are, however, limited by the large size and high energy consumption of the networks. Recently, stochastic computation (SC) has been considered to implement DNNs to reduce the hardware cost. However, it requires a large number of random number generators (RNGs) that lower the energy efficiency of the network. To overcome these limitations, we propose the design of an energy-efficient deep belief network (DBN) based on stochastic computation. An approximate SC activation unit (A-SCAU) is designed to implement different types of activation functions in the neurons. The A-SCAU is immune to signal correlations, so the RNGs can be shared among all neurons in the same layer with no accuracy loss. The area and energy of the proposed design are 5.27% and 3.31% (or 26.55% and 29.89%) of a 32-bit floating-point (or an 8-bit fixed-point) implementation. It is shown that the proposed SC-DBN design achieves a higher classification accuracy compared to the fixed-point implementation. The accuracy is only lower by 0.12% than the floating-point design at a similar computation speed, but with a significantly lower energy consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-12PUSHING THE NUMBER OF QUBITS BELOW THE "MINIMUM": REALIZING COMPACT BOOLEAN COMPONENTS FOR QUANTUM LOGIC
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner and Robert Wille, Johannes Kepler University Linz, AT
Abstract
Research on quantum computers has gained attention since they are able to solve certain tasks significantly faster than classical machines (in some cases, exponential speed-ups are possible). Since quantum computations typically contain large Boolean components, design automation techniques are required to realize the respective Boolean functions in quantum logic. They usually introduce a significant amount of additional qubits - a highly limited resource. In this work, we propose an alternative method for the realization of Boolean components for quantum logic. In contrast to the current state-of-the-art, we dedicatedly address the main reasons causing the additionally required qubits (namely the number of the most frequently occurring output pattern as well as the number of primary outputs of the function to be realized) and propose to manipulate the function so that both issues are addressed. The resulting methods allow to push the number of required qubits below what is currently considered the minimum.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-13POWER OPTIMIZATION THROUGH PERIPHERAL CIRCUIT REUSING INTEGRATED WITH LOOP TILING FOR RRAM CROSSBAR-BASED CNN
Authors:
Yuanhui Ni, Weiwen Chen and Keni Qiu, Capital Normal University, CN
Abstract
Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, this paper adopts the Peripheral Circuit Unit (PeriCU)-Reuse scheme to meet a certain power budget. The underlying idea is to put the expensive AD/DAs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. Furthermore, it is observed that memory accesses can be bypassed if two adjacent layers are assigned in the different PeriCU. Then a loop tiling technique is proposed to further improve the energy and throughput of RCS. The experiments of two convolutional applications validate that the PeriCU-Reuse scheme integrated with the loop tiling techniques can efficiently meet power requirement, and further reduce energy consumption by 61.7%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-14ORIENT: ORGANIZED INTERLEAVED ECCS FOR NEW STT-RAM CACHES
Speaker:
Hamed Farbeh, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR
Authors:
Zahra Azad1, Hamed Farbeh2 and Amir Mahdi Hosseini Monazzah3
1Sharif University of Technology, IR; 2School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR; 3Department of Computer Engineering, Sharif University of Technology, Tehran, Iran, IR
Abstract
Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising alternative to SRAM in cache memories. However, STT-MRAMs face with high probability of write errors due to its stochastic switching behavior. To correct the write errors, Error-Correcting Codes (ECCs) used in SRAM caches are conventionally employed. A cache line consists of several codewords and the data bits are selected in such a way that the maximum correction capability is provided based on the error patterns in SRAMs. However, the different write error patterns in STT-MRAM caches leads to inefficiency of conventional ECC configurations. In this paper, first we investigate the efficiency of ECC configurations and demonstrate that the vulnerability of codewords in a cache line varies by up to 17x. This variation means that, while some words are overprotected, some others are highly probable to experience uncorrectable errors. Then, we propose an ECC bit selection scheme, so-called ORIENT, to reduce the vulnerability variation of codewords to 1.4x. The simulation results show that conventional ECC configuration increases the write error rate by up to about 64.4% compared with the optimum ECC bit selection, whereas this value for ORIENT is only 4.5%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-15ERASMUS: EFFICIENT REMOTE ATTESTATION VIA SELF-MEASUREMENT FOR UNATTENDED SETTINGS
Speaker:
Norrathep Rattanavipanon, University of California, Irvine, TH
Authors:
Xavier Carpent1, Norrathep Rattanavipanon2 and Gene Tsudik2
1UC Irvine, US; 2UCI, US
Abstract
Remote attestation (RA) is a popular means of detecting malware in embedded and IoT devices. RA is usually realized as a protocol via which a trusted verifier measures software integrity of an untrusted remote device called prover. All prior RA techniques require on-demand operation. We identify two drawbacks of this approach in the context of unattended devices: First, it fails to detect mobile malware that enters and leaves the prover between successive RA instances. Second, it requires the prover to engage in a potentially expensive computation, which can negatively impact safety-critical or real-time devices. To this end, we introduce the concept of self-measurement whereby a prover periodically (and securely) measures and records its own software state. A verifier then collects and verifies these measurements. We demonstrate a concrete technique called ERASMUS, justify its features and evaluate its performance. We show that ERASMUS is well-suited for safety-critical applications. We also define a new metric -- Quality of Attestation (QoA).

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-16END-TO-END LATENCY ANALYSIS OF CAUSE-EFFECT CHAINS IN AN ENGINE MANAGEMENT SYSTEM
Speaker:
Junchul Choi, Seoul National University, KR
Authors:
Junchul Choi, Donghyun Kang and Soonhoi Ha, Seoul National University, KR
Abstract
An engine management system consists of periodic or sporadic real-time tasks. A task is a set of runnables that may be fully preemptive or partially at runnable boundaries. A cause-effect chain is defined as a chain of runnables that are connected by the read/write dependency. We propose a novel analytical technique to estimate the end-to-end latency of a cause-effect chain by considering conservatively estimated schedule time bounds of associated runnables. The proposed approach is verified with an industrial-strength automotive benchmark.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-17GENERAL FLOORPLANNING METHODOLOGY FOR 3D ICS WITH AN ARBITRARY BONDING STYLE
Speaker:
Chien-Yu Huang, Department of Electrical Engineering, National Cheng Kung University, TW
Authors:
Jai-Ming Lin and Chien-Yu Huang, Department of Electrical Engineering, National Cheng Kung University, TW
Abstract
This paper proposes a general floorplanning methodology which can be applied to 3D ICs with an arbitrary bonding style. Some researches have shown that a 3D IC with the hybrid bonding style, which includes face-to-back and face-to-face, may obtain better results than that simply using the face-to-back bonding style. We respectively present an approach to assign modules to tiers for each kind of bonding style. Further, a new utilization function, called cosine-shaped function, is proposed to estimate utilizations of bins required by the analytical-based approach. Our experimental results show the cosine shaped function can obtain a little better result than the bell-shaped function on IBM benchmarks for 2D floorplanning. We also show that the proposed 3D floorplanning methodology consumes less TSVs and induces shorter wirelength compared to previous work in the hybrid bonding style.

Download Paper (PDF; Only available from the DATE venue WiFi)

UB09 Session 9

Date: Thursday 22 March 2018
Time: 10:00 - 12:00
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB09.1CCF: A CGRA COMPILATION FRAMEWORK
Authors:
Shail Dave and Aviral Shrivastava, Arizona State University, US
Abstract
Coarse-grained reconfigurable array (CGRA) can efficiently accelerate even non-parallel loops. Although scores of techniques have been developed in the past decade to map loops on CGRA PEs, several challenges in enabling acceleration of general-purpose applications on CGRAs remained unresolved, in particular, the automatic code generation for the CGRA accelerator coupled with modern processor cores. In this demonstration, we showcase CCF - CGRA compiler framework. CCF is implemented in LLVM 4.0 and includes a set of transformation and analysis passes. We show that given performance-critical loops annotated in embedded applications, how CCF extracts the loop, constructs the data dependency graph (DDG), maps it onto CGRA architecture, off-loads necessary configuration instructions for CGRA PEs, and automatically communicates data between the CPU and CGRA.

More information ...
UB09.2TOPOLINANO & MAGCAD: A DESIGN AND SIMULATION FRAMEWORK FOR THE EXPLORATION OF EMERGING TECHNOLOGIES
Authors:
Umberto Garlando and Fabrizio Riente, Politecnico di Torino, IT
Abstract
We developed a design framework that enables the exploration and analysis of emerging beyond-CMOS technologies. It is composed of two powerful tools: ToPoliNano and MagCAD. Different technologies are supported, and new ones could be added thanks to their modular structure. ToPoliNano starts from a VHDL description of a circuit and performs the place&route following the technological constraints. The resulting circuit can be simulated both at logical or physical level. MagCAD is a layout editor where the user can design custom circuits, by placing basic elements of the selected technology. The tool can extract a VHDL netlist based on compact models of placed elements derived from experiments or physical simulations. Circuits can be verified with standard VHDL simulators. The design workflow will be demonstrated at the U-booth to show how those tools could be a valuable help in the studying and development of emerging technologies and to obtain feedbacks from the scientific community.

More information ...
UB09.3CONSTRAINED RANDOM APPLICATION GENERATION FOR FIRMWARE-BASED POWER MANAGEMENT VALIDATION
Authors:
Vladimir Herdt1, Hoang M. Le1, Daniel Große2 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen, DFKI GmbH, DE
Abstract
Efficient power management (PM) is very important for modern SoCs. To handle the every rising complexity of embedded system design, power aware virtual prototypes (VPs) are employed to enable an early power analysis. Most modern SoCs implement the PM strategy in firmware (FW) due to ease of development. Validation of these strategies at VP level is crucial as undetected flaws will propagate. However, existing validation approaches are based on engineered software (SW), which might miss rare corner cases. We propose a demonstrator based on a novel approach to assess the power-versus-performance trade-off of FW-based PM. Instead of executing real SW applications, our approach makes use of workload scenarios described by a set of constraints to automatically generate SW with a specific power consumption profile. The main novelty is the modeling of scenarios based on constrained random techniques that are very successful in the area of SoC/HW functional validation.

More information ...
UB09.4POWER-AWARE SOFTWARE MAPPING OF PARALLEL APPLICATIONS ONTO HETEROGENEOUS MPSOCS
Authors:
Gereon Onnebrink and Rainer Leupers, RWTH Aachen University, DE
Abstract
Heterogeneous multi- and many-processor systems-on-chip provide the best trade-off between performance, cost, and power. One of the biggest hurdles to exploit multicore architectures from the SW side. Considering an application that has been properly partitioned into multiple concurrent tasks, and programmed in a parallel language, the process of mapping those tasks onto the processors with optimal DVFS is a huge challenge for a certain design goal. An automatic approach is needed that determines the optimal decision. A great amount of research has been conducted aiming to optimise the performance of a parallelised application. Another research track is the ESL power estimation methodology. Combining both, a novel power-aware software mapping heuristic has been implemented to develop performance and power co-optimized parallel software. This algorithm can be used to identify the gain of sophisticated power management techniques by providing the power-performance trade-off.

More information ...
UB09.5VIRTUAL PROTOTYPE MAKANI: ANALYZING THE USAGE OF POWER MANAGEMENT TECHNIQUES AND EXTRA-FUNCTIONAL PROPERTIES BY USING VIRTUAL PROTOTYPING
Author:
Sören Schreiner, OFFIS – Institute for Information Technology, DE
Abstract
My Phd work consists of analyzing the correct usage of power management techniques, as well as the analysis of extra-functional properties, including power and timing properties, in MPSoCs. Especially in safety-critical environments the power management gets safety-critical too, since it is able to influence the overall system behavior. To demonstrate my methodologies a mixed-critical multi-rotor system and its corresponding virtual prototype is used. The multi-rotor system's avionics is served by a Xilinx Zynq 7000 MPSoC. The hardware architecture includes ARM and MicroBlaze cores, a NoC for communication and peripherals. The MPSoC processes the flight algorithms with triple modular redundancy and a mission-critical video processing task. The virtual prototype consists of a virtual platform and an environmental model. The virtual platform is equipped with my measuring tool libraries to generate traces of the observed power management techniques and the extra-functional properties.

More information ...
UB09.6WARE: WEARABLE ELECTRONICS DIRECTIONAL AUGMENTED REALITY
Authors:
Gabriele Miorandi1, Walter Vendraminetto2, Federico Fraccaroli3, Davide Quaglia1 and Gianluca Benedetti4
1University of Verona, IT; 2EDALab Srl, IT; 3Wagoo LLC, IT; 4Wagoo Italia srls, IT
Abstract
Augmented Reality (AR) currently require large form factors, weight, cost and frequent recharging cycles that reduce usability. Connectivity, image processing, localization, and direction evaluation lead to high processing and power requirements. A multi-antenna system, patented by the industrial partner, enables a new generation of smart eye-wear that elegantly requires less hardware, connectivity, and power to provide AR functionalities. They will allow users to directionally locate nearby radio emitting sources that highlight objects of interest (e.g., people or retail items) by using existing standards like Bluetooth Low Energy, Apple's iBeacon and Google's Eddystone. This booth will report the current level of research addressed by the Computer Science Department of University of Verona, Wagoo LLC, and Wagoo Italia srls. In the presented demo, different objects emit an "I am here" signal and a prototype of the smart glasses shows the information related to the observed object.

More information ...
UB09.7OTPG: SPECIFICATION-BASED CONSTRUCTION OF ONLINE TPGS FOR MICROPROCESSORS
Authors:
Mikhail Chupilko, Alexander Kamkin and Andrei Tatarnikov, ISP RAS, RU
Abstract
This work presents an approach to construction of online test program generators (TPGs). The approach is intended to use specifications of ISA presented in nML/mmuSL specification languages. They are processed by a meta-generator to obtain their binary representations supplied with meta information and a test generation core compatible with the target microprocessor. The test generation core is loaded as a binary image into the target microprocessor's memory (for experiments we're using QEMU for MIPS) and produces test cases to be processed (incl. results checking) by an executor. It should be noticed that the meta-generator and the executor are not obligatory run at the same microprocessor (especially, if it is highly incomplete). The final goal of the project is to propose a method of obtaining online TPGs for a wide range of ISAs, and to develop a mature tool implementing this method.

More information ...
UB09.8IIP GENERATORS TO EASE ANALOG IC DESIGN
Authors:
Benjamin Prautsch, Uwe Eichler and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS/EAS, DE
Abstract
Semiconductor technology has shown significant progress over the last decades. Digital EDA (electronic design automation) allowed that this progress could be converted to high-performance digital ICs. Analog components are part of Systems-on-Chip (SoC) too, but analog EDA lags far behind. Therefore, a lot of effort was spent to automate analog IC design. Mayor results are constraint-based layout-aware optimization tools using predefined layout templates or pure automation as well as analog generators containing expert knowledge. While optimization is a holistic top-down approach, generators allow parameterized and fast bottom-up generation of critical schematic and layout parts, pre-planned by experienced designers. With IIP Generators, we follow three use cases to ease analog design: 1) design on higher hierarchy levels, 2) development of hierarchical high-level IIPs, and 3) automated design porting due to highly technology-independent blocks down to 22nm.

More information ...
UB09.9ABSYNTH: A COMPREHENSIVE APPROACH TO FRONT TO BACK ANALOG BLOCK DESIGN AUTOMATION
Authors:
Abhaya Chandra Kammara S.1, Sidney Pontes-Filho2 and Andreas König2
1ISE, TU Kaiserslautern, DE; 2University of Kaiserslautern, DE
Abstract
ABSYNTH was first presented in CEBIT 2014 where complete, practical circuit sizing approaches have been shown using meta-heuristics on trusted simulators. This tool was then proven by its use in design of several cells in a research project. Here, we present the extension to our nested optimization approach that creates a symmetric and well matched layout in every step for every instance in the population of the swarm, that is extracted in our flow to provide feedback to the cost function impacting on the population update for more viable and robust circuits. The layout optimization presented in this DEMO works with Cadence Layout design tools. Our initial focus is, motivated by Industry 4.0, IoT, on cells for signal conditioning electronics with reconfigurability and Self-X features.[1] Abhaya C. Kammara, L.Palanichamy, and A. König, "Multi-Objective optimization and visualization for analog automation", Complex. Intell. Syst, Springer, DOI 10.1007/s40747-016-0027-3, 2016

More information ...
UB09.10EXPERIENCE-BASED AUTOMATION OF ANALOG IC DESIGN
Authors:
Florian Leber and Juergen Scheible, Reutlingen University, DE
Abstract
While digital design automation is highly developed, analog design automation still remains behind the demands. Previous circuit synthesis approaches, which are usually based on optimization algorithms, do not satisfy industrial requirements. A promising alternative is given by procedural approaches (also known as "generators"): They (a) emulate experts' decisions, thus (b) make expert knowledge re-usable and (c) can consider all relevant aspects and constraints implicitly. Nowadays, generators are successfully applied in analog layout (Pcells, Pycells). We aim at an entire design flow completely based on procedural automation techniques. This flow will consist of procedures for the generation of schematics and layouts for every typical analog circuit class, such as amplifier, bandgap, filter a.s.o. In our presentation we give an overview on such a design flow and we show an approach for capturing an analog circuit designer's strategy as an executable "expert design plan".

More information ...
12:00End of session
12:30Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

10.1 Special Day Session on Designing Autonomous Systems: Digitalization in automotive and industrial systems

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Saal 2

Chair:
Matthias Traub, BMW, DE

Autonomous systems are an important part of todays and future solutions for the automotive and industrial sector. The research and development activities to enable highly/fully automated driving and industry 4.0 have to deal with a lot of new requirements (e.g. fail operational, cyber security), technologies (connectivity over 5G, neuronal networks, future computing platforms) and topics (data analytics, artificial intelligence). Furthermore processes, methods und tools lag behind and need to speed up to cope with all the consequences in validation and verification. The short paper will give an overview over these challenges and the actual state of research and the development in the field of digital autonomous systems.

TimeLabelPresentation Title
Authors
11:0010.1.1AUTO IN MOTION - TRENDS IN AUTOMOTIVE ENGINEERING
Author:
Eric Sax, Karlsruhe Institute of Technology, DE
Abstract
Technology is on the rise and enables new functions in modern cars. Automated vehicles, connected functions and alternative drives open new levels of safety, comfort and business models. But the processes, methods und tools lack behind and need to speed up to cope with all the consequences in validation and verification.
11:2510.1.2DRIVING AND BEING DRIVEN: FUTURE MOBILITY AND TECHNOLOGICAL ENABLERS
Author:
Hans-Jörg Vögel, BMW Group, DE
Abstract
Upcoming architectures for vehicular electrics, electronics, communication and software will face - among others - technological challenges driven by strong strategic trends. Those trends - Automated Driving, Artificial Intelligence, Drivetrain Electrification and Connected Systems and Services - are driving increasing system complexity, demanding formal verifiability and semantically rich system integration; and they are driving innovation, requiring new technologies,  subsystems and components to be integrated as well as new novel approaches in system integration and operation. Automation, in particular, not only introduces a plethora of additional sensors, compute-heavy algorithms, and the corresponding load on the vehicle's physical network, but also requires formalized functional safety engineering to a much greater extent than previous vehicle generations. Moreover, intelligent algorithms not only help automated vehicles drive safely, but artificial intelligence will be making its way into quite every aspect of functional vehicle design with the advent of intelligent personal assistants, affective multi-modal natural user interaction, situational awareness and augmented reality. The vehicle's self-awareness required is further driving architectural design challenges. Introduction of electrical drivetrains while maintaining or gradually phasing out conventional combustion engines from product line architectures is on the one side of the electrical power coin. On the other side, compute hungry algorithms are also raising questions of power consumption and power dissipation. Besides, next-generation wireless communication with antenna array placement requirements, as well as questions of information security and privacy are among technological drivers dominating automotive discussions. They are part of the foundation to deliver services and help organize people's mobile lives.
11:5010.1.3DIGITALIZATION IN THE INDUSTRY AUTOMATION - FROM THE COMPONENT TO THE CLOUD
Author:
Thilo Streichert, Festo AG & Co. KG, DE
Abstract
Products from Festo are becoming increasingly intelligent and incorporate more and more software-based applications as well as the ability to integrate embedded functions. Smart products optimize themselves, adapt to external influences, identify themselves and have a digital map in the form of a product key. Integrated sensor technology ensures that processes are transparent and self-diagnosing, thus enabling preventive maintenance. Universal standards and interfaces such as TSN and OPC UA fulfil the requirements for smart products suitable for plug&produce applications and Industry 4.0. In this contribution, the transformation from mechanical actuators to software-defined motion is presented. We will show how these automation components will be integrated in a networked machine cell and result in a higher overall equipment efficiency.
12:1010.1.45G CHALLENGES FOR CONNECTED, COOPERATIVE AND AUTOMATED TRANSPORT SYSTEMS
Author:
Jérôme Härri, Eurecom - Communication systems, FR
Abstract
The Wifi-based V2X Communication technology ITS-G5 (a.k.a DSRC in the US) is the currently only market-ready solution to provide safety-critical V2X communications for future C-ITS applications, such as road hazard warning, lane-change warning or intersection-collision warning. Yet, future Connected Cooperative Automated Transport Systems (CATS) are expected to require ultra-reliable and low latency connectivity that neither the current WiFi technology nor even the new Cellular V2X technology can provide. This talk will first provide a rapid overview of the benefit of these future CATS  as well as their expected communication requirements , then briefly review the state and capabilities of the current Wifi and Cellular V2X technologies, and finally describe the 5G challenges and roadmap  to meet the communication, networking and services required by these future Connected Cooperative Automated Transport Systems.
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

10.2 Neural Networks and Neurotechnology

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 6

Chair:
Jim Harkin, University of Ulster, GB

Co-Chair:
Robert Wille, University of Linz, AT

New approaches to energy efficiency in neural networks using approximate computing and non-volatile FeFET memory are presented. A novel inductive coupling interconnect approach for neurotechnology applications and an optimisation strategy for efficiently mapping spiking neural networks to neuromorphic hardware are also presented.

TimeLabelPresentation Title
Authors
11:0010.2.1DESIGN AND OPTIMIZATION OF FEFET-BASED CROSSBARS FOR BINARY CONVOLUTION NEURAL NETWORKS
Speaker:
Xiaoming Chen, Institute of Computing Technology, Chinese Academy of Sciences, CN
Authors:
Xiaoming Chen, Xunzhao Yin, Michael Niemier and Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
Binary convolution neural networks (CNNs) have attracted much attention for embedded applications due to low hardware cost and acceptable accuracy. Nonvolatile, resistive random-access memories (RRAMs) have been adopted to build crossbar accelerators for binary CNNs. However, RRAMs still face fundamental challenges such as sneak paths, high write energy, etc. We exploit another emerging nonvolatile device---- ferroelectric field-effect transistor (FeFET), to build crossbars to improve the energy efficiency for binary CNNs. Due to the three-terminal transistor structure, an FeFET can function as both a nonvolatile storage element and a controllable switch, such that both write and read power can be reduced. Simulation results demonstrate that compared with two RRAMbased crossbar structures, our FeFET-based design improves write power by 5600X and 395X, and read power by 4.1X and 3.1X. We also tackle an important challenge in crossbar-based CNN accelerators: when a crossbar array is not large enough to hold the weights of one convolution layer, how do we partition the workload and map computations to the crossbar array? We introduce a hardware-software co-optimization solution for this problem that is universal for any crossbar accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.2.2LOW-POWER 3D INTEGRATION USING INDUCTIVE COUPLING LINKS FOR NEUROTECHNOLOGY APPLICATIONS
Speaker:
Benjamin Fletcher, University of Southampton, GB
Authors:
Benjamin Fletcher1, Shidhartha Das2, Chi-Sang Poon3 and Terrence Mak1
1University of Southampton, GB; 2ARM Ltd., GB; 3Massachusetts Institute of Technology, US
Abstract
Three dimensional system integration offers the ability to stack multiple dies, fabricated in disparate technologies, within a single IC. For this reason, it is gaining popularity for use in sensor devices which perform concurrent analogue and digital processing, as both analogue and digital dies can be coupled together. One such class of devices are closed-loop neuromodulators; neurostimulators which perform real-time digital signal processing (DSP) to deliver bespoke treatment. Due to their implantable nature, these devices are inherently governed by very strict volume constraints, power budgets, and must operate with high reliability. To address these challenges, this paper presents a low-power inductive coupling link (ICL) transceiver for 3D integration of digital CMOS and analogue BiCMOS dies for use in closed-loop neuromodulators. The use of an ICL, as opposed to through silicon vias (TSVs), ensures high reliability and fabrication yield in addition to circumventing the use of voltage level conversion between disparate dies, improving power efficiency. The proposed transceiver is experimentally evaluated using SPICE as well as nine traditional TSV baseline solutions. Results demonstrate that, whilst the achievable bandwidth of the TSV-based approaches is much higher, for the typical data rates demanded by neuromodulator applications (0.5 - 1 Gbps) the ICL design consumes on average 36.7% less power through avoiding the use of voltage level shifters.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.2.3MAPPING OF LOCAL AND GLOBAL SYNAPSES ON SPIKING NEUROMORPHIC HARDWARE
Speaker:
Francky Catthoor, IMEC Fellow, BE
Authors:
Anup Das1, Yuefeng Wu1, Khanh Huynh1, Francesco Dell Anna2, Francky Catthoor2 and Siebren Schaafsma1
1IMEC, NL; 2IMEC, BE
Abstract
Spiking neural networks (SNNs) are widely deployed to solve complex pattern recognition, function approximation and image classification tasks. With the growing size and complexity of these networks, hardware implementation becomes challenging because scaling up the size of a single array (crossbar) of fully connected neurons is no longer feasible due to strict energy budget. Modern neromorphic hardware integrate small-sized crossbars with time-multiplexed interconnects. Partitioning SNNs becomes essential in order to map them on neuromorphic hardware with the major aim to reduce the global communication latency and energy overhead. To achieve this goal, we propose our instantiation of particle swarm optimization, which partitions SNNs into local synapses (mapped on crossbars) and global synapses (mapped on time-multiplexed interconnects), with the objective of reducing spike communication on the interconnect. This improves latency, power consumption as well as application performance by reducing inter-spike interval distortion and spike disorders. Our framework is implemented in Python, interfacing CARLsim, a GPU-accelerated application-level spiking neural network simulator with an extended version of Noxim, for simulating time-multiplexed interconnects. Experiments are conducted with realistic and synthetic SNN-based applications with different computation models, topologies and spike coding schemes. Using power numbers from in-house neuromorphic chips, we demonstrate significant reductions in energy consumption and spike latency over PACMAN, the widely-used partitioning technique for SNNs on SpiNNaker.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.2.4ENERGY-EFFICIENT NEURAL NETWORKS USING APPROXIMATE COMPUTATION REUSE
Speaker:
Xun Jiao, University of California San Diego, US
Authors:
Xun Jiao1, Vahideh Akhlaghi1, Yu Jiang2 and Rajesh Gupta1
1University of California, San Diego, US; 2Tsinghua University, CN
Abstract
As a problem-solving method, neural networks have shown broad success for medical applications, speech recognition, and natural language processing. Current hardware implementations of neural networks exhibit high energy consumption due to the intensive computing workloads. This paper proposes a methodology to design an energy-efficient neural network that effectively exploits computation reuse opportunities. To do so, we use Bloom filters (BFs) by tightly integrating them with computation units. BFs store and recall frequently occurring input patterns to reuse computations. We expand the opportunities for computation reuse by storing frequent input patterns specific to a given layer and using approximate pattern matching with hashing for limited data precision. This reconfigurable matching is key to achieving a "controllable approximation" for neural networks. To lower the energy consumption of BFs, we also use low-pow memristor arrays to implement BFs. Our experimental results show that for convolutional neural networks, the BFs enable 47.5% energy saving of multiplication operations while incurring only 1% accuracy drop. While the actual savings will vary depending upon the extent of approximation and reuse, this paper presents a method for reducing computing workloads and improving energy efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP4-11, 428AN ENERGY-EFFICIENT STOCHASTIC COMPUTATIONAL DEEP BELIEF NETWORK
Speaker:
Yidong Liu, University of Alberta, CA
Authors:
Yidong Liu1, Yanzhi Wang2, Fabrizio Lombardi3 and Jie Han1
1University of Alberta, CA; 2Syracuse university, US; 3Northeastern University, US
Abstract
Deep neural networks (DNNs) are effective machine learning models to solve a large class of recognition problems, including the classification of nonlinearly separable patterns. The applications of DNNs are, however, limited by the large size and high energy consumption of the networks. Recently, stochastic computation (SC) has been considered to implement DNNs to reduce the hardware cost. However, it requires a large number of random number generators (RNGs) that lower the energy efficiency of the network. To overcome these limitations, we propose the design of an energy-efficient deep belief network (DBN) based on stochastic computation. An approximate SC activation unit (A-SCAU) is designed to implement different types of activation functions in the neurons. The A-SCAU is immune to signal correlations, so the RNGs can be shared among all neurons in the same layer with no accuracy loss. The area and energy of the proposed design are 5.27% and 3.31% (or 26.55% and 29.89%) of a 32-bit floating-point (or an 8-bit fixed-point) implementation. It is shown that the proposed SC-DBN design achieves a higher classification accuracy compared to the fixed-point implementation. The accuracy is only lower by 0.12% than the floating-point design at a similar computation speed, but with a significantly lower energy consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP4-12, 375PUSHING THE NUMBER OF QUBITS BELOW THE "MINIMUM": REALIZING COMPACT BOOLEAN COMPONENTS FOR QUANTUM LOGIC
Speaker:
Alwin Zulehner, Johannes Kepler University Linz, AT
Authors:
Alwin Zulehner and Robert Wille, Johannes Kepler University Linz, AT
Abstract
Research on quantum computers has gained attention since they are able to solve certain tasks significantly faster than classical machines (in some cases, exponential speed-ups are possible). Since quantum computations typically contain large Boolean components, design automation techniques are required to realize the respective Boolean functions in quantum logic. They usually introduce a significant amount of additional qubits - a highly limited resource. In this work, we propose an alternative method for the realization of Boolean components for quantum logic. In contrast to the current state-of-the-art, we dedicatedly address the main reasons causing the additionally required qubits (namely the number of the most frequently occurring output pattern as well as the number of primary outputs of the function to be realized) and propose to manipulate the function so that both issues are addressed. The resulting methods allow to push the number of required qubits below what is currently considered the minimum.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

10.3 From Non-Volatile Flip-Flops to Storage Systems

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 1

Chair:
Alexandre Levisse, EPFL, CH

Co-Chair:
Weisheng Zhao, Beihang University, CN

This session combines research from circuit to system level on non-volatile memories. The first paper proposes an STT-MRAM-based multi-bit non-volatile flip-flop. The other papers addresse system-level challenges, such as write disturbance mitigation, wear levelling scheme and latency reduction, for various technologies (PCM, SSD-flash).

TimeLabelPresentation Title
Authors
11:0010.3.1(Best Paper Award Candidate)
MULTI-BIT NON-VOLATILE SPINTRONIC FLIP-FLOP
Speaker:
Rajendra Bishnoi, Karlsruhe Institute of Technology, DE
Authors:
Christopher Münch, Rajendra Bishnoi and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Abstract
As leakage increases proportionally with the technology downscaling, it becomes extremely challenging to manage to meet the total power budget. This is because, CMOS-based logic blocks can not be completely power-gated as their flip-flops always require a retention supply. Alternatively, their data can be stored in a separate memory during the standby mode, however, that result in a huge area and energy overhead. Spin Transfer Torque (STT) based non-volatile flip-flops can offer normally-off/instant-on computing features to reduce leakage by complete power shut-down without the need to transfer and restore system states before and after the power-down phases. The non-volatile component of such flip-flops can be easily shared for the overall design optimizations. In this paper, we design a unique multi-bit non-volatile flip-flop architecture using STT devices to reduce the area and energy costs associated with non-volatile components. This architecture is developed based on the resource sharing principle using a custom designing technique that enables the optimization for the area and energy consumption. We have developed a framework in which we have replaced the conventional neighbor flip-flops in the layout with our proposed multi-bit non-volatile designs. Results show that, at system-level, using our proposed multi-bit flip-flop architecture, we significantly improve the area and energy compared to the standard single bit non-volatile flip-flop designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.3.2ADAM: ARCHITECTURE FOR WRITE DISTURBANCE MITIGATION IN SCALED PHASE CHANGE MEMORY
Speaker:
Shivam Swami, University of Pittsburgh, US
Authors:
Shivam Swami and Kartik Mohanram, University of Pittsburgh, US
Abstract
With technology scaling, phase change memory (PCM) has become highly vulnerable to write disturbance (WD) errors. A PCM WD error occurs when a cell write dissipates heat to idle cells in the same/adjacent word lines (WLs), disturbing the states of those cells. Whereas state-of-the-art solutions, e.g., data insulation (DIN) and super dense PCM (SD-PCM), have successfully addressed WL PCM WD errors, reducing (i) bit line (BL) WD errors and (ii) the performance penalties of aggregate (WL+BL) WD error recovery remain areas of active research and development. Architecture for Write DisturbAnce Mitigation, ADAM, is a low cost, high performance pattern-based data compression and alignment solution to reduce the aggregate (WL+BL) WD error rate in PCM. At no impact to inter-cell spacing, ADAM increases the lateral separation between the cells storing useful data in adjacent WLs, ensuring that the heat dissipated to adjacent WLs minimally impacts the cells storing useful data. For one compression tag bit per 512-bit cache line, ADAM provides an effective solution to reduce the number of WL and BL cells vulnerable to WD errors. ADAM also integrates a novel Deferred WD Correction scheme, DEFT, that opportunistically defers latency-intensive WD error recovery of cached data in the adjacent WLs without impacting memory reliability. ADAM is evaluated on single-/multi-level cell (SLC/MLC) PCM using the SPEC CPU2006 benchmarks. Results for SLC (MLC) PCM show that in comparison to state-of-the-art SD-PCM, ADAM reduces the aggregate WD error rate by 32% (60%); this translates to a 50% (61%) reduction in error correction energy and a 7% (15%) improvement in system performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.3.3PROGRAM ERROR RATE-BASED WEAR LEVELING FOR NAND FLASH MEMORY
Speaker:
Fei Wu, Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology, CN
Authors:
Xin Shi1, Fei Wu1, Shunzhuo Wang1, Changsheng Xie1 and Zhonghai Lu2
1Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, CN; 2KTH Royal Institute of Technology, SE
Abstract
Wear leveling scheme has became a fundamental issue in the design of Solid State Disk (SSD) based on NAND Flash memory. Existing schemes aim to equalize the number of programming/erase (P/E) cycles and memory raw bit error rates (BER) among all the flash blocks. However, due to fabrication process variation, different blocks of the same flash chip usually have largely different endurance in terms of BER and program error rate (PER). Such conventional design cannot obtain the wear status of flash blocks precisely. This paper proposes PER-WL,an efficient PER-based wear leveling scheme that uses PER statistics as the measurement of flash block wear-out pace, and performs block data swapping to improve the wear leveling efficiency. In our evaluation with four realistic workloads, PER-based wear leveling scheme can achieve 17% and 9% variance of program error rate reduction, 8% and 3% program error rate reduction with 5% and 2% system performance degradation when compared to two state-of-the-art wear leveling schemes on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:1510.3.4SHADOWGC: COOPERATIVE GARBAGE COLLECTION WITH MULTI-LEVEL BUFFER FOR PERFORMANCE IMPROVEMENT IN NAND FLASH-BASED SSDS
Speaker:
Jinhua Cui, Xi'an Jiaotong University, CN
Authors:
Jinhua Cui1, Youtao Zhang2, Jianhang Huang1, Weiguo Wu1 and Jun Yang2
1Xi'an Jiaotong University, CN; 2University of Pittsburgh, US
Abstract
Garbage collection, an essential background activity in NAND flash based Solid-State Drives, often introduces large runtime overhead. Recent studies showed that it is beneficial to separate the flash pages that have dirty copies in the write buffers from those that do not. However, the existing schemes exploring this observation have limitations, which prevent them from maximizing the performance improvement. In this paper, we address the above challenge through ShadowGC, a novel GC design that exploits the pages in both host-side and device-side write buffers and adopts different read and write strategies to minimize the GC overhead. When garbage collecting flash pages that have dirty copies in the device-side write buffer, ShadowGC reads data from the write buffer such that the page relocation operations in GC merge with the write-back operations of the buffer. When garbage collecting flash pages that have dirty copies in the host-side write buffer, ShadowGC moves them to dedicated blocks and speeds up the movement with fast-write operations. Our experimental results show that, on average, ShadowGC reduces the write amplification by 16.2% and the GC latency by 20.5% over the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP4-13, 755POWER OPTIMIZATION THROUGH PERIPHERAL CIRCUIT REUSING INTEGRATED WITH LOOP TILING FOR RRAM CROSSBAR-BASED CNN
Authors:
Yuanhui Ni, Weiwen Chen and Keni Qiu, Capital Normal University, CN
Abstract
Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, this paper adopts the Peripheral Circuit Unit (PeriCU)-Reuse scheme to meet a certain power budget. The underlying idea is to put the expensive AD/DAs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. Furthermore, it is observed that memory accesses can be bypassed if two adjacent layers are assigned in the different PeriCU. Then a loop tiling technique is proposed to further improve the energy and throughput of RCS. The experiments of two convolutional applications validate that the PeriCU-Reuse scheme integrated with the loop tiling techniques can efficiently meet power requirement, and further reduce energy consumption by 61.7%.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP4-14, 697ORIENT: ORGANIZED INTERLEAVED ECCS FOR NEW STT-RAM CACHES
Speaker:
Hamed Farbeh, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR
Authors:
Zahra Azad1, Hamed Farbeh2 and Amir Mahdi Hosseini Monazzah3
1Sharif University of Technology, IR; 2School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR; 3Department of Computer Engineering, Sharif University of Technology, Tehran, Iran, IR
Abstract
Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising alternative to SRAM in cache memories. However, STT-MRAMs face with high probability of write errors due to its stochastic switching behavior. To correct the write errors, Error-Correcting Codes (ECCs) used in SRAM caches are conventionally employed. A cache line consists of several codewords and the data bits are selected in such a way that the maximum correction capability is provided based on the error patterns in SRAMs. However, the different write error patterns in STT-MRAM caches leads to inefficiency of conventional ECC configurations. In this paper, first we investigate the efficiency of ECC configurations and demonstrate that the vulnerability of codewords in a cache line varies by up to 17x. This variation means that, while some words are overprotected, some others are highly probable to experience uncorrectable errors. Then, we propose an ECC bit selection scheme, so-called ORIENT, to reduce the vulnerability variation of codewords to 1.4x. The simulation results show that conventional ECC configuration increases the write error rate by up to about 64.4% compared with the optimum ECC bit selection, whereas this value for ORIENT is only 4.5%.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:32IP5-16, 594EFFICIENT WEAR LEVELING FOR INODES OF FILE SYSTEMS ON PERSISTENT MEMORIES
Speaker:
Xianzhang Chen, Chongqing University, CN
Authors:
Xianzhang Chen1, Edwin Sha2, Yuansong Zeng1, Chaoshu Yang1, Weiwen Jiang1 and Qingfeng Zhuge3
1Chongqing University, CN; 2Chongqing University, US; 3East China Normal University, CN
Abstract
Existing persistent memory file systems achieve high-performance file accesses by exploiting advanced characteristics of persistent memories (PMs), such as PCM. However, they ignore the limited endurance of PMs. Particularly, the frequently updated inodes are stored on fixed locations throughout their lifetime, which can easily damage PM with common file operations. To address such issues, we propose a new mechanism, Virtualized Inode (VInode), for the wear leveling of inodes of persistent memory file systems. In VInode, we develop an algorithm called Pages as Communicating Vessels (PCV) to efficiently find and migrate the heavily written inodes. We implement VInode in SIMFS, a typical persistent memory file system. Experiments are conducted with well-known benchmarks. Compared with original SIMFS, experimental results show that VInode can reduce the maximum value and standard deviation of the write counts of pages to 1800x and 6200x lower, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

10.4 Cryptographic Hardware

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 2

Chair:
Nele Mentens, KU Leuven, BE

Co-Chair:
Tim Güneysu, Ruhr-Universität Bochum, DE

This session presents novel ideas realized in hardware for cryptographic systems. The contributions range from implementations of leakage resilient cryptography in ASICs, to FPGA realizations of novel public-key primitives as well as optimization of FPGA resources used by random number generation schemes.

TimeLabelPresentation Title
Authors
11:0010.4.1BINARY RING-LWE HARDWARE WITH POWER SIDE-CHANNEL COUNTERMEASURES
Speaker:
Ye Wang, The University of Texas at Austin, US
Authors:
Aydin Aysu, Mohit Tiwari and Michael Orshansky, University of Texas at Austin, US
Abstract
We describe the first hardware implementation of a quantum-secure encryption scheme along with its low-cost power side-channel countermeasures. The encryption uses an implementation-friendly Binary-Ring-Learning-with-Errors (B-RLWE) problem with binary errors that can be efficiently generated in hardware. We demonstrate that a direct implementation of B-RLWE exhibits vulnerability to power side-channel attacks, even to Simple Power Analysis, due to the nature of binary coefficients. We mitigate this vulnerability with a redundant addition and memory update. To further protect against Differential Power Analysis (DPA), we use a B-RLWE specific opportunity to construct a lightweight yet effective countermeasure based on randomization of intermediate states and masked threshold decoding. On a SAKURA-G FPGA board, we show that our method increases the required number of measurements for DPA attacks by 40X compared to unprotected design. Our results also quantify the trade-off between side-channel security and hardware area-cost of B-RLWE.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.4.2HIGH SPEED ASIC IMPLEMENTATIONS OF LEAKAGE-RESILIENT CRYPTOGRAPHY
Speaker:
Thomas Unterluggauer, Graz University of Technology, AT
Authors:
Robert Schilling1, Thomas Unterluggauer2, Stefan Mangard2, Frank Gurkaynak3, Michael Muehlberghuber4 and Luca Benini5
1Graz University of Technology / Know Center GmbH, AT; 2Graz University of Technology, AT; 3ETH Zurich, CH; 4Integrated Systems Laboratory (ETH Zurich), CH; 5Università di Bologna, IT
Abstract
Embedded devices in the Internet-of-Things require encryption functionalities to secure their communication. However, side-channel attacks and in particular differential power analysis (DPA) attacks pose a serious threat to cryptographic implementations. While state-of-the-art countermeasures like masking slow down the performance and can only prevent DPA up to a certain order, leakage-resilient schemes are designed to stay secure even in the presence of side-channel leakage. Although several leakage-resilient schemes have been proposed, there are no hardware implementations to demonstrate their practicality and performance on measurable silicon. In this work, we present an ASIC implementation of a multi-core System-on-Chip extended with a software-programmable accelerator for leakage-resilient cryptography. The accelerator is deeply embedded in the shared memory architecture of the many-core system, supports different configurations, contains a high-throughput implementation of the 2PRG primitive based on AES-128, offers two side-channel protected re-keying functions, and is the first fabricated design of the side-channel secure authenticated encryption scheme ISAP. The accelerator reaches a maximum throughput of 7.49,Gbit/s and a best-case energy efficiency of 137,Gbit/s/W making this accelerator suitable for high-speed secure IoT applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.4.3OPTIMIZATION OF THE PLL CONFIGURATION IN A PLL-BASED TRNG DESIGN
Speaker:
Elie Noumon Allini, Laboratoire Hubert Curien, University of Saint-Etienne, FR
Authors:
Elie Noumon Allini, Oto Petura, Viktor Fischer and Florent Bernard, Hubert Curien Laboratory, Jean Monnet University, FR
Abstract
Several recent designs show that the phase locked- loops (PLLs) are well suited for building true random number generators (TRNG) in logic devices and especially in FPGAs, in which PLLs are physically isolated from the rest of the device. However, the setup of the PLL configuration for the PLL-based TRNG is a challenging task. Indeed, the designer has to take into account physical constraints of the hardwired block, when trying to achieve required performance (bit rate) and security (entropy rate per bit). In this paper, we introduce a method aimed at choosing PLL parameters (e.g. input frequency, multiplication and division factors of the PLL) that satisfy hardware constraints, while achieving the highest possible bit rate or entropy rate according to application requirements. The proposed method is fast enough to produce all possible configurations in a short time. Comparing to the previous method based on a generic algorithm, which was able to find only a locally optimized solution and only for one PLL in tens of seconds, the new method finds exhaustive set of possible configurations of one- or two-PLL TRNG in few seconds, while the found configurations can be ordered depending on their performance or sensitivity to jitter.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP4-15, 187ERASMUS: EFFICIENT REMOTE ATTESTATION VIA SELF-MEASUREMENT FOR UNATTENDED SETTINGS
Speaker:
Norrathep Rattanavipanon, University of California, Irvine, TH
Authors:
Xavier Carpent1, Norrathep Rattanavipanon2 and Gene Tsudik2
1UC Irvine, US; 2UCI, US
Abstract
Remote attestation (RA) is a popular means of detecting malware in embedded and IoT devices. RA is usually realized as a protocol via which a trusted verifier measures software integrity of an untrusted remote device called prover. All prior RA techniques require on-demand operation. We identify two drawbacks of this approach in the context of unattended devices: First, it fails to detect mobile malware that enters and leaves the prover between successive RA instances. Second, it requires the prover to engage in a potentially expensive computation, which can negatively impact safety-critical or real-time devices. To this end, we introduce the concept of self-measurement whereby a prover periodically (and securely) measures and records its own software state. A verifier then collects and verifies these measurements. We demonstrate a concrete technique called ERASMUS, justify its features and evaluate its performance. We show that ERASMUS is well-suited for safety-critical applications. We also define a new metric -- Quality of Attestation (QoA).

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-14, 873NON-INTRUSIVE TESTING TECHNIQUE FOR DETECTION OF TROJANS IN ASYNCHRONOUS CIRCUITS
Speaker:
Rodrigo Possamai Bastos, TIMA Laboratory, CNRS/Grenoble INP/UJF, FR
Authors:
Leonel Acunha Guimarães, Thiago Ferreira de Paiva Leite, Rodrigo Possamai Bastos and Laurent Fesquet, TIMA - Grenoble Institute of Technology, FR
Abstract
Asynchronous circuits, as any IC, are vulnerable to hardware Trojans (HTs), which might be maliciously implanted in IC designs during outsourced fabrication phases. In this paper, a new testing technique to detect HTs by exploiting the regular side-channel properties of quasi-delay insensitive (QDI) asynchronous circuits is proposed. The technique does not need neither additional circuitry nor significant adjustments in the post-fabrication testing phase. Simulation results show that the proposed technique is able to detect HTs with dimensions smaller than 1% of the original circuit.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

10.5 Mixed-Criticality and Fault-Tolerant Real-Time Embedded Systems

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 3

Chair:
Leandro Indrusiak, Univ. of York, GB

Co-Chair:
Andy Pimentel, University of Amsterdam, DE

The session presents advances in mixed criticality systems related to Availability, Memory Bandwidth and Fault-Tolerance. The first paper details on service degradation in mixed criticality systems. The second paper handles mixed-critical workloads in the presences of memory contention. The third paper considers fault-tolerance to be incorporated into control algorithms.

TimeLabelPresentation Title
Authors
11:0010.5.1AVAILABILITY ENHANCEMENT AND ANALYSIS FOR MIXED-CRITICALITY SYSTEMS ON MULTI-CORE
Speaker:
Roberto Medina, Télécom ParisTech, FR
Authors:
Roberto Medina, Etienne Borde and Laurent Pautet, Télécom ParisTech, FR
Abstract
In the critical systems domain, Mixed Criticality Systems (MCS) improve considerably the usage of computation resources by running tasks with different levels of criticality on multi-core processors. To ensure the safety of MCS, services provided by low criticality tasks are degraded or stopped whenever high criticality tasks need more computation time than initially credited. The evaluation of this degradation is hardly considered in the literature although low criticality services are of prime importance for the quality of service (QoS) of critical systems. In this paper, we propose a method to evaluate the availability of low criticality services, i.e. how often these services are delivered in MCS. We also propose a task model that improves this availability, demonstrated thanks to our evaluation method on an illustrative example of MCS.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:3010.5.2MIXED-CRITICALITY SCHEDULING WITH MEMORY BANDWIDTH REGULATION
Speaker:
Muhammad Ali Awan, CISTER/INESC-TEC and ISEP/IPP, Porto, Portugal, PT
Authors:
Muhammad Ali Awan1, Pedro Souto2, Konstantinos Bletsas1, Benny Akesson1 and Eduardo Tovar1
1CISTER/INESC-TEC, ISEP, PT; 2Faculty of Engineering of the University of Porto, PT
Abstract
Mixed-criticality (MC) multicore system design must reconcile safety guarantees and high performance. The interference among cores on shared resources in such systems leads to unpredictable temporal behaviour. Memory bandwidth regulation among different cores can be a useful tool to mitigate the interference when accessing main memory. However, for mixed-criticality systems conforming to the (well-established) Vestal model, the existing schedulability analyses are oblivious to memory stalling effects, including stalls from memory bandwidth regulation. This makes it unsafe. In this paper, we address this issue by formulating a schedulability analysis for mixed-criticality fixed-priority-scheduled multicore systems using per-core memory access regulation. We also propose multiple heuristics for memory bandwidth allocation and task-to-core assignment. We implement our analysis and heuristics in a tool and evaluate them, performance-wise, through extensive experiments. Our experiments show that stall-oblivious schedulability analysis may be optimistic due to contention on shared memory resources.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:0010.5.3DESIGN AND VALIDATION OF FAULT-TOLERANT EMBEDDED CONTROLLERS
Speaker:
Soumyajit Dey, IIT Kharagpur, IN
Authors:
Saurav Kumar Ghosh1, Soumyajit Dey2, Dip Goswami3, Daniel Mueller-Gritschneder4 and Samarjit Chakraborty4
1Dept. of CSE, IIT Kharagpur, IN; 2Indian Institute of Technology Kharagpur, IN; 3Eindhoven University of Technology, NL; 4Technical University of Munich, DE
Abstract
Embedded control systems are an important and often safety-critical class of applications that need to operate reliably even in the presence of faults. We show that intermittent fault scenarios caused by wear-out effects due to a higher density and a smaller geometry of the embedded electronic components may become a reliability concern for real-time embedded control applications. To mitigate the effects of such intermittent faults, we propose a novel fault-tolerant controller design method such that the resulting controllers ensure closed loop stability (i.e., guarantee safety) with only possibly degraded performance under such fault scenarios. In order to measure the amortized performance offered by the software implementations of such fault-tolerant controllers, we provide a program analysis methodology that statically estimates the quality of control guaranteed by the C code implementation of the fault-tolerant control law. This combination of fault-tolerant controller design followed by performance feedback computed using a formal analysis is illustrated with a case study from the automotive domain.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP4-16, 181END-TO-END LATENCY ANALYSIS OF CAUSE-EFFECT CHAINS IN AN ENGINE MANAGEMENT SYSTEM
Speaker:
Junchul Choi, Seoul National University, KR
Authors:
Junchul Choi, Donghyun Kang and Soonhoi Ha, Seoul National University, KR
Abstract
An engine management system consists of periodic or sporadic real-time tasks. A task is a set of runnables that may be fully preemptive or partially at runnable boundaries. A cause-effect chain is defined as a chain of runnables that are connected by the read/write dependency. We propose a novel analytical technique to estimate the end-to-end latency of a cause-effect chain by considering conservatively estimated schedule time bounds of associated runnables. The proposed approach is verified with an industrial-strength automotive benchmark.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP5-12, 442TOWARDS FULLY AUTOMATED TLM-TO-RTL PROPERTY REFINEMENT
Speaker:
Vladimir Herdt, University of Bremen, DE
Authors:
Vladimir Herdt1, Hoang M. Le1, Daniel Grosse2 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen/DFKI GmbH, DE
Abstract
An ESL design flow starts with a TLM description, which is thoroughly verified and then refined to a RTL description in subsequent steps. The properties used for TLM verification are refined alongside the TLM description to serve as starting point for RTL property checking. However, a manual transformation of properties from TLM to RTL is error prone and time consuming. Therefore, in this paper we propose a fully automated TLM-to-RTL property refinement based on a symbolic analysis of transactors. We demonstrate the applicability of our property refinement approach using a case study.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

10.6 Special Session: Computing with Ferroelectric FETs - Devices, Models, Systems, and Applications

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Konf. 4

Chair:
Michael Niemier, University of Notre Dame, US

Co-Chair:
Ian O’Connor, Ecole Centrale de Lyon, FR

In this session, we consider devices, circuits, and systems comprised of transistors with integrated ferroelectrics. Said structures are actively being considered by various semiconductor manufacturers as they can address a large and unique design space. Transistors with integrated ferroelectrics could (i) enable a better switch (i.e., offer steeper subthreshold swings), (ii) are CMOS compatible, (iii) have multiple operating modes (i.e., I-V characteristics can also enable compact, 1-transistor, non-volatile storage elements, as well as analog synaptic behavior), and (iv) have been experimentally demonstrated (i.e., with respect to all of the aforementioned operating modes). These device-level characteristics offer unique opportunities at the circuit, architectural, and system-level, and are considered from device, circuit/architecture, and foundry-level perspectives.

TimeLabelPresentation Title
Authors
11:0010.6.1OUTLOOK FOR LOW-POWER BEYOND-CMOS DEVICES
Author:
An Chen, Semiconductor Research Corporation, US
Abstract
tbd
11:3010.6.2EXPLOITING FERROELECTRIC FETS: FROM LOGIC-IN-MEMORY TO NEURAL NETWORKS AND BEYOND
Speaker and Author:
Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
tbd
12:0010.6.3FEFETS: FROM NON-VOLATILE MEMORY TO NON-VOLATILE COMPUTING
Author:
Stefan Slesazeck, NaMLab gGmbH, DE
Abstract
tbd
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

10.8 An Industry Approach to FPGA and SOC System Development and Verification

Date: Thursday 22 March 2018
Time: 11:00 - 12:30
Location / Room: Exhibition Theatre

Organiser:
Alexander Schreiber, The MathWorks, DE

Speaker:
John Zhao, MathWorks, US

MATLAB and Simulink provide a rich environment for embedded-system development, with libraries of proven, specialized algorithms ready to use for specific applications. The environment enables a model-based design workflow for fast prototyping and implementation of the algorithms on heterogeneous embedded targets, such as MPSoC. A system-level design approach enables architectural exploration and partitioning, as well as coordination between SW and HW development workflows. Functional verification throughout the design process improves coverage and test-case generation while reducing the time and resources required.

In this exhibition theater session, you will learn

  • Automatically generate synthesizable RTL code from you MATLAB and Simulink algorithms targeting FPGA, ASIC or Programmable SoC
  • A HW/SW co-design workflow that combines system level design and simulation with automatic code generation
  • Functional verification using MATLAB and Simulink in a SystemVerilog workflow illustrated by a detailed example
TimeLabelPresentation Title
Authors
12:30End of session
Lunch Break in Großer Saal and Saal 1



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

UB10 Session 10

Date: Thursday 22 March 2018
Time: 12:00 - 14:30
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB10.1ARCHON: AN ARCHITECTURE-OPEN RESOURCE-DRIVEN CROSS-LAYER MODELLING FRAMEWORK
Authors:
Fei Xia1, Ashur Rafiev1, Mohammed Al-Hayanni2, Alexei Iliasov1, Rishad Shafik1, Alexander Romanovsky1 and Alex Yakovlev1
1Newcastle University, GB; 2Newcastle University, UK and University of Technology and HCED, IQ
Abstract
This demonstration showcases a modelling method for large complex computing systems focusing on many-core types and concentrating on the crosslayer aspects. The resource-driven models aim to help system designers reason about, analyse, and ultimately design such systems across all conventional computing and communication layers, from application, operating system, down to the finest hardware details. The framework and tool support the notion of selective abstraction and are suitable for studying such non-functional properties such as performance, reliability and energy consumption.

More information ...
UB10.2FPGA-BASED HARDWARE ACCELERATOR FOR DRUG DISCOVERY
Authors:
Ghaith Tarawneh, Alessandro de Gennaro, Georgy Lukyanov and Andrey Mokhov, Newcastle University, GB
Abstract
We present an FPGA-based hardware accelerator for drug discovery, developed during the EPSRC programme grant POETS (EP/N031768/1) in partnership with e-Therapeutics, an Oxford based drug discovery company. e-Therapeutics is pioneering a novel form of drug discovery based on analyzing protein interactome networks (https://www.youtube.com/watch?v=wQFpTtuzrgA). This approach can discover suitable drug candidates much more efficiently compared to wet lab testing but requires considerable computing power, particularly because commodity computers are generally inefficient at analyzing large-scale networks. The presented accelerator, consisting of an FPGA board with a silicon-mapped protein interactome plus accompanying software formalisms and tools, can deliver a 1000x speed up in this application compared to software running on commodity computers. We will showcase demos in which we run in-silico analysis of protein interactomes to test drug effects and visualize the results in real-time.

More information ...
UB10.3ADVANCED SIMULATION OF QUANTUM COMPUTATIONS
Authors:
Zulehner Alwin and Robert Wille, Johannes Kepler University Linz, AT
Abstract
Quantum computation is a promising emerging technology which allows for substantial speed-ups compared to classical computation. Since physical realizations of quantum computers are in their infancy, most research in this domain still relies on simulations on classical machines. This causes an exponential overhead which current simulators try to tackle with straight forward array-based representations and massive hardware power. There also exist solutions based on decision diagrams (graph-based approaches) that try to tackle the complexity by exploiting redundancies in quantum states and operations. However, they did not get established since they yield speedups only for certain benchmarks. Here, we demonstrate a new graph-based simulation approach which clearly outperforms state-of-the-art simulators. By this, users can efficiently execute quantum algorithms even if the respective quantum computers are not broadly available yet.

More information ...
UB10.4RISC-V PROCESSOR MODELING IN IP-XACT USING KACTUS2
Authors:
Esko Pekkarinen and Timo Hämäläinen, Tampere University of Technology, FI
Abstract
The complexity of modern embedded system design is managed by advanced, high-level design methodologies such as IP-XACT. However, integrating IP-XACT as a part of an existing design flow and packaging legacy sources is too often inhibited by the inherit differences between IP-XACT and the traditional hardware description languages. In this work, we take an existing Verilog implementation of a RISC-V microprocessor and package it with our open-source IP-XACT tool Kactus2. The resulting IP-XACT description will be publicly available and based on the modeling experience we report the observed pitfalls in the transition from HDL to IP-XACT.

More information ...
UB10.5RECONFIGURABLE SELF-TIMED DATAFLOW ACCELERATOR
Authors:
Danil Sokolov, Alessandro de Gennaro and Andrey Mokhov, Newcastle University, GB
Abstract
Many applications require reconfigurable pipelines to handle incoming data items differently depending on their values or the operating mode. Currently, reconfigurable synchronous pipelines are the mainstream of dataflow accelerators. However, there are certain advantages to be gained from self-timed dataflow processing, e.g. robustness to unstable power supply, data-dependent performance, etc. To become attractive for industry, reconfigurable asynchronous pipelines need a formal behavioural model and design automation. This demo will present a design flow for the specification, verification and synthesis of reconfigurable self-timed pipelines using Dataflow Structure formalism in Workcraft(https://workcraft.org/). As a case study we will use an asynchronous accelerator for Ordinal Pattern Encoding(OPE) with reconfigurable pipeline depth. We will exhibit the resultant OPE chip fabricated in TSMC90nm to show the benefits of reconfigurability and asynchrony for dataflow processing.

More information ...
UB10.6TTOOL/OMC: OPTIMIZED COMPILATION OF EXECUTABLE UML/SYSML DIAGRAMS FOR THE DESIGN OF DATA-FLOW APPLICATIONS
Authors:
Andrea Enrici1, Julien Lallet1, Renaud Pacalet2 and Ludovic Apvrille2
1Nokia Bell Labs, FR; 2Télécom ParisTech, FR
Abstract
Future 5G networks are expected to increase data rates by a factor of 10x. To meet this requirement, baseband stations will be equipped with both programmable (e.g., CPUs, DSPs) and reconfigurable components (e.g., FPGAs). Efficiently programming these architectures is not trivial due to the inner complexity and interactions of these two types of components. This raises the need for unified design flows capable of rapidly partitioning and programming these mixed architectures. Our demonstration will show the complete system-level design and Design Space Exploration, based on UML/SysML diagrams, of a 5G data-link layer receiver, that is partitioned onto both programmable and reconfigurable hardware. We realize an implementation of such a UML/SysML design by compiling it into an executable C application whose memory footprint is optimized with respect to a given scheduling. We will validate the effectiveness of our solution by comparing automated vs manual designs.

More information ...
UB10.7USING FORMAL METHODS FOR AUTOMATIC PLATFORM-INDEPENDENT CODE GENERATION OF RUN-TIME MANAGEMENT
Authors:
Mohammadsadegh Dalvandi, Michael Butler and Asieh Salehi Fathabadi, University of Southampton, GB
Abstract
Run-Time Management (RTM) systems are used in embedded systems to dynamically adapt hardware performance to minimise energy consumption. In this demonstration, we present a framework for automatic generation of RTM implementations from platform-independent formal models. The methodology in designing the RTM systems uses a high-level mathematical language, Event-B, which can describe systems at different abstraction levels. A code generation tool is used to translate platform-independent Event-B RTM models to platform-specific implementations in C. Formal verification is used to ensure correctness of the Event-B models. The portability offered by our methodology is demonstrated by modelling a Reinforcement Learning (RL) based RTM and generating implementations for two different platforms that all achieve energy savings on the respective platforms. The generated RTM code has been integrated with the PRiME framework, a cross-layer framework for embedded power management.

More information ...
UB10.8IIP GENERATORS TO EASE ANALOG IC DESIGN
Authors:
Benjamin Prautsch, Uwe Eichler and Torsten Reich, Fraunhofer Institute for Integrated Circuits IIS/EAS, DE
Abstract
Semiconductor technology has shown significant progress over the last decades. Digital EDA (electronic design automation) allowed that this progress could be converted to high-performance digital ICs. Analog components are part of Systems-on-Chip (SoC) too, but analog EDA lags far behind. Therefore, a lot of effort was spent to automate analog IC design. Mayor results are constraint-based layout-aware optimization tools using predefined layout templates or pure automation as well as analog generators containing expert knowledge. While optimization is a holistic top-down approach, generators allow parameterized and fast bottom-up generation of critical schematic and layout parts, pre-planned by experienced designers. With IIP Generators, we follow three use cases to ease analog design: 1) design on higher hierarchy levels, 2) development of hierarchical high-level IIPs, and 3) automated design porting due to highly technology-independent blocks down to 22nm.

More information ...
UB10.9ABSYNTH: A COMPREHENSIVE APPROACH TO FRONT TO BACK ANALOG BLOCK DESIGN AUTOMATION
Authors:
Abhaya Chandra Kammara S.1, Sidney Pontes-Filho2 and Andreas König2
1ISE, TU Kaiserslautern, DE; 2University of Kaiserslautern, DE
Abstract
ABSYNTH was first presented in CEBIT 2014 where complete, practical circuit sizing approaches have been shown using meta-heuristics on trusted simulators. This tool was then proven by its use in design of several cells in a research project. Here, we present the extension to our nested optimization approach that creates a symmetric and well matched layout in every step for every instance in the population of the swarm, that is extracted in our flow to provide feedback to the cost function impacting on the population update for more viable and robust circuits. The layout optimization presented in this DEMO works with Cadence Layout design tools. Our initial focus is, motivated by Industry 4.0, IoT, on cells for signal conditioning electronics with reconfigurability and Self-X features.[1] Abhaya C. Kammara, L.Palanichamy, and A. König, "Multi-Objective optimization and visualization for analog automation", Complex. Intell. Syst, Springer, DOI 10.1007/s40747-016-0027-3, 2016

More information ...
14:30End of session
15:30Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.0 LUNCH TIME KEYNOTE SESSION: Autonomous Driving: Ready to Market? Which are the Remaining Top Challenges?

Date: Thursday 22 March 2018
Time: 13:20 - 13:50
Location / Room: Saal 2

Chair:
Ayse Coskun, Boston University, US

During the last years a lot of prototypes for automated/autonomous driving vehicles have been presented to the public. Depending on the use case car manufacturers or tech companies have used an evolutionary or a revolutionary approach. While the evolutionary way should be more reasonable applied for owned cars due to cost restraints and the need for the functionality to work more or less by "something everywhere", the revolutionary approach following the strategy "everything somewhere" seems to be the better solution for fleets of autonomous cabs or shuttles. Although we have seen a lot of functional concepts for both approaches to automation, there are still some big challenges to be solved. On one hand the whole automation function has to be designed redundantly to ensure a sufficient functional safety level. In this context the use of Artificial Intelligence based networks could be a solution in particular neuronal networks based on deep learning. On the other hand there is still the question "how good is good enough" having in mind that perfectly working systems cannot be realized and how can the necessary verification/validation process be implemented. The public funded project PEGASUS is working to provide first answers. However: do we have considered all impacts of automated mobility?

TimeLabelPresentation Title
Authors
13:2011.0.1KEYNOTE SPEAKER
Author:
Thomas Form, Head of Electronics and Vehicle Research, Volkswagen AG, and co-ordinator of the Pegasus research project on safety of automated driving, DE
Abstract
During the last years a lot of prototypes for automated/autonomous driving vehicles have been presented to the public. Depending on the use case car manufacturers or tech companies have used an evolutionary or a revolutionary approach. While the evolutionary way should be more reasonable applied for owned cars due to cost restraints and the need for the functionality to work more or less by "something everywhere", the revolutionary approach following the strategy "everything somewhere" seems to be the better solution for fleets of autonomous cabs or shuttles. Although we have seen a lot of functional concepts for both approaches to automation, there are still some big challenges to be solved. On one hand the whole automation function has to be designed redundantly to ensure a sufficient functional safety level. In this context the use of Artificial Intelligence based networks could be a solution in particular neuronal networks based on deep learning. On the other hand there is still the question "how good is good enough" having in mind that perfectly working systems cannot be realized and how can the necessary verification/validation process be implemented. The public funded project PEGASUS is working to provide first answers. However: do we have considered all impacts of automated mobility?
13:50End of session
15:30Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.1 Special Day Session on Designing Autonomous Systems: Smart Vision Systems

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Saal 2

Chair:
Bernhard Rinner, Alpen-Adria-Universität Klagenfurt, AT

Smart vision systems that capture data in both private and public environments are now ubiquitous and have applications in security, disaster response, robotics, and smart environments, among others. Processing this data manually is an immensely tedious - and for some applications - an infeasible task, and an enhanced level of automation and self-awareness in the overall system is a key to overcome the design challenges. This special session addresses design aspects of smart vision systems realized at different levels: the image sensor, the camera node and the system level.

TimeLabelPresentation Title
Authors
14:0011.1.1THE CAMEL APPROACH TO STACKED SENSOR SMART CAMERAS
Speaker:
Marilyn Wolf, Georgia Institute of Technology, US
Authors:
Saibal Mukhopadhyay1, Marilyn Wolf1 and Evan Gebahrdt2
1Georgia Institute of Technology, US; 2School of ECE, Georgia Institute of Technology, US
Abstract
Stacked image sensor systems combine an image sensor, memory, and processors using 3D technology. Stacking camera components that have traditionally been packaged separately provides several benefits: very high bandwidth out of the image sensor, allowing for higher frame rates; very low latency, providing opportunities for image processing and computer vision algorithms which can adapt at very high rates; and lower power consumption. This paper will describe the characteristics of stacked image sensor systems and novel algorithmic and systems concepts that are made possible by these stacked sensors.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:1811.1.2A DESIGN TOOL FOR HIGH PERFORMANCE IMAGE PROCESSING ON MULTICORE PLATFORMS
Speaker:
Shuvra Bhattacharyya, University of Maryland, College Park, MD, USA and Tampere University of Technology, Finland, US
Authors:
Jiahao Wu1, Timothy Blattner2, Walid Keyrouz2 and Shuvra S. Bhattacharyya1
1University of Maryland, US; 2National Institute of Standards and Technology, US
Abstract
Design and implementation of smart vision systems often involve the mapping of complex image processing algorithms into efficient, real-time implementations on multicore platforms. In this paper, we describe a novel design tool that is developed to address this important challenge. A key component of the tool is a new approach to hierarchical dataflow scheduling that integrates a global scheduler and multiple local schedulers. The local schedulers are lightweight modules that work independently. The global scheduler interacts with the local schedulers to optimize overall memory usage and execution time. The proposed design tool is demonstrated through a case study involving an image stitching application for large scale microscopy images.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3611.1.3QUASAR, A HIGH-LEVEL PROGRAMMING LANGUAGE AND DEVELOPMENT ENVIRONMENT FOR DESIGNING SMART VISION SYSTEMS ON EMBEDDED PLATFORMS
Speaker:
Bart Goossens, Ghent University - imec, BE
Authors:
Bart Goossens, Hiêp Luong, Jan Aelterman and Wilfried Philips, Ghent University, Dept. of Telecommunications and Information Processing, BE
Abstract
We present Quasar, a new programming framework that handles many complex aspects in the design of smart vision systems on embedded platforms, such as parallelization, data flow management, scheduling and load balancing. Quasar, as a highlevel programming language, is nearly hardware-agnostic, has a low barrier of entry and is therefore well suited for algorithm design and rapid prototyping. Through several benchmarks and application use cases we demonstrate that programs written in Quasar have a performance that is on a par with (or better than) hand-tuned CUDA and OpenACC code while the development requires much less time and is future-proof.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:5411.1.4CONCURRENT FOCAL-PLANE GENERATION OF COMPRESSED SAMPLES FROM TIME-ENCODED PIXEL VALUES
Speaker:
Ricardo Carmona-Galan, Instituto de Microelectronica de Sevilla (CSIC-Univ. de Sevilla), ES
Authors:
Marco Trevisi1, Héctor C Bandala2, Jorge Fernández-Berni1, Ricardo Carmona-Galán1 and Ángel Rodríguez-Vázquez1
1Instituto de Microelectrónica de Sevilla (IMSE-CNM), CSIC-Universidad de Sevilla, ES; 2Dept. Electronics, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), MX
Abstract
Compressive sampling allows wrapping the relevant content of an image in a reduced set of data. It exploits the sparsity of natural images. This principle can be employed to deliver images over a network under a restricted data rate and still receive enough meaningful information. An efficient implementation of this principle lies in the generation of the compressed samples right at the imager. Otherwise, i. e. digitizing the complete image and then composing the compressed samples in the digital plane, the required memory and processing resources can seriously compromise the budget of an autonomous camera node. In this paper we present the design of a pixel architecture that encodes light intensity into time, followed by a global strategy to pseudo-randomly combine pixel values and generate, on-chip and on-line, the compressed samples.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1211.1.5CONTACTLESS FINGER AND FACE CAPTURING ON A SECURE HANDHELD EMBEDDED DEVICE
Speaker:
Axel Weissenfeld, AIT Austrian Institute of Technology GmbHA, AT
Authors:
Axel Weissenfeld and Bernhard Strobl, Austrian Institute of Technology, AT
Abstract
Traveler flows and crossings at the external borders of the EU are increasing and are expected to increase even more in the future; trends which encompass great challenges for travelers, border guards and the border infrastructure. In this paper we present a new handheld device, which enables border control authorities to check European, visa-holding and frequent third country travelers in a comfortable, fast and secure way. The mobile solution incorporates new multimodal biometric capturing and matching units for face and 4-finger authentication. Thereby, the focus is on the capturing unit and fingerprint verification, which is evaluated in detail. On the other hand, the use in border control requires high security measurements and trustworthy use of credentials, which are also presented. Tests of the handheld device at a land border indicate great acceptance by travelers and border guards.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.2 Timing and Power Driven Physical Design

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Konf. 6

Chair:
Miguel Silveira, INESC-ID/IST, PT

Co-Chair:
Patrick Groeneveld, Cadence Design Systems, US

The first two papers in this session present timing analysis algorithms to handle non-ideal physical conditions. In particular the first paper extends the involution model adding non-deterministic delay variations. The second paper presents a method for estimating the worst case delay based on the extreme value theory. The remaining two papers in this session deal with floorplanning and placement. One paper addresses 3D ICs for multiple supply voltages. The other shows a way to accelerate the analytical placement employing GPU cores.

TimeLabelPresentation Title
Authors
14:0011.2.1(Best Paper Award Candidate)
A FAITHFUL BINARY CIRCUIT MODEL WITH ADVERSARIAL NOISE
Speaker:
Jürgen Maier, TU Wien, AT
Authors:
Matthias Fuegger1, Jürgen Maier2, Robert Najvirt2, Thomas Nowak3 and Ulrich Schmid2
1LSV, CNRS & ENS Paris-Saclay, FR; 2TU Wien, AT; 3Universite Paris Sud, FR
Abstract
Accurate delay models are important for static and dynamic timing analysis of digital circuits, and mandatory for formal verification. However, Függer et al. [IEEE TC 2016] proved that pure and inertial delays, which are employed for dynamic timing analysis in state-of-the-art tools like ModelSim, NC-Sim and VCS, do not yield faithful digital circuit models. Involution delays, which are based on delay functions that are mathematical involutions depending on the previous-output-to-input time offset, were introduced by Függer et al. [DATE'15] as a faithful alternative (that can easily be used with existing tools). Although involution delays were shown to predict real signal traces reasonably accurately, any model with a deterministic delay function is naturally limited in its modeling power. In this paper, we thus extend the involution model, by adding non-deterministic delay variations (random or even adversarial), and prove analytically that faithfulness is not impaired by this generalization. Albeit the amount of non-determinism must be considerably restricted to ensure this property, the result is surprising: the involution model differs from non-faithful models mainly in handling fast glitch trains, where small delay shifts have large effects. This originally suggested that adding even small variations should break the faithfulness of the model, which turned out not to be the case. Moreover, the results of our simulations also confirm that this generalized involution model has larger modeling power and, hence, applicability.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.2.2EVT-BASED WORST CASE DELAY ESTIMATION UNDER PROCESS VARIATION
Speaker:
Charalampos Antoniadis, University of Thessaly, GR
Authors:
Charalampos Antoniadis, Dimitrios Garyfallou, Nestor Evmorfopoulos and Georgios Stamoulis, University of Thessaly, GR
Abstract
Manufacturing process variation in sub-20nm processes has introduced ever increasing overhead in Static Timing Analysis (STA) in order to guarantee the reliable operation of the circuit. Chip designers apply corner-based analysis and add guard-bands to design parameters in order to take into account the impact of process variation on timing. However, the aforementioned techniques are either too slow as the number of design parameters proliferates with the integration of more components into a chip or inaccurate due to the assumption that the worst case delay resides at the corners of design parameters. In this paper, we present a novel statistical methodology, which relies on Extreme Value Theory (EVT), to estimate the worst case delay of VLSI circuits under variations in gate/interconnect parameters. Despite the previous statistical approaches toward maximum delay estimation, our methodology can be applied regardless of the underlying gate/interconnect delay model or any assumption about the distribution of the Arrival Time (AT) at every circuit node, making it very appealing for integration to any level of timing analysis abstraction (from spice-to-gate level) and provide fast yet accurate results. Experimental results on ISCAS85/ISCAS89 circuits show that the estimated maximum AT at the Primary Outputs (POs) can be within 5% of the true maximum AT, at the cost of a few thousand Monte Carlo simulations.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.2.3CO-SYNTHESIS OF FLOORPLANNING AND POWERPLANNING IN 3D ICS FOR MULTIPLE SUPPLY VOLTAGE DESIGNS
Speaker:
Jhih-Ying Yang, Department of Electrical Engineering, National Cheng Kung University, TW
Authors:
Jai-Ming Lin, Chien-Yu Huang and Jhih-Ying Yang, Department of Electrical Engineering, National Cheng Kung University, TW
Abstract
This paper addresses a 3D floorplanning methodology, which considers floorplanning and powerplanning at the same time for Multiple Supply Voltage (MSV) circuits. Physical design becomes more complex for MSV designs since modules with the same power domain have to be placed at close locations in 3D space to facilitate powerplanning and reduce IR-drop, which would deteriorate wirelength. By properly partitioning modules of the same power domain into several voltage islands and increasing overlap area of the voltage islands in contiguous dies, we can reduce routing resource usage without increasing wirelength significantly. Further, unlike previous works, our approach not only can handle a netlist with soft modules and hard modules but also can meet the fixed-outline constraint. The experimental results show that our methodology gets better results than other approach in designs with single voltage domain and is also promising for MSV designs.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.2.4ACCELERATE ANALYTICAL PLACEMENT WITH GPU: A GENERIC APPROACH
Speaker:
Martin D. F. Wong, University of Illinois Urbana-Champaign, US
Authors:
Chun-Xun Lin and Martin Wong, University of Illinois at Urbana-Champaign, US
Abstract
This paper presents a generic approach of exploiting GPU parallelism to speed up the essential computations in VLSI nonlinear analytical placement. We consider the computation of wirelength and density which are widely used as cost and constraint in nonlinear analytical placement. For wirelength gradient computing, we utilize the sparse characteristic of circuit graph to transform the compute-intensive portions into sparse matrix multiplications, which effectively optimizes the memory access pattern and mitigates the imbalance workload. For density, we introduce a computation flattening technique to achieve load balancing among threads and a High-Precision representation is integrated into our approach to guarantee the reproducibility. We have evaluated our method on a set of contest benchmarks from industry. The experimental results demonstrate our GPU method achieves a better performance over both the CPU methods and the straightforward GPU implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP4-17, 425GENERAL FLOORPLANNING METHODOLOGY FOR 3D ICS WITH AN ARBITRARY BONDING STYLE
Speaker:
Chien-Yu Huang, Department of Electrical Engineering, National Cheng Kung University, TW
Authors:
Jai-Ming Lin and Chien-Yu Huang, Department of Electrical Engineering, National Cheng Kung University, TW
Abstract
This paper proposes a general floorplanning methodology which can be applied to 3D ICs with an arbitrary bonding style. Some researches have shown that a 3D IC with the hybrid bonding style, which includes face-to-back and face-to-face, may obtain better results than that simply using the face-to-back bonding style. We respectively present an approach to assign modules to tiers for each kind of bonding style. Further, a new utilization function, called cosine-shaped function, is proposed to estimate utilizations of bins required by the analytical-based approach. Our experimental results show the cosine shaped function can obtain a little better result than the bell-shaped function on IBM benchmarks for 2D floorplanning. We also show that the proposed 3D floorplanning methodology consumes less TSVs and induces shorter wirelength compared to previous work in the hybrid bonding style.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-1, 932A PLACEMENT ALGORITHM FOR SUPERCONDUCTING LOGIC CIRCUITS BASED ON CELL GROUPING AND SUPER-CELL PLACEMENT
Speaker:
Massoud Pedram, University of Southern California, US
Authors:
Soheil Nazar Shahsavani, Alireza Shafaei Bejestan and Massoud Pedram, University of Southern California, US
Abstract
This paper presents a novel clustering based placement algorithm for single flux quantum (SFQ) family of superconductive electronic circuits. In these circuits nearly all cells receive a clock signal and a placement algorithm that ignores the clock routing cost will not produce high quality solutions. To address this issue, proposed approach simultaneously minimizes the total wirelength of the signal nets and area overhead of the clock routing. Furthermore, construction of a perfect H-tree in SFQ logic circuits is not viable solution due to the resulting very high routing overhead and the in-feasibility of building exact zero-skew clock routing trees. Instead a hybrid clock tree must be used whereby higher levels of the clock tree (i.e., those closer to the clock source) are based on H-tree construction whereas lower levels of the clock tree follow a linear (i.e., chain-like) structure. The proposed approach is able to reduce the overall half-perimeter wirelength by 15% and area by 8% compared with state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:32IP5-2, 851ABAX: 2D/3D LEGALISER SUPPORTING LOOK-AHEAD LEGALISATION AND BLOCKAGE STRATEGIES
Speaker:
Nikolaos Sketopoulos, University of Thessaly, GR
Authors:
Nikolaos Sketopoulos, Christos Sotiriou and Stavros Simoglou, Department of Electrical and Computer Engineering, University of Thessaly, GR
Abstract
Abax is a modern version of the classical Abacus, minimum displacement, greedy legaliser. Abax supports single-tier 2D or 3D legalisation for multiple, logic-on-logic 3D-IC tiers, efficient look-ahead legalisation of intermediate Global Placement (GP) iterations, Hard Macros, Blockages, row density constraints and multiple local cell displacement functions and cell orderings. For 3D-IC, Abax can produce multi-tier 3D-IC placements by performing Legalisation-based Partitioning. For efficient Look-ahead Legalisation, Abax supports two new local displacement cost functions, multi-cell mean and multi-cell total. We show that the classical single-cell displacement and multi-cell total can result in artifacts when legalising early intermediate GPs, and that multi-cell mean is the best candidate for Look-ahead Legalisation. Obstructions, i.e. Hard Macros and Blockages are handled by using two strategies. We present legalisation results for the ISPD2014 and ISPD2015 benchmarks, by using GP generated from Eh?Placer, and HPWL measurement by using RippleDP. For 3D, two-tier legalisation we illustrate a ~30% reduction in HPWL for a set of ISPD2014 benchmarks. For 2D legalisation on the ISPD2015 benchmarks, our average HPWL increase over GP is 3.03%, compared to 7.21% of the Eh?Placer legaliser, and 43.16% of the RippleDP legaliser.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:33IP5-3, 864LESAR: A DYNAMIC LINE-END SPACING AWARE DETAILED ROUTER
Speaker:
Yih-Lang Li, Computer Science Department, NCTU, TH
Authors:
Ying-Chi Wei, Radhamanjari Samanta and Yih-Lang Li, National Chiao-Tung University, TW
Abstract
As the VLSI technology scales down, 193nm optical lithography reaches the limit and one-dimensional (1D) unidirectional style lithography technique emerges as one of the most promising solutions for coming advanced technology nodes. The 1D process first generates unidirectional dense metal lines and then use line-end cutting to form the target patterns with cut masks. If cuts are too close, they will lead to conflicts. Line-end spacing rules become dynamic rather than static because of cut mask and also now need to be followed strictly. Line-end spacing check between two line-end pairs in the same mask has also been regarded as compulsory line-end spacing constraints that have not discussed in previous works yet. Complying with these rules during APR has become a new bottleneck. In this work, we propose to make the router aware of the dynamic line-end spacing rules, including end-end spacing and parity spacing constraints. Experimental results of our proposed router demonstrates that it can effectively expel all end-end spacing violations as well as 75% of parity spacing violations in a reasonable runtime increase of 14%.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.3 More than Moore Interconnects

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Konf. 1

Chair:
Sébastien Le Beux, Ecole Centrale de Lyon – University of Lyon, FR

Co-Chair:
Davide Zoni, Politecnico di Milano, IT

In this session, the application of emerging technologies such as 3D Integration and Silicon Photonics broadens the capabilities of chip-scale interconnects and on-chip resource allocation mechanisms. The first two papers apply 3D integration to the design of networks-on-chip by providing enhanced collective communication mechanisms and resiliency to soft errors, respectively. The third paper shows how to leverage the NoC to manage resource allocations in chip multiprocessors. The final paper applies silicon photonics to the design of chip-scale interconnection networks for high-performance computing systems.

TimeLabelPresentation Title
Authors
14:0011.3.1HIGH PERFORMANCE COLLECTIVE COMMUNICATION-AWARE 3D NETWORK-ON-CHIP ARCHITECTURES
Speaker:
Biresh Joardar, Washington State University, US
Authors:
Biresh Joardar, Karthi Duraisamy and Partha Pande, Washington State University, US
Abstract
3D Network-on-Chip (NoC) architectures are capable of achieving better performance and lower energy consumption compared to their planar counterparts. However, conventional 3D NoCs are not efficient in handling collective communication. Existing works mainly explore Path and Tree multicast distribution schemes for 3D NoCs. However, both these mechanisms involve high network latency and lack scalability. In this work, we propose a SMART (Single-cycle Multi-hop Asynchronous Repeated Traversal) 3D NoC architecture that is capable of achieving high-performance collective communication. The proposed High-Performance SMART (HP-SMART) 3D NoC achieves 65% and 31% latency improvements compared to the existing Path and Tree multicast-based 3D NoCs respectively. HP-SMART 3D NoC also achieves significant improvement in message latency compared to its 2D counterpart.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.3.2A SOFT-ERROR RESILIENT ROUTE COMPUTATION UNIT FOR 3D NETWORKS-ON-CHIPS
Speaker:
Alexandre Coelho, TIMA Laboratory, FR
Authors:
Alexandre Coelho, Amir Charif, Nacer-Eddine Zergainoh, Juan Fraire and Raoul Velazco, Université Grenoble Alpes, CNRS, Grenoble INP, FR
Abstract
Three-dimensional Networks-on-Chips (3D-NoCs) have emerged as an alternative to further enhance the performance, functionality, and packaging density of 2D-NoCs. However, the increasing complexity of NoC routers, the continuous miniaturization of silicon technology, the lower operating voltages, and the higher operating frequencies have made the NoC increasingly vulnerable to soft errors. In particular, transient faults occurring in the route computation unit (RCU) can provoke misrouting which may lead to severe effects such as deadlocks or packet loss, corrupting the operation of the entire chip. By combining a reliable fault detection circuit leveraging circuit-level double-sampling, with a cost-effective rerouting mechanism, we develop a full fault-tolerance solution that can efficiently detect and correct such fatal errors before the affected packets leave the router. To validate the proposed solution, we also introduce a novel method for simulation-based fault-injection based on the NoC's gate-level netlist. Experimental results obtained from a partially and vertically connected 3D-NoC indicate that our solution can provide a high level of reliability in the presence of errors, at the expense of an area and power overhead of 4.1% and 6.8% respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.3.3SPA: SIMPLE POOL ARCHITECTURE FOR APPLICATION RESOURCE ALLOCATION IN MANY-CORE SYSTEMS
Speaker:
Iraklis Anagnostopoulos, Southern Illinois University Carbondale, US
Authors:
Jayasimha Sai Koduri and Iraklis Anagnostopoulos, Southern Illinois University Carbondale, US
Abstract
The technology push by Moore's law brings a paradigm shift in the adaption of many core systems which replace high frequency superscalar processors with many simpler ones. On the software side, in order to utilize the available computational power, applications are following the high performance parallel/multi-threading model. Thus, many-core systems raise the challenges of resource allocation and fragmentation making necessary efficient run-time resource management techniques. In this paper, we propose SPA, a Simple Pool Architecture for managing resource allocation in many-core systems. The proposed framework follows a distributed approach in which cores are organized into clusters and multiple clusters form a pool. Clusters are created based on system's characteristics and the allocation of cores is performed in a distributed manner so as to take advantage of spatial features, shared resources and reduce scattering of cores. Experimental results show that SPA produces on average 15% better application response time while waiting time is reduced by 45% on average compared to other state-of-art methodologies.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.3.4RSON: AN INTER/INTRA-CHIP SILICON PHOTONIC NETWORK FOR RACK-SCALE COMPUTING SYSTEMS
Speaker:
Peng Yang, Hong Kong University of Science and Technology, CN
Authors:
Peng Yang1, Zhengbin Pang2, Zhifei Wang1, Zhehui Wang1, Min Xie2, Xuanqi Chen1, Luan H.K. Duong1 and Jiang Xu1
1Hong Kong University of Science and Technology, HK; 2National University of Defense Technology, CN
Abstract
The increasing demand for more computational power from scientific computing, big data processing, and machine learning is pushing the development of HPC (high-performance computing) systems. As the basic HPC building blocks, modularized server racks with a large number of multicore nodes are facing performance and energy efficiency challenges. This paper proposes RSON, an optical network for rack-scale computing systems. RSON connects processor cores, caches, local memories, and remote memories through a novel inter/intra-chip silicon photonic network architecture. We develop a low-latency scalable channel partition and low-power dynamic path priority control scheme for RSON. Experimental results show that RSON can help rack-scale computing systems achieve up to 6.8X higher performance under the same energy consumption than state-of-the-art systems under the latest APEX (application performance at extreme scale) benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-4, 228UNDERSTANDING TURN MODELS FOR ADAPTIVE ROUTING: THE MODULAR APPROACH
Speaker:
Edoardo Fusella, Department of Electrical Engineering and Information Technologies, University of Naples Federico II, IT
Authors:
Edoardo Fusella and Alessandro Cilardo, University of Naples Federico II, IT
Abstract
Routing algorithms were extensively studied first in multi-computer systems, then in multi- and many-core architectures. Among the commonly used routing techniques, the turn model seems the most promising solution when targeting adaptiveness. Based on the turn model, several alternative approaches with different turn prohibition schemes were proposed. This paper gives a new theoretical background for designing deadlock-free partially adaptive logic-based distributed routing algorithms that are based on the turn model. Two properties are presented, including a necessary and sufficient condition to prove that a routing algorithm is deadlock-free as long as turn restrictions follow a modular distribution. Existing approaches can be considered a subset of the solution space identified by this work. Finally, we propose a novel routing algorithm exhibiting encouraging performance improvements over state-ofthe-art approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-5, 29QUATER-IMAGINARY BASE FOR COMPLEX NUMBER ARITHMETIC CIRCUITS
Speaker:
Souradip Sarkar, Nokia Bell Labs, BE
Authors:
Souradip Sarkar and Manil Dev Gomony, Nokia Bell Labs, BE
Abstract
Arithmetic operations involving complex numbers are widely used in the signal processing functions in the physical layer of modern wireless and wireline communication systems, electronic instrumentation and control systems. With the ever increasing throughput requirements of such systems, the power consumption of the hardware realization is increasing beyond the allowed budget. Arithmetic circuits based on binary numeral system that have been optimized rigorously over the past few decades are currently being used for the computation involving complex numbers. In this paper, we present the potential of arithmetic circuits for complex number computations based on the Quater-imaginary (QI) base numeral system to reduce power consumption. We show that for a simple multiplier implementation in the QI base, the savings in power and area consumption could be up to 40% when synthesized in 28nm TSMC standard cell technology node.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.4 Evaluating and optimizing memory and timing across HW and SW boundaries

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Konf. 2

Chair:
Sara Vinco, Politecnico di Torino, IT

Co-Chair:
Todd Austin, University of Michigan, US

The session presents new solutions to evaluate and optimize data location and application timing. The first paper presents a hybrid memory emulator to analyze the performance characteristics of non-volatile memories. The second paper proposes a tool and a library to synthesize distributed protocols on concurrent systems. The third paper presents an optimization framework to find the best data partitioning schemes for processing-in-memory architectures. Finally, the last paper uses loop acceleration to advance source level timing simulation.

TimeLabelPresentation Title
Authors
14:0011.4.1(Best Paper Award Candidate)
HME: A LIGHTWEIGHT EMULATOR FOR HYBRID MEMORY
Speaker:
Jie Xu, Huazhong University of Science and Technology, CN
Authors:
Zhuohui Duan, Haikun Liu, Xiaofei Liao and Hai Jin, Huazhong University of Science and Technology, CN
Abstract
Emerging non-volatile memory (NVM) technologies have been widely studied in recent years. Those studies mainly rely on cycle-accurate architecture simulators because the commercial NVM hardware is still unavailable. However, current simulation approaches are either too slow, or cannot simulate complex and large-scale workloads. In this paper, we propose a DRAM-based hybrid memory emulator, called HME, to emulate the performance characteristics of NVM devices. HME exploits hardware features available in commodity Non-Uniform Memory Access (NUMA) architectures to emulate two kinds of memories: fast, local DRAM, and slower, remote NVM on other NUMA nodes. HME can emulate a wide range of NVM latencies by injecting software-created memory access delays on the remote NUMA nodes. To evaluate the impact of hybrid memories on the application performance, we also provide application programming interfaces to allocate memory from NVM or DRAM regions. We evaluate the accuracy of the read/write delay injection models by using SPEC CPU2006 and compare the results with a state-of-the-art NVM emulator Quartz. Experimental results demonstrate that the average emulation errors of NVM read and write latencies are less than 5% in HME, which is much lower than Quartz. Moreover, the application performance overhead in HME is one order of magnitude lower than Quartz.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.4.2VERC3: A LIBRARY FOR EXPLICIT STATE SYNTHESIS OF CONCURRENT SYSTEMS
Speaker:
Marco Elver, University of Edinburgh, GB
Authors:
Marco Elver, Christopher J. Banks, Paul Jackson and Vijay Nagarajan, University of Edinburgh, GB
Abstract
We propose an alternative, explicit state only, approach to concurrent system synthesis. In particular, the focus of this work is on the synthesis of distributed protocols. Given a correctness specification and a protocol skeleton (i.e. incomplete with holes), the goal is to synthesize the holes. At the heart of our technique is a dynamic programming based algorithm that prunes inferred failure candidates. The algorithm exploits the fact that typically only a few transitions are needed to reach an erroneous state in a faulty distributed protocol. Therefore, it is unlikely that every hole to be synthesized is contributing towards the error; thus, faulty protocol candidates where only a subset of holes were used can be used to infer failures of later candidates with a superset of holes. We evaluate the tool using a cache coherence protocol synthesis case study. Specifically, we study a directory based MSI protocol, assuming an unordered interconnect which gives rise to numerous race conditions which must be resolved via introducing transient states---a common cause of complexity and bugs in such protocols. In the case study, we therefore focus on synthesizing the transient state actions (we consider up to 12 holes out of possible 35). With the proposed candidate pruning optimization, we report up to 43x improvement over a naive candidate enumeration scheme. We make available the tool and C++ library, VerC3.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.4.3PROMETHEUS: PROCESSING-IN-MEMORY HETEROGENEOUS ARCHITECTURE DESIGN FROM A MULTI-LAYER NETWORK THEORETIC STRATEGY
Speaker:
Yao Xiao, USC, US
Authors:
Yao Xiao, Shahin Nazarian and Paul Bogdan, University of Southern California, US
Abstract
With increasing demand for distributed intelligent physical systems performing big data analytics on the field and in real-time, processing-in-memory (PIM) architectures integrating 3D-stacked memory and logic layers could provide higher performance and energy efficiency. Towards this end, the PIM design requires principled and rigorous optimization strategies to identify interactions and manage data movement across different vaults. In this paper, we introduce Prometheus, a novel PIM-based framework that constructs a comprehensive model of computation and communication (MoCC) based on a static and dynamic compilation of an application. Firstly, by adopting a low level virtual machine (LLVM) intermediate representation (IR), an input application is modeled as a two-layered graph consisting of (i) a computation layer in which the nodes denote computation IR instructions and edges denote data dependencies among instructions, and (ii) a communication layer in which the nodes denote memory operations (e.g., load/store) and edges represent memory dependencies detected by alias analysis. Secondly, we develop an optimization framework that partitions the multi-layer network into processing communities within which the computational workload is maximized while balancing the load among computational clusters. Thirdly, we propose a community-to-vault mapping algorithm for designing a scalable hybrid memory cube (HMC)-based system where vaults are interconnected through a network-on-chip (NoC) approach rather than a crossbar architecture. This ensures scalability to hundreds of vaults in each cube. Experimental results demonstrate that Prometheus consisting of 64 HMC-based vaults improves system performance by 9.8x and achieves 2.3x energy reduction, compared to conventional systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.4.4ADVANCING SOURCE-LEVEL TIMING SIMULATION USING LOOP ACCELERATION
Speaker:
Joscha Benz, University of Tuebingen, DE
Authors:
Joscha Benz1, Christoph Gerum1 and Oliver Bringmann2
1University of Tuebingen, DE; 2University of Tuebingen / FZI, DE
Abstract
Source-level timing simulation (STLS) is an important technique for early examination of timing behavior, as it is very fast and accurate. A factor occasionally more important than precision is simulation speed, especially in design space exploration or very early phases of development. Additionally, practices like rapid prototyping also benefit from high-performance timing simulation. Therefore, we propose to further reduce simulation run-time by utilizing a method called loop acceleration. Accelerating a loop in the context of SLTS means deriving the timing of a loop prior to simulation to increase simulation speed of that loop. We integrated this technique in our SLTS framework and conducted an comprehensive evaluation using the Mälardalen benchmark suite. We were able to reduce simulation time by up to 43% of the original time, while the introduced accuracy loss did not exceed 8 percentage points.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-13, 782IN-MEMORY COMPUTING USING PATHS-BASED LOGIC AND HETEROGENEOUS COMPONENTS
Speaker:
Alvaro Velasquez, University of Central Florida, US
Authors:
Alvaro Velasquez and Sumit Kumar Jha, University of Central Florida, US
Abstract
The memory-processor bottleneck and scaling difficulties of the CMOS transistor have given rise to a plethora of research initiatives to overcome these challenges. Popular among these is in-memory crossbar computing. In this paper, we propose a framework for synthesizing logic-in-memory circuits based on the behavior of paths of electric current throughout the memory. Limitations of using only bidirectional components with this approach are also established. We demonstrate the effectiveness of our approach by generating n-bit addition circuits that can compute using a constant number of read and write cycles.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.5 Microfluidic Devices and Inexact Computing

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Konf. 3

Chair:
Martin Trefzer, University of York, GB

Co-Chair:
Lukas Sekanina, University of Brno, CZ

The first two presentations cover applications for microfluidic devices. The first one considers sample preparation, i.e. how to efficiently prepare certain dilutions and mixtures of fluids with a given amount of storages. The second one considers programmable versions of these devices that allow for the realization of general purpose applications. The last two presentations introduce new circuit structures for computing technologies that rely on approximation and probabilities. More precisely, an adaptive approximated divider design and manipulating circuits for stochastic computing are presented.

TimeLabelPresentation Title
Authors
14:0011.5.1STORAGE-AWARE SAMPLE PREPARATION USING FLOW-BASED MICROFLUIDIC LAB-ON-CHIP
Speaker:
Robert Wille, Institute for Integrated Circuits, Johannes Kepler University Linz, 4040 Linz, Austria, AT
Authors:
Sukanta Bhattacharjee1, Robert Wille2, Juinn-Dar Huang3 and Bhargab Bhattacharya1
1Indian Statistical Institute, Kolkata, IN; 2Johannes Kepler University Linz, AT; 3National Chiao Tung University,Hsinchu, TW
Abstract
Recent advances in microfluidics have been the major driving force behind the ubiquity of Labs-on-Chip (LoC) in biochemical protocol automation. The preparation of dilutions and mixtures of fluids is a basic step in sample preparation for which several algorithms and chip-architectures are well known. Dilution and mixing are implemented on biochips through a sequence of basic fluid-mixing and splitting operations performed in certain ratios. These steps are abstracted using a mixing graph. During this process, on-chip storage-units are needed to store intermediate fluids to be used later in the sequence. This allows to optimize the reactant-costs, to reduce the sample-preparation time, and/or to achieve the desired ratio. However, the number of storage-units is usually limited in given LoC architectures. Since this restriction is not considered by existing methods for sample preparation, the results that are obtained are often found to be useless (in the case when more storage-units are required than available) or more expensive than necessary (in the case when storage-units are available but not used, e.g., to further reduce the number of mixing operations or reactant-cost). In this paper, we present a storage-aware algorithm for sample preparation with flow-based LoCs which addresses these issues. We present a SAT-based approach to construct a mixing graph that enables the best usage of available storage-units while optimizing sample-preparation cost and/or time. Experimental results on several test cases reveal the scope, effectiveness, and the flexibility of the proposed method.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.5.2PUMP-AWARE FLOW ROUTING ALGORITHM FOR PROGRAMMABLE MICROFLUIDIC DEVICES
Speaker:
Tsung-Yi Ho, National Tsing Hua University, TW
Authors:
Guan-Ru Lai1, Chun-Yu Lin2 and Tsung-Yi Ho2
1TSMC, TW; 2National Tsing Hua University, TW
Abstract
As the biochemical experiment becomes more complicated and more diverse, the process of developing a specific-purpose microfluidic biochip for a new task can be very expensive and time consuming. Therefore, the programmable microfluidic devices (PMDs) are proposed as general purpose devices which can perform multiple functions without any hardware modification. Because the PMDs are controlled by pure software program, the assays can be done in parallel and the total completion time can be reduced. However, the high parallelism may cause congestion problem as different reagents are not allowed to cross each other to avoid unexpected mixing. Moreover, since reagents are pushed by the off-chip pump, the free channel from an off-chip pump to the actuated reagent is also prohibited to pass through. This could further complicate the congestion problem and increase the assay completion time significantly. However, some vulnerable reagents may spoil over time during the experiment. For timing critical application, it is indispensable to ensure the total assay completion time is within an upper limit. Therefore, we propose a pump-aware flow routing algorithm which deals with the complex routing congestion while minimizing the assay completion time within an upper limit

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.5.3ADAPTIVE APPROXIMATION IN ARITHMETIC CIRCUITS: A LOW-POWER UNSIGNED DIVIDER DESIGN
Speaker:
Honglan Jiang, University of Alberta, CA
Authors:
Honglan Jiang1, Leibo Liu2, Fabrizio Lombardi3 and Jie Han1
1University of Alberta, CA; 2Tsinghua University, CN; 3Northeastern University, US
Abstract
Many approximate arithmetic circuits have been proposed for high-performance and low-power applications. However, most designs are either hardware-fficient with a low accuracy or very accurate with a limited hardware saving, mostly due to the use of a static approximation. In this paper, an adaptive approximation approach is proposed for the design of a divider. In this design, division is computed by using a reduced-width divider and a shifter by adaptively pruning the input bits. Specifically, for a 2n/n division 2k/k bits are selected starting from the most significant '1' in the dividend/divisor. At the same time, redundant least significant bits (LSBs) are truncated or if the number of remaining LSBs is smaller than 2k for the dividend or k for the divisor, '0's are appended to the LSBs of the input. To avoid overflow, a 2(k+1)/(k+1) divider is used to compute the division of the 2k-bit dividend and the k-bit divisor, both with the most significant bit being '0'. Thus, k<n is a key variable that determines the size of the divider and the accuracy of the approximate design. Finally, an error correction circuit is proposed to recover the error caused by the shifter by using OR gates. The synthesis results in an industrial 28nm CMOS process show that the proposed 16/8 approximate divider using an 8/4 accurate divider is 2.5x as fast and consumes 34.42% of the power of the accurate 16/8 design. Compared with the other approximate dividers, the proposed design is significantly more accurate at a similar power-delay product. Moreover, simulation results show that the proposed approximate divider outperforms the other designs in two image processing applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.5.4CORRELATION MANIPULATING CIRCUITS FOR STOCHASTIC COMPUTING
Speaker:
Vincent Lee, University of Washington, US
Authors:
Vincent T. Lee, Armin Alaghi and Luis Ceze, University of Washington, US
Abstract
Stochastic computing (SC) is an emerging computing technique that promises high density, low power, and error tolerant solutions. In SC, values are encoded as unary bitstreams and SC arithmetic circuits operate on one or more bitstreams. In many cases, the input bitstreams must be correlated or uncorrelated for SC arithmetic to produce accurate results. As a result, a key challenge for designing SC accelerators is manipulating the impact of correlation across SC operations. This paper presents and evaluates a set of novel correlation manipulating circuits to manage correlation in SC computation: a synchronizer, desynchronizer, and decorrelator. We then use these circuits to propose improved SC maximum, minimum, and saturating adder designs. Compared to existing correlation manipulation techniques, our circuits are more accurate and up to 3x more energy efficient. In the context of an image processing pipeline, these circuits can reduce the total energy consumption by up to 24%.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-6, 637FAULT-TOLERANT VALVE-BASED MICROFLUIDIC ROUTING FABRIC FOR DROPLET BARCODING IN SINGLE-CELL ANALYSIS
Speaker:
Yasamin Moradi, Technical University of Munich (TUM), DE
Authors:
Yasamin Moradi1, Mohamed Ibrahim2, Krishnendu Chakrabarty2 and Ulf Schlichtmann1
1Technical University of Munich, DE; 2Duke University, US
Abstract
High-throughput single-cell genomics is used to gain insights into diseases such as cancer. Motivated by this important application, microfluidics has emerged as a key technology for developing comprehensive biochemical procedures for studying DNA, RNA, proteins, and many other cellular components. Recently, a hybrid microfluidic platform has been proposed to efficiently automate the analysis of a heterogeneous sequence of cells. In this design, a valve-based routing fabric based on transposers is used to label/barcode the target cells. However, the design proposed in prior work overlooked defects that are likely to occur during chip fabrication and system integration. We address the above limitation by investigating the fault tolerance of the valve-based routing fabric. We develop a theory of failure assessment and introduce a design technique for achieving fault tolerance. Simulation results show that the proposed method leads to a slight increase in the fabric size and decrease in cell-analysis throughput, but this is only a small price to pay for the added assurance of fault tolerance in the new design.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-7, 408OPTIMIZING POWER-ACCURACY TRADE-OFF IN APPROXIMATE ADDERS
Speaker:
Celia Dharmaraj, Indian Institute of Technology Madras, IN
Authors:
Celia Dharmaraj, Vinita Vasudevan and Nitin Chandrachoodan, Indian Institute of Technology Madras, IN
Abstract
Approximate circuit design has gained significance in recent years targeting applications like media processing where full accuracy is not required. In this paper, we propose an approximate adder in which the approximate part of the sum is obtained by finding a single optimal level that minimizes the mean error distance. Therefore hardware needed for the approximate part computation can be removed, which effectively results in very low power consumption. We compare the proposed adder with various approximate adders in the literature in terms of power and accuracy metrics. The power savings of our adder is shown to be 17% to 55% more than power savings of the existing approximate adders over a significant range of accuracy values. Further, in an image addition application, this adder is shown to provide the best trade-off between PSNR and power.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.6 Memory: new technologies and reliability-related issues

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Konf. 4

Chair:
Carles Hernandez, Barcelona Supercomputing Center (BSC), ES

Co-Chair:
Shahar Kvatinsky, Technion, IL

The session covers computation using emerging memory technologies, investigating techniques to protect against process variation and soft errors

TimeLabelPresentation Title
Authors
14:0011.6.1XNOR-RRAM: A SCALABLE AND PARALLEL RESISTIVE SYNAPTIC ARCHITECTURE FOR BINARY NEURAL NETWORKS
Speaker:
Shimeng Yu, Arizona State University, CN
Authors:
Xiaoyu Sun, Shihui Yin, Xiaochen Peng, Rui Liu, Jae-sun Seo and Shimeng Yu, Arizona State University, US
Abstract
Recent advances in deep learning have shown that Binary Neural Networks (BNNs) are capable of providing a satisfying accuracy on various image datasets with significant reduction in computation and memory cost. With both weights and activations binarized to +1 or -1 in BNNs, the high-precision multiply-and-accumulate (MAC) operations can be replaced by XNOR and bit-counting operations. In this work, we propose a RRAM synaptic architecture (XNOR-RRAM) with a bit-cell design of complementary word lines that implements equivalent XNOR and bit-counting operation in a parallel fashion. For large-scale matrices in fully connected layers or when the convolution kernels are unrolled in multiple channels, the array partition is necessary. Multi-level sense amplifiers (MLSAs) are employed as the intermediate interface for accumulating partial weighted sum. However, a low bit-level MLSA and intrinsic offset of MLSA may degrade the classification accuracy. We investigate the impact of sensing offsets on classification accuracy and analyze various design options with different sub-array sizes and sensing bit-levels. Experimental results with RRAM models and 65nm CMOS PDK show that the system with 128×128 sub-array size and 3-bit MLSA can achieve accuracies of 98.43% for MLP on MNIST and 86.08% for CNN on CIFAR-10, showing 0.34% and 2.39% degradation respectively compared to the accuracies of ideal BNN algorithms. The projected energy-efficiency of XNOR-RRAM is 141.18 TOPS/W, showing ~33X improvement compared to the conventional RRAM synaptic architecture with sequential row-by-row read-out.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.6.2A NOVEL FAULT TOLERANT CACHE ARCHITECTURE BASED ON ORTHOGONAL LATIN SQUARES THEORY
Speaker:
Georgios Keramidas, Think Silicon S.A., GR
Authors:
Filippos Filippou1, Georgios Keramidas2, Michail Mavropoulos1 and Dimitris Nikolos1
1University of Patras, GR; 2Think Silicon S.A./Technological Educational Institute of Western Greece, GR
Abstract
Aggressive dynamic voltage and frequency scaling is widely used to reduce the power consumption of microprocessors. Unfortunately, voltage scaling increases the impact of process variations on memory cells resulting in an exponential increase in the number of malfunctioning memory cells. As a result, various cache fault-tolerant (CFT) techniques have been proposed. In this work, we propose a new CFT technique which applies a systematic redistribution (permutation) of the cache blocks (assuming various block granularity levels) within the cache structure using the orthogonal Latin Square concept and taking as input the location of the malfunctioning cells in the cache array. The aim of the redistribution is twofold. First, to uniformly distribute the faulty blocks to sets and second, to gather the faulty subblocks to a minimum number of blocks, so as the fault free blocks are maximized. Our evaluation results using the benchmarks of SPEC2006 suite, 100 memory fault maps, and four percentages of malfunctioning cells show that our proposal exhibits strong capability to reduce cache performance degradation especially in situations with high percentages of faulty cells and compares favorably to already known techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.6.3TECHNOLOGY-AWARE LOGIC SYNTHESIS FOR RERAM BASED IN-MEMORY COMPUTING
Speaker:
Debjyoti Bhattacharjee, Nanyang Technological University, SG
Authors:
Debjyoti Bhattacharjee1, Luca Amaru2 and Anupam Chattopadhyay1
1Nanyang Technological University, SG; 2Synopsys, US
Abstract
Resistive RAMs (ReRAMs) have gained prominence for design of logic-in-memory circuits and architectures due to fast read/write speeds, high endurance, density and logic operation capabilities. ReRAM crossbar arrays allow constrained bit-level parallel operations. In this paper, for the first time, we propose optimization techniques during logic synthesis, which are specifically targeted for leveraging the parallelism offered by ReRAM crossbar arrays. Our method uses Majority-Inverter Graph (MIG) for the internal representation of the Boolean functions. The novel optimization techniques, when applied to the MIG, exposes the bit-level parallelism, and is further coupled with an efficient technology mapping flow. The entire synthesis process is benchmarked exhaustively over large arithmetic functions using a representative ReRAM crossbar architecture, while varying the crossbar dimensions. For the hard benchmarks, we obtained 10% reduction in the number of nodes with 16% reduction in delay on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.6.4SMARTAG: ERROR CORRECTION IN CACHE TAG ARRAY BY EXPLOITING ADDRESS LOCALITY
Speaker:
Hamed Farbeh, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR
Authors:
Seyedeh Golsana Ghaemi1, Iman Ahmadpour2, Mehdi Ardebili3 and Hamed Farbeh4
1Sharif University of Technology, IR; 2Sharif University of technology, IR; 3Tehran University, IR; 4School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR
Abstract
Soft errors in on-chip caches are the major cause of processors failure. Partitioning the cache into data and tag arrays, recent reports show that the vulnerability of the latter is as high as or even higher than that of the former. Although Error-Correcting Codes (ECCs) are widely used to protect the data array, their overheads are not affordable in the tag array and its protection is conventionally limited to parity code. In this paper, we propose Similarity-Managed Robust Tag (SMARTag) technique to provide the error correction capability in parity-protected tags. SMARTag exploits the inherent similarity between the upper parts of the tags in a cache set to share these parts between addresses and ECCs. Using SMARTag, the cache access time is intact since the ECC part is bypassed in normal cache operation and no extra memory is required since ECCs are stored in available tag space. The simulation results show that SMARTag is capable of correcting more than 98% of errors in the tag array, on average, and its energy consumption, area, and performance overhead is less than 0.2%.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-8, 287IMPROVING THE ERROR BEHAVIOR OF DRAM BY EXPLOITING ITS Z-CHANNEL PROPERTY
Speaker:
Kira Kraft, University of Kaiserslautern, DE
Authors:
Kira Kraft1, Matthias Jung2, Chirag Sudarshan1, Deepak M. Mathew1, Christian Weis1 and Norbert Wehn1
1University of Kaiserslautern, DE; 2Fraunhofer IESE, DE
Abstract
In this paper, we present a new communication theoretic channel model for Dynamic Random Access Memory (DRAM) retention errors, that relies on the fully asymmetric retention error behavior of DRAM cells. This new model shows that the traditional approach is over pessimistic and we confirm this with real measurements of DDR3 and DDR4 DRAM devices. Together with an exploitation of the vendor specific true- and anti-cell structure, a low complexity bit-flipping approach is presented, that can largely increase DRAM's reliability with minimum overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-9, 668ARCHITECTURE AND OPTIMIZATION OF ASSOCIATIVE MEMORIES USED FOR THE IMPLEMENTATION OF LOGIC FUNCTIONS BASED ON NANOELECTRONIC 1S1R CELLS
Speaker:
Arne Heittmann, RWTH-Aachen University, DE
Authors:
Arne Heittman and Tobias G. Noll, RWTH Aachen University, DE
Abstract
A neuromorphic architecture based on Binary Associative memories and nanoelectronic resistive switches is proposed for the realization of arbitrary logic/arithmetic functions. Subsets of non-trivial code sets based on error detecting 2-out-of-n-codes are thoroughly used to encode operands, results, and intermediate states in order to enhance the circuit reliability by mitigating the impact of device variability. 2-ary functions can be implemented by cascading a mixer memory, a correlator memory, and a response memory. By introduction of a new cost function based on class-specific word-line-coverage, stochastic optimization is applied with the aim to minimize the overall number of active amplifiers. For various exemplary functions optimized architectures are compared against solutions obtained using a standard-cost function. It is shown that the consideration of word-line-coverage results in a significant circuit compaction.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:32IP5-17, 62EXPLORING NON-VOLATILE MAIN MEMORY ARCHITECTURES FOR HANDHELD DEVICES
Speaker:
Virendra Singh, Indian Institute of Technology Bombay, IN
Authors:
Sneha Ved and Manu Awasthi, Indian Institute of Technology Gandhinagar, IN
Abstract
As additional functionality is being added to contemporary handheld devices, the SoCs inside these devices are becoming increasingly complex. Similarly, the applications executing on these handhelds are beginning to exhibit an ever increasing memory footprint. To support these trends, main memory capacity of these SoCs has been increasing over time. Due to these developments, memory system's contribution to the overall system power has increased dramatically. Non-volatile memories have been used in server architectures to increase capacity as well as keep memory system's power consumption in check. However, in the handheld domain, where user experience and battery life are of paramount importance, the applicability of such technologies has not been widely studied. In this paper, we propose and evaluate a number of hybrid memory architectures using mobile DRAM and PCM. We show that intelligent memory architectures, cognizant of workload's memory access patterns can provide significant energy savings without compromising on user experience. Using proposed approach, we can devise architectures that exhibit significant energy savings with only a 2.8% performance loss.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.7 Building Resistant Systems: From Temperature Awareness to Attack Resistance

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Konf. 5

Chair:
Marina Zapater, EPFL, CH

Co-Chair:
Georg Becker, ESMT Berlin, DE

This session explores new methods in building reliable and secure systems, especially in larger SoCs. Temperature-induced stress can impact the reliability of digital systems. Temperature fluctuations can also be exploited as side channels. This session first explores the two side of this coin by discussing a novel temperature-aware chiplet placing algorithm in 2.5D systems and by showing how transmission bandwidth encoded as a temperature signal can be maximized. Then, the rest of the session highlights advances in PUFs that are resistant against lastest-generation attacks, and particularly integration of PUFs in larger systems.

TimeLabelPresentation Title
Authors
14:0011.7.1LEVERAGING THERMALLY-AWARE CHIPLET ORGANIZATION IN 2.5D SYSTEMS TO RECLAIM DARK SILICON
Speaker:
Yenai Ma, Boston University, US
Authors:
Furkan Eris1, Ajay Joshi1, Andrew B. Kahng2, Yenai Ma1, Saiful Mojumder1 and Tiansheng Zhang1
1Boston University, US; 2UCSD, US
Abstract
As on-chip power densities of manycore systems continue to increase, one cannot simultaneously run all the cores due to thermal constraints. This phenomenon, known as the 'dark silicon' problem, leads to inactive regions on the chip and limits the performance of manycore systems. This paper proposes to reclaim dark silicon through a thermally-aware chiplet organization technique in 2.5D manycore systems. The proposed technique adjusts the interposer size and the spacing between adjacent chiplets to reduce the peak temperature of the overall system. In this way, a system can operate with a larger number of active cores at a higher frequency without violating thermal constraints, thereby achieving higher performance. To determine the chiplet organization that jointly maximizes performance and minimizes manufacturing cost, we formulate and solve an optimization problem that considers temperature and interposer size constraints of 2.5D systems. We design a multi-start greedy approach to find (near-)optimal solutions efficiently. Our analysis demonstrates that by using our proposed technique, an optimized 2.5D manycore system improves performance by 41% and 16% on average and by up to 87% and 39% for temperature thresholds of 85oC and 105oC, respectively, compared to a traditional single-chip system at the same manufacturing cost. When maintaining the same performance as an equivalent single-chip system, our approach is able to reduce the system manufacturing cost by 36%.

Download Paper (PDF; Only available from the DATE venue WiFi)
14:3011.7.2ISING-PUF: A MACHINE LEARNING ATTACK RESISTANT PUF FEATURING LATTICE LIKE ARRANGEMENT OF ARBITER-PUFS
Speaker:
Hiromitsu Awano, The University of Tokyo, JP
Authors:
Hiromitsu Awano1 and Takashi Sato2
1The University of Tokyo, JP; 2Kyoto University, JP
Abstract
A concept of Ising-PUF, a novel PUF structure that utilizes chaotic behavior of mutually interacting small PUFs, is proposed. Ising-PUF consists of a lattice like arrangement of small PUFs, each of which contains a spin register that stores the response of the small PUF, which also serves as a challenge of its neighbors. The spin patterns that develop along time determine the 1-bit response of the Ising-PUF. Utilizing state-memorizing nature of the spin registers, Ising-PUF attains a challenge hysteresis, i.e., allowing sequence of challenge inputs that continuously stimulate its chaotic behavior, which provides the drastically large challenge-to-response space. Experimental results demonstrate nearly ideal metrics; inter-chip Hamming distance (HD) of 50.1% and inter-environment HD of 2.26%. Further, Ising-PUF is remarkably tolerant to machine learning attacks, demonstrating that, even with a deep neural network using a 50k training CRPs, the prediction accuracy remains 50%, which is comparable to a random guess.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:0011.7.3EFFICIENT HELPER DATA REDUCTION IN SRAM PUFS VIA LOSSY COMPRESSION
Speaker:
Ye Wang, University of Texas at Austin, US
Authors:
Ye Wang and Michael Orshansky, University of Texas at Austin, US
Abstract
Fuzzy extractors used in PUF-based key generation require storage of helper data in non-volatile memory (NVM). The challenge of using SRAM PUF-based key generation on FPGAs is that high-capacity NVM, such as Flash, is not available on chip. Only expensive one-time-programmable (OTP) memory with limited capacity, such as e-fuses, can be utilized to store helper data. Our work allows a significant reduction of helper data size (HDS) through two innovative techniques. The first uses bit-error-rate (BER)-aware lossy compression: by treating a fraction of reliable bits as unreliable, it effectively reduces the size of the reliability mask. Considering practical costs of error characterization, the second technique permits across-temperature HDS minimization strategies based on bit-selection (with or without subsequent compression) using room-temperature only characterization. The method is based on stochastic concentration theory and allows efficiently forming confidence intervals for true worst-case BER. We use it to enable lossy compression and key reconstruction with success arbitrarily close to certainty. Results show that compared to maskless alternative, the proposed algorithm achieves an up to 4.5X HDS reduction with only 60% raw bits. Compared to lossless compression, we achieve a further 25% total HDS reduction, at the cost of doubling the number of raw PUF bits, for a 128-bit key. When bit-specific across-temperature characterization is not possible, our method achieves a significant 2.4X helper data reduction compared to the maskless alternative for extracting a 128-bit key and a 3X reduction for a 256-bit key.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:1511.7.4IMPROVING THE EFFICIENCY OF THERMAL COVERT CHANNELS IN MULTI-/MANY-CORE SYSTEMS
Speaker:
Zijun Long, South China University of Technology, CN
Authors:
Zijun Long1, Xiaohang Wang1, Yingtao Jiang2, Guofeng Cui1, Yiming Zhao1, Li Zhang1 and Terrence Mark3
1South China University of Technology, CN; 2university of nevada, las vegas, USA, US; 3University of Southampton,UK, GB
Abstract
In many-core chips seen in mobile computing, data center, AI, and elsewhere, thermal covert channels could be established to transmit data (e.g., passwords), supposedly to be kept secret and private. Effectiveness of a thermal covert channel, measured by its transmission rate and bit error rate (BER), is so much dependent on the thermal noise/interference imposed on the channel. In this paper, we present a few techniques to improve the capacity of thermal covert channel by overcoming the thermal interference. In particular, data in a thermal covert channel are encoded and represented following a new thermal signaling scheme where logic value, 0 or 1, modules the thermal signal's duty cycle. Next, we show in this study that proper selection of transmission frequency can significantly minimize thermal interference. In addition, we propose a robust end-to-end communication protocol for reliable communications. Our experiments have confirmed that, compared to an existing thermal covert channel attack [1] [2], a thermal covert channel enhanced with all the improvements proposed in this study is seeing significant BER reduction (by as much as 75%), and transmission rate boost (by more than threefold). Building such a strong thermal covert channel is the key step towards developing robust defense and countermeasures against information leaking over thermal covert channel.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30IP5-10, 771ACCURATE PREDICTION OF SMARTPHONES' SKIN TEMPERATURE BY CONSIDERING EXOTHERMIC COMPONENTS
Speaker:
Jihoon Park, Yonsei University, KR
Authors:
Jihoon Park and Hojung Cha, Dept. of Computer Science, Yonsei University, KR
Abstract
Smartphones' surface temperature, also called skin temperature, can rapidly heat up in certain cases, and this causes a variety of safety problems. Therefore, the thermal management of smartphones should consider the skin temperature, and its accurate prediction is important. However, due to the complicated relationship among the many exothermic components in the device, predicting skin temperature is extremely difficult. In this paper, we develop a thermal prediction model that accurately predicts the skin temperature of a mobile device. In an experiment with smartphones, we show that the proposed model achieves an accuracy of 98%, with a ±0.4 °C margin of error. To the best of our knowledge, our work is the first to reveal the complex relationship between the various components inside of a smartphone and its skin temperature.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:31IP5-11, 735TRUSTWORTHY PROOFS FOR SENSOR DATA USING FPGA BASED PHYSICALLY UNCLONABLE FUNCTIONS
Speaker:
Urbi Chatterjee, Indian Institute of Technology Kharagpur, IN
Authors:
Urbi Chatterjee1, Durga Prasad Sahoo2, Debdeep Mukhopadhyay3 and Rajat Subhra Chakraborty1
1Indian Institute of Technology Kharagpur, IN; 2Bosch India (RBEI/ETI), IN; 3Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, IN
Abstract
The Internet of Things (IoT) is envisaged to consist of billions of connected devices coupled with sensors which generate huge volumes of data enabling control-and-command in this paradigm. However, integrity of this data is of utmost concern, and is promisingly addressed leveraging the inherent unreliability of Physically Unclonable Functions (PUFs) w.r.t. ambient parameter variations, using the concept of Virtual Proofs (VPs). Advantage of these protocols is that they do not use explicit keys and aim at proving the authenticity of the sensor. Since the existing PUF-based protocols do not use the sensor data as a part of challenge (i.e. input) to PUFs, there is no guarantee of uniqueness of PUF's challenge-response behavior over multiple levels of ambient parameters. Few of these protocols needs to sequential search in the challenge-response database. To alleviate these issues, we develop a new class of authenticated sensing protocols where the sensor data is combined with the external challenge by utilizing the Strict Avalanche Criterion of the PUF. We validate the proposed protocol through actual experiments on FPGA using Double Arbiter PUFs (DAPUFs), which are implemented with superior uniformity, uniqueness, and reliability on Xilinx Artix-7 FPGAs. According to the FPGA- based validation, the proposed protocol with DAPUF can be effectively used to authenticate wide variations of temperature from −20◦C to 80◦C.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:32IP5-15, 878TOWARDS INTER-VENDOR COMPATIBILITY OF TRUE RANDOM NUMBER GENERATORS FOR FPGAS
Speaker:
Miloš Grujić, imec-COSIC, KU Leuven, BE
Authors:
Miloš Grujić, Bohan Yang, Vladimir Rozic and Ingrid Verbauwhede, imec-COSIC, KU Leuven, BE
Abstract
True random number generators (TRNGs) are fundamental constituents of secure embedded cryptographic systems. In this paper, we introduce a general methodology for porting TRNG across different FPGA vendor families. In order to demonstrate our methodology, we applied it to the delay-chain based TRNG (DC-TRNG) on Intel Cyclone IV and Cyclone V FPGAs. We examine vendor-agnostic generality of the underlying DC-TRNG principle and propose modifications to address differences in structure of FPGAs. Implementation of the DC-TRNG on Cyclone IV uses 149 LEs (<0.1% of available resources) and has a throughput of 5Mbps, while on Cyclone V it occupies 230 ALMs (<1.5% of resources) with an output rate of 12.5 Mbps. The quality of the random bits produced by the DC-TRNG on Intel Cyclone IV and V is further confirmed by using NIST statistical test suite.

Download Paper (PDF; Only available from the DATE venue WiFi)
15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

11.8 A Powerful Framework for Functional Safety

Date: Thursday 22 March 2018
Time: 14:00 - 15:30
Location / Room: Exhibition Theatre

Organiser:
Astrid Ernst, Mentor, DE

In the program of the technical conference, on Thursday is the Special Day for "Autonomous Driving". In the Exhibition Theatre this will be complemented by a workshop on how to design chips and electronic systems fulfilling the functional safety requirements of devices used in the car. This a challenge which is absolutely key for the automotive industry - no autonomous driving without functional safety, no matter how cool the driving algorithms might be.

TimeLabelPresentation Title
Authors
14:0011.8.1INTRODUCTION
Speaker:
Dirk Hansen, Mentor, DE
Abstract

As semiconductor value in a modern car expands, reliability and safety of electronics must improve dramatically. If simple electronics in Bluetooth and power seats cause the most problems in cars today, as indicated by various reliability and dependability surveys, how are we going to make the shift to much more complex electronics systems that are needed for self-driving cars?  It will become imperative to improve the quality of semiconductors going forward and we must get much better in verifying and validating these complex automotive systems knowing that lives will be at risk with autonomous driving. This will increase the test cycles, visibility and coverage to improve the safety and reliability.

In this session, we will present technologies and methodologies allowing us to handle an explosion of test scenarios to verify electronics and algorithms of driverless cars. We will explain a mature Development Process and show how Requirement driven development provides proof that design was built and tested as intended.

14:1011.8.2IC VERIFICATION: SHIFT-LEFT THE PATH TO ISO 26262 COMPLIANCE FOR DIGITAL IC DEVELOPMENT
Speaker:
Dirk Hansen, Mentor, DE
Abstract

If you are developing IP or semiconductors targeting ADAS or autonomous driving, you must develop in accordance with ISO 26262 to ensure safety of your products. The challenge is that this imposes additional development practices, flows and verification needs beyond your normal IC development. In this presentation we will provide an overview of Mentor's solution, both today and tomorrow, to address the functional safety needs for IC development (focused primarily on the digital side) and how we are helping customers "shift-left" their path to compliance. We will also discuss why Mentor + Siemens is the perfect match to address these automotive challenges.

14:5011.8.3DFT PART: TEST SOLUTIONS FOR THE AUTOMOTIVE MARKET
Speaker:
Ralph Sommer, Mentor, DE
Abstract

The amount of electronic content in passenger cars continues to grow rapidly, driven largely by the integration of various ADAS and autonomous driving capabilities. It is of course critical that these devices adhere to the highest possible quality and reliability requirements. Meeting the functional safety requirements mandated by the ISO 26262 standard requires the integration of advanced self-test and monitoring capabilities throughout the vehicle's electronics. The capabilities must not only have the ability to fully test all electronics during power-up, but more importantly must provide the ability to perform periodic tests throughout the functional operation of the vehicle. The Mentor Tessent product family offers a new generation of test solutions to address these evolving challenges.

15:30End of session
Coffee Break in Exhibition Area



Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018

  • Coffee Break 10:30 - 11:30
  • Lunch Break 13:00 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20
  • Coffee Break 16:00 - 17:00

Wednesday, March 21, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:30
  • Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20
  • Coffee Break 16:00 - 17:00

Thursday, March 22, 2018

  • Coffee Break 10:00 - 11:00
  • Lunch Break 12:30 - 14:00
  • Keynote Lecture in "Saal 2" 13:20 - 13:50
  • Coffee Break 15:30 - 16:00

UB11 Session 11

Date: Thursday 22 March 2018
Time: 14:30 - 16:30
Location / Room: Booth 1, Exhibition Area

LabelPresentation Title
Authors
UB11.1ARCHON: AN ARCHITECTURE-OPEN RESOURCE-DRIVEN CROSS-LAYER MODELLING FRAMEWORK
Authors:
Fei Xia1, Ashur Rafiev1, Mohammed Al-Hayanni2, Alexei Iliasov1, Rishad Shafik1, Alexander Romanovsky1 and Alex Yakovlev1
1Newcastle University, GB; 2Newcastle University, UK and University of Technology and HCED, IQ
Abstract
This demonstration showcases a modelling method for large complex computing systems focusing on many-core types and concentrating on the crosslayer aspects. The resource-driven models aim to help system designers reason about, analyse, and ultimately design such systems across all conventional computing and communication layers, from application, operating system, down to the finest hardware details. The framework and tool support the notion of selective abstraction and are suitable for studying such non-functional properties such as performance, reliability and energy consumption.

More information ...
UB11.3OISC MULTICORE STENCIL PROCESSOR: ONE INSTRUCTION-SET COMPUTER-BASED MULTICORE PROCESSOR FOR STENCIL COMPUTING
Authors:
Kaoru Saso, Jing Yuan Zhao and Yuko Hara-Azumi, School of Engineering, Tokyo Institute of Technology, JP
Abstract
Subtract and Branch on NEGative with 4 operands (SUBNEG4) is one of One Instruction-Set Computers that execute only one type of instruction. Thanks to its simplicity, SUBNEG4 has only 1/20x circuit area and 1/10x power consumption against MIPS processor. As SUBNEG4 is Turing-complete, it is suitable for parallel computing by multiple cores, while keeping its low-power feature. Our on-going project is seeking for effective use and deployment of SUBNEG4 cores on embedded systems. Our booth will demonstrate the significant speed-up by a SUBNEG4-based many-core processor against a conventional processor, for stencil computing. Our 64-core processor efficiently handles 2D von-Neumann neighborhood stencils, e.g., wave simulation by Verlet integration and 2D Jacobi iteration, to compute 64 points simultaneously. We show that small many-core processors can be realized even with such large number of cores while achieving good speed-up for heaving computation.

More information ...
UB11.6SPANNER: SELF-REPAIRING SPIKING NEURAL NETWORK CONTROLLER FOR AN AUTONOMOUS ROBOT
Authors:
Alan Millard1, Anju Johnson1, James Hilder1, David Halliday1, Andy Tyrrell1, Jon Timmis1, Junxiu Liu2, Shvan Karim2, Jim Harkin2 and Liam McDaid2
1University of York, GB; 2Ulster University, GB
Abstract
The human brain is remarkably resilient, and is able to self-repair following injury or a stroke. In contrast, electronic systems typically exhibit limited self-repair capabilities, and cannot recover from faults. We demonstrate a bio-inspired approach to self-repair that allows an autonomous robot to recover from faults in its artificial 'brain'. Astrocytes are support cells in the human brain that interact with neurons to regulate synaptic activity. We have modelled this interaction to create a spiking neural network that can self-repair when synapses between neurons are damaged, by strengthening redundant pathways. We demonstrate a robot platform controlled by a self-repairing spiking neural network that is implemented on an FPGA. We demonstrate that injecting faults into the synapses of the network initially causes the robot to behave erratically, but that the neural controller is able to automatically repair itself, thus allowing the robot to resume normal function.

More information ...
UB11.7USING FORMAL METHODS FOR AUTOMATIC PLATFORM-INDEPENDENT CODE GENERATION OF RUN-TIME MANAGEMENT
Authors:
Mohammadsadegh Dalvandi, Michael Butler and Asieh Salehi Fathabadi, University of Southampton, GB
Abstract
Run-Time Management (RTM) systems are used in embedded systems to dynamically adapt hardware performance to minimise energy consumption. In this demonstration, we present a framework for automatic generation of RTM implementations from platform-independent formal models. The methodology in designing the RTM systems uses a high-level mathematical language, Event-B, which can describe systems at different abstraction levels. A code generation tool is used to translate platform-independent Event-B RTM models to platform-specific implementations in C. Formal verification is used to ensure correctness of the Event-B models. The portability offered by our methodology is demonstrated by modelling a Reinforcement Learning (RL) based RTM and generating implementations for two different platforms that all achieve energy savings on the respective platforms. The generated RTM code has been integrated with the PRiME framework, a cross-layer framework for embedded power management.

More information ...
UB11.9ABSYNTH: A COMPREHENSIVE APPROACH TO FRONT TO BACK ANALOG BLOCK DESIGN AUTOMATION
Authors:
Abhaya Chandra Kammara S.1, Sidney Pontes-Filho2 and Andreas König2
1ISE, TU Kaiserslautern, DE; 2University of Kaiserslautern, DE
Abstract
ABSYNTH was first presented in CEBIT 2014 where complete, practical circuit sizing approaches have been shown using meta-heuristics on trusted simulators. This tool was then proven by its use in design of several cells in a research project. Here, we present the extension to our nested optimization approach that creates a symmetric and well matched layout in every step for every instance in the population of the swarm, that is extracted in our flow to provide feedback to the cost function impacting on the population update for more viable and robust circuits. The layout optimization presented in this DEMO works with Cadence Layout design tools. Our initial focus is, motivated by Industry 4.0, IoT, on cells for signal conditioning electronics with reconfigurability and Self-X features.[1] Abhaya C. Kammara, L.Palanichamy, and A. König, "Multi-Objective optimization and visualization for analog automation", Complex. Intell. Syst, Springer, DOI 10.1007/s40747-016-0027-3, 2016

More information ...
16:30End of session

IP5 Interactive Presentations

Date: Thursday 22 March 2018
Time: 15:30 - 16:00
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP5-1A PLACEMENT ALGORITHM FOR SUPERCONDUCTING LOGIC CIRCUITS BASED ON CELL GROUPING AND SUPER-CELL PLACEMENT
Speaker:
Massoud Pedram, University of Southern California, US
Authors:
Soheil Nazar Shahsavani, Alireza Shafaei Bejestan and Massoud Pedram, University of Southern California, US
Abstract
This paper presents a novel clustering based placement algorithm for single flux quantum (SFQ) family of superconductive electronic circuits. In these circuits nearly all cells receive a clock signal and a placement algorithm that ignores the clock routing cost will not produce high quality solutions. To address this issue, proposed approach simultaneously minimizes the total wirelength of the signal nets and area overhead of the clock routing. Furthermore, construction of a perfect H-tree in SFQ logic circuits is not viable solution due to the resulting very high routing overhead and the in-feasibility of building exact zero-skew clock routing trees. Instead a hybrid clock tree must be used whereby higher levels of the clock tree (i.e., those closer to the clock source) are based on H-tree construction whereas lower levels of the clock tree follow a linear (i.e., chain-like) structure. The proposed approach is able to reduce the overall half-perimeter wirelength by 15% and area by 8% compared with state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-2ABAX: 2D/3D LEGALISER SUPPORTING LOOK-AHEAD LEGALISATION AND BLOCKAGE STRATEGIES
Speaker:
Nikolaos Sketopoulos, University of Thessaly, GR
Authors:
Nikolaos Sketopoulos, Christos Sotiriou and Stavros Simoglou, Department of Electrical and Computer Engineering, University of Thessaly, GR
Abstract
Abax is a modern version of the classical Abacus, minimum displacement, greedy legaliser. Abax supports single-tier 2D or 3D legalisation for multiple, logic-on-logic 3D-IC tiers, efficient look-ahead legalisation of intermediate Global Placement (GP) iterations, Hard Macros, Blockages, row density constraints and multiple local cell displacement functions and cell orderings. For 3D-IC, Abax can produce multi-tier 3D-IC placements by performing Legalisation-based Partitioning. For efficient Look-ahead Legalisation, Abax supports two new local displacement cost functions, multi-cell mean and multi-cell total. We show that the classical single-cell displacement and multi-cell total can result in artifacts when legalising early intermediate GPs, and that multi-cell mean is the best candidate for Look-ahead Legalisation. Obstructions, i.e. Hard Macros and Blockages are handled by using two strategies. We present legalisation results for the ISPD2014 and ISPD2015 benchmarks, by using GP generated from Eh?Placer, and HPWL measurement by using RippleDP. For 3D, two-tier legalisation we illustrate a ~30% reduction in HPWL for a set of ISPD2014 benchmarks. For 2D legalisation on the ISPD2015 benchmarks, our average HPWL increase over GP is 3.03%, compared to 7.21% of the Eh?Placer legaliser, and 43.16% of the RippleDP legaliser.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-3LESAR: A DYNAMIC LINE-END SPACING AWARE DETAILED ROUTER
Speaker:
Yih-Lang Li, Computer Science Department, NCTU, TH
Authors:
Ying-Chi Wei, Radhamanjari Samanta and Yih-Lang Li, National Chiao-Tung University, TW
Abstract
As the VLSI technology scales down, 193nm optical lithography reaches the limit and one-dimensional (1D) unidirectional style lithography technique emerges as one of the most promising solutions for coming advanced technology nodes. The 1D process first generates unidirectional dense metal lines and then use line-end cutting to form the target patterns with cut masks. If cuts are too close, they will lead to conflicts. Line-end spacing rules become dynamic rather than static because of cut mask and also now need to be followed strictly. Line-end spacing check between two line-end pairs in the same mask has also been regarded as compulsory line-end spacing constraints that have not discussed in previous works yet. Complying with these rules during APR has become a new bottleneck. In this work, we propose to make the router aware of the dynamic line-end spacing rules, including end-end spacing and parity spacing constraints. Experimental results of our proposed router demonstrates that it can effectively expel all end-end spacing violations as well as 75% of parity spacing violations in a reasonable runtime increase of 14%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-4UNDERSTANDING TURN MODELS FOR ADAPTIVE ROUTING: THE MODULAR APPROACH
Speaker:
Edoardo Fusella, Department of Electrical Engineering and Information Technologies, University of Naples Federico II, IT
Authors:
Edoardo Fusella and Alessandro Cilardo, University of Naples Federico II, IT
Abstract
Routing algorithms were extensively studied first in multi-computer systems, then in multi- and many-core architectures. Among the commonly used routing techniques, the turn model seems the most promising solution when targeting adaptiveness. Based on the turn model, several alternative approaches with different turn prohibition schemes were proposed. This paper gives a new theoretical background for designing deadlock-free partially adaptive logic-based distributed routing algorithms that are based on the turn model. Two properties are presented, including a necessary and sufficient condition to prove that a routing algorithm is deadlock-free as long as turn restrictions follow a modular distribution. Existing approaches can be considered a subset of the solution space identified by this work. Finally, we propose a novel routing algorithm exhibiting encouraging performance improvements over state-ofthe-art approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-5QUATER-IMAGINARY BASE FOR COMPLEX NUMBER ARITHMETIC CIRCUITS
Speaker:
Souradip Sarkar, Nokia Bell Labs, BE
Authors:
Souradip Sarkar and Manil Dev Gomony, Nokia Bell Labs, BE
Abstract
Arithmetic operations involving complex numbers are widely used in the signal processing functions in the physical layer of modern wireless and wireline communication systems, electronic instrumentation and control systems. With the ever increasing throughput requirements of such systems, the power consumption of the hardware realization is increasing beyond the allowed budget. Arithmetic circuits based on binary numeral system that have been optimized rigorously over the past few decades are currently being used for the computation involving complex numbers. In this paper, we present the potential of arithmetic circuits for complex number computations based on the Quater-imaginary (QI) base numeral system to reduce power consumption. We show that for a simple multiplier implementation in the QI base, the savings in power and area consumption could be up to 40% when synthesized in 28nm TSMC standard cell technology node.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-6FAULT-TOLERANT VALVE-BASED MICROFLUIDIC ROUTING FABRIC FOR DROPLET BARCODING IN SINGLE-CELL ANALYSIS
Speaker:
Yasamin Moradi, Technical University of Munich (TUM), DE
Authors:
Yasamin Moradi1, Mohamed Ibrahim2, Krishnendu Chakrabarty2 and Ulf Schlichtmann1
1Technical University of Munich, DE; 2Duke University, US
Abstract
High-throughput single-cell genomics is used to gain insights into diseases such as cancer. Motivated by this important application, microfluidics has emerged as a key technology for developing comprehensive biochemical procedures for studying DNA, RNA, proteins, and many other cellular components. Recently, a hybrid microfluidic platform has been proposed to efficiently automate the analysis of a heterogeneous sequence of cells. In this design, a valve-based routing fabric based on transposers is used to label/barcode the target cells. However, the design proposed in prior work overlooked defects that are likely to occur during chip fabrication and system integration. We address the above limitation by investigating the fault tolerance of the valve-based routing fabric. We develop a theory of failure assessment and introduce a design technique for achieving fault tolerance. Simulation results show that the proposed method leads to a slight increase in the fabric size and decrease in cell-analysis throughput, but this is only a small price to pay for the added assurance of fault tolerance in the new design.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-7OPTIMIZING POWER-ACCURACY TRADE-OFF IN APPROXIMATE ADDERS
Speaker:
Celia Dharmaraj, Indian Institute of Technology Madras, IN
Authors:
Celia Dharmaraj, Vinita Vasudevan and Nitin Chandrachoodan, Indian Institute of Technology Madras, IN
Abstract
Approximate circuit design has gained significance in recent years targeting applications like media processing where full accuracy is not required. In this paper, we propose an approximate adder in which the approximate part of the sum is obtained by finding a single optimal level that minimizes the mean error distance. Therefore hardware needed for the approximate part computation can be removed, which effectively results in very low power consumption. We compare the proposed adder with various approximate adders in the literature in terms of power and accuracy metrics. The power savings of our adder is shown to be 17% to 55% more than power savings of the existing approximate adders over a significant range of accuracy values. Further, in an image addition application, this adder is shown to provide the best trade-off between PSNR and power.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-8IMPROVING THE ERROR BEHAVIOR OF DRAM BY EXPLOITING ITS Z-CHANNEL PROPERTY
Speaker:
Kira Kraft, University of Kaiserslautern, DE
Authors:
Kira Kraft1, Matthias Jung2, Chirag Sudarshan1, Deepak M. Mathew1, Christian Weis1 and Norbert Wehn1
1University of Kaiserslautern, DE; 2Fraunhofer IESE, DE
Abstract
In this paper, we present a new communication theoretic channel model for Dynamic Random Access Memory (DRAM) retention errors, that relies on the fully asymmetric retention error behavior of DRAM cells. This new model shows that the traditional approach is over pessimistic and we confirm this with real measurements of DDR3 and DDR4 DRAM devices. Together with an exploitation of the vendor specific true- and anti-cell structure, a low complexity bit-flipping approach is presented, that can largely increase DRAM's reliability with minimum overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-9ARCHITECTURE AND OPTIMIZATION OF ASSOCIATIVE MEMORIES USED FOR THE IMPLEMENTATION OF LOGIC FUNCTIONS BASED ON NANOELECTRONIC 1S1R CELLS
Speaker:
Arne Heittmann, RWTH-Aachen University, DE
Authors:
Arne Heittman and Tobias G. Noll, RWTH Aachen University, DE
Abstract
A neuromorphic architecture based on Binary Associative memories and nanoelectronic resistive switches is proposed for the realization of arbitrary logic/arithmetic functions. Subsets of non-trivial code sets based on error detecting 2-out-of-n-codes are thoroughly used to encode operands, results, and intermediate states in order to enhance the circuit reliability by mitigating the impact of device variability. 2-ary functions can be implemented by cascading a mixer memory, a correlator memory, and a response memory. By introduction of a new cost function based on class-specific word-line-coverage, stochastic optimization is applied with the aim to minimize the overall number of active amplifiers. For various exemplary functions optimized architectures are compared against solutions obtained using a standard-cost function. It is shown that the consideration of word-line-coverage results in a significant circuit compaction.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-10ACCURATE PREDICTION OF SMARTPHONES' SKIN TEMPERATURE BY CONSIDERING EXOTHERMIC COMPONENTS
Speaker:
Jihoon Park, Yonsei University, KR
Authors:
Jihoon Park and Hojung Cha, Dept. of Computer Science, Yonsei University, KR
Abstract
Smartphones' surface temperature, also called skin temperature, can rapidly heat up in certain cases, and this causes a variety of safety problems. Therefore, the thermal management of smartphones should consider the skin temperature, and its accurate prediction is important. However, due to the complicated relationship among the many exothermic components in the device, predicting skin temperature is extremely difficult. In this paper, we develop a thermal prediction model that accurately predicts the skin temperature of a mobile device. In an experiment with smartphones, we show that the proposed model achieves an accuracy of 98%, with a ±0.4 °C margin of error. To the best of our knowledge, our work is the first to reveal the complex relationship between the various components inside of a smartphone and its skin temperature.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-11TRUSTWORTHY PROOFS FOR SENSOR DATA USING FPGA BASED PHYSICALLY UNCLONABLE FUNCTIONS
Speaker:
Urbi Chatterjee, Indian Institute of Technology Kharagpur, IN
Authors:
Urbi Chatterjee1, Durga Prasad Sahoo2, Debdeep Mukhopadhyay3 and Rajat Subhra Chakraborty1
1Indian Institute of Technology Kharagpur, IN; 2Bosch India (RBEI/ETI), IN; 3Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, IN
Abstract
The Internet of Things (IoT) is envisaged to consist of billions of connected devices coupled with sensors which generate huge volumes of data enabling control-and-command in this paradigm. However, integrity of this data is of utmost concern, and is promisingly addressed leveraging the inherent unreliability of Physically Unclonable Functions (PUFs) w.r.t. ambient parameter variations, using the concept of Virtual Proofs (VPs). Advantage of these protocols is that they do not use explicit keys and aim at proving the authenticity of the sensor. Since the existing PUF-based protocols do not use the sensor data as a part of challenge (i.e. input) to PUFs, there is no guarantee of uniqueness of PUF's challenge-response behavior over multiple levels of ambient parameters. Few of these protocols needs to sequential search in the challenge-response database. To alleviate these issues, we develop a new class of authenticated sensing protocols where the sensor data is combined with the external challenge by utilizing the Strict Avalanche Criterion of the PUF. We validate the proposed protocol through actual experiments on FPGA using Double Arbiter PUFs (DAPUFs), which are implemented with superior uniformity, uniqueness, and reliability on Xilinx Artix-7 FPGAs. According to the FPGA- based validation, the proposed protocol with DAPUF can be effectively used to authenticate wide variations of temperature from −20◦C to 80◦C.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-12TOWARDS FULLY AUTOMATED TLM-TO-RTL PROPERTY REFINEMENT
Speaker:
Vladimir Herdt, University of Bremen, DE
Authors:
Vladimir Herdt1, Hoang M. Le1, Daniel Grosse2 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen/DFKI GmbH, DE
Abstract
An ESL design flow starts with a TLM description, which is thoroughly verified and then refined to a RTL description in subsequent steps. The properties used for TLM verification are refined alongside the TLM description to serve as starting point for RTL property checking. However, a manual transformation of properties from TLM to RTL is error prone and time consuming. Therefore, in this paper we propose a fully automated TLM-to-RTL property refinement based on a symbolic analysis of transactors. We demonstrate the applicability of our property refinement approach using a case study.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-13IN-MEMORY COMPUTING USING PATHS-BASED LOGIC AND HETEROGENEOUS COMPONENTS
Speaker:
Alvaro Velasquez, University of Central Florida, US
Authors:
Alvaro Velasquez and Sumit Kumar Jha, University of Central Florida, US
Abstract
The memory-processor bottleneck and scaling difficulties of the CMOS transistor have given rise to a plethora of research initiatives to overcome these challenges. Popular among these is in-memory crossbar computing. In this paper, we propose a framework for synthesizing logic-in-memory circuits based on the behavior of paths of electric current throughout the memory. Limitations of using only bidirectional components with this approach are also established. We demonstrate the effectiveness of our approach by generating n-bit addition circuits that can compute using a constant number of read and write cycles.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-14NON-INTRUSIVE TESTING TECHNIQUE FOR DETECTION OF TROJANS IN ASYNCHRONOUS CIRCUITS
Speaker:
Rodrigo Possamai Bastos, TIMA Laboratory, CNRS/Grenoble INP/UJF, FR
Authors:
Leonel Acunha Guimarães, Thiago Ferreira de Paiva Leite, Rodrigo Possamai Bastos and Laurent Fesquet, TIMA - Grenoble Institute of Technology, FR
Abstract
Asynchronous circuits, as any IC, are vulnerable to hardware Trojans (HTs), which might be maliciously implanted in IC designs during outsourced fabrication phases. In this paper, a new testing technique to detect HTs by exploiting the regular side-channel properties of quasi-delay insensitive (QDI) asynchronous circuits is proposed. The technique does not need neither additional circuitry nor significant adjustments in the post-fabrication testing phase. Simulation results show that the proposed technique is able to detect HTs with dimensions smaller than 1% of the original circuit.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-15TOWARDS INTER-VENDOR COMPATIBILITY OF TRUE RANDOM NUMBER GENERATORS FOR FPGAS
Speaker:
Miloš Grujić, imec-COSIC, KU Leuven, BE
Authors:
Miloš Grujić, Bohan Yang, Vladimir Rozic and Ingrid Verbauwhede, imec-COSIC, KU Leuven, BE
Abstract
True random number generators (TRNGs) are fundamental constituents of secure embedded cryptographic systems. In this paper, we introduce a general methodology for porting TRNG across different FPGA vendor families. In order to demonstrate our methodology, we applied it to the delay-chain based TRNG (DC-TRNG) on Intel Cyclone IV and Cyclone V FPGAs. We examine vendor-agnostic generality of the underlying DC-TRNG principle and propose modifications to address differences in structure of FPGAs. Implementation of the DC-TRNG on Cyclone IV uses 149 LEs (<0.1% of available resources) and has a throughput of 5Mbps, while on Cyclone V it occupies 230 ALMs (<1.5% of resources) with an output rate of 12.5 Mbps. The quality of the random bits produced by the DC-TRNG on Intel Cyclone IV and V is further confirmed by using NIST statistical test suite.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-16EFFICIENT WEAR LEVELING FOR INODES OF FILE SYSTEMS ON PERSISTENT MEMORIES
Speaker:
Xianzhang Chen, Chongqing University, CN
Authors:
Xianzhang Chen1, Edwin Sha2, Yuansong Zeng1, Chaoshu Yang1, Weiwen Jiang1 and Qingfeng Zhuge3
1Chongqing University, CN; 2Chongqing University, US; 3East China Normal University, CN
Abstract
Existing persistent memory file systems achieve high-performance file accesses by exploiting advanced characteristics of persistent memories (PMs), such as PCM. However, they ignore the limited endurance of PMs. Particularly, the frequently updated inodes are stored on fixed locations throughout their lifetime, which can easily damage PM with common file operations. To address such issues, we propose a new mechanism, Virtualized Inode (VInode), for the wear leveling of inodes of persistent memory file systems. In VInode, we develop an algorithm called Pages as Communicating Vessels (PCV) to efficiently find and migrate the heavily written inodes. We implement VInode in SIMFS, a typical persistent memory file system. Experiments are conducted with well-known benchmarks. Compared with original SIMFS, experimental results show that VInode can reduce the maximum value and standard deviation of the write counts of pages to 1800x and 6200x lower, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-17EXPLORING NON-VOLATILE MAIN MEMORY ARCHITECTURES FOR HANDHELD DEVICES
Speaker:
Virendra Singh, Indian Institute of Technology Bombay, IN
Authors:
Sneha Ved and Manu Awasthi, Indian Institute of Technology Gandhinagar, IN
Abstract
As additional functionality is being added to contemporary handheld devices, the SoCs inside these devices are becoming increasingly complex. Similarly, the applications executing on these handhelds are beginning to exhibit an ever increasing memory footprint. To support these trends, main memory capacity of these SoCs has been increasing over time. Due to these developments, memory system's contribution to the overall system power has increased dramatically. Non-volatile memories have been used in server architectures to increase capacity as well as keep memory system's power consumption in check. However, in the handheld domain, where user experience and battery life are of paramount importance, the applicability of such technologies has not been widely studied. In this paper, we propose and evaluate a number of hybrid memory architectures using mobile DRAM and PCM. We show that intelligent memory architectures, cognizant of workload's memory access patterns can provide significant energy savings without compromising on user experience. Using proposed approach, we can devise architectures that exhibit significant energy savings with only a 2.8% performance loss.

Download Paper (PDF; Only available from the DATE venue WiFi)

12.1 Special Day Session on Designing Autonomous Systems: Self-awareness for Autonomous Systems

Date: Thursday 22 March 2018
Time: 16:00 - 17:30
Location / Room: Saal 2

Chair:
Nikil Dutt, University of California at Irving, US

With increasing interest in the deployment of autonomous vehicles and robots, a critical open challenge is to empower these systems with self-awareness for achieving truly autonomous operation. Self-awareness principles hold the promise to manage effectively continuous change and evolution, application interference, environment dynamics and system uncertainty, thereby adhering to safety, availability, and security guarantees as needed. The goal of this special session is to make the audience appreciate the benefits of self-awareness for systems autonomy, highlight challenges for self-awareness using two application contexts (unmanned aerial systems and autonomous vehicles), and outline EDA and HW/SW challenges to support self-awareness for systems autonomy.

TimeLabelPresentation Title
Authors
16:0012.1.1SELF-AWARENESS AND BEHAVIOR ASSURANCE FOR UNMANNED AERIAL VEHICLE (UAV) SYSTEMS
Speaker:
Prakash Sarathy, Northrop Grumman, US
Authors:
Peter Lee1 and Prakash Sarathy2
1Corenova Technologies Inc., US; 2Northrup Grumman Corporation, US
Abstract
Self-awareness principles hold the promise to manage effectively continuous change and evolution, application interference, environment dynamics and system uncertainty, thereby adhering to safety, availability, and security guarantees in complex systems. This is certainly true for unmanned vehicles that are capable of autonomous operations. In this paper, we provide a brief overview of factors driving a significant increase in autonomous vehicles for both civilian and military markets, and provide a summary of current capabilities and limitations of the state-of-practice. Furthermore, we examine the nature of current and future autonomous operations in the context of desirable characteristics such as reliability, safety, resiliency and robustness properties and identify extant challenges to achieving them. We intend to highlight some approaches and technologies that offer promise in addressing the aforementioned challenges from a technical perspective. To further ground this discussion two distinct real-life use cases will be discussed, namely a) Controlled airspace launch and recovery and b) coordinated operations in a controlled airspace. These usecases will serve to underscore the significant obstacles that lie ahead, in order to realize a full spectrum of autonomous capabilities in the near future.
16:3012.1.3PROMISES AND CHALLENGES IN DEPLOYING SELF-AWARENESS IN AUTONOMOUS VEHICLES
Author:
Christoph Stiller, Karlsruhe Institute of Technology (KIT), DE
Abstract
This talk will a) review levels of autonomy for self-driving cars; b) outline approaches and challenges for achieving full autonomy for self-driving cars; c) highlight differences and additional challenges over autonomous aircraft for self-driving cars; and d) elaborate the potential of cooperativity for automated driving. The talk will draw on lessons learned and challenges to be overcome from the perspective of the German Research Priority Program "Cooperative Interactive Automobiles".
17:0012.1.3DESIGN METHODOLOGIES FOR ENABLING SELF-AWARENESS IN AUTONOMOUS SYSTEMS
Speaker:
Andreas Herkersdorf, Technical University Munich (TUM), DE
Authors:
Armin Sadighi1, Bryan Donyanavard2, Thawra Kadeed3, Kasra Moazzemi2, Tiago Muck2, Ahmed Nassar2, Amir Rahmani2, Thomas Wild1, Nikil Dutt4, Rolf Ernst3, Andreas Herkersdorf5 and Fadi Kurdahi4
1Technical University of Munich (TUM), DE; 2University of California at Irvine (UCI), US; 3TU Braunschweig, DE; 4University of California, Irvine, US; 5TU München, DE
Abstract
This paper deals with challenges and possible solutions for incorporating self-awareness principles in EDA design flows for autonomous systems. We present a holistic approach that enables self-awareness across the software/hardware stack, from systems-on-chip to systems-of-systems (autonomous car) contexts. We use the Information Processing Factory (IPF) metaphor as an exemplar to show how self-awareness can be achieved across multiple abstraction levels, and discuss new research challenges. The IPF approach represents a paradigm shift in platform design by envisioning the move towards a consequent platform-centric design in which the combination of self-organizing learning and formal reactive methods guarantee the applicability of such cyber-physical systems in safety-critical and high-availability applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.2 Searching for corner cases, bugs and security vulnerabilities

Date: Thursday 22 March 2018
Time: 16:00 - 17:30
Location / Room: Konf. 6

Chair:
Daniel Grosse, University of Bremen / DFKI, DE

Co-Chair:
Jaan Raik, Tallinn University of Technology, EE

This session presents innovative solutions to identify challenging situations in RTL designs. The first work presents a fully automated and scalable approach for concolic generation of direct test on RTL models. The second work uses historical debugging data to predict future bug locations. The last presentation automatically mines formal assertions to highlight security vulnerabilities on IP firmware.

TimeLabelPresentation Title
Authors
16:0012.2.1DIRECTED TEST GENERATION USING CONCOLIC TESTING ON RTL MODELS
Speaker:
Jonathan Cruz, University of Florida, US
Authors:
Alif Ahmed, Farimah Farahmandi and Prabhat Mishra, University of Florida, US
Abstract
Functional validation is one of the most time consuming steps in System-on-Chip (SoC) design methodology. In today's industrial practice, simulating designs using billions of random or constrained-random tests can lead to high functional coverage. However, it is hard to cover the remaining small fraction of corner cases and rare functional scenarios. While formal methods are promising in such cases, it is infeasible to apply them on large designs. In this paper, we propose a fully automated and scalable approach for generating directed tests using concolic testing of RTL models. While application of concolic testing on hardware designs has shown some promising results, existing approaches are tuned for improving overall coverage, rather than covering a specific target. We developed a Control Flow Graph (CFG) assisted directed test generation method that can efficiently generate a test to activate a given target. Our experimental results demonstrate that our approach is both efficient and scalable compared to the state-of-the-art test generation methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.2.2SUSPECT SET PREDICTION IN RTL BUG HUNTING
Speaker:
Neil Veira, University of Toronto, CA
Authors:
Neil Veira, Zissis Poulos and Andreas Veneris, University of Toronto, CA
Abstract
We propose a framework for predicting erroneous design components from partially observed solution sets that are found through automated debugging tools. The proposed method involves learning design component dependencies by using historical debugging data, and representing these dependencies by means of a probabilistic graph. Using this representation, one can run a debugging tool non-exhaustively, obtain a partial set of potentially erroneous components and then predict the remaining by applying a cost-effective belief propagation pass. The method can reduce debugging runtime when it comes to multiple debugging sessions by 15x on the average while achieving a 91% average prediction accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.2.3SYMBOLIC ASSERTION MINING FOR SECURITY VALIDATION
Speaker:
Alessandro Danese, University of Verona, IT
Authors:
Alessandro Danese1, Valeria Bertacco2 and Graziano Pravadelli1
1University of Verona, IT; 2University of Michigan, US
Abstract
This paper presents DOVE, a validation framework to identify points of vulnerability inside IP firmwares. The framework relies on the symbolic simulation of the firmware to search for corner cases in its computational paths that may hide vulnerabilities. Then, DOVE automatically mine a compact set of formal assertions representing these unlikely paths to guide the analysis of the verification engineers. Experimental results on two case studies show the effectiveness of the generated assertions in pinpointing actual vulnerabilities and its efficiency in terms of execution time.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.3 Verification and Formal Synthesis

Date: Thursday 22 March 2018
Time: 16:00 - 17:30
Location / Room: Konf. 1

Chair:
Christoph Scholl, University of Freiburg, DE

Co-Chair:
Gianpiero Cabodi, Politecnico di Torino, IT

This session explores formal design methodology for asynchroneous pipelines, reasons about decomposing assume/guarantee contracts, links high-level models and RTL designs, and also improves arithmetic circuit verification.

TimeLabelPresentation Title
Authors
16:0012.3.1IMPROVING AND EXTENDING THE ALGEBRAIC APPROACH FOR VERIFYING GATE-LEVEL MULTIPLIERS
Speaker:
Armin Biere, Johannes Kepler Universität Linz, AT
Authors:
Daniela Ritirc, Armin Biere and Manuel Kauers, Johannes Kepler University Linz, AT
Abstract
The currently most effective approach for verifying gate-level multipliers uses Computer Algebra. It reduces a word-level multiplier specification by a Gröbner basis derived from a gate-level implementation. This reduction produces zero if and only if the circuit is a multiplier. We improve this approach by extracting full- and half-adder constraints to reduce the Gröbner basis, which speeds up computation substantially. Refactoring the specification in terms of partial products instead of inputs yields further improvements. As a third contribution we extend these algebraic techniques to verify the equivalence of bit-level multipliers without using a word-level specification.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.3.2RECONFIGURABLE ASYNCHRONOUS PIPELINES: FROM FORMAL MODELS TO SILICON
Speaker:
Danil Sokolov, Newcastle University, GB
Authors:
Danil Sokolov, Alessandro de Gennaro and Andrey Mokhov, Newcastle University, GB
Abstract
Dataflow pipelines are widely used in the design of high-throughput computation systems. Real-life applications often require dynamically reconfigurable pipelines to differently process data items or adjust to the current operating mode. Reconfigurable synchronous pipelines are known since 1980s and are well supported by formal models and tools. Reconfigurable asynchronous pipelines on the other hand, have neither a formal behavioural model, nor mature EDA support, making them unattractive to industry. This paper presents a model and an open-source tool for the design and verification of reconfigurable asynchronous pipelines, and validates this approach in silicon.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.3.3AUTOMATIC GENERATION OF HARDWARE CHECKERS FROM FORMAL MICRO-ARCHITECTURAL SPECIFICATIONS
Speaker:
Alexander Fedotov, Eindhoven University of Technology, NL
Authors:
Alexander Fedotov and Julien Schmaltz, Eindhoven University of Technology, NL
Abstract
To manage design complexity, high-level models are used to evaluate the functionality and performance of design solutions. There is a significant gap between these high-level models and the Register Transfer Level (RTL) implementations actually produced by designers. We address the challenge of bridging this gap, namely, relating abstract specifications to RTL implementations. An important feature of our proposed approach is to support emph{non-deterministic} specifications. From such a non-deterministic model, we automatically compute a representation of its observable behaviour. We then turn this representation into a System Verilog checker. The checker is connected to the input and output interfaces of the RTL implementation. The resulting combination is given to a commercial EDA tool to prove inclusion of the traces of the implementation into the traces of the specification. Our method is implemented for the formal micro-architectural description language (MaDL) — based on the xMAS formalism originally proposed by Intel — and exemplified on several examples.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.3.4SPECIFICATION DECOMPOSITION FOR SYNTHESIS FROM LIBRARIES OF LTL ASSUME/GUARANTEE CONTRACTS
Speaker:
Antonio Iannopollo, UC Berkeley, IT
Authors:
Antonio Iannopollo, Stavros Tripakis and Alberto Sangiovanni Vincentelli, University of California, Berkeley, US
Abstract
Contract-Based Design is a methodology that allows for compositional design of complex systems. Given a contract representing a specification, it is possible to formally satisfy it by composing a number of simpler contracts. When these simpler contracts are chosen from a library of existing solutions, we talk about synthesis from contract libraries. There are techniques to automate the synthesis process, but they are computationally intensive, especially for complex specifications. In this paper, we describe an efficient technique to partition a specification, i.e., an LTL-based Assume/Guarantee contract, in a number of simpler sub-specifications which can be satisfied independently. Once all these smaller problems are solved, it is possible to safely merge their solutions to satisfy the original specification. We show the effectiveness of our technique in an industrial case study.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.4 Hardware-assisted Security

Date: Thursday 22 March 2018
Time: 16:00 - 17:30
Location / Room: Konf. 2

Chair:
Ilia Polian, University of Stuttgart, DE

Co-Chair:
Nele Mentens, KU Leuven, BE

Security of today's system cannot be achieved by software techniques alone. This session presents hardware anchors which provide security at circuit and system level and allow its efficient verification.

TimeLabelPresentation Title
Authors
16:0012.4.1HARDWARE-ASSISTED ROOTKIT DETECTION VIA ON-LINE STATISTICAL FINGERPRINTING OF PROCESS EXECUTION
Speaker:
Yiorgos Makris, University of Texas at Dallas, US
Authors:
Liwei Zhou and Yiorgos Makris, The University of Texas at Dallas, US
Abstract
Kernel rootkits generally attempt to maliciously tamper kernel objects and surreptitiously distort program execution flow. Herein, we introduce a hardware-assisted hierarchical on-line system which detects such kernel rootkits by identifying deviation of dynamic intra-process execution profiles based on architecture-level semantics captured directly in hardware. The underlying key insight is that, in order to take effect, malicious manipulation of kernel objects must distort the execution flow of benign processes, thereby leaving abnormal traces in architecture-level semantics. While traditional detection methods rely on software modules to collect such traces, their implementations are susceptible to being compromised through software attacks. In contrast, our detection system maintains immunity to software attacks by resorting to hardware for trace collection. The proposed method is demonstrated on a Linux-based operating system running on a 32-bit x86 architecture, implemented in Simics. Experimental results, using real-world kernel rootkits, corroborate the effectiveness of this method, while a predictive 45nm PDK is used to evaluate hardware overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.4.2SECURING CONDITIONAL BRANCHES IN THE PRESENCE OF FAULT ATTACKS
Speaker:
Robert Schilling, Graz University of Technology, AT
Authors:
Robert Schilling1, Mario Werner2 and Stefan Mangard2
1Graz University of Technology / Know Center GmbH, AT; 2Graz University of Technology, AT
Abstract
In typical software, many comparisons and subsequent branch operations are highly critical in terms of security. Examples include password checks, signature checks, secure boot, and user privilege checks. For embedded devices, these security-critical branches are a preferred target of fault attacks as a single bit flip or skipping a single instruction can lead to complete access to a system. In the past, numerous redundancy schemes have been proposed in order to provide control-flow-integrity (CFI) and to enable error detection on processed data. However, current countermeasures for general purpose software do not provide protection mechanisms for conditional branches. Hence, critical branches are in practice often simply duplicated. We present a generic approach to protect conditional branches, which links an encoding-based comparison result with the redundancy of CFI protection mechanisms. The presented approach can be used for all types of data encodings and CFI mechanisms and maintains their error-detection capabilities throughout all steps of a conditional branch. We demonstrate our approach by realizing an encoded comparison based on AN-codes, which is a frequently used encoding scheme to detect errors on data during arithmetic operations. We extended the LLVM compiler so that standard code and conditional branches can be protected automatically and analyze its security. Our design shows that the overhead in terms of size and runtime is lower than state-of-the-art duplication schemes.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.4.3TOWARDS PROVABLY-SECURE PERFORMANCE LOCKING
Speaker:
Abhrajit Sengupta, Texas A&M University, US
Authors:
Monir Zaman1, Abhrajit Sengupta2, Danqing Liu3, Ozgur Sinanoglu4, Yiorgos Makris1 and Jeyavijayan Rajendran3
1The University of Texas at Dallas, US; 2New York University, US; 3Texas A&M University, US; 4New York University Abu Dhabi, AE
Abstract
Locking the functionality of an integrated circuit (IC) thwarts attacks such as intellectual property (IP) piracy, hardware Trojans, overbuilding, and counterfeiting. Although functional locking has been extensively investigated, locking the performance of an IC has been little explored. In this paper, we develop provably-secure performance locking, where only on applying the correct key the IC shows superior performance; for an incorrect key, the performance of the IC degrades significantly. This leads to a new business model, where the companies can design a single IC capable of different performances for different users. We develop mathematical definitions of security and theoretically, and experimentally prove the security against the state-of-the-art-attacks. We implemented performance locking on a FabScalar microprocessor, achieving a degradation in instructions per clock cycle (IPC) of up to 77% on applying an incorrect key, with an overhead of 0.6%, 0.2%, and 0% for area, power, and delay, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.4.4AN AUTOMATED CONFIGURABLE TROJAN INSERTION FRAMEWORK FOR DYNAMIC TRUST BENCHMARKS
Speaker:
Jonathan Cruz, University of Florida, US
Authors:
Jonathan Cruz, Yuanwen Huang, Prabhat Mishra and Swarup Bhunia, University of Florida, US
Abstract
Abstract—Malicious hardware modification, also known as hardware Trojan attack, has emerged as a serious security concern for electronic systems. Such attacks compromise the basic premise of hardware root of trust. Over the past decade, significant research efforts have been directed to carefully analyze the trust issues arising from hardware Trojans and to protect against them. This vast body of work often needs to rely on well-defined set of trust benchmarks that can reliably evaluate the effectiveness of the protection methods. In recent past, efforts have been made to develop a benchmark suite to analyze the effectiveness of pre-silicon Trojan detection and prevention methodologies. However, there are only a limited number of Trojan inserted benchmarks available. Moreover, there is an inherent bias as the researcher is aware of Trojan properties such as location and trigger condition since the current benchmarks are static. In order to create an unbiased and robust benchmark suite to evaluate the effectiveness of any protection technique, we have developed a comprehensive framework of automatic hardware Trojan insertion. Given a netlist, the framework will automatically generate a design with single or multiple Trojan instances based user-specified Trojan properties. It allows a wide variety of configurations, such as the type of Trojan, Trojan activation probability, number of triggers, and choice of payload. The tool ensures that the inserted Trojan is a valid one and allow for provisions to optimize the Trojan footprint (area and switching). Experiments demonstrate that a state-ofthe- art Trojan detection technique provides poor efficacy when using benchmarks generated by our tool. This tool is available for download from http://www.trust-hub.org/.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.5 Lifetime Improvement for Persistent Memory

Date: Thursday 22 March 2018
Time: 16:00 - 17:30
Location / Room: Konf. 3

Chair:
Arne Heittman, RWTH, DE

Co-Chair:
Chengmo Yang, University of Delaware, US

This session concentrates on methods for prolonging the lifetime of persistent main memory. Based on reference frequencies, papers in this session promote novel approaches to mitigate the adverse impact of writes from prospectives of cache replacement policy, frequent pattern compression, SLC/MLC hybrid organization, inode virtualization, as well as Android application specific address mapping.

TimeLabelPresentation Title
Authors
16:0012.5.1EXTENDING THE LIFETIME OF NVMS WITH COMPRESSION
Speaker:
Jie Xu, Huazhong University of Science and Technology, CN
Authors:
Jie Xu, Dan Feng, Yu Hua, Wei Tong, Jingning Liu and Chunyan Li, Wuhan National Lab for Optoelectronics, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan, China, CN
Abstract
Emerging Non-Volatile Memories (NVMs) such as Phase Change Memory (PCM) and Resistive RAM (RRAM) are promising to replace traditional DRAM technology. However, they suffer from limited write endurance and high write energy consumption. Encoding methods such as Flip-N-Write, FlipMin and CAFO can reduce the bit flips of NVMs by exploiting additional capacity to store the tag bits of encoding methods. The effects of encoding methods are limited by the capacity overhead of the tag bits. In this paper, we propose COE to COmpress cacheline for Extending the lifetime of NVMs. COE exploits the space saved by compression to store the tag bits of data encoding methods. Through combining data compression techniques with data encoding methods, COE can reduce the bit flips with negligible capacity overhead. We further observe that the saved space size of each compressed cacheline varies, and different encoding methods have different tradeoffs between capacity overhead and effects. To fully exploit the space saved by compression for improving lifetime, we select the proper encoding methods according to the saved space size. Experimental results show that our scheme can reduce the bit flips by 14.2%, decrease the energy consumption by 11.8% and improve the lifetime by 27.5% with only 0.2% capacity overhead.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.5.2HETEROGENEOUS PCM ARRAY ARCHITECTURE FOR RELIABILITY, PERFORMANCE AND LIFETIME ENHANCEMENT
Speaker:
Taehyun Kwon, Sungkyunkwan University, KR
Authors:
Taehyun Kwon1, Muhammad Imran1, Jung Min You2 and Joon-Sung Yang1
1Sungkyunkwan University, KR; 2SungKyunKwan University, KR
Abstract
Conventional DRAM and flash memory are reaching their scaling limits thus motivating research in various emerging memory technologies as a potential replacement. Among these, phase change memory (PCM) has received considerable attention owing to its high scalability and multi-level cell (MLC) operation for high storage density. However, due to the resistance drift over time, the soft error rate in MLC PCM is high. Additionally, the iterative programming in MLC negatively impacts performance and cell endurance. The conventional methods to overcome the drift problem incur large overheads, impact memory lifetime and are inadequate in terms of acceptable soft error rate (SER). In this paper, we propose a new PCM memory architecture with heterogeneous PCM arrays to increase reliability, performance and lifetime. The basic storage unit in the proposed architecture consists of two single-level cells (SLCs) and one four-level cell (4LC). Using the reduced number of 4LCs compared to conventional homogeneous 4LC PCM arrays, the drift-induced error rate is considerably reduced. By alternating each cell operation between SLC and 4LC over time, the overall lifetime can also be significantly enhanced. The proposed architecture achieves up to 10^5 times lower soft error rate with considerably less ECC overhead. With simple ECC scheme, about 22% performance improvement is achieved and additionally, the overall lifetime is also enhanced by about 57%.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.5.3AN EFFICIENT PCM-BASED MAIN MEMORY SYSTEM VIA EXPLOITING FINE-GRAINED DIRTINESS OF CACHELINES
Speaker:
Jie Xu, Huazhong University of Science and Technology, CN
Authors:
Jie Xu1, Dan Feng2, Yu Hua2, Wei Tong2, Jingning Liu2, Chunyan Li2 and Zheng Li3
1Wuhan National Lab for Optoelectronics, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan, China, CN; 2Wuhan National Lab for Optoelectronics, CN; 3WNLO, CN
Abstract
Phase Change Memory (PCM) has the potential to replace traditional DRAM memory due to its better scalability and non-volatility. However, PCM also suffers from high write latency and energy consumption. To mitigate the write overhead of PCM-based main memory, we propose a Fine-grained Dirtiness Aware (FDA) last-level cache (LLC) victimization scheme. The key idea of FDA is to preferentially evict cachelines with fewer dirty words when victimizing dirty cachelines. The modified word is defined to be dirty. FDA exploits two key observations. First, the write service time of a cacheline is proportional to the number of dirty words. Second, a cacheline with fewer dirty words has the same or lower reference frequency compared with other dirty cachelines. Therefore, evicting cachelines with fewer dirty words can reduce the write service time of cachelines and will not increase the miss rate. To reduce the write service time of cachelines, FDA evicts the cacheline with the fewest dirty words when victimizing dirty cachelines. We also present FDARP to decrease the miss rate by further synergizing the number of dirty words with Re-reference Prediction Value. Experimental results show that FDA (FDARP) can improve the IPC performance by 8.3% (14.8%), decrease the write service time of cachelines by 37.0% (36.3%) and reduce write energy consumption of PCM by 27.0% (32.5%) under the mixed benchmarks.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.5.4DFPC: A DYNAMIC FREQUENT PATTERN COMPRESSION SCHEME IN NVM-BASED MAIN MEMORY
Speaker:
Yuncheng Guo, Huazhong University of Science and Technology, CN
Authors:
Yuncheng Guo, Yu Hua and Pengfei Zuo, Huazhong University of Science and Technology, CN
Abstract
Non-volatile memory technologies (NVMs) are promising candidates as the next-generation main memory due to high scalability and low energy consumption. However, the performance bottlenecks, such as high write latency and low cell endurance, still exist in NVMs. To address these problems, frequent pattern compression schemes have been widely used, which however suffer from the lack of flexibility and adaptability. In order to overcome these shortcomings, we propose a well-adaptive NVM write scheme, called Dynamic Frequent Pattern Compression (DFPC), to significantly reduce the amount of write units and extend the lifetime. Instead of only using static frequent patterns in existing FPC schemes, which are pre-defined and not always efficient for all applications, the idea behind DFPC is to exploit the characteristics of data distribution in execution to obtain dynamic patterns, which often appear in the real-world applications. To further improve the compression ratio, we exploit the value locality in a cache line to extend the granularity of dynamic patterns. Hence DFPC can encode the contents of cache lines with more kinds of frequent data patterns. We implement DFPC in GEM5 with NVMain and execute 8 applications from SPEC CPU2006 to evaluate our scheme. Experimental results demonstrate the efficacy and efficiency of DFPC.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session

12.6 Special Session: Computing with Emerging Memories: How Good can it be?

Date: Thursday 22 March 2018
Time: 16:00 - 17:30
Location / Room: Konf. 4

Chair:
Pierre-Emmanuel Gaillardon, University of Utah, US

Co-Chair:
Ian O’Connor, Ecole Centrale de Lyon, FR

With the recent evolutions of nanometer transistor technologies, power consumption emerged as the most critical limitation. Within advanced processors and computing architectures, the processor-memory communication accounts for a significant part of the energy requirement. While alternative design approaches, such as the use of optimized accelerators or advanced power management techniques are successfully employed in contemporary designs, the trend keeps worsening due to the ever-increasing gap between on-chip and off-chip memory data rates. This trend, known as Von Neumann bottleneck, not only limits the system performance, but also acts nowadays as a limiter of the energy scaling. The quest towards more energy-efficiency requires solutions that solve the Von Neumann bottleneck by tightly intertwining computing with memories. In this hot topic session, we intend to elaborate on in-memory computing by identifying and compare the latest computing models in light of conventional, e.g., SRAMs, and emerging memory technologies, e.g., RRAMs, STT-MRAMs. In-memory computing is considered here in the general sense of computing information locally within large data storage.

TimeLabelPresentation Title
Authors
16:0012.6.1PRACTICAL CHALLENGES IN DELIVERING THE PROMISES OF REAL PROCESSING-IN-MEMORY MACHINES
Speaker:
Nishil Talati, Technion - Israel Institute of Technology, IL
Authors:
Nishil Talati1, Ameer Haj Ali2, Rotem Ben Hur2, Nimrod Wald2, Ronny Ronen2, Pierre-Emmanuel Gaillardon3 and Shahar Kvatinsky1
1Technion, IL; 2Technion - Israel Institute of Technology, IL; 3University of Utah, US
Abstract
Processing-in-Memory (PiM) machines promise to overcome the von Neumann bottleneck in order to further scale performance and energy efficiency of computing systems by reducing the extent of data transfer and offering ample parallelism. In this paper, we take the memristive Memory Processing Unit (mMPU) as a case study of a PiM machine and scrutinize it in practical scenarios. Specifically, we explore the limitations of parallelism and data transfer elimination. We argue that lack of operand locality and arrangement might make data transfer inevitable in the mMPU. We then devise techniques to move data within the mMPU, without transferring it off-chip, and quantify their costs. Additionally, we present electrical parameters that might limit the parallelism offered by the mMPU and evaluate their impact. Using benchmarks from the LGsynth91 suite, their vector extensions, and a few synthetic data-parallel workloads, we show that the internal data transfer results in an increase of up to 1.5x in the execution time, while the limited parallelism increases it by 1.1x to 2x.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.6.2SMART INSTRUCTION CODES FOR IN-MEMORY COMPUTING ARCHITECTURES COMPATIBLE WITH STANDARD SRAM INTERFACES
Speaker:
Maha Kooli, CEA-Leti, FR
Authors:
Maha Kooli1, Henri-Pierre CHARLES2, Bastien Giraud3 and Jean-Philippe Noel2
1CEA/LETI, FR; 2CEA, FR; 3CEA LETI, FR
Abstract
This paper presents the computing model for the In-Memory Computing architecture based on SRAM memory that embeds computing abilities. This memory concept offers significant performance gains in terms of energy consumption and execution time. To handle the interaction between the memory and the CPU, new memory instruction codes were designed. Those instructions are communicated by the CPU to the memory, using standard SRAM buses. This implementation allows (1) to embed In-Memory Computing capabilities on a system without Instruction Set Architecture (ISA) modification, and (2) to finely interlace CPU instructions and in-memory computing instructions.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.6.4MEMRISTIVE DEVICES FOR COMPUTATION-IN-MEMORY
Speaker:
Said Hamdioui, Delft University of Technology, NL
Authors:
Jintao Yu, HoangAnh DuNguyen, Mottaqiallah Taouil and Said Hamdioui, TU Delft, NL
Abstract
CMOS technology and its continuous scaling have made electronics and computers accessible and affordable for almost everyone on the globe; in addition, they have enabled the solutions of a wide range of societal problems and applications. Today, however, both the technology and the computer architectures are facing severe challenges/walls making them incapable of providing the demanded computing power with tight constraints. This motivates the need for the exploration of novel architectures based on new device technologies; not only to sustain the financial benefit of technology scaling, but also to develop solutions for extremely demanding emerging applications. This paper presents two computation-in-memory based accelerators making use of emerging memristive devices; they are Memristive Vector Processor and RRAM Automata Processor. The preliminary results of these two accelerators show significant improvement in terms of latency, energy and area as compared to today's architectures and design.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:1512.6.3COMPUTING-IN-MEMORY WITH SPINTRONICS
Speaker:
Shubham Jain, Purdue University, US
Authors:
Shubham Jain1, Sachin Sapatnekar2, Jian-Ping Wang2, Kaushik Roy1 and Anand Raghunathan1
1Purdue University, US; 2Department of Electrical and Computer Engineering, University of Minnesota, US
Abstract
In-memory computing is a promising approach to alleviating the processor-memory data transfer bottleneck in computing systems. While spintronics has attracted great interest as a non-volatile memory technology, recent work has shown that its unique properties can also enable in-memory computing. We summarize efforts in this direction, and describe three different designs that enhance STT-MRAM to perform logic, arithmetic, and vector operations and evaluate transcendental functions within memory arrays.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session