IP1 Interactive Presentations

Label	Presentation Title Authors
IP1-1	STRUCTURAL DESIGN OPTIMIZATION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS USING STOCHASTIC COMPUTING Speaker: Yanzhi Wang, Syracuse University, US Authors: Zhe Li¹, Ao Ren¹, Ji Li², Qinru Qiu¹, Bo Yuan³, Jeffrey Draper² and Yanzhi Wang¹ ¹Syracuse University, US; ²University of Southern California, US; ³City University of New York, City College, US Abstract Deep Convolutional Neural Networks (DCNNs) have been demonstrated as effective models for understanding image content. The computation behind DCNNs highly relies on the capability of hardware resources due to the deep structure. DCNNs have been implemented on different large- scale computing platforms. However, there is a trend that DCNNs have been embedded into light-weight local systems, which requires low power/energy consumptions and small hardware footprints. Stochastic Computing (SC) radically simplifies the hardware implementation of arithmetic units and has the potential to satisfy the small low-power needs of DCNNs. Local connectivities and down-sampling operations have made DCNNs more complex to be implemented using SC. In this paper, eight feature extraction designs for DCNNs using SC in two groups are explored and optimized in detail from the perspective of calculation precision, where we permute two SC implementations for inner-product calculation, two down-sampling schemes, and two structures of DCNN neurons. We evaluate the network in aspects of network accuracy and hardware performance for each DCNN using one feature extraction design out of eight. Through exploration and optimization, the accuracies of SC-based DCNNs are guaranteed compared with software implementations on CPU/GPU/binary-based ASIC synthesis, while area, power, and energy are significantly reduced by up to 776X, 190X, and 32835X. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-2	APPROXQA: A UNIFIED QUALITY ASSURANCE FRAMEWORK FOR APPROXIMATE COMPUTING Speaker: Ting Wang, The Chinese University of Hong Kong, HK Authors: Ting Wang, Qian Zhang and Qiang Xu, The Chinese University of Hong Kong, HK Abstract Approximate computing, being able to trade off computation quality and computational effort (e.g., energy) by exploiting the inherent error-resilience of emerging applications (e.g., recognition and mining), has garnered significant attention recently. No doubt to say, quality assurance is indispensable for satisfactory user experience with approximate computing, but this issue has remained largely unexplored in the literature. In this work, we propose a novel framework namely ApproxQA to tackle this problem, in which approximation mode tuning and rollback recovery are considered in a unified manner when quality violation occurs. To be specific, ApproxQA resorts to a two-level controller, in which the high-level approximation controller tunes approximation modes at a coarse-grained scale based on Q-learning while the low-level rollback controller judiciously determines whether to perform rollback recovery at a fine-grained scale based on the target quality requirement. ApproxQA can provide statistical quality assurance even when the underlying quality checkers are not reliable. Experimental results on various benchmark applications demonstrate that it significantly outperforms existing solutions in terms of energy efficiency with quality assurance. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-3	(Best Paper Award Candidate) EVOAPPROX8B: LIBRARY OF APPROXIMATE ADDERS AND MULTIPLIERS FOR CIRCUIT DESIGN AND BENCHMARKING OF APPROXIMATION METHODS Speaker: Lukas Sekanina, Brno University of Technology, CZ Authors: Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek and Lukas Sekanina, Brno University of Technology, CZ Abstract Approximate circuits and approximate circuit design methodologies attracted a significant attention of researchers as well as industry in recent years. In order to accelerate the approximate circuit and system design process and to support a fair benchmarking of circuit approximation methods, we propose a library of approximate adders and multipliers called EvoApprox8b. This library contains 430 non-dominated 8-bit approximate adders created from 13 conventional adders and 471 non-dominated 8-bit approximate multipliers created from 6 conventional multipliers. These implementations were evolved by a multi-objective Cartesian genetic programming. The EvoApprox8b library provides Verilog, Matlab and C models of all approximate circuits. In addition to standard circuit parameters, the error is given for seven different error metrics. The EvoApprox8b library is available at: www.fit.vutbr.cz/research/groups/ehw/approxlib Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-4	(Best Paper Award Candidate) DROOP MITIGATING LAST LEVEL CACHE ARCHITECTURE FOR STTRAM Speaker: Swaroop Ghosh, Pennsylvania State University, US Authors: Radha Krishna Aluru¹ and Swaroop Ghosh² ¹University of South Florida, US; ²Pennsylvania State University, US Abstract Spin-Transfer Torque magnetic Random Access Memory (STT-RAM) is one of the emerging technologies in the Domain of Non-volatile dense memories especially preferred for the last level cache (LLC). The amount of current needed to reorient the magnetization at present (~100μA per bit) is too high, especially for the Write operation. When we perform a full cache line (512-bit) Write, this extremely high current compared to MRAM will result in a Voltage droop in the conventional cache architecture. Due to this droop, the write operation will fail half way through when we attempt to write in the farthest Bank of the cache from the supply. In this paper, we will be proposing a new cache architecture to mitigate this problem of droop and make the write operation successful. Instead of continuously writing the entire Cache line (512-bit) in a single bank, our architecture will be writing these 512-bits in multiple different locations across the cache in parts of 8 (64-bit each). The various simulation results obtained (both circuit and micro-architectural) comparing our proposed architecture against the conventional are presented in detail. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-5	MODELING INSTRUCTION CACHE AND INSTRUCTION BUFFER FOR PERFORMANCE ESTIMATION OF VLIW ARCHITECTURES USING NATIVE SIMULATION Speaker: Omayma Matoussi, Grenoble INP, TIMA laboratory, FR Authors: Omayma Matoussi¹ and Frédéric Pétrot² ¹Tima Laboratory at Grenoble, FR; ²TIMA Laboratory, Grenoble Institute of Technology, FR Abstract In this work, we propose an icache performance estimation approach that focuses on a component necessary to handle the instruction parallelism in a very long instruction word (VLIW) processor: the instruction buffer (IB). Our annotation approach is founded on an intermediate level native- simulation framework. It is evaluated with reference to a cycle accurate instruction set simulator leading to an average cycle count error of 9.3% and an average speedup of 10. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-6	ANALOG FAULT TESTING THROUGH ABSTRACTION Speaker: Enrico Fraccaroli, Università degli Studi di Verona, IT Authors: Enrico Fraccaroli and Franco Fummi, Università degli Studi di Verona, IT Abstract Despite analog SPICE-like simulators have reached their maturity, most of them were not originally conceived for simulating faulty circuits. With the advent of smart systems, fault testing has to deal with models encompassing both analog and digital blocks. Due to their complexity, the industry is still lacking of effective testing approaches for these analog and mixed-signal (AMS) models. The current problem is the computational time required for implementing an analog fault simulation campaign. To this end, the work presented in this paper is an automatic procedure which: 1) injects faults in an analog circuit, 2) abstracts both faulty and fault-free models from the circuit to the functional level, 3) builds an efficient fault simulation framework. The processes of fault injection, faulty model abstraction and framework generation are reported in details, as well as how simulation is carried out. This abstraction process, which preserves the faulty behaviors, allows to reach a speed-up of some orders of magnitude and thus, making feasible an extensive analog faults campaign. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-7	BISCC: EFFICIENT PRE THROUGH POST SILICON VALIDATION OF MIXED-SIGNAL/RF SYSTEMS USING BUILT IN STATE CONSISTENCY CHECKING Speaker: Abhijit Chatterjee, Georgia Institute of Technology, US Authors: Sabyasachi Deyati¹, Barry Muldrey¹ and Abhijit Chatterjee² ¹Georgia Institute of Technology, US; ²Georgia Tech, US Abstract High levels of integration in SoCs and SoPs is making pre as well as post-silicon validation of mixed-signal systems increasingly difficult due to: (a) lack of automated pre and post-silicon design checking algorithms and (b) lack of controllability and observability of internal circuit nodes in post-silicon. While digital scan chains provide observability of internal digital circuit states, analog scan chains suffer from signal integrity, bandwidth and circuit loading issues. In this paper, we propose a novel technique based on built-in state consistency checking that allows both pre as well as post-silicon validation of mixed-signal/RF systems without the need to rely on manually generated checks. The method is supported by a design-for-validation (DfV) methodology which systematically inserts a minimum amount of circuitry into mixed-signal systems for design bug detection and diagnosis purposes. The core idea is to apply two spectrally diverse stimuli to the circuit under test (CUT) in such a way that they result in the same circuit state (observed voltage/current values at internal or external circuit nodes). By comparing the resulting state values, design bugs are detected efficiently without the need for manually generated checks. No assumption is made about the nature of the detected bugs; the stimulus applied is steered towards those that are the most likely to detect design bugs. Test cases for both pre and post-silicon design bug detection and diagnosis prove the viability of the proposed BISCC approach. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-8	COMPUTING WITH NANO-CROSSBAR ARRAYS: LOGIC SYNTHESIS AND FAULT TOLERANCE Speaker: Mustafa Altun, Istanbul Technical University, TR Authors: Mustafa Altun¹, Valentina Ciriani² and Mehdi Tahoori³ ¹Istanbul Technical University, TR; ²University of Milan, IT; ³Karlsruhe Institute of Technology, DE Abstract Nano-crossbar arrays have emerged as a strong candidate technology to replace CMOS in near future. They are regular and dense structures, and can be fabricated such that each crosspoint can be used as a conventional electronic component such as a diode, a FET, or a switch. This is a unique opportunity that allows us to integrate well developed conventional circuit design techniques into nano-crossbar arrays. Motivated by this, our project aims to develop a complete synthesis and performance optimization methodology for switching nano-crossbar arrays that leads to the design and construction of an emerging nanocomputer. First two work packages of the project are presented in this paper. These packages are on logic synthesis that aims to implement Boolean functions with nano-crossbar arrays with area optimization, and fault tolerance that aims to provide a full methodology in the presence of high fault densities and extreme parametric variations in nano-crossbar architectures. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-9	SECURECLOUD: SECURE BIG DATA PROCESSING IN UNTRUSTED CLOUDS Speaker: Rafael Pires, University of Neuchâtel, CH Abstract We present the SecureCloud EU Horizon 2020 project, whose goal is to enable new big data applications that use sensitive data in the cloud without compromising data security and privacy. For this, SecureCloud designs and develops a layered architecture that allows for (i) the secure creation and deployment of secure micro-services; (ii) the secure integration of individual micro-services to full-fledged big data applications; and (iii) the secure execution of these applications within untrusted cloud environments. To provide security guarantees, SecureCloud leverages novel security mechanisms present in recent commodity CPUs, in particular, Intel's Software Guard Extensions (SGX). SecureCloud applies this architecture to big data applications in the context of smart grids. We describe the SecureCloud approach, initial results, and considered use cases. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-10	WCET-AWARE PARALLELIZATION OF MODEL-BASED APPLICATIONS FOR MULTI-CORES: THE ARGO APPROACH Speaker: Steven Derrien, Universite de Rennes 1, FR Authors: Steven Derrien¹, Isabelle Puaut², Panayiotis Alefragis³, Marcus Bednara⁴, Harald Bucher⁵, Clément David⁶, Yann Debray⁶, Umut Durak⁷, Imen Fassi², Christian Ferdinand⁸, Damien Hardy², Angeliki Kritikakou², Gerard Rauwerda⁹, Simon Reder⁵, Martin Sicks⁸, Timo Stripf⁵, Kim Sunesen⁹, Timon ter Braak⁹, Nikolaos Voros³ and Jürgen Becker⁵ ¹IRISA, FR; ²University of Rennes 1 / IRISA, FR; ³TWG, GR; ⁴IIS/Franhofer, DE; ⁵Karlsruhe Institute of Technology, DE; ⁶Scilab, FR; ⁷DLR, DE; ⁸Absint, FR; ⁹Recore systems, FR Abstract Parallel architectures are nowadays not only confined to the domain of high performance computing, they are also increasingly used in embedded time-critical systems. The ARGO H2020 project provides a programming paradigm and associated tool flow to exploit the full potential of architectures in terms of development productivity, time-to-market, exploitation of the platform computing power and guaranteed real-time performance. In this paper we give an overview of the objectives of ARGO and explore the challenges introduced by our approach. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-11	EXPLORING THE UNKNOWN THROUGH SUCCESSIVE GENERATIONS OF LOW POWER AND LOW RESOURCE VERSATILE AGENTS Speaker: Martin Andraud, Eindhoven University of Technology, NL Authors: Martin Andraud¹ and Marian Verhelst² ¹Eindhoven University of Technology, NL; ²Katholieke Universiteit Leuven, BE Abstract The Phoenix project aims to develop a new approach to explore unknown environments, based on multiple measurement campaigns carried out by extremely tiny devices, called agents, that gather data from multiple sensors. These low power and low resource agents are configured specifically for each measurement campaign to achieve the exploration goal in the smallest number of iterations. Thus, the main design challenge is to build agents as much reconfigurable as possible. This paper introduces the Phoenix project in more details and presents first developments in the agent design. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-12	POWER PROFILING OF MICROCONTROLLER'S INSTRUCTION SET FOR RUNTIME HARDWARE TROJANS DETECTION WITHOUT GOLDEN CIRCUIT MODELS Speaker: Falah Awwad, College of Engineering / Department of Electrical Engineering, UAE University, AE Authors: Faiq Khalid Lodhi¹, Syed Rafay Hasan², Osman Hasan¹ and Falah Awwad³ ¹School of Electrical Engineering and Computer Science National University of Sciences and Technology (NUST), PK; ²Department of Electrical and Computer Engineering, Tennessee Technological University, US; ³College of Engineering, United Arab Emirates University, AE Abstract Globalization trends in integrated circuit (IC) design are leading to increased vulnerability of ICs against hardware Trojans (HT). Recently, several side channel parameters based techniques have been developed to detect these hardware Trojans that require golden circuit as a reference model, but due to the widespread usage of IPs, most of the system-on-chip (SoC) do not have a golden reference. Hardware Trojans in intellectual property (IP)-based SoC designs are considered as major concern for future integrated circuits. Most of the state-of-the-art runtime hardware Trojan detection techniques presume that Trojans will lead to anomaly in the SoC integration units. In this paper, we argue that an intelligent intruder may intrude the IP-based SoC without disturbing the normal SoC operation or violating any protocols. To overcome this limitation, we propose a methodology to extract the power profile of the micro-controllers instruction sets, which is in turn used to train a machine learning algorithm. In this technique, the power profile is obtained by extracting the power behavior of the micro-controllers for different assembly language instructions. This trained model is then embedded into the integrated circuits at the SoC integration level, which classifies the power profile during runtime to detect the intrusions. We applied our proposed technique on MC8051 micro-controller in VHDL, obtained the power profile of its instruction set and then applied deep learning, k-NN, decision tree and naive Bayesian based machine learning tools to train the models. The cross validation comparison of these learning algorithm, when applied to MC8051 Trojan benchmarks, shows that we can achieve 87\% to 99\% accuracy. To the best of our knowledge, this is the first work in which the power profile of a microprocessor's instruction set is used in conjunction with machine learning for runtime HT detection. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-13	ACCOUNTING FOR SYSTEMATIC ERRORS IN APPROXIMATE COMPUTING Speaker: Martin Bruestel, Technical University Dresden, DE Authors: Martin Bruestel¹ and Akash Kumar² ¹Technical University Dresden, DE; ²Technische Universitaet Dresden, DE Abstract Approximate computing is gaining more and more attention as potential solution to the problem of increasing energy demand in computing. Several recent works focus on the application of deterministic approximate computing to arithmetic computations. Circuits for addition and multiplication are simpliﬁed, trading exactness for energy and/or speed. Recent approximation techniques for adders focus on modiﬁcations of individual full adders' truth tables or shortening carry chains. While the resulting error is usually characterized with statistical measures over the range of possible input/output combinations, the actual adder is a static nonlinear system regarding arithmetic operations and signal processing. The resulting unexpected effects present a challenge for adopting approximate computing as a widespread and standard application-level optimization technique. This paper focuses on the deterministic effects of approximate multi-bit adders, which are especially evident for certain input data in an otherwise well speciﬁed systems, showing the necessity to look beyond purely statistical measures. We show which fundamental principles are violated depending on the chosen approximation scheme, and how this choice affects practical applications. This can serve as a basis for designers to make informed decisions about the use of approximate adders at the application level. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-14	GAUSSIAN MIXTURE ERROR ESTIMATION FOR APPROXIMATE CIRCUITS Speaker: Amin Ghasemazar, The University of British Columbia, CA Authors: Amin Ghasemazar and Mieszko Lis, University of British Columbia, CA Abstract In application domains where perceived quality is limited by human senses, where data are inherently noisy, or where models are naturally inexact, approximate computing offers an attractive tradeoff between accuracy and energy or performance. While several approximate functional units have been proposed to date, the question of how these techniques can be systematically integrated into a design flow remains open. Ideally, units like adders or multipliers could be automatically replaced with their approximate counterparts as part of the design flow. This, however, requires accurately modelling approximation errors to avoid compromising output quality. Prior proposals have either focused on describing errors per-bit or significantly limited estimation accuracy to reduce otherwise exponential storage requirements. When multiple approximate modules are chained, these limitations become critical, and propagated error estimates can be orders of magnitude off. In this paper, we propose an approach where both input distributions and approximation errors are modelled as Gaussian mixtures. This naturally represents the multiple sources of error that arise in many approximate circuits while maintaining reasonable memory requirements. Estimation accuracy is significantly better than prior art (up to 7.2× lower Hellinger distance) and errors can be accurately propagated through a cascade of approximate operations; estimates of quality metrics like MSE and MED are within a few percent of simulation-derived values. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-15	(Best Paper Award Candidate) ENHANCING SYMBOLIC SYSTEM SYNTHESIS THROUGH ASPMT WITH PARTIAL ASSIGNMENT EVALUATION Speaker: Kai Neubauer, University of Rostock, DE Authors: Kai Neubauer¹, Philipp Wanko², Torsten Schaub² and Christian Haubelt¹ ¹University of Rostock, DE; ²University of Potsdam, DE Abstract The design of embedded systems is becoming continuously more complex such that efficient design methods are becoming crucial for competitive results regarding design time and performance. Recently, combined Answer Set Programming (ASP) and Quantifier Free Integer Difference Logic (QF-IDL) solving has been shown to be a promising approach in system synthesis. However, this approach still has several restrictions limiting its applicability. In the paper at hand, we propose a novel ASP modulo theories (ASPmT) system synthesis approach, which (i) supports more sophisticated system models, (ii) tightly integrates the QF-IDL solving into the ASP solving, and (iii) makes use of partial assignment checking. As a result, more realistic systems are considered and an early exclusion of infeasible solutions improves the entire system synthesis. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-16	3DFAR: A THREE-DIMENSIONAL FABRIC FOR RELIABLE MULTICORE PROCESSORS Speaker: Valeria Bertacco, University of Michigan-, US Authors: Javad Bagherzadeh and Valeria Bertacco, University of Michigan, US Abstract In the past decade, silicon technology trends into the nanometer regime have led to significantly higher transistor failure rates. Moreover, these trends are expected to exacerbate with future devices. To enhance reliability,several approaches leverage the inherent core-level and processor-level redundancy present in large chip multiprocessors. However, all of these methods incur high overheads, making them impractical. In this paper, we propose 3DFAR, a novel architecture leveraging 3-dimensional fabrics layouts to efficiently enhance reliability in the presence of faults. Our key idea is based on a fine-grained reconfigurable pipeline for multicore processors, which minimizes routing delay among spare units of the same type by using physical layout locality and efficient interconnect switches, distributed over multiple vertical layers. Our evaluation shows that 3DFAR outperforms state-of-the-art reliable 2D solutions, at a minimal area cost of only 7% over an unprotected design. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-17	EVALUATING IMPACT OF HUMAN ERRORS ON THE AVAILABILITY OF DATA STORAGE SYSTEMS Speaker: Hossein Asadi, Sharif University of Technology, IR Authors: Mostafa Kishani, Reza Eftekhari and Hossein Asadi, Sharif University of Technology, IR Abstract In this paper, we investigate the effect of incorrect disk replacement service on the availability of data storage systems. To this end, we first conduct Monte Carlo simulations to evaluate the availability of disk subsystem by considering disk failures and incorrect disk replacement service. We also propose a Markov model that corroborates the Monte Carlo simulation results. We further extend the proposed model to consider the effect of automatic disk fail-over policy. The results obtained by the proposed model show that overlooking the impact of incorrect disk replacement can result up to three orders of magnitude unavailability underestimation. Moreover, this study suggests that by considering the effect of human errors, the conventional believes about the dependability of different RAID mechanisms should be revised. The results show that in the presence of human errors, RAID1 can result in lower availability compared to RAID5. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-18	GPUGUARD: TOWARDS SUPPORTING A PREDICTABLE EXECUTION MODEL FOR HETEROGENEOUS SOC Speaker: Björn Forsberg, ETH Zürich, CH Authors: Björn Forsberg¹, Andrea Marongiu² and Luca Benini³ ¹ETH Zürich, CH; ²Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ³Università di Bologna, IT Abstract The deployment of real-time workloads on commercial off-the-shelf (COTS) hardware is attractive, as it reduces the cost and time-to-market of new products. Most modern high-end embedded SoCs rely on a heterogeneous design, coupling a general-purpose multi-core CPU to a massively parallel accelerator, typically a programmable GPU, sharing a single global DRAM. However, because of non-predictable hardware arbiters designed to maximize average or peak performance, it is very difficult to provide timing guarantees on such systems. In this work we present our ongoing work on GPUguard, a software technique that predictably arbitrates main memory usage in heterogeneous SoCs. A prototype implementation for the NVIDIA Tegra TX1 SoC shows that GPUguard is able to reduce the adverse effects of memory sharing, while retaining a high throughput on both the CPU and the accelerator. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-19	A NON-INTRUSIVE, OPERATING SYSTEM INDEPENDENT SPINLOCK PROFILER FOR EMBEDDED MULTICORE SYSTEMS Speaker: Lin Li, Infineon Technologies, DE Authors: Lin Li¹, Philipp Wagner², Albrecht Mayer¹, Thomas Wild² and Andreas Herkersdorf³ ¹Infineon Technologies, DE; ²Technical University of Munich, DE; ³TU München, DE Abstract Locks are widely used as a synchronization method to guarantee the mutual exclusion for accesses to shared resources in multi-core embedded systems. They have been studied for years to improve performance, fairness, predictability etc. and a variety of lock implementations optimized for different scenarios have been proposed. In practice, applying an appropriate lock type to a specific scenario is usually based on the developer's hypothesis, which could mismatch the actual situation. A wrong lock type applied may result in lower performance and unfairness. Thus, a lock profiling tool is needed to increase the system transparency and guarantee the proper lock usage. In this paper, an operating-system-independent lock profiling approach is proposed as there are many different operating systems in the embedded field. This approach detects lock acquisition and lock releasing using hardware tracing based on hardware-level spinlock characteristics instead of specific libraries or APIs. The spinlocks are identified automatically; lock profiling statistics can be measured and performance-harmful lock behaviors are detected. With this information, the lock usage can be improved by the software developer. A prototype as a Java tool was implemented to conduct hardware tracing and analyze locks inside applications running on the Infineon AURIX microcontrollers. Download Paper (PDF; Only available from the DATE venue WiFi)

Label

Presentation Title
Authors

IP1-1

STRUCTURAL DESIGN OPTIMIZATION FOR DEEP CONVOLUTIONAL NEURAL NETWORKS USING STOCHASTIC COMPUTING
Speaker:
Yanzhi Wang, Syracuse University, US
Authors:
Zhe Li¹, Ao Ren¹, Ji Li², Qinru Qiu¹, Bo Yuan³, Jeffrey Draper² and Yanzhi Wang¹
¹Syracuse University, US; ²University of Southern California, US; ³City University of New York, City College, US
Abstract
Deep Convolutional Neural Networks (DCNNs) have been demonstrated as effective models for understanding image content. The computation behind DCNNs highly relies on the capability of hardware resources due to the deep structure. DCNNs have been implemented on different large- scale computing platforms. However, there is a trend that DCNNs have been embedded into light-weight local systems, which requires low power/energy consumptions and small hardware footprints. Stochastic Computing (SC) radically simplifies the hardware implementation of arithmetic units and has the potential to satisfy the small low-power needs of DCNNs. Local connectivities and down-sampling operations have made DCNNs more complex to be implemented using SC. In this paper, eight feature extraction designs for DCNNs using SC in two groups are explored and optimized in detail from the perspective of calculation precision, where we permute two SC implementations for inner-product calculation, two down-sampling schemes, and two structures of DCNN neurons. We evaluate the network in aspects of network accuracy and hardware performance for each DCNN using one feature extraction design out of eight. Through exploration and optimization, the accuracies of SC-based DCNNs are guaranteed compared with software implementations on CPU/GPU/binary-based ASIC synthesis, while area, power, and energy are significantly reduced by up to 776X, 190X, and 32835X.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-2

APPROXQA: A UNIFIED QUALITY ASSURANCE FRAMEWORK FOR APPROXIMATE COMPUTING
Speaker:
Ting Wang, The Chinese University of Hong Kong, HK
Authors:
Ting Wang, Qian Zhang and Qiang Xu, The Chinese University of Hong Kong, HK
Abstract
Approximate computing, being able to trade off computation quality and computational effort (e.g., energy) by exploiting the inherent error-resilience of emerging applications (e.g., recognition and mining), has garnered significant attention recently. No doubt to say, quality assurance is indispensable for satisfactory user experience with approximate computing, but this issue has remained largely unexplored in the literature. In this work, we propose a novel framework namely ApproxQA to tackle this problem, in which approximation mode tuning and rollback recovery are considered in a unified manner when quality violation occurs. To be specific, ApproxQA resorts to a two-level controller, in which the high-level approximation controller tunes approximation modes at a coarse-grained scale based on Q-learning while the low-level rollback controller judiciously determines whether to perform rollback recovery at a fine-grained scale based on the target quality requirement. ApproxQA can provide statistical quality assurance even when the underlying quality checkers are not reliable. Experimental results on various benchmark applications demonstrate that it significantly outperforms existing solutions in terms of energy efficiency with quality assurance.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-3

(Best Paper Award Candidate)
EVOAPPROX8B: LIBRARY OF APPROXIMATE ADDERS AND MULTIPLIERS FOR CIRCUIT DESIGN AND BENCHMARKING OF APPROXIMATION METHODS
Speaker:
Lukas Sekanina, Brno University of Technology, CZ
Authors:
Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek and Lukas Sekanina, Brno University of Technology, CZ
Abstract
Approximate circuits and approximate circuit design methodologies attracted a significant attention of researchers as well as industry in recent years. In order to accelerate the approximate circuit and system design process and to support a fair benchmarking of circuit approximation methods, we propose a library of approximate adders and multipliers called EvoApprox8b. This library contains 430 non-dominated 8-bit approximate adders created from 13 conventional adders and 471 non-dominated 8-bit approximate multipliers created from 6 conventional multipliers. These implementations were evolved by a multi-objective Cartesian genetic programming. The EvoApprox8b library provides Verilog, Matlab and C models of all approximate circuits. In addition to standard circuit parameters, the error is given for seven different error metrics. The EvoApprox8b library is available at: www.fit.vutbr.cz/research/groups/ehw/approxlib
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-4

(Best Paper Award Candidate)
DROOP MITIGATING LAST LEVEL CACHE ARCHITECTURE FOR STTRAM
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
Authors:
Radha Krishna Aluru¹ and Swaroop Ghosh²
¹University of South Florida, US; ²Pennsylvania State University, US
Abstract
Spin-Transfer Torque magnetic Random Access Memory (STT-RAM) is one of the emerging technologies in the Domain of Non-volatile dense memories especially preferred for the last level cache (LLC). The amount of current needed to reorient the magnetization at present (~100μA per bit) is too high, especially for the Write operation. When we perform a full cache line (512-bit) Write, this extremely high current compared to MRAM will result in a Voltage droop in the conventional cache architecture. Due to this droop, the write operation will fail half way through when we attempt to write in the farthest Bank of the cache from the supply. In this paper, we will be proposing a new cache architecture to mitigate this problem of droop and make the write operation successful. Instead of continuously writing the entire Cache line (512-bit) in a single bank, our architecture will be writing these 512-bits in multiple different locations across the cache in parts of 8 (64-bit each). The various simulation results obtained (both circuit and micro-architectural) comparing our proposed architecture against the conventional are presented in detail.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-5

MODELING INSTRUCTION CACHE AND INSTRUCTION BUFFER FOR PERFORMANCE ESTIMATION OF VLIW ARCHITECTURES USING NATIVE SIMULATION
Speaker:
Omayma Matoussi, Grenoble INP, TIMA laboratory, FR
Authors:
Omayma Matoussi¹ and Frédéric Pétrot²
¹Tima Laboratory at Grenoble, FR; ²TIMA Laboratory, Grenoble Institute of Technology, FR
Abstract
In this work, we propose an icache performance estimation approach that focuses on a component necessary to handle the instruction parallelism in a very long instruction word (VLIW) processor: the instruction buffer (IB). Our annotation approach is founded on an intermediate level native- simulation framework. It is evaluated with reference to a cycle accurate instruction set simulator leading to an average cycle count error of 9.3% and an average speedup of 10.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-6

ANALOG FAULT TESTING THROUGH ABSTRACTION
Speaker:
Enrico Fraccaroli, Università degli Studi di Verona, IT
Authors:
Enrico Fraccaroli and Franco Fummi, Università degli Studi di Verona, IT
Abstract
Despite analog SPICE-like simulators have reached their maturity, most of them were not originally conceived for simulating faulty circuits. With the advent of smart systems, fault testing has to deal with models encompassing both analog and digital blocks. Due to their complexity, the industry is still lacking of effective testing approaches for these analog and mixed-signal (AMS) models. The current problem is the computational time required for implementing an analog fault simulation campaign. To this end, the work presented in this paper is an automatic procedure which: 1) injects faults in an analog circuit, 2) abstracts both faulty and fault-free models from the circuit to the functional level, 3) builds an efficient fault simulation framework. The processes of fault injection, faulty model abstraction and framework generation are reported in details, as well as how simulation is carried out. This abstraction process, which preserves the faulty behaviors, allows to reach a speed-up of some orders of magnitude and thus, making feasible an extensive analog faults campaign.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-7

BISCC: EFFICIENT PRE THROUGH POST SILICON VALIDATION OF MIXED-SIGNAL/RF SYSTEMS USING BUILT IN STATE CONSISTENCY CHECKING
Speaker:
Abhijit Chatterjee, Georgia Institute of Technology, US
Authors:
Sabyasachi Deyati¹, Barry Muldrey¹ and Abhijit Chatterjee²
¹Georgia Institute of Technology, US; ²Georgia Tech, US
Abstract
High levels of integration in SoCs and SoPs is making pre as well as post-silicon validation of mixed-signal systems increasingly difficult due to: (a) lack of automated pre and post-silicon design checking algorithms and (b) lack of controllability and observability of internal circuit nodes in post-silicon. While digital scan chains provide observability of internal digital circuit states, analog scan chains suffer from signal integrity, bandwidth and circuit loading issues. In this paper, we propose a novel technique based on built-in state consistency checking that allows both pre as well as post-silicon validation of mixed-signal/RF systems without the need to rely on manually generated checks. The method is supported by a design-for-validation (DfV) methodology which systematically inserts a minimum amount of circuitry into mixed-signal systems for design bug detection and diagnosis purposes. The core idea is to apply two spectrally diverse stimuli to the circuit under test (CUT) in such a way that they result in the same circuit state (observed voltage/current values at internal or external circuit nodes). By comparing the resulting state values, design bugs are detected efficiently without the need for manually generated checks. No assumption is made about the nature of the detected bugs; the stimulus applied is steered towards those that are the most likely to detect design bugs. Test cases for both pre and post-silicon design bug detection and diagnosis prove the viability of the proposed BISCC approach.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-8

COMPUTING WITH NANO-CROSSBAR ARRAYS: LOGIC SYNTHESIS AND FAULT TOLERANCE
Speaker:
Mustafa Altun, Istanbul Technical University, TR
Authors:
Mustafa Altun¹, Valentina Ciriani² and Mehdi Tahoori³
¹Istanbul Technical University, TR; ²University of Milan, IT; ³Karlsruhe Institute of Technology, DE
Abstract
Nano-crossbar arrays have emerged as a strong candidate technology to replace CMOS in near future. They are regular and dense structures, and can be fabricated such that each crosspoint can be used as a conventional electronic component such as a diode, a FET, or a switch. This is a unique opportunity that allows us to integrate well developed conventional circuit design techniques into nano-crossbar arrays. Motivated by this, our project aims to develop a complete synthesis and performance optimization methodology for switching nano-crossbar arrays that leads to the design and construction of an emerging nanocomputer. First two work packages of the project are presented in this paper. These packages are on logic synthesis that aims to implement Boolean functions with nano-crossbar arrays with area optimization, and fault tolerance that aims to provide a full methodology in the presence of high fault densities and extreme parametric variations in nano-crossbar architectures.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-9

SECURECLOUD: SECURE BIG DATA PROCESSING IN UNTRUSTED CLOUDS
Speaker:
Rafael Pires, University of Neuchâtel, CH
Abstract
We present the SecureCloud EU Horizon 2020 project, whose goal is to enable new big data applications that use sensitive data in the cloud without compromising data security and privacy. For this, SecureCloud designs and develops a layered architecture that allows for (i) the secure creation and deployment of secure micro-services; (ii) the secure integration of individual micro-services to full-fledged big data applications; and (iii) the secure execution of these applications within untrusted cloud environments. To provide security guarantees, SecureCloud leverages novel security mechanisms present in recent commodity CPUs, in particular, Intel's Software Guard Extensions (SGX). SecureCloud applies this architecture to big data applications in the context of smart grids. We describe the SecureCloud approach, initial results, and considered use cases.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-10

WCET-AWARE PARALLELIZATION OF MODEL-BASED APPLICATIONS FOR MULTI-CORES: THE ARGO APPROACH
Speaker:
Steven Derrien, Universite de Rennes 1, FR
Authors:
Steven Derrien¹, Isabelle Puaut², Panayiotis Alefragis³, Marcus Bednara⁴, Harald Bucher⁵, Clément David⁶, Yann Debray⁶, Umut Durak⁷, Imen Fassi², Christian Ferdinand⁸, Damien Hardy², Angeliki Kritikakou², Gerard Rauwerda⁹, Simon Reder⁵, Martin Sicks⁸, Timo Stripf⁵, Kim Sunesen⁹, Timon ter Braak⁹, Nikolaos Voros³ and Jürgen Becker⁵
¹IRISA, FR; ²University of Rennes 1 / IRISA, FR; ³TWG, GR; ⁴IIS/Franhofer, DE; ⁵Karlsruhe Institute of Technology, DE; ⁶Scilab, FR; ⁷DLR, DE; ⁸Absint, FR; ⁹Recore systems, FR
Abstract
Parallel architectures are nowadays not only confined to the domain of high performance computing, they are also increasingly used in embedded time-critical systems. The ARGO H2020 project provides a programming paradigm and associated tool flow to exploit the full potential of architectures in terms of development productivity, time-to-market, exploitation of the platform computing power and guaranteed real-time performance. In this paper we give an overview of the objectives of ARGO and explore the challenges introduced by our approach.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-11

EXPLORING THE UNKNOWN THROUGH SUCCESSIVE GENERATIONS OF LOW POWER AND LOW RESOURCE VERSATILE AGENTS
Speaker:
Martin Andraud, Eindhoven University of Technology, NL
Authors:
Martin Andraud¹ and Marian Verhelst²
¹Eindhoven University of Technology, NL; ²Katholieke Universiteit Leuven, BE
Abstract
The Phoenix project aims to develop a new approach to explore unknown environments, based on multiple measurement campaigns carried out by extremely tiny devices, called agents, that gather data from multiple sensors. These low power and low resource agents are configured specifically for each measurement campaign to achieve the exploration goal in the smallest number of iterations. Thus, the main design challenge is to build agents as much reconfigurable as possible. This paper introduces the Phoenix project in more details and presents first developments in the agent design.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-12

POWER PROFILING OF MICROCONTROLLER'S INSTRUCTION SET FOR RUNTIME HARDWARE TROJANS DETECTION WITHOUT GOLDEN CIRCUIT MODELS
Speaker:
Falah Awwad, College of Engineering / Department of Electrical Engineering, UAE University, AE
Authors:
Faiq Khalid Lodhi¹, Syed Rafay Hasan², Osman Hasan¹ and Falah Awwad³
¹School of Electrical Engineering and Computer Science National University of Sciences and Technology (NUST), PK; ²Department of Electrical and Computer Engineering, Tennessee Technological University, US; ³College of Engineering, United Arab Emirates University, AE
Abstract
Globalization trends in integrated circuit (IC) design are leading to increased vulnerability of ICs against hardware Trojans (HT). Recently, several side channel parameters based techniques have been developed to detect these hardware Trojans that require golden circuit as a reference model, but due to the widespread usage of IPs, most of the system-on-chip (SoC) do not have a golden reference. Hardware Trojans in intellectual property (IP)-based SoC designs are considered as major concern for future integrated circuits. Most of the state-of-the-art runtime hardware Trojan detection techniques presume that Trojans will lead to anomaly in the SoC integration units. In this paper, we argue that an intelligent intruder may intrude the IP-based SoC without disturbing the normal SoC operation or violating any protocols. To overcome this limitation, we propose a methodology to extract the power profile of the micro-controllers instruction sets, which is in turn used to train a machine learning algorithm. In this technique, the power profile is obtained by extracting the power behavior of the micro-controllers for different assembly language instructions. This trained model is then embedded into the integrated circuits at the SoC integration level, which classifies the power profile during runtime to detect the intrusions. We applied our proposed technique on MC8051 micro-controller in VHDL, obtained the power profile of its instruction set and then applied deep learning, k-NN, decision tree and naive Bayesian based machine learning tools to train the models. The cross validation comparison of these learning algorithm, when applied to MC8051 Trojan benchmarks, shows that we can achieve 87\% to 99\% accuracy. To the best of our knowledge, this is the first work in which the power profile of a microprocessor's instruction set is used in conjunction with machine learning for runtime HT detection.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-13

ACCOUNTING FOR SYSTEMATIC ERRORS IN APPROXIMATE COMPUTING
Speaker:
Martin Bruestel, Technical University Dresden, DE
Authors:
Martin Bruestel¹ and Akash Kumar²
¹Technical University Dresden, DE; ²Technische Universitaet Dresden, DE
Abstract
Approximate computing is gaining more and more attention as potential solution to the problem of increasing energy demand in computing. Several recent works focus on the application of deterministic approximate computing to arithmetic computations. Circuits for addition and multiplication are simpliﬁed, trading exactness for energy and/or speed. Recent approximation techniques for adders focus on modiﬁcations of individual full adders' truth tables or shortening carry chains. While the resulting error is usually characterized with statistical measures over the range of possible input/output combinations, the actual adder is a static nonlinear system regarding arithmetic operations and signal processing. The resulting unexpected effects present a challenge for adopting approximate computing as a widespread and standard application-level optimization technique. This paper focuses on the deterministic effects of approximate multi-bit adders, which are especially evident for certain input data in an otherwise well speciﬁed systems, showing the necessity to look beyond purely statistical measures. We show which fundamental principles are violated depending on the chosen approximation scheme, and how this choice affects practical applications. This can serve as a basis for designers to make informed decisions about the use of approximate adders at the application level.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-14

GAUSSIAN MIXTURE ERROR ESTIMATION FOR APPROXIMATE CIRCUITS
Speaker:
Amin Ghasemazar, The University of British Columbia, CA
Authors:
Amin Ghasemazar and Mieszko Lis, University of British Columbia, CA
Abstract
In application domains where perceived quality is limited by human senses, where data are inherently noisy, or where models are naturally inexact, approximate computing offers an attractive tradeoff between accuracy and energy or performance. While several approximate functional units have been proposed to date, the question of how these techniques can be systematically integrated into a design flow remains open. Ideally, units like adders or multipliers could be automatically replaced with their approximate counterparts as part of the design flow. This, however, requires accurately modelling approximation errors to avoid compromising output quality. Prior proposals have either focused on describing errors per-bit or significantly limited estimation accuracy to reduce otherwise exponential storage requirements. When multiple approximate modules are chained, these limitations become critical, and propagated error estimates can be orders of magnitude off. In this paper, we propose an approach where both input distributions and approximation errors are modelled as Gaussian mixtures. This naturally represents the multiple sources of error that arise in many approximate circuits while maintaining reasonable memory requirements. Estimation accuracy is significantly better than prior art (up to 7.2× lower Hellinger distance) and errors can be accurately propagated through a cascade of approximate operations; estimates of quality metrics like MSE and MED are within a few percent of simulation-derived values.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-15

(Best Paper Award Candidate)
ENHANCING SYMBOLIC SYSTEM SYNTHESIS THROUGH ASPMT WITH PARTIAL ASSIGNMENT EVALUATION
Speaker:
Kai Neubauer, University of Rostock, DE
Authors:
Kai Neubauer¹, Philipp Wanko², Torsten Schaub² and Christian Haubelt¹
¹University of Rostock, DE; ²University of Potsdam, DE
Abstract
The design of embedded systems is becoming continuously more complex such that efficient design methods are becoming crucial for competitive results regarding design time and performance. Recently, combined Answer Set Programming (ASP) and Quantifier Free Integer Difference Logic (QF-IDL) solving has been shown to be a promising approach in system synthesis. However, this approach still has several restrictions limiting its applicability. In the paper at hand, we propose a novel ASP modulo theories (ASPmT) system synthesis approach, which (i) supports more sophisticated system models, (ii) tightly integrates the QF-IDL solving into the ASP solving, and (iii) makes use of partial assignment checking. As a result, more realistic systems are considered and an early exclusion of infeasible solutions improves the entire system synthesis.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-16

3DFAR: A THREE-DIMENSIONAL FABRIC FOR RELIABLE MULTICORE PROCESSORS
Speaker:
Valeria Bertacco, University of Michigan-, US
Authors:
Javad Bagherzadeh and Valeria Bertacco, University of Michigan, US
Abstract
In the past decade, silicon technology trends into the nanometer regime have led to significantly higher transistor failure rates. Moreover, these trends are expected to exacerbate with future devices. To enhance reliability,several approaches leverage the inherent core-level and processor-level redundancy present in large chip multiprocessors. However, all of these methods incur high overheads, making them impractical. In this paper, we propose 3DFAR, a novel architecture leveraging 3-dimensional fabrics layouts to efficiently enhance reliability in the presence of faults. Our key idea is based on a fine-grained reconfigurable pipeline for multicore processors, which minimizes routing delay among spare units of the same type by using physical layout locality and efficient interconnect switches, distributed over multiple vertical layers. Our evaluation shows that 3DFAR outperforms state-of-the-art reliable 2D solutions, at a minimal area cost of only 7% over an unprotected design.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-17

EVALUATING IMPACT OF HUMAN ERRORS ON THE AVAILABILITY OF DATA STORAGE SYSTEMS
Speaker:
Hossein Asadi, Sharif University of Technology, IR
Authors:
Mostafa Kishani, Reza Eftekhari and Hossein Asadi, Sharif University of Technology, IR
Abstract
In this paper, we investigate the effect of incorrect disk replacement service on the availability of data storage systems. To this end, we first conduct Monte Carlo simulations to evaluate the availability of disk subsystem by considering disk failures and incorrect disk replacement service. We also propose a Markov model that corroborates the Monte Carlo simulation results. We further extend the proposed model to consider the effect of automatic disk fail-over policy. The results obtained by the proposed model show that overlooking the impact of incorrect disk replacement can result up to three orders of magnitude unavailability underestimation. Moreover, this study suggests that by considering the effect of human errors, the conventional believes about the dependability of different RAID mechanisms should be revised. The results show that in the presence of human errors, RAID1 can result in lower availability compared to RAID5.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-18

GPUGUARD: TOWARDS SUPPORTING A PREDICTABLE EXECUTION MODEL FOR HETEROGENEOUS SOC
Speaker:
Björn Forsberg, ETH Zürich, CH
Authors:
Björn Forsberg¹, Andrea Marongiu² and Luca Benini³
¹ETH Zürich, CH; ²Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ³Università di Bologna, IT
Abstract
The deployment of real-time workloads on commercial off-the-shelf (COTS) hardware is attractive, as it reduces the cost and time-to-market of new products. Most modern high-end embedded SoCs rely on a heterogeneous design, coupling a general-purpose multi-core CPU to a massively parallel accelerator, typically a programmable GPU, sharing a single global DRAM. However, because of non-predictable hardware arbiters designed to maximize average or peak performance, it is very difficult to provide timing guarantees on such systems. In this work we present our ongoing work on GPUguard, a software technique that predictably arbitrates main memory usage in heterogeneous SoCs. A prototype implementation for the NVIDIA Tegra TX1 SoC shows that GPUguard is able to reduce the adverse effects of memory sharing, while retaining a high throughput on both the CPU and the accelerator.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP1-19

A NON-INTRUSIVE, OPERATING SYSTEM INDEPENDENT SPINLOCK PROFILER FOR EMBEDDED MULTICORE SYSTEMS
Speaker:
Lin Li, Infineon Technologies, DE
Authors:
Lin Li¹, Philipp Wagner², Albrecht Mayer¹, Thomas Wild² and Andreas Herkersdorf³
¹Infineon Technologies, DE; ²Technical University of Munich, DE; ³TU München, DE
Abstract
Locks are widely used as a synchronization method to guarantee the mutual exclusion for accesses to shared resources in multi-core embedded systems. They have been studied for years to improve performance, fairness, predictability etc. and a variety of lock implementations optimized for different scenarios have been proposed. In practice, applying an appropriate lock type to a specific scenario is usually based on the developer's hypothesis, which could mismatch the actual situation. A wrong lock type applied may result in lower performance and unfairness. Thus, a lock profiling tool is needed to increase the system transparency and guarantee the proper lock usage. In this paper, an operating-system-independent lock profiling approach is proposed as there are many different operating systems in the embedded field. This approach detects lock acquisition and lock releasing using hardware tracing based on hardware-level spinlock characteristics instead of specific libraries or APIs. The spinlocks are identified automatically; lock profiling statistics can be measured and performance-harmful lock behaviors are detected. With this information, the lock usage can be improved by the software developer. A prototype as a Java tool was implemented to conduct hardware tracing and analyze locks inside applications running on the Infineon AURIX microcontrollers.
Download Paper (PDF; Only available from the DATE venue WiFi)

available at

Visit us at DATE 2017

Booth: 20+21

Booth: 30

Booth: 17

Booth: 26

Booth: 1

Booth: 23

Submissions

IP1 Interactive Presentations

DATE Smartphone App

Visit us at DATE 2017