IP3 Interactive Presentations

Printer-friendly version PDF version

Date: Wednesday 21 March 2018
Time: 16:00 - 16:30
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP3-1TESTBENCH QUALIFICATION FOR SYSTEMC-AMS TIMED DATA FLOW MODELS
Speaker:
Muhammad Hassan, DFKI GmbH, DE
Authors:
Muhammad Hassan1, Daniel Grosse2, Hoang M. Le3, Thilo Voertler4, Karsten Einwich4 and Rolf Drechsler2
1Cyber Physical Systems, DFKI, DE; 2University of Bremen/DFKI GmbH, DE; 3University of Bremen, DE; 4COSEDA Technologies GmbH, DE
Abstract
Analog-Mixed Signal (AMS) circuits have become increasingly important for today's SoCs. The Timed Data Flow (TDF) model of computation available in SystemC-AMS offers here a good tradeoff between accuracy and simulation-speed at the system-level. One of the main challenges in system-level verification is the quality of the testbench. In this paper, we present a testbench qualification approach for SystemC-AMS TDF models. Our contribution is twofold: First, we propose specific mutation models for the class of filters implemented as TDF models. This requires to analyze the Laplace transfer function of the filter design. Second, we present the mutation based qualification approach based on the proposed specific mutations as well as standard behavioral mutations. This allows to find serious quality issues in the testbench. Our experimental results for a real-world AMS system demonstrate the applicability and efficacy of our approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-2AN ALGEBRA FOR MODELING CONTINUOUS TIME SYSTEMS
Speaker:
José Medeiros, University of Brasilia, BR
Authors:
José E. G. de Medeiros1, George Ungureanu2 and Ingo Sander2
1University of Brasília, BR; 2KTH Royal Institute of Technology, SE
Abstract
Advancements on analog integrated design have led to new possibilities for complex systems combining both continuous and discrete time modules on a signal processing chain. However, this also increases the complexity any design flow needs to address in order to describe a synergy between the two domains, as the interactions between them should be better understood. We believe that a common language for describing continuous and discrete time computations is beneficial for such a goal and a step towards it is to gain insight and describe more fundamental building blocks. In this work we present an algebra based on the General Purpose Analog Computer, a theoretical model of computation recently updated as a continuous time equivalent of the Turing Machine.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-3TTW: A TIME-TRIGGERED WIRELESS DESIGN FOR CPS
Speaker:
Romain Jacob, ETH Zurich, CH
Authors:
Romain Jacob1, Licong Zhang2, Marco Zimmerling3, Jan Beutel1, Samarjit Chakraborty2 and Lothar Thiele1
1ETH Zurich, CH; 2Technical University of Munich, DE; 3Technische Universität Dresden, DE
Abstract
Wired fieldbuses have long been proven effective in supporting Cyber-Physical Systems (CPS). However, various domains are now striving for wireless solutions due to ease of deployment or novel functionality requiring the ability to support mobile devices. Low-power wireless protocols have been proposed in response to this need, but requirements of a large class of CPS applications can still not be satisfied. We thus propose Time-Triggered Wireless (TTW), a distributed low-power wireless system design that minimizes communication energy consumption and offers end-to-end timing predictability, runtime adaptability, reliability, and low latency. Evaluation shows a 2x reduction in communication latency and 33-40% lower radio-on time compared with DRP, the closest related work, validating the suitability of TTW for new exciting wireless CPS applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-4PHYLAX: SNAPSHOT-BASED PROFILING OF REAL-TIME EMBEDDED DEVICES VIA JTAG INTERFACE
Speaker:
Eduardo Chielle, New York University Abu Dhabi, BR
Authors:
Charalambos Konstantinou1, Eduardo Chielle2 and Michail Maniatakos2
1New York University, US; 2New York University Abu Dhabi, AE
Abstract
Real-time embedded systems play a significant role in the functionality of critical infrastructure. Legacy microprocessor-based embedded systems, however, have not been developed with security in mind. Applying traditional security mechanisms in such systems is challenging due to computing constraints and/or real-time requirements. Their typical 20-30 year lifespan further exacerbates the problem. In this work, we propose PHYLAX, a plug-and-play solution to detect intrusions in already installed embedded devices. PHYLAX is an external monitoring tool which does not require code instrumentation. Also, our tool adapts and prioritizes intrusion detection based on the requirements of the underlying infrastructure (power grid, chemical factory, etc.) as well as the computing capabilities of the target embedded system (CPU model, memory size, etc.). PHYLAX can be employed on any legacy device which incorporates a JTAG interface. As a case study, we present the inclusion of PHYLAX on a power grid recloser controller.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-5CHARACTERIZING DISPLAY QOS BASED ON FRAME DROPPING FOR POWER MANAGEMENT OF INTERACTIVE APPLICATIONS ON SMARTPHONES
Speaker:
Chung-Ta King, National Tsing Hua University, TW
Authors:
Kuan-Ting Ho1, Chung-Ta King1, Bhaskar Das1 and Yung-Ju Chang2
1National Tsing Hua University, TW; 2National Chiao Tung University, TW
Abstract
User-centric power management in smartphones aims to conserve power without affecting user's perceived quality of experience. Most existing works focus on periodically updated applications such as games and video players and use a fixed frame rate, measured in frame per second (FPS), as the metric to quantify the display quality of service (QoS). The idea is to adjust the CPU/GPU frequency just enough to maintain the frame rate at a user satisfactory level. However, when applied to aperiodically-updated interactive applications, e.g. Facebook or Instagram, that draw the frame buffer at a varying rate in response to user inputs, such a power management strategy becomes too conservative. Based on real user experiments, we observe that users can tolerate a certain percentage of frame drops when running aperiodically updated applications without affecting their perceived display quality. Hence, we introduce a new metric to characterize display quality of service, called the frame drawn ratio (FDR), and propose a new CPU/GPU frequency governor based on the FDR metric. The experiments by real users show that the proposed governor can conserve 17.2% power in average when compared to the default governor, while maintaining the same or even better QoE rating.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-6PREDICTION-BASED FAST THERMOELECTRIC GENERATOR RECONFIGURATION FOR ENERGY HARVESTING FROM VEHICLE RADIATORS
Speaker:
Xue Lin, Northeastern University, US
Authors:
Hanchen Yang1, Feiyang Kang2, Caiwen Ding3, Ji Li4, Jaemin Kim5, Donkyu Baek6, Shahin Nazarian4, Xue Lin7, Paul Bogdan4 and Naehyuck Chang8
1Beijing University of Posts and Telecommunications, CN; 2Zhejiang University, CN; 3Syracuse University, US; 4University of Southern California, US; 5Seoul National University, KR; 6Korea Advanced Institute of Science and Technology, KR; 7Northeastern University, US; 8KAIST, KR
Abstract
Thermoelectric generation has increasingly drawn attention for being environmentally friendly. However, only a few of the prior researches on thermoelectric generators (TEG) have focused on improving efficiency at system level. They attempt to capture the electrical property changes on TEG modules as the temperature fluctuates on vehicle radiators. The most recent reconfiguration algorithm shows large improvements on output performance but suffers from major drawback on computational time and energy overhead, and non-scalability in terms of array size and processing frequency. In this paper, we propose a novel TEG array reconfiguration algorithm that determines near-optimal configuration with an acceptable computational time. More precisely, with O(N) time complexity, our prediction-based fast TEG reconfiguration algorithm enables all modules to work at or near their maximum power points (MPP). Additionally, we incorporate prediction methods to further reduce the runtime and switching overhead during the reconfiguration process. Experimental results present 30% performance improvement, almost 100x reduction on switching overhead and 13x enhancement on computational speed compared to the baseline and prior work. The scalability of our algorithm makes it applicable to larger scale systems such as industrial boilers and heat exchangers.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-7A PARAMETERIZED TIMING-AWARE FLIP-FLOP MERGING ALGORITHM FOR CLOCK POWER REDUCTION
Speaker:
Chaochao Feng, National University of Defense Technology, CN
Authors:
Chaochao Feng1, Daheng Yue1, Zhenyu Zhao1 and Zhuofan Liao2
1National University of Defense Technology, CN; 2Changsha University of Science and Technology, CN
Abstract
In modern integrated circuits, the clock power contributes a dominant part of the chip power. Clock power can be reduced effectively by utilizing multi-bit flip-flops. In this paper, a parameterized timing-aware flip-flop merging algorithm is proposed for clock power reduction. The single-bit flip-flops are merged into multi-bit flip-flops after placement & optimization and before clock network synthesis with consideration of function, scan chain information, distance and timing constraints. The algorithm can be configured with different parameters, such as the bit-number of MBFF, the setup timing margin and the distance margin. Experimental results under an industrial design show that compared with the basic design without MBFF, the design with 2-bit, 4-bit, 6-bit, and 8-bit MBFFs can save 7.5%, 12%, 11.8% and 11.1% total power consumption respectively. Using MBFF4 to replace 1-bit FFs is the best choice for the design optimization, which achieves minimum area and total power consumption. We also compare the designs with MBFF4 replacement under five different setup timing margins and distance margins. Without violating any timing constraint, it is better to set the setup timing margin as small as possible to achieve best power optimization. The distance margin (100μm, 30μm) is the best choice for this industry design to achieve minimum power consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-8FAST CHIP-PACKAGE-PCB COANALYSIS METHODOLOGY FOR POWER INTEGRITY OF MULTI-DOMAIN HIGH-SPEED MEMORY: A CASE STUDY
Speaker:
Seungwon Kim, Ulsan National Institute of Science and Technology, KR
Authors:
Seungwon Kim1, Ki Jin Han2, Youngmin Kim3 and Seokhyeong Kang1
1Ulsan National Institute of Science and Technology (UNIST), KR; 2Dongguk University, KR; 3Kwangwoon University, KR
Abstract
The power integrity of high-speed interfaces is an increasingly important issue in mobile memory systems. However, because of complicated design variations such as adjacent VDD domain coupling, conventional case-specific modeling is limited in analyzing trends in results from parametric variations. Moreover, conventional industrial methods can be simulated only after the design layout is completed and it requires a lot of back-annotation processes, which result in delayed delays time to market. In this paper, we propose a chip-package-PCB coanalysis methodology applied to our multi-domain high-speed memory system model with a current generation method. Our proposed parametric simulation model can analyze the tendency of power integrity results from variable sweeps and Monte Carlo simulations, and it shows a significantly reduced runtime compared to the conventional EDA methodology under JEDEC LPPDR4 environment.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-9APPROXIMATE HARDWARE GENERATION USING SYMBOLIC COMPUTER ALGEBRA EMPLOYING GRöBNER BASIS
Speaker:
Saman Fröhlich, DFKI GmbH, DE
Authors:
Saman Froehlich1, Daniel Grosse2 and Rolf Drechsler2
1Cyber-Physical Systems, DFKI GmbH, DE; 2University of Bremen/DFKI GmbH, DE
Abstract
Many applications are inherently error tolerant. Approximate Computing is an emerging design paradigm, which gives the opportunity to make use of this error tolerance, by trading off accuracy for performance. The behavior of a circuit can be defined at an arithmetic level, by describing the input and output relation as a polynomial. Symbolic Computer Algebra (SCA) has been employed to verify that a given circuit netlist matches the behavior specified at the arithmetic level. In this paper, we present a method that relaxes the exactness requirement of the implementation. We propose a heuristic method to generate an approximation for a given netlist and use SCA to ensure that the result is within application-specific bounds for given error-metrics. In addition, our approach allows for automatic generation of approximate hardware wrt. applicationspecific input probabilities. To the best of our knowledge taking input probabilities, which are known for many practical applications, into account has not been considered before. We employ the proposed approach to generate approximate adders and show that the results outperform state-of-the-art, handcrafted approximate hardware.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-10RECONFIGURABLE IMPLEMENTATION OF $GF(2^M)$ BIT-PARALLEL MULTIPLIERS
Speaker and Author:
José L. Imaña, Complutense University of Madrid, ES
Abstract
Hardware implementations of arithmetic operations over binary finite fields $GF(2^m)$ are widely used in several important applications, such as cryptography, digital signal processing and error-control codes. In this paper, efficient reconfigurable implementations of bit-parallel canonical basis multipliers over binary fields generated by type II irreducible pentanomials $f(y) = y^m + y^{n+2} + y^{n+1} + y^n +1$ are presented. These pentanomials are important because all five binary fields recommended by NIST for ECDSA can be constructed using such polynomials. In this work, a new approach for $GF(2^m)$ multiplication based on type II pentanomials is given and several post-place and route implementation results in Xilinx Artix-7 FPGA are reported. Experimental results show that the proposed multiplier implementations improve the area$imes$time parameter when compared with similar multipliers found in the literature.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-11PROCESSING IN 3D MEMORIES TO SPEEDUP OPERATIONS ON COMPLEX DATA STRUCTURES
Speaker:
Luigi Carro, UFRGS, BR
Authors:
Paulo Cesar Santos1, Geraldo Francisco de Oliveira Junior1, Joao Paulo Lima1, Marco Antonio Zanata Alves2, Luigi Carro1 and Antonio Carlos Schneider Beck1
1UFRGS, BR; 2UFPR, BR
Abstract
Pointer chasing has been, for years, the kernel operation employed by diverse data structures, from graphs to hash tables and dictionaries. However, due to the bewildering growth in the volume of data that current applications have to deal with, performing pointer chasing operations have become a major source of performance and energy bottleneck, due to its sparse memory access behavior. In this work, we aim to tackle this problem by taking advantage of the already available parallelism present in today's 3D-stacked memories. We present a simple mechanism that can accelerate pointer chasing operations by making use of a state-of-the-art PIM design that executes in-memory vector operations. The key idea behind our design is to run speculative loads, in parallel, based on a given memory address in a reconfigurable window of addresses. Our design can perform pointer-chasing operations on b+tree 4.9x faster when compared to modern baseline systems. Besides that, since our device avoids data movement and alleviates the memory hierarchy's inefficiency due to poor spatial data locality, we can also reduce energy consumption by 85% when compared to the baseline.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-12AN EFFICIENT NBTI-AWARE WAKE-UP STRATEGY FOR POWER-GATED DESIGNS
Speaker:
Yu-Guang Chen, Yuan Ze University, TW
Authors:
Kun-Wei Chiu1, Yu-Guang Chen2 and Ing-Chao Lin1
1National Cheng Kung University, TW; 2Yuan Ze University, TW
Abstract
The wake-up process of a power-gated design may induce an excessive surge current and threaten the signal integrity. A proper wake-up sequence should be carefully designed to avoid surge current violations. On the other hand, PMOS sleep transistors may suffer from the negative-bias temperature instability (NBTI) effect which results in decreased driving current. Conventional wake-up sequence decision approaches do not consider the NBTI effect, which may result in a longer or unacceptable wake-up time after circuit aging. Therefore, in this paper, we propose a novel NBTI-aware wake-up strategy to reduce the average wake-up time within a circuit lifetime. Our strategy first finds a set of proper wake-up sequences for different aging scenarios (i.e. after a certain period of aging), and then dynamically reconfigures the wake-up sequences at runtime. The experimental results show that compared to a traditional fixed wake-up sequence approach, our strategy can reduce average wake-up time by as much as 45.04% with only 3.7% extra area overhead for the reconfiguration structure.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-13DESIGNING RELIABLE PROCESSOR CORES IN ULTIMATE CMOS AND BEYOND: A DOUBLE SAMPLING SOLUTION
Speaker:
Nacer-Eddine Zergainoh, TIMA, FR
Authors:
Thierry Bonnoit, Fraidy Bouesse, Nacer-Eddine Zergainoh and Michael Nicolaidis, TIMA, FR
Abstract
The double sampling paradigm is an efficient method to protect the circuits against soft-errors. But the data that are going out of the area protected by double sampling are still vulnerable. To eliminate this weakness without having additional constraints on the datapaths, the most common solution adds a contaminable buffer stage between the two areas. Therefore, this stage avoids the propagation of the potentially corrupted data further in the circuit when an error is detected in the double sampling area. But the issue is that this stage must itself be protected against soft-errors, which drastically increases the cost of the solution. In this paper we characterize the additional implementation constraints due to this vulnerability. We proposed an architectural solution that uses three latches to remove those constraints and protect the area outside the double sampling domain without adding a buffer stage. We present an implementation of this solution on the LEON3 processor, and we compare the results in terms of additional cost and efficiency with other solutions.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-14DESIGN OF A TIME-PREDICTABLE MULTICORE PROCESSOR: THE T-CREST PROJECT
Speaker and Author:
Martin Schoeberl, Technical University of Denmark, DK
Abstract
Real-time systems need to deliver results in time and often this timely production of a result needs to be guaranteed. Static timing analysis can be used to bound the worst-case execution time of tasks. However, this timing analysis is only possible if the processor architecture is analysis friendly. This paper presents the T-CREST processor, a real-time multicore processor developed to be time-predictable and an easy target for static worst-case execution time analysis. We present how to achieve time-predictability at all levels of the architecture, from the processor pipeline, via a network-on-chip, up to the memory controller. The main architectural feature to provide time predictability is to use static arbitration of shared resources in a time-division multiplexing way.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-15ERROR RESILIENCE ANALYSIS FOR SYSTEMATICALLY EMPLOYING APPROXIMATE COMPUTING IN CONVOLUTIONAL NEURAL NETWORKS
Speaker:
Muhammad Abdullah Hanif, Vienna University of Technology, Vienna, AT
Authors:
Muhammad Abdullah Hanif1, Rehan Hafiz2 and Muhammad Shafique1
1TU Wien, AT; 2ITU, PK
Abstract
Approximate computing is an emerging paradigm for error resilient applications as it leverages accuracy loss for improving power, energy, area, and/or performance of an application. The spectrum of error resilient applications includes the domains of Image and video processing, Artificial intelligence (AI) and Machine Learning (ML), data analytics, and other Recognition, Mining, and Synthesis (RMS) applications. In this work, we address one of the most challenging question, i.e., how to systematically employ approximate computing in Convolutional Neural Networks (CNNs), which are one of the most compute-intensive and the pivotal part of AI. Towards this, we propose a methodology to systematically analyze error resilience of deep CNNs and identify parameters that can be exploited for improving performance/efficiency of these networks for inference purposes. We also present a case study for significance-driven classification of filters for different convolutional layers, and propose to prune those having the least significance, and thereby enabling accuracy vs. efficiency tradeoffs by exploiting their resilience characteristics in a systematic way.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-16DEMAS: AN EFFICIENT DESIGN METHODOLOGY FOR BUILDING APPROXIMATE ADDERS FOR FPGA-BASED SYSTEMS
Speaker:
Semeen Rehman, Vienna University of Technology (TU Wien), AT
Authors:
Bharath Srinivas Prabakaran1, Semeen Rehman1, Muhammad Abdullah Hanif1, Salim Ullah2, Ghazal Mazaheri3, Akash Kumar2 and Muhammad Shafique1
1TU Wien, AT; 2Technische Universität Dresden, DE; 3UC Riverside, US
Abstract
The current state-of-the-art approximate adders are mostly ASIC-based, i.e., they focus solely on gate and/or transistor level approximations (e.g., through circuit simplification or truncation) to achieve area, latency, power and/or energy savings at the cost of accuracy loss. However, when these designs are synthesized for FPGA-based systems, they do not offer similar reductions in area, latency and power/energy due to the underlying architectural differences between ASICs and FPGAs. In this paper, we present a novel generic design methodology to synthesize and implement approximate adders for any FPGA-based system by considering the underlying resources and architectural differences. Using our methodology, we have designed, analyzed and presented eight different multi-bit adder architectures. Compared to the 16-bit accurate adder, our designs are successful in achieving area, latency and power-delay product gains of 50%, 38%, and 53%, respectively. We also compare our approximate adders to state-of-the-art approximate adders specialized for ASIC and FPGA fabrics and demonstrate the benefits of our approach. We will make the RTL and behavioral models of our and state-of-the-art designs open-source at https://sourceforge.net/projects/approxfpgas/ to further fuel the research and development in the FPGA community and to ensure reproducible research.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-17GAIN SCHEDULED CONTROL FOR NONLINEAR POWER MANAGEMENT IN CMPS
Speaker:
Nikil Dutt, University of California, Irvine, US
Authors:
Bryan Donyanavard, Amir M. Rahmani, Tiago Muck, Kasra Moazzemi and Nikil Dutt, University of California, Irvine, US
Abstract
Dynamic voltage and frequency scaling (DVFS) is a well-established technique for power management of thermal- or energy-sensitive chip multiprocessors (CMPs). In this context, linear control theoretic solutions have been successfully implemented to control the voltage-frequency knobs. However, modern CMPs with a large range of operating frequencies and multiple voltage levels display nonlinear behavior in the relationship between frequency and power. State-of-the-art linear controllers therefore leave room for opportunity in optimizing DVFS operation. We propose a Gain Scheduled Controller (GSC) for nonlinear runtime power management of CMPs that simplifies the controller implementation of systems with varying dynamic properties by utilizing an adaptive control theoretic approach in conjunction with static linear controllers. Our design improves the stability, accuracy, settling time, and overshoot of the controller over a linear controller with minimal overhead. We implement our approach on an Exynos platform containing ARM's big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that the system's response to changes in target power is improved by 2x while operating up to 12% more efficiently.

Download Paper (PDF; Only available from the DATE venue WiFi)