Booklet Proof Reading

Goto Session:

UB01 Session 1
UB02 Session 2
UB03 Session 3
UB04 Session 4
UB05 Session 5
UB06 Session 6
UB07 Session 7
UB08 Session 8
UB09 Session 9
UB10 Session 10
UB11 Session 11
1.1 Opening Session: Plenary, Awards Ceremony & Keynote Addresses
UB01 Session 1
2.1 Executive Panel - New Opportunities in the Internet of Things
2.2 Adaptability for Low Power Computing
2.3 System Level Design Methods
2.4 Automotive Systems and Smart Energy Systems
2.5 Power of Assertions
2.6 Design and Analysis of Dependable Systems
2.7 Compilation and Code Transformations for Reconfigurable Computing
2.8 Facilities for Design and Fabrication for FD-SOI IC
UB02 Session 2
3.0 LUNCH TIME KEYNOTE SESSION: "How micro-electronic will change your life style" sponsored by Mentor Graphics
3.1 Executive Panel - Extending Moore's Law & Heterogeneous Integration
3.2 Passive Implementation Attacks and Countermeasures
3.3 Loop Acceleration
3.4 Tackling Memory Walls with Emerging Architectures and Technologies
3.5 Breaking Simulation Boundaries
3.6 Hot Topic - Memristor based Computation-in-Memory Architecture for Data-Intensive Applications
3.7 Model-based Analysis and Verification
3.8 Hot Topic - Design Methodologies for a Cyber-Physical Systems Approach to Personalized Medicine-on-a-Chip: Challenges and Opportunities
UB03 Session 3
IP1 Interactive Presentations
4.1 Executive Panel - Trends and Challenges in Today's Automotive Semiconductors
4.2 Implementation and Verification of Security Components
4.3 Multi-/Manycore Scheduling
4.4 Exploring Reliability and Efficiency Tradeoffs at the Architectural Level
4.5 Industrial Test and Validation Experiments
4.6 Online Testing and Reliable Memories
4.7 How Resilient Are Emerging Technologies?
4.8 Strength by Interdisciplinary Research: The Cadence Academic Network
UB04 Session 4
Exhibition-Reception Exhibition Reception
5.1 SPECIAL DAY Hot Topic: Applications of IoT
5.2 Hardware Trojan and Active Implementation Attacks
5.3 Variability Challenges in Nanoscale Circuits
5.4 Emerging Technologies for NoCs
5.5 Critical Embedded Systems
5.6 Analyzing and Improving Memories
5.7 Architectures and Design for Cyber-Physical Systems
5.8 Hot Topic - The Next Generation of Virtual Prototyping: Ultra-fast yet Accurate Simulation of HW/SW Systems
IP2 Interactive Presentations
UB05 Session 5
6.1 SPECIAL DAY Hot Topic: Platforms for the IoT
6.2 Physical Unclonable Functions
6.3 Emerging Low Power Techniques
6.4 Bridging the Moore's Law Gap with Application-Specific Architectures
6.5 Multimedia and Consumer Electronics
6.6 Panel - The Future of Electronics, Semiconductor, and Design in Europe
6.7 Application-Mapping Strategies for Many-Cores
6.8 Panel - The Future of Electronics, Semiconductor, and Design in Europe -> takes place as session 6.6 in Salle Bayard
UB06 Session 6
7.0 SPECIAL DAY Keynotes
UB07 Session 7
7.1 SPECIAL DAY Hot Topic: Design Tools for the IoT
7.2 Hot Topic - Trading Accuracy for Efficient Computing
7.3 Hot Topic - Advances in Hardware Trojans Detection
7.4 Routing Advances for Fault-tolerant and Multicast NoCs
7.5 System Reliability: from Runtime to Design Languages
7.6 Test Power and 3-D Fault Tolerance
7.7 Energy-efficient Computing
7.8 Critical Research Areas Driven by Industry Transformations
IP3 Interactive Presentations
UB08 Session 8
8.1 SPECIAL DAY Panel: Security and Verification for the IoT
8.2 Flash Memories & Numerical Approximation
8.3 Dynamic Thermal Management for Multi-cores
8.4 Industrial System Design Opportunities
8.5 Hot Topic - Spintronics based Computing
8.6 Statistical Answers to Analog/Mixed Signal Design and Test Problems
8.7 Compilers and Tools for Performance
8.8 Share a Fab - Multi Project Wafers Enable Your Innovations
Party DATE Party
9.1 SPECIAL DAY Hot Topic: Game-changing Innovative Technology Platforms for Health Care
9.2 Hot Topic - Transparent Use of Accelerators in Heterogeneous Computing Systems
9.3 NoC Optimization
9.4 Advanced Trends in Alternative Technologies
9.5 Modeling and Simulation of Extra-Functional Properties
9.6 Design, Synthesis and Validation of Analog Circuits
9.7 Test Generation, Fault Simulation and Diagnosis
9.8 Hot Topic - Monolithic 3D: A Path to Real 3D Integrated Chips
IP4 Interactive Presentations
UB09 Session 9
10.1 SPECIAL DAY Hot Topic: Wearable Medical Applications
10.2 Emerging Memory Architectures
10.3 Modern Architectures for Real-Time Systems
10.4 Energy Aware Data Center: Design and Management
10.5 Reconfigurable Architectures and Applications
10.6 Circuit Design and Test: From Characterization to Measurement
10.7 Expanding the Applicability of Formal Methods
10.8 From IP to EDA Tools Enterprise Management: What is so special?
UB10 Session 10
11.0 SPECIAL DAY Keynote
11.1 SPECIAL DAY Hot Topic: Implantable Medical Applications
11.2 Variability and Robustness for Emerging Technologies
11.3 Hot Topic - Multi/Many-Core Programming: Where Are We Standing?
11.4 Logic Synthesis: the Faithful, the Approximate and the Stochastic
11.5 Ultra-low Power Devices for Health and Rehabilitation
11.6 Video Architectures for Multimedia and Communications
11.7 Exploiting Dark Silicon
11.8 Exhibition Keynote - Designing Systems for the Connected Autonomous Future: An Industry Perspective
UB11 Session 11
12.8 Tutorial: An Industry Approach to FPGA/ARM System Development and Verification
IP5 Interactive Presentations
12.1 SPECIAL DAY Hot Topic: Technology and Design Platforms for Diagnostics
12.2 Solver Advances and Emerging Applications
12.3 Patterning, Pairing, Placement and Packing
12.4 High-Level Specifications and Models
12.5 New Perspectives in Next-Generation Medical Systems
12.6 Medical Design Automation: Is All That Simulation and Model Reduction Getting Into Your "Head"?
12.7 Brain Health and Mental Disorders: new challenges for electronic engineers

UB01 Session 1

Date: Tuesday 15 March 2016
Time: 10:30 - 12:30
Location / Room:

Label	Presentation Title Authors
12:30	End of session

UB02 Session 2

Date: Tuesday 15 March 2016
Time: 12:30 - 15:00
Location / Room:

Label	Presentation Title Authors
15:00	End of session

UB03 Session 3

Date: Tuesday 15 March 2016
Time: 15:00 - 17:30
Location / Room:

Label	Presentation Title Authors
17:30	End of session

UB04 Session 4

Date: Tuesday 15 March 2016
Time: 17:30 - 19:30
Location / Room:

Label	Presentation Title Authors
19:30	End of session

UB05 Session 5

Date: Wednesday 16 March 2016
Time: 10:00 - 12:00
Location / Room:

Label	Presentation Title Authors
12:00	End of session

UB06 Session 6

Date: Wednesday 16 March 2016
Time: 12:00 - 14:00
Location / Room:

Label	Presentation Title Authors
14:00	End of session

UB07 Session 7

Date: Wednesday 16 March 2016
Time: 14:00 - 16:00
Location / Room:

Label	Presentation Title Authors
16:00	End of session

UB08 Session 8

Date: Wednesday 16 March 2016
Time: 16:00 - 18:00
Location / Room:

Label	Presentation Title Authors
18:00	End of session

UB09 Session 9

Date: Thursday 17 March 2016
Time: 10:00 - 12:00
Location / Room:

Label	Presentation Title Authors
12:00	End of session

UB10 Session 10

Date: Thursday 17 March 2016
Time: 12:00 - 14:30
Location / Room:

Label	Presentation Title Authors
14:30	End of session

UB11 Session 11

Date: Thursday 17 March 2016
Time: 14:30 - 16:30
Location / Room:

Label	Presentation Title Authors
16:30	End of session

1.1 Opening Session: Plenary, Awards Ceremony & Keynote Addresses

Date: Tuesday 10 March 2015
Time: 08:30 - 10:30
Location / Room: Auditorium Dauphiné

Chair:
Wolfgang Nebel, University of Oldenburg, DE

Co-Chair:
David Atienza, EPFL, CH

Time	Label	Presentation Title Authors
08:30	1.1.1	WELCOME ADDRESSES Speakers: Wolfgang Nebel¹ and David Atienza² ¹DATE 2015 General Chair, OFFIS, DE; ²DATE 2015 Program Chair, EPFL, CH
08:45	1.1.2	PRESENTATION OF DISTINGUISHED AWARDS
09:10	1.1.3	KEYNOTE ADDRESS: Speaker: Geneviève Fioraso, secrétaire d'Etat chargée de l'Enseignement supérieur et de la Recherche (to be confirmed), FR
	1.1.4	KEYNOTE ADDRESS: EUROPEAN MICROELECTRONICS STRATEGY Speaker: Günther H. Oettinger, EU Commission, DE
09:50	1.1.5	KEYNOTE ADDRESS: ST TECHNOLOGIES FULLY ADDRESSING INTERNET OF THINGS APPLICATIONS FROM LP DIGITAL TO RF-CMOS, ENVM AND SENSORS Speaker: Jean Marc Chery, STMicroelectronics, FR
10:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

UB01 Session 1

Date: Tuesday 10 March 2015
Time: 10:30 - 12:30
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB01.1	4-LOOP: 4-CORE LEON 3 WITH LINUX OPERATING SYSTEM, OPENMP LIBRARY AND HARDWARE PROFILING SYSTEM Presenters: Giacomo Valente, Vittoriano Muttillo and Andrea Moro, University of L'Aquila, IT Authors: Vittoriano Muttillo and Fabio Federici, Abstract Multi-processor SoC based on soft-cores are increasing the range of applications that could be implemented by exploiting FPGAs. In this context, this demo presents a symmetric multi-processor system, composed of four Leon 3 cores and a custom Linux kernel, able to execute OpenMP-based applications and enhanced with a hardware profiling system. OpenMP support and the novel profiling system are the results of R&D activities conducted by several students and Professors at University of L'Aquila. More information ...
UB01.2	ORIENTOMA: A WEARABLE ORIENTATION SYSTEM FOR BLIND AND VISUALLY IMPAIRED PEOPLE Presenter: Giuseppe AiroFarulla, Politecnico di Torino, IT Authors: Marco Indaco and Ludovico Russo, Politecnico di Torino, IT Abstract This work aims to design and implement a low-cost wear-able orientation system able to help blind (or visually im-paired) people to walk and orienting autonomously in un-known environments. The basic principle is to create a system able to understand information from the context where the user is and to convey such information using a way that is intelligible for blind people. Such information may come from suitable sensors integrated in the user's smartphone, which represents the core of the system, and/or from other wearable devices each people is sup-posed to have, such as a smartwatch. More information ...
UB01.3	BONDCALC: THE BOND CALCULATOR Presenter: Carl Christoph Jung, Reutlingen University, DE Authors: Christian Silber¹ and Juergen Scheible² ¹Robert Bosch GmbH, DE; ²Reutlingen University, DE Abstract The Bond Calculator is a fast and exact tool to help designers to choose a bond wire, which does not fuse. The Bond Calculator is orders of magnitude faster than FEM and Easy-to-use. The Bond Calculator helps designers to estimate the temperature at the bond connection itself, by calculating the time and space dependence of the power delivered from the bond wire to the chip. These temperature changes can affect the durability of the bond connection. The Bond Calculator uses a simplified simulation model to calculate the temperature profile in a bond wire from the induced current profile. This software tool has been validated by FEM and measurement. More information ...
UB01.4	DESIGNING AND EVALUATING RESOURCE MANAGEMENT POLICIES FOR HETEROGENEOUS SYSTEM ARCHITECTURES Presenter: Gianluca Durelli, Politecnico di Milano, IT Authors: Cristiana Bolchini, Antonio Miele, Gabriele Pallotta, Marcello Pogliani and Marco Santambrogio, Politecnico di Milano, IT Abstract Current trends in computing architectures are going in the direction of heterogeneous systems (i.e. constituted by CPUs, GPUs, and FPGAs). The design space to effectively exploit these platforms is huge. Within this context, research is moving towards systems able to adapt themselves to a wide range of workloads to optimize performance/energy trade-offs. We propose a virtual platform (VP) to help designers to develop adaptive policies. The VP allows to perform an high-level evaluation of the policies with the possibility to customize both the architecture and the workload mix. More information ...
UB01.5	IMPLEMENTATIONS OF THE SEMI-GLOBAL MATCHING 3D VISION ALGORITHM FOR AUTOMOTIVE APPLICATIONS Presenter: Affaq Qamar, Politecnico di Torino, IT Author: Luciano Lavagno, Politecnico di Torino, IT Abstract The demo will show our real-time hardware implementations on a Xilinx® ZynqTM System-on-Chip of the Semi-Global Matching (SGM) algorithm, which is frequently used in stereo vision systems, e.g. for automotive applications. We will also compare the quality of results, flexibility and design time that we achieved using both High-Level Synthesis (HLS) and manual RTL design. The use of HLS is particularly promising because the automotive industry is very sensitive to production costs, hence it requires various implementations of the same algorithm, with very different resolutions, costs, and performance levels, for different target market segments. SGM mainly consists of three sequential processing steps which are, (i) cost cube calculation, (ii) path cost computation and (iii) disparity estimation and minimization. The path cost computation further involves processing of pixel wise cost cube data into eight distinct directions. The initial algorithmic "golden" model used very large arrays, which had to be mapped to an external DRAM and brought into the on-chip RAM of the FPGA on demand. This required both adding the memory transfer loops and inserting calls to the AXI transactors that access the DRAM through the on-chip DDR slave. Moreover, the initial single-threaded algorithm had to be parallelized, by converting the top-level sweeps of the image in eight directions into forward and backward passes. Both manual RTL and HLS designs were suitable to achieve the target real-time performance. The design space was thus explored by making several fairly different micro-architectural choices. In the end, it was possible to obtain an implementation which is comparable to the manual RTL design. The authors intend to demonstrate the FPGA based HW implementation of the SGM algorithm (upon permission from the industrial partners) and discuss the HLS flow and comparison strategy. More information ...
UB01.6	MAMMA: SPEECH ENHANCEMENT DEMO EXPLOITING MEMS MICROPHONE ARRAY FOR PEOPLE WITH DISABILITIES Presenter: Luca Sarti, University of Pisa, IT Authors: Alessandro Palla¹, Luca Fanucci¹ and Roberto Sannino² ¹University of Pisa, IT; ²STMicroelectronics, IT Abstract Disabled people, especially the ones with motor skill impairments, have difficulties in interaction with electronic devices. Indeed voice recognition could be exploited, but its performance strongly depends by the environmental noise. We propose a wearable speech enhancement system based on MEMS microphone array and an ARM Cortex M4 CPU featuring a beamforming technique and an adaptive acoustic echo cancellation filtering in order to increase SNR of acquired voice stream. An increase by 16.5 dB in the SNR is obtained when noise and voice come from opposite directions. Theoretical analysis and in-system measurements prove the effectiveness of the proposed solution. More information ...
UB01.7	WORKCRAFT: FRAMEWORK FOR INTERPRETED GRAPHS Presenter: Danil Sokolov, Newcastle University, GB Abstract Workcraft is a cross-platform framework for capture, simulation, synthesis and verification of graph models. It supports a wide range of popular graph formalisms and provides a plugin-based framework for modelling and analysis of new model types. More information ...
UB01.9	ISP RAS VERIFICATION TOOLS: INTEGRATED APPROACH TO HARDWARE VERIFICATION AT UNIT AND SYSTEM LEVELS BASED ON STATIC AND DYNAMIC METHODS Presenter: Andrei Tatarnikov, Institute for System Programming of the Russian Academy of Sciences (ISP RAS), RU Authors: Mikhail Chupilko, Alexander Kamkin, Artem Kotsynyak and Sergey Smolov, Institute for System Programming of the Russian Academy of Sciences (ISP RAS), RU Abstract Verification has long been recognized as an integral part of the hardware design process. As each hardware design is developed from unit- and core-level point of view, verification process should account this fact and provide means for dealing with both of them. Applied approaches include both static (formal methods, source code analysis) and dynamic (testing) methods. To facilitate verification, it is important to provide a uniform methodology that would allow integrating different approaches. In this work, we present a set of verification tools that takes advantage exactly of combining static and dynamic approaches. This allows knowledge sharing between tools, which helps to build more accurate models of hardware designs to be used in verification activities at different levels of abstraction. Brief descriptions of the tools are given below. MicroTESK is a reconfigurable (retargetable and extendable) model-based test program generator for microprocessors and other programmable devices. Lightweight formal specifications customize the generator for a particular architecture and provide knowledge about situations to be covered by tests. A convenient test template framework allows rapid development of complex verification scenarios. Being retargetable, MicroTESK is able to support various RISC and CISC architectures. C++TESK is an open-source C++ based toolkit intended for automated functional testing of software components (mostly in C/C++) and RTL (HDL) models of digital hardware (in Verilog and VHDL). The main part of the toolkit is a library of C++ classes and macros that define facilities for constructing formal specifications (reference models), adapters of components under test, test scenarios and test coverage metrics. Basing on C++ descriptions provided by a user, a test system is compiled. It allows automatically generating and applying sequences of stimuli to the component under test, checking correctness of its reactions and collecting statistics on test execution. Besides the basic library, the toolkit includes a report generator, means for parallelizing test execution on computer clusters, and Eclipse-based IDE. The toolkit is planned to be integrated into UVM methodology. Retrascope is an extendable toolkit for RTL (HDL) models transformation and functional verification at unit level. Analyzing source HDL-code, it extracts control and data flows, transforms them into Extended Finite State Machines (EFSM), and generates covering test sequences for them. The toolkit supports RTL modules written in VHDL and Verilog. It can be used both from command line and from Eclipse-based IDE. More information ...
12:30	End of session
13:00	Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

2.1 Executive Panel - New Opportunities in the Internet of Things

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Salle Oisans

Organiser:
Yervant Zorian, Fellow & Chief Architect, Synopsys, US

Billions of devices connected to the internet is not too far from today's reality. Such an Internet of Things offers advanced connectivity between built-in sensors, field operation devices, and cloud systems, covering a variety of applications, including medical, home automation, energy, transportation, environmental monitoring, etc. This results in several new approaches and innovative methods that work together to enable the network of smart devices. Executives in this session will discuss the impact of IoT on the semiconductor industry and the new opportunities it may bring in designing today's Internet of Things.

Panelists:

Mojy Chian, Silicon Cloud International, US
Christoph Heer, Intel, DE
Subramani Kengeri, GLOBALFOUNDRIES, US
Philipe Magarshack, STMicroelectronics, FR
Remy Pottier, ARM, GB
Yankin Tanurhan, Synopsys, US

13:00

End of session
Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only).

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00

2.2 Adaptability for Low Power Computing

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Belle Etoile

Chair:
Patrick Knocke, OFFIS, DE

Co-Chair:
Ruzica Jevtic, Universidad Carlos III, ES

Run-time adaptability is increasingly exploited to improve efficiency of energy-scarce systems. This however inevitably brings serious increases in system complexity to optimally control the adaptability knobs and threatens system reliability. This session groups several approaches to achieve effective run-time reconfiguration at various levels of granularity. Adaptive strategies for multi-core task allocation, NV back-up storage, PV energy harvesting and multi-domain clock gating are presented.

Time	Label	Presentation Title Authors
11:30	2.2.1	CLOCK DOMAIN CROSSING AWARE SEQUENTIAL CLOCK GATING Speakers: Mohit Kumar¹, Jianfeng Liu², Mi-Suk Hong², Kyungtae Do², JungYun Choi², Jaehong Park², Abhishek Ranjan¹, Manish Kumar¹ and Nikhil Tripathi¹ ¹Calypto Design Systems, IN; ²S.LSI, Samsung Electronics Co. Ltd., KR Abstract Power has become the overriding concern for most modern electronic applications today. To reduce clock power, which is a significant portion of the dynamic power consumed by a design, sequential clock gating is increasingly getting used over and above combinational clock gating. With the shrinking device sizes and increasingly complex designs, data is frequently transferred from one clock domain to the other. The sequential clock gating optimizations can use signals from across sequential boundaries and thus, can introduce new clock domain crossing (CDC) violations which can cause catastrophic functional issues in the fabricated chip. Hence, it has become very important that sequential clock gating optimizations be CDC aware. In this paper, we present an algorithm to handle CDC violations as part of the objective function for sequential clock gating optimizations. With the proposed algorithm, we have obtained an average of 22% sequential power savings — this is within 3% of the power savings obtained by the CDC unaware sequential clock gating. In comparison, the state-of-the-art two-pass solution is leading to an almost complete loss of power savings. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.2.2	AN ENERGY EFFICIENT BACKUP SCHEME WITH LOW INRUSH CURRENT FOR NONVOLATILE SRAM IN ENERGY HARVESTING SENSOR NODES Speakers: Hehe Li¹, Yongpan Liu¹, Qinghang Zhao¹, Guangyu Sun², Chao Zhang², Yizi Gu¹, Rong Luo¹, Huazhong Yang¹, Meng-Fan Chang³ and Xiao Sheng¹ ¹Department of Electronic Engineering, Tsinghua University, CN; ²Center for Energy-Efficient Computing and Applications, EECS, Peking University, CN; ³Department of Electrical Engineering, National Tsing Hua University, TW Abstract In modern energy harvesting sensor nodes, nonvolatile SRAM (nvSRAM) has been widely investigated as a promising on-chip memory architecture because of its zero standby power, resilience to power failures, and fast read/write operations. However, conventional approaches transfer all data from SRAM into NVM during the backup process. Thus, large on-chip energy storage capacitors are normally required. In addition, high peak inrush current is generated instantaneously, which has a negative impact on energy efficiency and circuit reliability. To mitigate these problems, we propose a novel holistic backup flow, which consists of a partial backup process and a run-time pre-writeback scheme for nvSRAM based caches. A statistics based dead-block predictor is employed to achieve a fast and low power partial backup process. We also present an adaptive pre-writeback point allocation strategy to further reduce the backup load. Simulation results show that, with our proposed backup scheme, energy storage capacitance is reduced by 34% and inrush current is reduced by 54% on average compared to the conventional full backup scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.2.3	RACE TO IDLE OR NOT: BALANCING THE MEMORY SLEEP TIME WITH DVS FOR ENERGY MINIMIZATION Speakers: Chenchen Fu¹, Minming Li¹ and Jason Xue² ¹Department of Computer Science, City University of Hong Kong, HK; ²City University of Hong Kong, HK Abstract Reducing energy consumption is a critical problem in most of the computing systems today. In recent years, dynamic voltage scaling (DVS) has been often applied in the multi-core processor systems. The leakage power of the main memory shared by the multiple DVS cores is becoming a larger problem with technology scaling. This paper focuses on minimizing the system-wide energy consumption by applying DVS on each core and turning the memory to sleep when all the cores have common idle time. This work presents systematic analysis for the target problem based on different system models and task models. For tasks with common release time , optimal schemes are presented for the systems both with and without considering the static power of the cores. For the general task model, a heuristic online algorithm is proposed. Furthermore, the scheme is extended to handle the problem when the transition overhead between the active and sleep modes is not negligible. The experimental results show that the heuristic algorithm can reduce the energy consumption of the overall system by 8.73% in average (up to 28.44%) compared to a state-of-the-art multi-core DVS scheduling scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
12:45	2.2.4	EVENT-DRIVEN AND SENSORLESS PHOTOVOLTAIC SYSTEM RECONFIGURATION FOR ELECTRIC VEHICLES Speakers: Xue Lin¹, Yanzhi Wang¹, Massoud Pedram¹, Jaemin Kim² and Naehyuck Chang³ ¹University of Southern California, US; ²Seoul National University, KR; ³Korea Advanced Institute of Science and Technology, KR Abstract This work investigates the problem of increasing the electrical energy generation efficiency of photovoltaic (PV) systems on electrical vehicles (EVs). Although PV power alone seems simply not sufficient to power an EV, the onboard PV system is still meaningful in mitigating the power demand of EV charging from the grid and reducing the environmental impact of EVs. The PV cell modules of an onboard PV system are mounted on the rooftop, hood, trunk, and door panels of an EV to fully make use of the vehicle surface areas. However, due to the non-uniform distribution and rapid change of solar irradiance, an onboard PV system suffers from significant efficiency degradation. To address this problem, this work borrows the dynamic PV array reconfiguration architecture in previous work with the accommodation of the rapidly changing solar irradiance in the onboard scenario. Most importantly, this work differs from previous work in that (i) we propose an event-driven PV array reconfiguration framework replacing the periodic reconfiguration framework in previous work to reduce the computation and energy overhead of the PV array reconfiguration; (ii) we provide a sensorless (and also event-driven) PV array reconfiguration framework, which further reduces the cost of a vehicular PV system, by proposing a solar irradiance estimation algorithm for obtaining the instantaneous solar irradiance level on each PV cell module. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-1, 208	HIGH-RESOLUTION ONLINE POWER MONITORING FOR MODERN MICROPROCESSORS Speakers: Fabian Oboril, Jos Ewert and Mehdi Tahoori, Karlsruhe Institute of Technology, DE Abstract The power consumption of computing systems is nowadays a major design constraint that affects performance and reliability. To co-optimize these aspects, fine-grained adaptation techniques at runtime are of growing importance. However, to use these tools efficiently, fine-grained information about the power consumption of various on-chip components at runtime is required. Therefore, here we propose a novel software-implemented high-resolution (spatial and temporal) power monitoring approach that relies on micro-models to estimate the power consumption of all microarchitectural components inside a processor core. Combined with a self-calibration technique that uses an available on-chip power sensor, our power estimation approach can achieve an accuracy of more than 99 % and provides deep insights about the power dissipation inside a processor core during workload execution. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-2, 1013	REDUCING ENERGY CONSUMPTION IN MICROCONTROLLER-BASED PLATFORMS WITH LOW DESIGN MARGIN CO-PROCESSORS Speakers: Andres Gomez¹, Christian Pinto², Andrea Bartolini³, Davide Rossi², Hamed Fatemi⁴, Jose Pineda de Gyvez⁴ and Luca Benini⁵ ¹Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ²Università di Bologna, IT; ³Università di Bologna, IT / ETH Zürich, CH; ⁴NXP Semiconductors, NL; ⁵Università di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), IT Abstract Advanced energy minimization techniques (i.e. DVFS, Thermal Management, etc) and their high-level HW/SW requirements are well established in high-throughput multi-core systems. These techniques would have an intolerable overhead in low-cost, performance-constrained microcontroller units (MCU's). These devices can further reduce power by operating at a lower voltage, at the cost of increased sensitivity to PVT variation and increased design margins. In this paper, we propose an runtime environment for next-generation dual-core MCU platforms. These platforms complement a single-core with a low area overhead, reduced design margin shadow-processor. The runtime decreases the overall energy consumption by exploiting design corner heterogeneity between the two cores, rather than increasing the throughput. This allows the platform's power envelope to be dynamically adjusted to application-specific requirements. Our simulations show that, depending on the ratio of core to platform energy, total energy savings can be up to 20%. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

2.3 System Level Design Methods

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Stendhal

Chair:
Yuichi Nakamura, NEC, JP

Co-Chair:
Andreas Herkersdorf, TU München, DE

This session tackles complex system-level design problems in state-of-the-art FPGA-based designs and schedulability-critical systems. The first talk proposes a runtime system assigning multi-clock domains in FPGA-based designs for minimizing the makespan of multiple tasks. The second talk studies novel multi-cycling optimization in high-level synthesis which is driven by software profiling. The third talk presents useful schedulability analysis and formulation on execution time bound for integrated modular avionic systems. Finally two IP talks propose an automated design flow for asynchronous dataflow networks to achieve better performance and area as well as feature localization for SystemC designs.

Time	Label	Presentation Title Authors
11:30	2.3.1	ONLINE BINDING OF APPLICATIONS TO MULTIPLE CLOCK DOMAINS IN SHARED FPGA-BASED SYSTEMS Speakers: Farzad Samie¹, Lars Bauer¹, Chih-Ming Hsieh² and Joerg Henkel¹ ¹Karlsruhe Institute of Technology (KIT), DE; ²Karlsruhe Institute of Technology, DE Abstract Modern FPGA-based platforms provide multiple clock domains and their frequencies can be changed at runtime by using PLLs and clock multiplexers. This is especially beneficial for platforms that run several applications simultaneously (e.g. modern wireless sensor nodes that are shared by multiple users), as different processing modules may be fed by different clock frequencies at different time windows. However, since the number of clock domains on a platform is limited, several processing modules need to share the same clock domain. In this paper, we study the problem of binding multiple applications to multiple clock domains, such that the latest finishing time of any application (i.e. the makespan) is minimized. We present an Integer Linear Programming (ILP) formulation and then propose a novel algorithm that (i) quickly identifies those applications that are dominated by others (and thus can be ignored without losing optimality) and that (ii) uses the ascending property of the optimal binding to reduce the search space. The experimental results show up to 17% makespan reduction compared to state-of-theart. The overhead when executing on a low-power SmartFusion2 SoC equipped with an ARM Cortex-M3 core is on average 8.9 ms, i.e. our algorithm is suitable for runtime decisions. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.3.2	PROFILING-DRIVEN MULTI-CYCLING IN FPGA HIGH-LEVEL SYNTHESIS Speakers: Stefan Hadjis¹, Andrew Canis¹, Ryoya Sobue², Yuko Hara-Azumi³, Hiroyuki Tomiyama² and Jason Anderson¹ ¹University of Toronto, CA; ²Ritsumeikan University, JP; ³Tokyo Institute of Technology, JP Abstract Multi-cycling is a well-known strategy to improve performance in digital design, wherein the required time for selected combinational paths is lengthened to multiple clock cycles (rather than just one). The approach can be applied to paths associated with computations whose results are not needed immediately -- such paths are allowed multiple clock cycles to "complete" reducing the opportunity for them to form the critical path of the circuit. In this paper, we consider multi-cycling in the high-level synthesis context (HLS) and use software profiling to guide multi-cycling optimizations. Specifically, prior to HLS, we execute the program in software with typical datasets to gather data on the number of times each code segment executes. During HLS, we then extend the schedule for infrequently executed code segments and apply multi-cycling to the dilated schedules, which exhibit greater opportunities for multi-cycling. In essence, our approach ensures that non-frequently executed code segments will not form the critical path of the HLS-generated circuit. In an experimental study targeting the Altera Stratix IV FPGA, we evaluate the impact on speed performance and area for both traditional multi-cycling, as well as the proposed software profiling-driven multi-cycling, and show that profiling-driven multi-cycling leads to a geomean speedup of over 10% across 13 benchmark circuits, with some circuit speedups in excess of 30%. Circuit area is reduced by 11%, yielding a mean 20% improvement in area-delay product. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.3.3	SCHEDULABILITY BOUND FOR INTEGRATED MODULAR AVIONICS PARTITIONS Speakers: Jung-Eun Kim¹, Tarek Abdelzaher² and Lui Sha¹ ¹Department of Computer Science, University of Illinois at Urbana-Champaign, US; ²University of Illinois, US Abstract In the avionics industry, as a hierarchical scheduling architecture Integrated Modular Avionics System has been widely adopted for its isolating capability. In practice, in an early development phase, a system developer does not know much about task execution times, but only task periods and IMA partition information. In such a case the schedulability bound for a task in a given partition tells a developer how much of the execution time the task can have to be schedulable. Once the developer knows the bound, then the developer can deal with any combination of execution times under the bound, which is safe in terms of schedulability. We formulate the problem as linear programming that is commonly used in the avionics industry for schedulability analysis, and compare the bound with other existing ones which are obtained with no period information. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-3, 759	DE-ELASTISATION: FROM ASYNCHRONOUS DATAFLOWS TO SYNCHRONOUS CIRCUITS Speakers: Mahdi Jelodari Mamaghani, Jim Garside and Doug Edwards, University of Manchester, GB Abstract Whilst asynchronous VLSI programming provides a flexible abstract formalism to realise concurrent systems, the resulting performance is still an issue when adapting the flow in the industrial context. The asynchronous design paradigm provides `elasticity' which enables the system to tolerate delays in communication and computation; the drawback is that it imposes a communication overhead to the system which becomes prohibitively expensive when applied at a fine-grained level. This paper proposes a 'de-elastisation' technique in a CAD flow for asynchronous dataflow networks to improve the circuits' performance and area. To preserve the architectural advantages of asynchronous design (e.g. short cycles) the type of circuits are classified into blocking and non-blocking loops upon which our de-elastisation scheme relies. The technique is incorporated in the Teak CAD flow. Experimental results on several substantial case studies show significant performance and area improvement. This work shows 3x improvement for the first category of circuits, suitable for iterative realisations and DSP-like architectures and 4x for the second category which are suitable for concurrent realisations. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-4, 677	AUTOMATED FEATURE LOCALIZATION FOR DYNAMICALLY GENERATED SYSTEMC DESIGNS Speakers: Jannis Stoppe¹, Robert Wille¹ and Rolf Drechsler² ¹University of Bremen, DE; ²University of Bremen/DFKI GmbH, DE Abstract Due to the large complexity of today's circuits and systems, all components e.g. in a System of Chip (SoC) cannot be designed from scratch anymore. As a consequence, designers frequently work on components which they did not create themselves and, hence, design understanding becomes a critical issue. Approaches for feature localization may help here by pinpointing to distinguished characteristics of a design. However, existing approaches for feature localization of SoCs mainly focused on the Register Transfer Level; existing solutions for the Electronic System Level (using languages such as SystemC) have severe limits. In this work, we propose an approach for advanced feature localization in SystemC designs. By this, we overcome main limitations of previously proposed solutions, in particular the missing support for dynamic descriptions, while keep the proposed solution as non-intrusive as possible. The benefits of the proposed approach are confirmed by means of a case study. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

2.4 Automotive Systems and Smart Energy Systems

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Chartreuse

Chair:
Bart Vermeulen, NXP Semiconductors, NL

Co-Chair:
Geoff Merrett, University of Southampton, GB

This session covers energy optimisation for embedded systems and emerging automotive systems and networks, including Ethernet and IP. To create effective Ethernet-enabled automotive networks, topics including service discovery, bridging and traffic shaping are addressed.

Time	Label	Presentation Title Authors
11:30	2.4.1	(Best Paper Award Candidate) WORKLOAD UNCERTAINTY CHARACTERIZATION AND ADAPTIVE FREQUENCY SCALING FOR ENERGY MINIMIZATION OF EMBEDDED SYSTEMS Speakers: Anup Das¹, Akash Kumar², Bharadwaj Veeravalli², Rishad Shafik¹, Geoff Merrett¹ and Bashir Al-Hashimi³ ¹University of Southampton, GB; ²National University of Singapore, SG; ³University of Southampton, Abstract A primary design optimization objective for multicore embedded systems is to minimize the energy consumption of applications while satisfying their performance requirement. A system-level approach to this problem is to scale the frequency of the processing cores based on the readings obtained from the hardware performance monitors. However, performance monitor readings contain uncertainty, which becomes prominent when applications are executed in a multicore environment. This uncertainty can be attributed to factors such as cache contention and DRAM access time, that are very difficult to predict dynamically. We demonstrate that such uncertainty can be controlled to make better decision on the processor frequency in order to minimize energy consumption. To achieve this, we propose a multinomial logistic regression model, which combines probabilistic interpretation with maximum likelihood (ML) estimation to classify an incoming workload, at run-time, into a finite set of classes. Every workload class corresponds to a frequency pre-determined using an appropriate training set and results in minimum energy consumption. The classifier incorporates (1) uncertainty with arbitrary probability distribution to estimate the actual frame workload; and (2) the frequency switching overhead, neither of which are considered in any of the existing approaches. The classified frequency is applied on the processing cores to execute the workload. The proposed approach is engineered into an embedded multicore system and is validated with a set of standard multimedia applications. Results demonstrate that the proposed approach minimizes energy consumption by an average 20% as compared to the existing techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.4.2	FORMAL ANALYSIS OF THE STARTUP DELAY OF SOME/IP SERVICE DISCOVERY Speakers: Jan Reinke Seyler¹, Thilo Streichert¹, Michael Glaß², Nicolas Navet³ and Jürgen Teich² ¹Daimler AG, DE; ²Friedrich-Alexander-Universität Erlangen-Nürnberg, DE; ³Université du Luxembourg, LU Abstract An automotive network needs to start up within the millisecond range. This includes the physical startup, the software boot time, and the configuration of the network. The introduction of Ethernet into the automotive industry expanded the design space drastically and is increasing the complexity of configuring every element in the network. To add more flexibility to automotive Ethernet networks, the concept of Service Discovery was migrated from consumer electronics to AUTOSAR within the SOME/IP middleware. A network is not fully functional until every client found its service. Consequently, this time interval adds to the startup time of a network. This work presents a formal analysis model to calculate the waiting time of every client to receive the first offer from its service. The model is able to determine the worst case of a given parameter set. Based on this, a method for calculating the total startup time of a system is derived. The model is implemented in a free-to-use octave program and validated by comparing the analytical results to a timing-accurate simulation and an experimental setup. In any case, the worst-case assumption holds true and the gap between the maximum of the simulation and the presented method is less than 1.3%. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.4.3	ANALYSIS OF ETHERNET-SWITCH TRAFFIC SHAPERS FOR IN-VEHICLE NETWORKING APPLICATIONS Speakers: Sivakumar Thangamuthu¹, Nicola Concer², Pieter Cuijpers³ and Johan Lukkien³ ¹NXP Semiconductors, IN; ²NXP Semiconductors, NL; ³Technische Universiteit Eindhoven, NL Abstract Switched Ethernet has been proposed as network technology for automotive and industrial applications. IEEE AVB is a collection of standards that specifies (among other elements) a set of network traffic shaping mechanisms (i.e., rules to regulate the traffic flow) to have guaranteed Quality of Service for Audio/Video traffic. However, in-vehicle control applications like advanced driver-assistance systems require much lower latencies than provided by this standard. Within the context of IEEE TSN (Time Sensitive Networking), three new traffic shaping mechanisms are considered, named Burst Limiting, Time Aware and Peristaltic shaper respectively. In this paper we explain and compare these shapers, we examine their worst case end-to-end latencies analytically and we investigate their behavior through a simulation of a particular setup. We show that the shapers hardly satisfy the requirements for 100Mbps Ethernet, but can come close under further restrictions. We also show the impact the shapers have on AVB traffic. Download Paper (PDF; Only available from the DATE venue WiFi)
12:45	2.4.4	REAL-TIME CAPABLE CAN TO AVB ETHERNET GATEWAY USING FRAME AGGREGATION AND SCHEDULING Speakers: Christian Herber, Andre Richter, Thomas Wild and Andreas Herkersdorf, Technische Universität München, DE Abstract Ethernet is a key technology to satisfy the communication requirements of future automotive embedded systems. Audio/Video Bridging (AVB) Ethernet is a set of IEEE standards that allows synchronous and time-sensitive communication. It is the favored candidate for backbone and camera applications, but is not expected to replace Controller Area Network (CAN). Instead, both have to coexist in future architectures. No research has been conducted regarding CAN to AVB gateways, and approaches for similar protocols are either not fit or inefficient. In this paper, we present a CAN to AVB Ethernet gateway that allows efficient, real-time capable forwarding. We aggregate and schedule multiple CAN frames into a single AVB Ethernet frame to minimize bandwidth requirements. We evaluate static and dynamic scheduling approaches and determine optimal gateway configurations, showing that the necessary bandwidth reservation is reduced by 72% compared to similar approaches. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-5, 51	INDUCTOR OPTIMIZATION FOR ACTIVE CELL BALANCING USING GEOMETRIC PROGRAMMING Speakers: Matthias Kauer¹, Swaminathan Narayanaswamy¹, Martin Lukasiewycz¹, Sebastian Steinhorst¹ and Samarjit Chakraborty² ¹TUM CREATE, SG; ²TU Munich, DE Abstract This paper proposes an optimization methodology for inductor components in active cell balancing architectures of electric vehicle battery packs. For this purpose, we introduce a new mathematical model to quantitatively describe the charge transfer of a family of inductor-based circuits. Utilizing worst case assumptions, this model yields a nonlinear program for designing the inductor and selecting the transfer current. In the next step, we transform this problem into a geometric program that can be efficiently solved. The optimized inductor reduces energy dissipation by at least 20% in various scenarios compared to a previous approach which selected an optimal off-the-shelf inductor. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-6, 174	LIGHTWEIGHT AUTHENTICATION FOR SECURE AUTOMOTIVE NETWORKS Speakers: Philipp Mundhenk¹, Sebastian Steinhorst¹, Martin Lukasiewycz¹, Suhaib A. Fahmy² and Samarjit Chakraborty³ ¹TUM CREATE, SG; ²School of Computer Engineering, Nanyang Technological University, SG; ³TU Munich, DE Abstract We propose a framework to bridge the gap between secure authentication in automotive networks and on the internet. Our proposed framework allows runtime key exchanges with minimal overhead for resource-constrained in-vehicle networks. It combines symmetric and asymmetric cryptography to establish secure communication and enable secure updates of keys and software throughout the lifetime of the vehicle. For this purpose, we tailor authentication protocols for devices and authorization protocols for streams to the automotive domain. As a result, our framework natively supports multicast and broadcast communication. We show that our lightweight framework is able to initiate secure message streams over 15 times faster than conventional frameworks, for the first time meeting the real-time requirements of automotive networks. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

2.5 Power of Assertions

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Meije

Chair:
Franco Fummi, University of Verona, IT

Co-Chair:
Pablo Sanchez, University of Cantabria, ES

Assertions play a critical role in verification of hardware systems. This session is focusing on new applications of assertions in a wide variety of validation scenarios such as faster bug localization and post-silicon validation as well as reusing properties across abstraction levels.

Time	Label	Presentation Title Authors
11:30	2.5.1	AUTOMATIC EXTRACTION OF ASSERTIONS FROM EXECUTION TRACES OF BEHAVIOURAL MODELS Speakers: Alessandro Danese, Tara Ghasempouri and Graziano Pravadelli, University of Verona, IT Abstract Several approaches exist for specification mining of hardware designs. Most of them work at RTL and they extract assertions in the form of temporal relations between Boolean variables. Other approaches work at system level (e.g., TLM) to mine assertions that specify the behaviour of the communication protocol. However, these techniques do not generate assertions addressing the design functionality. Thus, there is a lack of studies related to the automatic mining of assertions for capturing the functionality of behavioural models, where logic expressions among more abstracted (e.g., numeric) variables than bits and bit vectors are necessary. This paper is intended to fill in the gap, by proposing a tool for automatic extraction of temporal assertions from execution traces of behavioural models by adopting a mix of static and dynamic techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.5.2	A METHODOLOGY FOR AUTOMATED DESIGN OF EMBEDDED BIT-FLIPS DETECTORS IN POST-SILICON VALIDATION Speakers: Pouya Taatizadeh and Nicola Nicolici, McMaster University, CA Abstract Post-silicon validation is concerned with detecting design errors that escape to silicon prototypes and need to be fixed before committing to high-volume manufacturing. Electrical errors are particularly difficult to catch during the pre-silicon phase because of the insufficient accuracy of device models, which is often traded-off against simulation time. This challenge is further aggravated by the rising number of voltage domains, especially if subtle errors are excited in unique electrical states. Since these electrically-induced subtle errors most commonly manifest in the logic domain as bit-flips, to the best of our knowledge there are no systematic methods to design embedded hardware monitors for generic logic blocks that can detect bit-flips with low detection latency. Toward this goal, we propose a methodology that relies on design assertions that are ranked based on their potential to detect bit-flips and subsequently mapped into user-constrained embedded hardware monitors with the aim to increase bit-flip coverage estimate. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.5.3	DATA MINING DIAGNOSTICS AND BUG MRIS FOR HW BUG LOCALIZATION Speakers: Monica Farkash¹, Bryan Hickerson² and Balavinayagam Samynathan¹ ¹University of Texas at Austin, US; ²IBM, US Abstract This paper addresses the challenge of minimizing the time and resources required to localize bugs in HW dynamic functional verification. Our diagnostics solution eliminates the need to back trace from point of failure to its origin, decreasing the overall debugging time. The proposed solution dynamically analyses data extracted from sets of passing and failing tests to identify behavior discrepancies, which it expresses as source code statements, coverage events and timing during simulation. It also provides a visual diagnostic support, an image of the behavior discrepancies in time which we call a Machine Reasoning Image (MRI). This paper describes in detail our data mining solution based on coverage data, HDL hierarchies and time analysis of coverage events. Our approach brings a data mining solution to the problem of HW bug localization. It defines new concepts, provides in-depth analysis, presents supporting algorithms, and shows actual results on archetypical problems from PowerPC core verification as an industrial application. Download Paper (PDF; Only available from the DATE venue WiFi)
12:45	2.5.4	RTL PROPERTY ABSTRACTION FOR TLM ASSERTION-BASED VERIFICATION Speakers: Nicola Bombieri¹, Riccardo Filippozzi¹, Graziano Pravadelli¹ and Francesco Stefanni² ¹University of Verona, IT; ²EDALab s.r.l., IT Abstract Different techniques and commercial tools are at the state of the art to reuse existing RTL IP models to generate more abstract (i.e., TLM) IP implementations for system-level design. In contrast, reusing, at TLM, an assertion-based verification (ABV) environment originally developed for an RTL IP is still an open problem. The lack of an effective and efficient solution forces verification engineers to shoulder a time consuming and error-prone manual re-definition, at TLM, of existing assertion libraries. This paper is intended to fill in the gap by presenting a technique to automatically abstract properties defined for RTL IPs and to create dynamic ABV environments for the corresponding TLM models. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-7, 970	MINIMIZING THE NUMBER OF PROCESS CORNER SIMULATIONS DURING DESIGN VERIFICATION Speakers: Michael Shoniker, Bruce Cockburn, Jie Han and Witold Pedrycz, University of Alberta, CA Abstract Integrated circuit designs need to be verified in simulation over a large number of process corners that represent the expected range of transistor properties, supply voltages, and die temperatures. Each process corner can require substantial simulation time. Unfortunately, the required number of corners has been growing rapidly in the latest semiconductor technologies. We consider the problem of minimizing the required number of process corner simulations by iteratively learning a model of the output functions in order to confidently estimate key maximum and/or minimum properties of those functions. Depending on the output function, the required number of corner simulations can be reduced by factors of up to 95%. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

2.6 Design and Analysis of Dependable Systems

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Bayard

Chair:
Arne Hamann, Robert Bosch GmbH, DE

Co-Chair:
Viacheslav Izosimov, Semcon/KTH, SE

This section introduces new methods for uncertainty-aware reliability analysis and soft error vulnerability estimation as well as techniques for error recovery in safety-critical systems and security attacks through the JTAG port

Time	Label	Presentation Title Authors
11:30	2.6.1	LOW-COST CHECKPOINTING IN AUTOMOTIVE SAFETY-RELEVANT SYSTEMS Speakers: Carles Hernandez and Jaume Abella, Barcelona Supercomputing Center (BSC-CNS), ES Abstract The use of checkpointing and roll-back recovery (CRR) schemes is common practice to increase the likelihood of a task completing with the correct result despite the presence of faults. However, the use of CRR mechanisms is challenging in the severely constrained design space of safety-relevant embedded systems, such as those controlling critical functions in the automotive domain. CRR schemes introduce non-negligible time and memory overheads that may jeopardize the feasibility of their implementation. In this paper we propose a low-cost checkpointing mechanism suitable for safety-relevant embedded systems deploying light-lockstep architectures. The proposed checkpointing mechanism increases the reliability of the system while keeping timing and memory overhead low enough. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.6.2	UNCERTAINTY-AWARE RELIABILITY ANALYSIS AND OPTIMIZATION Speakers: Faramarz Khosravi, Malte Müller, Michael Glaß and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE Abstract Due to manufacturing tolerances and aging effects, future embedded systems have to cope with unreliable components. The intensity of such effects depends on uncertain aspects like environmental or usage conditions such that highly safety-critical systems are pessimistically designed for worst-case mission profiles. In this work, we propose to explicitly model the uncertain characteristics of system components, i.e. we model components using reliability functions with parameters distributed between a best and worst case. Since destructive effects like temperature may affect several components simultaneously (e.g. those in the same package), a correlation between uncertainties of components exists. The proposed uncertainty-aware method combines a formal analysis approach and a Monte Carlo simulation to consider uncertain characteristics and their different correlations. It delivers a holistic view on the system's reliability with best/worst/average-case behavior but also insights on variance and quantiles. However, existing optimization approaches typically assume design objectives to be single values or follow a predefined distribution known as noise. As a remedy, we propose a dominance criterion for meta-heuristic optimization approaches like evolutionary algorithms that enables the comparison of system implementations with arbitrarily distributed characteristics. Our presented experimental results show that (a) the proposed analysis comes at low overhead while capturing existing uncertainties with sufficient accuracy, and (b) the optimization process is significantly enhanced when guiding the search process by additional aspects like variance and the 95% quantile, delivering better system implementations as found by uncertainty-oblivious optimization approaches. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.6.3	EFFICIENT SOFT ERROR VULNERABILITY ESTIMATION OF COMPLEX DESIGNS Speakers: Shahrzad Mirkhani¹, Subhasish Mitra², Chen-Yong Cher³ and Jacob Abraham⁴ ¹University of Texas at Austin, US; ²Stanford University, US; ³IBM Research, US; ⁴University of Texas, US Abstract Analyzing design vulnerability for soft errors has become a challenging process in large systems with a large number of memory elements. Error injection in a complex system with a sufficiently large sample of error candidates for reasonable accuracy takes a large amount of time. In this paper we describe RAVEN, a statistical method to estimate the outcomes of a system in the presence of soft errors injected into flip-flops, as well as the vulnerability for each memory element. This method takes advantage of fast local simulations for each error injection, and calculates the probabilities for the system outcomes for every possible soft error in a period of time. Experimental results, on an out-of-order processor with SPECINT2000 workloads, show that RAVEN is an order of magnitude faster compared with traditional error injection while maintaining accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
12:45	2.6.4	DETECTION OF ILLEGITIMATE ACCESS TO JTAG VIA STATISTICAL LEARNING IN CHIP Speakers: Xuanle Ren¹, Vítor Grade Tavares² and Shawn Blanton¹ ¹Carnegie Mellon University, US; ²Faculdade de Engenharia da Universidade do Porto, PT Abstract IEEE 1149.1, commonly known as the joint test action group (JTAG), is the standard for the test access port and the boundary-scan architecture. The JTAG is primarily utilized at the time of the integrated circuit (IC) manufacture but also in the field, giving access to internal sub-systems of the IC, or for failure analysis and debugging. Because the JTAG needs to be left intact and operational for use, it inevitably provides a "backdoor" that can be exploited to undermine the security of the chip. Potential attackers can then use the JTAG to dump critical data or reverse engineer IP cores, for example. Since an attacker will use the JTAG differently from a legitimate user, it is possible to detect the difference using machine-learning algorithms. A JTAG protection scheme, SLIC-J, is proposed to monitor user behavior and detect illegitimate accesses to the JTAG. Specifically, JTAG access is characterized using a set of specifically-defined features, and then an on-chip classifier is used to predict whether the user is legitimate or not. To validate the effectiveness of the approach, both legitimate and illegitimate JTAG accesses are simulated using the OpenSPARC T2 benchmark. The results show that the detection accuracy is 99.2%, and the escape rate is 0.8%. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-8, 24	AN APPROXIMATE VOTING SCHEME FOR RELIABLE COMPUTING Speakers: Ke Chen¹, Jie Han² and Fabrizio Lombardi¹ ¹Northeastern University, US; ²University of Alberta, CA Abstract This paper relies on the principles of inexact computing to alleviate the issues arising in static masking by voting for reliable computing. A scheme that utilizes approximate voting is proposed; it is referred to as inexact double modular redundancy (IDMR). IDMR does not resort to triplication, thus saving overhead due to modular replication; moreover, this scheme is adaptive in its operation, i.e., it allows a threshold to determine the validity of the module outputs. IDMR operates by initially establishing the difference between the values of the outputs of the two modules; only if the difference is below a preset threshold, then the voter calculates the average value of the two module outputs. An extensive analysis of the voting circuits and an application to image processing are presented. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-9, 278	FLINT: LAYOUT-ORIENTED FPGA-BASED METHODOLOGY FOR FAULT TOLERANT ASIC DESIGN Speakers: Rochus Nowosielski, Lukas Gerlach, Stephan Bieband, Guillermo Paya-Vaya and Holger Blume, Leibniz Universität Hannover, Institute of Microelectronic Systems, DE Abstract Research of efficient fault tolerance techniques for digital systems requires insight into the fault propagation mechanism inside the ASIC design. Radiation, high temperature, or charge sharing effects in ultra-deep submicron technologies influence fault generation and propagation dependent on die location. The proposed methodology links efficient fault injection to fault propagation in the floorplan view of a standard cell ASIC. This is achieved by instrumentation of the gate netlist after place&route, emulation in an FPGA system and experiment control via interactive user interface. Further, automated fault injection campaigns allow exhaustive fault tolerance evaluations taking single faults as well as adjacent cell faults into account. The proposed methodology can be used to identify vulnerable cell nodes in the design and allow the classification of placement strategies of fault tolerant ASIC designs. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

2.7 Compilation and Code Transformations for Reconfigurable Computing

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Les Bans

Chair:
Dirk Stroobandt, Ghent University, BE

Co-Chair:
Marco Platzner, University of Paderborn, DE

This session presents techniques for efficient compilation to CGRAs and a code transformation approach to enhance embedded system security.

Time	Label	Presentation Title Authors
11:30	2.7.1	JOINT AFFINE TRANSFORMATION AND LOOP PIPELINING FOR MAPPING NESTED LOOP ON CGRA Speakers: Shouyi YIN¹, Dajiang Liu¹, Leibo Liu², Shaojun Wei¹ and Yike Guo³ ¹Tsinghua University, CN; ²Institute of Microelectronics and The National Lab for Information Science and Technology, Tsinghua University, CN; ³Imperial College, London, UK, GB Abstract Coarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on CGRAs for acceleration. The loop pipelining techniques are usually used to exploit the parallelism of loops. However, for nested loops, the existing loop pipelining methods often result in poor hardware utilization and low execution performance. To tackle this problem, this paper makes two contributions: 1) a pipelining-beneficial affine transformation method which can optimize the initiation interval (II) of nested loop and enable multiple loop pipelines merging; 2) a multi-pipeline merging method which can improve hardware utilization further. The experimental results show that our approach can improve the performance of nested loop by up to 56% on average, as compared to the state-of-the-art techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.7.2	PATH SELECTION BASED ACCELERATION OF CONDITIONALS IN CGRAS Speakers: Shri Hari Rajendran Radhika, Aviral Shrivastava and Mahdi Hamzeh, Arizona State University, US Abstract Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achieving high performance at low power consumption. While CGRAs can efficiently accelerate loop kernels, accelerating loops with control flow (loops with if-then-else structures) is quite challenging. Existing techniques use predication to handle control flow execution - in which they execute operations from both the paths, but commit only the result of operations from the path taken by branch at run time. However, this results in increased resource usage and therefore poor mapping and lower acceleration. The state-of-the- art dual issue scheme fetches instructions from both the paths, but executes only the ones from the correct path but this scheme has an overhead in instruction fetch bandwidth. In this paper, we propose a solution in which after resolving the branching condition, we fetch and execute instructions only from the path taken by branch. Experimental results show that our solution achieves 34.6% better performance and 52.1% lower energy consumption on an average compared to state of the art dual issue scheme without imposing any overhead in instruction fetch bandwidth. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.7.3	HARDWARE-ASSISTED CODE OBFUSCATION FOR FPGA SOFT MICROPROCESSORS Speakers: Meha Kainth, Lekshmi Krishnan, Chaitra Narayana, Sandesh Virupaksha and Russell Tessier, University of Massachusetts, US Abstract Soft microprocessors are vital components of many embedded FPGA systems. As the application domain for FPGAs expands, the security of the software used by soft processors increases in importance. Although software confidentiality approaches (e.g. encryption) are effective, code obfuscation is known to be an effective enhancement that further deters code understanding for attackers. The availability of specialization in FPGAs provides a unique opportunity for code obfuscation on a per-application basis with minimal hardware overhead. In this paper we describe a new technique to obfuscate soft microprocessor code which is located outside the FPGA chip in an unprotected area. Our approach provides customizable, data-dependent control flow modification to make it difficult for attackers to easily understand program behavior. The application of the approach to three benchmarks illustrates a control flow cyclomatic complexity increase of about 7x with a modest logic overhead for the soft processor. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-10, 97	A UNIFIED HARDWARE/SOFTWARE MPSOC SYSTEM CONSTRUCTION AND RUN-TIME FRAMEWORK Speakers: Sam Skalicky¹, Andrew Schmidt², Matthew French² and Sonia Lopez¹ ¹Rochester Institute of Technology, US; ²USC/ISI, US Abstract With the continual enhancement of heterogeneous resources in FPGA devices, utilizing these resources becomes a challenging burden for developers. Especially with the inclusion of sophisticated multiple processor system-on-chips, the necessary skill set to effectively leverage these resources spans both hardware and software expertise. The maturation of high level synthesis tools and programming languages aim to alleviate these complexities, yet there still exist systematic gaps that must be bridged to provide a more cohesive hardware/software development environment. High level MPSoC design initiatives such as Redsharc have reduced the costs of entry, simplifying application implementation. We propose a unified hardware/software framework for system construction, leveraging Redsharc's APIs, efficient on-chip interconnects, and run-time controllers. We present system level abstractions that enable compilation and implementation tools for hardware and software to be merged into a single configurable system development environment. Finally, we demonstrate our proposed framework with Redsharc, using AES encryption/decryption spanning software implementations on ARM and MicroBlaze processors and hardware kernels. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-11, 15	(AS)^2: ACCELERATOR SYNTHESIS USING ALGORITHMIC SKELETONS FOR RAPID DESIGN SPACE EXPLORATION Speakers: Shakith Fernando¹, Mark Wijtvliet¹, Cedric Nugteren¹, Akash Kumar² and Henk Corporaal³ ¹Eindhoven University of Technology, NL; ²National University of Singapore, SG; ³TU/e (Eindhoven University of Technology), NL Abstract Hardware accelerators in heterogeneous multiprocessor system-on-chips are becoming popular as a means of meeting performance and energy efficiency requirements of modern embedded systems. Current design methods for accelerator synthesis, such as High-Level Synthesis, are not fully automated. Therefore, time consuming manual iterations are required to explore efficient accelerator alternatives: the programmer is still required to think in terms of the underlying architecture. In this paper, we present (AS)^2: a design flow for Accelerator Synthesis using Algorithmic Skeletons. Skeletonization separates the structure of a parallel computation from an algorithms' functionality, enabling efficient implementations without requiring the programmer to have hardware knowledge. We define three such skeletons (for three image processing kernels), enabling FPGA specific parallelization techniques and optimizations. As a case study, we present a design space exploration of these skeletons and show how multiple design points with area-performance trade-offs, for the accelerators, can be efficiently and rapidly synthesized. We show that (AS)^2 is a promising direction for accelerator synthesis as it generates a Pareto front of 8 design points in under half an hour, for each of the three image processing kernels. Download Paper (PDF; Only available from the DATE venue WiFi)
13:02	IP1-12, 646	ASSISTED GENERATION OF FRAME CONDITIONS FOR FORMAL MODELS Speakers: Philipp Niemann, Frank Hilken, Martin Gogolla and Robert Wille, University of Bremen, DE Abstract Modeling languages such as UML or SysML allow for the validation and verification of the structure and the behavior of designs even in the absence of a specific implementation. However, formal models inherit a severe drawback: Most of them hardly provide a comprehensive and determinate description of transitions from one system state to another. This problem can be addressed by additionally specifying so-called frame conditions. However, only naive "workarounds" based on trivial heuristics or completely relying on a manual creation have been proposed for their generation thus far. In this work, we aim for a solution which neither leaves the burden of generating frame conditions entirely on the designer (avoiding the introduction of another time-consuming and expensive design step) nor is completely automatic (which, due to ambiguities, is not possible anyway). For this purpose, a systematic design methodology for the assisted generation of frame conditions is proposed. Download Paper (PDF; Only available from the DATE venue WiFi)
13:03	IP1-13, 1052	TOWARDS A META-LANGUAGE FOR THE CONCURRENCY CONCERN IN DSLS Speakers: Julien Deantoni¹, Papa Issa Diallo², Ciprian Teodorov², Joel Champeau² and Benoit Combemale³ ¹I3S, University of Nice Sophia Antipolis, FR; ²Lab-STICC - ENSTA Bretagne, FR; ³IRISA, Universty of Rennes1, FR Abstract Concurrency is of primary interest in the development of complex software-intensive systems, as well as the deployment on modern platforms. Furthermore, Domain-Specific Languages (DSLs) are increasingly used in industrial processes to separate and abstract the various concerns of complex systems. % However, reifying the definition of the DSL concurrency remains a challenge. This not only prevents leveraging the concurrency concern of a particular domain or platform, but it also hinders: (1) the development of a complete understanding of the DSL semantics; (2) the effectiveness of concurrency-aware analysis techniques; (3) the analysis of the deployment on parallel architectures. % In this paper, we introduce the key ideas leading toward MoCCML, a dedicated meta-language for formally specifying the concurrency concern within the definition of a DSL. The concurrency constraints can reflect the knowledge in a particular domain, but also the constraints of a particular platform. MoCCML comes with a complete language workbench to help a DSL designer in the definition of the concurrency directly within the concepts of the DSL itself, and a generic workbench to simulate and analyze any model conforming to this DSL. % MoCCML is illustrated on the definition of an lightweight extension of SDF (Synchronous Data Flow). Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

2.8 Facilities for Design and Fabrication for FD-SOI IC

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Salle Lesdiguières

Organiser:
Ahmed Jerraya, CEA-Leti, FR

Moderator:
Carlo Reita, CEA-Leti, FR

Panelists:
Gerd Teepe, GLOBALFOUNDRIES, DE
Patrick Blouet, STMicroelectronics, FR
Olivier Thomas, CEA-Leti, FR

FD-SOI technology enables low cost and energy efficient designs best suited for today consumer, IoT and automotive applications, in continuity with traditional planar technologies simpler to design and manufacture with. The talks will illustrate the availability of a full FD-SOI technology ecosystem, encompassing IC fabrication, IP availability and design experiences.

Time	Label	Presentation Title Authors
11:30	2.8.1	FD-SOI TECHNOLOGY ROADMAP Panelist: Carlo Reita, CEA-Leti, FR
11:45	2.8.2	FOUNDRY SERVICES FOR FD-SOI Panelist: Gerd Teepe, GLOBALFOUNDRIES, DE
12:00	2.8.3	FD-SOI DESIGN AND IP ECOSYSTEM Panelist: Patrick Blouet, STMicroelectronics, FR
12:15	2.8.4	INDUSTRIAL FD-SOI MPW AND GRENOBLE IC DESIGN CENTER Panelist: Olivier Thomas, CEA-Leti, FR
12:30	2.8.5	DISCUSSION
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

UB02 Session 2

Date: Tuesday 10 March 2015
Time: 12:30 - 15:00
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB02.1	STRNG: A SELF-TIMED RING BASED TRUE RANDOM NUMBER GENERATOR WITH MONITORING AND ENTROPY ASSESSMENT Presenter: Abdelkarim Cherkaoui, TIMA, FR Authors: Laurent Fesquet¹, Viktor Fischer² and Alain Aubert² ¹TIMA, FR; ²LaHC, FR Abstract The Self-timed ring based True Random Number Generator (STRNG) leverages the jitter of events propagating in a self-timed ring to generate provably random binary sequences. Several implementations in FPGAs and in CMOS design flows have shown the feasability of this generator in digital technologies, and also confirmed that it can provide high quality random bit sequences that pass the standard statistical test batteries at rates as high as 200 Mbit/s. Following AIS31 recommandations for the design and evaluation of TRNGs, the security of this generator is based primarily on an entropy assessment obtained by modeling the entropy extraction and measuring the entropy source. Secondly, the generator is protected against active attacks by monitoring its behavior in real-time or on demand. In this demonstration, we illustrate this approach in an Altera Cyclone III implementation of the STRNG. We show how the design is configurated depending on the measurement of the entropy source (the jitter magnitude) in order to guarantee a given minimum entropy rate per output bit. Then, we emulate physical attacks on the generator by willingly manipulating its internal structure in order to demonstrate how the entropy monitoring can detect abnormal behaviors and send the appropriate alarms. More information ...
UB02.2	PARLOMA: A REMOTE COMMUNICATION SYSTEM FOR DEAFBLIND PEOPLE Presenter: Ludovico Orlando Russo, Politecnico di Torino, IT Authors: Giuseppe Airò Farulla¹, Marco Indaco¹, Calogero Maria Oddo², Daniele Pianu³, Paolo Prinetto¹, Stefano Rosa¹ and Ludovico Orlando Russo¹ ¹Politecnico di Torino, IT; ²Scuola Superiore Sant'Anna, The Biorobotics Institute, IT; ³CNR, IEIIT, IT Abstract This work aims at designing a low-cost communication system to allow remote communication among deafblind people, up to now impossible. Due to their lacking of both the auditory and the visive channel, deafblind people can receive feedbacks and mes-sages only resorting on hand-in-hand communication and only from speakers physically located near them. Such limitation ag-gravates their situation and cause deafblind people to live behind a wall of isolation from active society. PARLOMA aims at breaking this wall, developing a tool that can be used by deafblind people to communicate whenever they want and wherever they are. More information ...
UB02.3	BONDCALC: THE BOND CALCULATOR Presenter: Carl Christoph Jung, Reutlingen University, DE Authors: Christian Silber¹ and Juergen Scheible² ¹Robert Bosch GmbH, DE; ²Reutlingen University, DE Abstract The Bond Calculator is a fast and exact tool to help designers to choose a bond wire, which does not fuse. The Bond Calculator is orders of magnitude faster than FEM and Easy-to-use. The Bond Calculator helps designers to estimate the temperature at the bond connection itself, by calculating the time and space dependence of the power delivered from the bond wire to the chip. These temperature changes can affect the durability of the bond connection. The Bond Calculator uses a simplified simulation model to calculate the temperature profile in a bond wire from the induced current profile. This software tool has been validated by FEM and measurement. More information ...
UB02.4	HIPER-NIRGAM: A TOOL CHAIN BASED FRAMEWORK FOR MODELLING THERMAL - AWARE RELIABILITY ESTIMATION IN 2D MESH NOCS Presenter: Ashish Sharma, Malaviya National Institute of Technology, Jaipur, IN Authors: Manoj Singh Gaur¹, Lava Bhargava¹, Vijay Laxmi¹ and Mark Zwolinski² ¹Malaviya National Institute of Technology, Jaipur, IN; ²University of Southampton, GB Abstract Every three years, power density in system-on-chip (SoCs) gets doubled. As the semiconductor technology is scaling, the number of cores and interconnect network connections are increasing. To improve system performance while meeting permissible power limits, Chip-Multi Processors (CMPs) and many-core processors have emerged as an appealing solution. One of the significant aspects of many-core design is an on chip interconnect network that can effectively support intra-core and inter-core communications. This interconnect should be scalable, support high communication bandwidth and multiple concurrent connections among cores. Network-on-chip (NoC) replaces the traditional bus based interconnect architecture as former is scalable, has higher bandwidth, fault tolerance and offers parallelism. Regular NoC topologies improve scalability too. Adaptive NoC routing solutions distribute power densities and delay onset of hotspot creation. With ever-growing demand of computation and communication bandwidth by applications, the system designer need to consider and address resultant power and thermal issues in SoC as well as NoC design. Design tools need to incorporate thermal effects in design and evaluation of prototypes. Abstract--- Regional temperature differential and hotspots are two thermal problems in network-on-chip. On-chip thermal problems have an adverse impact on system performance and reliability. We propose creation of a toolchain based framework for incorporating thermal evaluation of NoC through existing simulation tools. Our proposed framework provides an integration of NoC simulator with power and thermal simulation models for analyzing the thermal hotspots and can be used for thermal-aware reliability estimation. In our framework, reliability estimation is based on life time failure models such as TDDB (Time dependent dielectric breakdown), NBTI (Negative bias temperature instability) and SM (Stress Migration). In our proposed reliability measurement is based on MTTF (Mean time to failure) comparative value. Our tool chain consists NIRGAM as a NoC simulator, NoC configuration parameters such as number of virtual channel, buffer size, routing logic, simulation cycles and application traffic are passed to power models (Orion 2.0 and McPAT). Power models provide the power trace and area of given NoC configuration. The power model results are further used in Hotspot 5.02 [HOTSPOT] thermal simulation model for generating floorplan and temperature trace (steady temperature file). The steady temperature trace used in reliability estimation tool REST [REST_tool] to estimating MTTF vales. Abstract--- We believe that this generic framework can be used by researchers on academia and industry to incorporate thermal-aware reliability estimation in their design exploration. More information ...
UB02.5	REAL-TIME MULTIPROCESSOR COMPILER DEMO: COMPILER FOR REAL-TIME MULTIPROCESSOR SYSTEMS WITH SHARED ACCELERATORS Presenter: Marco Bekooij, University of Twente, NL Authors: Guus Kuiper, Stefan Geuns, Philip Wilmanns, Joost Hausmans and Marco Bekooij, University of Twente, NL Abstract Accelerators are added in real-time multiprocessor systems for power-efficiency improvement and cost reduction. Sharing of these accelerators improves their utilization but without tool support it also complicates programming. This demonstration shows a multiprocessor compiler for a real-time multiprocessor system that contains support for the sharing of hardware accelerators. The capabilities of this compiler are demonstrated by mapping a packet based GMSK receiver application onto this multiprocessor system. The multiprocessor system is implemented on a Xilinx Virtex-6 FPGA to which an RF front-end is connected. This multiprocessor system contains 16 Microblaze processors and 5 accelerators. With this system a real-time digital audio stream is received and demodulated. More information ...
UB02.6	ISIS: CUSTOMIZABLE RUNTIME VERIFICATION OF HARDWARE/SOFTWARE VIRTUAL PLATFORMS Presenter: Laurence Pierre, TIMA, FR Author: Martial Chabot, TIMA, FR Abstract Debugging today's hardware/software embedded systems is a complex process. We have previously described our tool, ISIS, that enables the runtime Assertion-Based Verification (ABV) of temporal requirements for high-level (SystemC TLM) models of such systems. We present here an extended version of the tool, that gives the user the possibility to customize and to optimize the verification process. More information ...
UB02.8	VHDL TO SYSTEMC TRANSLATION AND ABSTRACTION: SYSTEMC MANIPULATION FRAMEWORK: FROM RTL VHDL TO OPTIMIZED TLM SYSTEMC Presenter: Syed Saif Abrar, Tallinn University of Technology, EE Authors: Syed Saif Abrar, Valentin Tihhomirov, Maksim Jenihhin and Jaan Raik, Tallinn University of Technology, EE Abstract We propose a novel framework for SystemC manipulation based on the open-source hardware design and analysis environment zamiaCAD. The framework provides for optimized VHDL-to-SystemC translation and subsequent abstraction to higher-levels. It includes an Eclipse-based front-end and is integrated to a comprehensive environment for RTL VHDL analysis, simulation and debug. More information ...
UB02.9	ID.FIX: AN EDA TOOL FOR FIXED-POINT REFINEMENT OF EMBEDDED SYSTEMS Presenter: Olivier Sentieys, INRIA, FR Authors: Daniel Menard¹ and Nicolas Simon² ¹INSA Rennes, FR; ²INRIA, FR Abstract Most of digital image and signal processing algorithms are implemented into architectures based on fixed-point arithmetic to satisfy cost and power consumption constraints associated with most of embedded and cyber-physical systems. The fixed-point conversion process (or refinement) is crucial for reducing the time-to-market and design tools to automate this phase and to explore the design space are still lacking. The ID.Fix EDA tool, based on the compiler infrastructure GECOS, allows for the conversion of a floating-point C source code into a C code using fixed-point data types. The data word-lengths are optimized by minimizing the implementation cost under accuracy constraint. To achieve low optimization time, an analytical approach is used to evaluate the fixed-point computation accuracy. This approach is valid for systems made-up of any smooth arithmetic operations. Commercial tools can then be used to synthesize the architecture or to perform software compilation from the output fixed-point description of the application. Thus, the goal is to bridge the gap between the floating-point description developed by algorithm designer and the fixed-point description use as input for high-level synthesis or compilation tools. More information ...
15:00	End of session
16:00	Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.0 LUNCH TIME KEYNOTE SESSION: "How micro-electronic will change your life style" sponsored by Mentor Graphics

Date: Tuesday 10 March 2015
Time: 13:20 - 14:20
Location / Room: Salle Oisans

Chair:
Jean-Marie Saint-Paul, Mentor, FR

Microelectronics is opening new vistas in our lives by changing the way we interact with our environment. Speech recognition, Drones and Robots are not only advanced research topics, these are surrounding our daily activities, sharing our lives and may induce a much larger transformation of our life style than we could have imagined 10 years ago. This session will bring together high profile products and vision to show the ongoing transformation.

Time	Label	Presentation Title Authors
13:20	3.0.1	NEW LIFE STYLES BEYOND YOUR DREAMS Speaker: Thierry Collette, CEA-Leti, FR
13:30	3.0.2	DRONES THAT FLY FOR YOU Speaker: Nicolas Besnard, Parrot, FR
13:50	3.0.3	ROBOTS THAT LIVE WITH YOU Speaker: Rodolphe Gelin, Aldebaran, FR
14:20		End of session
16:00		Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.1 Executive Panel - Extending Moore's Law & Heterogeneous Integration

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Salle Oisans

Organiser:
Yervant Zorian, Fellow & Chief Architect, Synopsys, US

Systemic scaling in today's new applications is dramatically impacting the semiconductor industry. As a result, certain applications are moving to new advanced semiconductor nodes and others are adopting heterogeneous integration using multi-die modules. In addition to the technical challenges in each case, these solutions significantly affect the dependency between eco-system players necessitating smooth interdependency between them. The executives in this session will discuss the solutions in the semiconductor industry and their impact on the eco system players.

Panelists:

Antun Domic, Synopsys, US
Rudy Lauwereins, IMEC, BE
Didier Tettart, TSMC Europe, NL

16:00

End of session
Coffee Break in Exhibition Area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00

3.2 Passive Implementation Attacks and Countermeasures

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Belle Etoile

Chair:
Philippe Maurine, CEA-TECH, FR

Co-Chair:
Francesco Regazzoni, AlaRI, CH

Passive implementation attacks are a major security threat for embedded systems. This session focuses on improving side-channel analysis considering reliable key extraction, information leakage of static power consumption, and processor instructions. It also presents a monitor to defeat write-back attacks on caches.

Time	Label	Presentation Title Authors
14:30	3.2.1	RELIABLE INFORMATION EXTRACTION FOR SINGLE TRACE ATTACKS Speakers: Valentina Banciu, Elisabeth Oswald and Carolyn Whitnall, University of Bristol, GB Abstract Side-channel attacks using only a single trace crucially rely on the capability of reliably extracting side-channel information (e.g. Hamming weights of intermediate target values) from traces. In particular, in original versions of simple power analysis (SPA) or algebraic side channel attacks (ASCA) it was assumed that an adversary can correctly extract the Hamming weight values for all the intermediates used in an attack. Recent developments in error tolerant SPA style attacks relax this unrealistic requirement on the information extraction and bring renewed interest to the topic of template building or training suitable machine learning classifiers. In this work we ask which classifiers or methods, if any, are most likely to return the true Hamming weight among their first (say s) ranked outputs. We experiment on two data sets with different leakage characteristics. Our experiments show that the most suitable classifiers to reach the required performance for pragmatic SPA attacks are Gaussian templates, Support Vector Machines and Random Forests, across the two data sets that we considered. We found no configuration that was able to satisfy the requirements of an error tolerant ASCA in case of complex leakage. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.2.2	SCANDALEE: A SIDE-CHANNEL-BASED DISASSEMBLER USING LOCAL ELECTROMAGNETIC EMANATIONS Speakers: Daehyun Strobel, Florian Bache, David Oswald, Falk Schellenberg and Christof Paar, Horst Görtz Institute for IT-Security, Ruhr-University Bochum, DE Abstract Side-channel analysis has become a well-established topic in the scientific community and industry over the last one and a half decade. Somewhat surprisingly, the vast majority of work on side-channel analysis has been restricted to the "use case" of attacking cryptographic implementations through the recovery of keys. In this contribution, we show how side-channel analysis can be used for extracting code from embedded systems based on a CPU's electromagnetic emanation. There are many applications within and outside the security community where this is desirable. In cryptography, it can, e.g., be used for recovering proprietary ciphers and security protocols. Another broad application field is general security and reverse engineering, e.g., for detecting IP violations of firmware or for debugging embedded systems when there is no debug interface or it is proprietary. A core feature of our approach is that we take localized electromagnetic measurements that are spatially distributed over the IC being analyzed. Given these multiple inputs, we model code extraction as a classification problem that we solve with supervised learning algorithms. We apply a variant of linear discriminant analysis to distinguish between the multiple classes. In contrast to previous approaches, which reported instruction recognition rates between 40-70%, our approach detects more than 95% of all instructions for test code, and close to 90% for real-world code. The methods are thus very relevant for use in practice. Our method performs dynamic code recognition, which has both advantages (only the program parts that are actually executed are observed) but also limitations (rare code executions are difficult to observe). Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.2.3	SIDE-CHANNEL ATTACKS FROM STATIC POWER: WHEN SHOULD WE CARE? Speakers: Santos Merino del Pozo¹, Francois-Xavier Standaert¹, Dina Kamel¹ and Amir Moradi² ¹UCL Crypto Group, BE; ²Ruhr University Bochum, DE Abstract Static power consumption is an increasingly important concern when designing circuits in deep submicron technologies. Besides its impact for low-power implementations, recent research has investigated whether it could lead to exploitable side-channel leakages. Both simulated analyses and measurements from FPGA devices have confirmed that such a static signal can indeed lead to successful key recoveries. In this respect, the main remaining question is whether it can become the target of choice for actual adversaries, especially since it has smaller amplitude than its dynamic counterpart. In this paper, we answer this question based on actual measurements taken from an AES S-box prototype chip implemented in a 65-nanometer CMOS technology. For this purpose, we first provide a fair comparison of the static and dynamic leakages in a univariate setting, based on worst-case information theoretic analysis. This comparison confirms that the static signal is significantly less informative than the dynamic one. Next, we extend our evaluations to a multivariate setting. In this case, we observe that simple averaging strategies can be used to reduce the noise in static leakage traces. As a result, we mainly conclude that (a) if the target chip is working at maximum clock frequency (which prevents the previously mentioned averaging), the static leakage signal remains substantially smaller than the dynamic one, so has limited impact, and (b) if the adversary can reduce the clock frequency, the noise of the static leakage traces can be reduced arbitrarily. Whether the static signal leads to more informative leakages than the dynamic one then depends on the quality of the measurements (as the former one has very small amplitude). But it anyway raises a warning flag for the implementation of algorithmic countermeasures such as masking, that require high noise levels. Download Paper (PDF; Only available from the DATE venue WiFi)
15:45	3.2.4	EXTRAX: SECURITY EXTENSION TO EXTRACT CACHE RESIDENT INFORMATION FOR SNOOP-BASED EXTERNAL MONITORS Speakers: Jinyong Lee¹, Yongje Lee², Hyungon Moon¹, Ingoo Heo¹ and Yunheung Paek¹ ¹Seoul National University, KR; ²Seoul National University, Samsung Electronics Co., Ltd., KR Abstract Advent of rootkits has urged researchers to conduct much research on defending the integrity of OS kernels. Even though recently proposed snoop-based monitors have shown to provide higher performance and security level compared to conventional hypervisor-based monitors, we discovered that the use of write-back caches in a system would seriously undermine the effectiveness of snoop-based monitors. To address the problem, we propose a special hardware unit called Extrax which makes use of existing hardware logic, core debugging interface, to extract necessary information for security monitoring. Being implemented to refine the debug information for security purposes, Extrax assists snoop-based monitors to detect attacks that exploit write-back caches. Experimental results show that our system can detect more advanced attacks, which the state-of-the-art snoop-based hardware monitors cannot capture, with moderate area overhead and power consumption. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.3 Loop Acceleration

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Stendhal

Chair:
Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE

Co-Chair:
Benjamin Schafer, Hong Kong Polytechnic University, HK

This session reveals novel loop optimization techniques in high-level synthesis for resolving area overhead and communication bottlenecks in nested loops and/or multidimensional arrays. The first talk leverages loop-array dependencies for loop partitioning to reduce the dimension of the design space in order to ease the design complexity. The second talk quantifies a relationship between loop unrolling and partitioning, based on which area reduction methods are proposed by controlling the degree of loop unrolling. The third talk then resolves communication bottlenecks in embedded accelerators through inter-tile data reuse on loop optimizations.

Time	Label	Presentation Title Authors
14:30	3.3.1	EXPLOITING LOOP-ARRAY DEPENDENCIES TO ACCELERATE THE DESIGN SPACE EXPLORATION WITH HIGH LEVEL SYNTHESIS Speakers: Nam Khanh Pham¹, Amit Kumar Singh², Akash Kumar³ and Mi Mi Aung Khin⁴ ¹ECE Department, National University of Singapore, SG; ²University of York, GB; ³National University of Singapore, SG; ⁴Data Storage Institute (DSI), ASTAR, Singapore., SG Abstract* Recently, the requirement of shortened design cycles has led to rapid development of High Level Synthesis (HLS) tools that convert system level descriptions in a high level language into efficient hardware designs. Due to the high level of abstraction, HLS tools can easily provide multiple hardware designs from the same behavioral description. Therefore, they allow designers to explore various architectural options for different design objectives. However, such exploration has exponential complexity, making it practically impossible to explore the entire design space. The conventional approaches to reduce the design space exploration (DSE) complexity do not analyze the structure of the design space to limit the number of design points. To fill such a gap, we explore the structure of the design space by analyzing the dependencies between loops and arrays. We represent these dependencies as a graph that is used to reduce the dimensions of the design space. Moreover, we also examine the access pattern of the array and utilize it to find the efficient partition of arrays for each loop optimization parameter set. The experimental results show that our approach provides almost the same quality of result as the exhaustive DSE approach while significantly reducing the exploration time with an average of speed-up of 14x. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.3.2	INTERPLAY OF LOOP UNROLLING AND MULTIDIMENSIONAL MEMORY PARTITIONING IN HLS Speakers: Alessandro Cilardo and Luca Gallo, University of Naples Federico II, IT Abstract This paper deals with memory partitioning in the context of high-level synthesis for FPGA technologies. In particular, the work focuses on the area overhead caused by partitioning and sheds light on the interplay with a technique commonly used in HLS, i.e., loop unrolling. As a practical outcome, the study proposes a solution to reduce the area overhead by appropriately controlling the degree of loop unrolling. The experimental results confirm the significance of the analysis as well as the effectiveness of the proposed optimization technique. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.3.3	INTER-TILE REUSE OPTIMIZATION APPLIED TO BANDWIDTH CONSTRAINED EMBEDDED ACCELERATORS Speakers: Maurice Peemen, Bart Mesman and Henk Corporaal, Eindhoven University of Technology, NL Abstract The adoption of High-Level Synthesis (HLS) tools has significantly reduced accelerator design time. A complex scaling problem that remains is the data transfer bottleneck. To scale-up performance accelerators require huge amounts of data, and are often limited by interconnect resources. In addition, the energy spent by the accelerator is dominated by the transfer of data, either in the form of memory references or data movement on interconnect. In this paper we drastically reduce accelerator communication by exploration of computation reordering and local buffer usage. Consequently, we present a new analytical methodology to optimize nested loops for inter-tile data reuse with loop transformations like interchange and tiling. We focus on embedded accelerators that can be used in a multi-accelerator System on Chip (SoC), so performance, area, and energy are key in this exploration. 1) On three common embedded applications in the image/video processing domain (demosaicing, block matching, object detection), we show that our methodology reduces data movement up to 2.1x compared to the best case of intra-tile optimization. 2) We demonstrate that our small accelerators (1-3% FPGA resources) can boost a simple MicroBlaze soft-core to the performance level of a high-end Intel-i7 processor. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.4 Tackling Memory Walls with Emerging Architectures and Technologies

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Chartreuse

Chair:
Akash Kumar, NUS, SG

Co-Chair:
Cristina Silvano, Politecnico di Milano, IT

This session focuses on various aspects of memory system design including non-volatile reconfigurable cache design, cache directory design, shared DRAM access, and writeback policies for stacked-die caches.

Time	Label	Presentation Title Authors
14:30	3.4.1	SELECTDIRECTORY: A SELECTIVE DIRECTORY FOR CACHE COHERENCE IN MANY-CORE ARCHITECTURES Speakers: Yuan Yao¹, Guanhua Wang², Zhiguo Ge³, Tulika Mitra², Wenzhi Chen¹ and Naxin Zhang³ ¹Zhejiang University, CN; ²National University of Singapore, SG; ³Huawei, SG Abstract As we move into many-core era fueled by Moore's Law, it has become unprecedentedly challenging to provide the shared memory abstraction through directory-based cache coherence. The main difficulty is the high area and power overhead of the directory in tracking the presence of a memory block in all the private caches. Sparse directory offers relatively better design trade-offs by decoupling the coherence meta-data from the last-level cache (LLC); but still suffers from high area/power issues. In this work, we propose a compact directory design by exploiting the observation that a significant fraction of the memory blocks are temporarily exclusive in the cache hierarchy and hence only needs minimal sharer information. Inspired by this observation, we propose to further decouple the tag array from the coherence meta-data array in the sparse directory and allocate a sharer list only for the actively shared blocks. Experimental results reveal that our proposal, called SelectDirectory, can substantially save directory storage area and energy without sacrificing performance. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.4.2	DYRECTAPE: A DYNAMICALLY RECONFIGURABLE CACHE USING DOMAIN WALL MEMORY TAPES Speaker: Rangharajan Venkatesan, NVIDIA Corporation, US Authors: Ashish Ranjan¹, Shankar Ganesh Ramasubramanian¹, Rangharajan Venkatesan², Vijay Pai¹, Kaushik Roy¹ and Anand Raghunathan¹ ¹Purdue University, US; ²NVIDIA Corporation, US Abstract Spintronic memories offer superior density, non-volatility and ultra-low standby power compared to CMOS memories, and have consequently attracted great interest in the design of on-chip caches. Domain Wall Memory (DWM) is a spintronic memory technology with unparalleled density arising from a tape-like structure. However, such a structure involves serialized access to the bits stored in each bit-cell, resulting in increased access latency, and thereby degrading performance. Prior efforts address this challenge either by limiting the number of bits per tape, in effect sacrificing the density benefits of DWM, or through cache management policies that can only partly alleviate the shift overhead. We observe that there exists significant heterogeneity in sensitivity to cache capacity and access latency across different applications, and across distinct phases of an application. We also make the key observation that DWM tapes offer a natural mechanism to tradeoff density for access latency by limiting the number of domains of each tape that are actively used to store cache data. Based on this insight, we propose DyReCTape, a dynamically reconfigurable cache that packs maximum bits per tape and leverages the intrinsic capability of DWMs to modulate the active bits per tape with minimal overhead. DyReCTape uses a history-based reconfiguration policy that tracks the number of shift operations incurred and miss rate to appropriately tailor the capacity and access latency of the DWM cache. We further propose two performance optimizations to DyReCTape: (i) a lazy migration policy to mitigate the overheads of reconfiguration, and (ii) re-use of the portion of the cache that is unused (due to reconfiguration) as a victim cache to reduce the number of off-chip accesses. We evaluate DyReCTape using applications from the PARSEC and SPLASH benchmark suites. Our experiments demonstrate that DyReCTape achieves 19.8% performance improvement over an iso-area SRAM cache and 11.7% performance improvement (due to a 3.4X reduction in the number of shifts) over a state-of-the-art DWM cache. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.4.3	COOPERATIVELY MANAGING DYNAMIC WRITEBACK AND INSERTION POLICIES IN A LAST-LEVEL DRAM CACHE Speakers: Shouyi YIN¹, Jiakun Li¹, Leibo Liu², Shaojun Wei¹ and Yike Guo³ ¹Tsinghua University, CN; ²Institute of Microelectronics and The National Lab for Information Science and Technology, Tsinghua University, CN; ³Imperial College, London, UK, GB Abstract Stacked-DRAM used as the last-level caches (LLC) in multi-core systems delivers performance enhancement due to its capacity benefit. While the performance of LLC depends heav- ily upon its block replacement policy, conventional replacement policy needs redesigning to exploit the best of DRAM cache while avoiding its drawbacks. Existing DRAM cache insertion policy blindly forwards victim lines to the off-chip memory, regardless of the potential for increased hits by placing a fraction of them in the DRAM cache; nevertheless, a na ̈ıve design that steers all dirty victims to the DRAM cache introduces excessive writeback traffic which aggravates capacity misses and DRAM interference. To leverage insertions in terms of writeback or fill requests, we propose a cooperative writeback and insertion policy that adapts to the distinct access patterns of heterogeneous applications based on runtime misses and writeback efficiency, thereby increasing HMIPC (harmonic instruction per cycle) throughput by 22.2%, 13.7% and 14.5% compared to LRU and two static writeback policies. Download Paper (PDF; Only available from the DATE venue WiFi)
15:45	3.4.4	A GENERIC, SCALABLE AND GLOBALLY ARBITRATED MEMORY TREE FOR SHARED DRAM ACCESS IN REAL-TIME SYSTEMS Speakers: Manil Dev Gomony¹, Jamie Garside², Benny Akesson³, Neil Audsley² and Kees Goossens¹ ¹Eindhoven University of Technology, NL; ²University of York, GB; ³Czech Technical University in Prague, CZ Abstract Predictable arbitration policies, such as Time Division Multiplexing (TDM) and Round-Robin (RR), are used to provide firm real-time guarantees to clients sharing a single memory resource (DRAM) between the multiple memory clients in multi-core real-time systems. Traditional centralized implementations of predictable arbitration policies in a shared memory bus or interconnect are not scalable in terms of the number of clients. On the other hand, existing distributed memory interconnects are either globally arbitrated, which do not offer diverse service according to the heterogeneous client requirements, or locally arbitrated, which suffers from larger area, power and latency overhead. Moreover, selecting the right arbitration policy according to the diverse and dynamic client requirements in reusable platforms requires a generic re-configurable architecture supporting different arbitration policies. The main contributions in this paper are: (1) We propose a novel generic, scalable and globally arbitrated memory tree (GSMT) architecture for distributed implementation of several predictable arbitration policies. (2) We present an RTL-level implementation of Accounting and Priority assignment (APA) logic of GSMT that can be configured with five different arbitration policies typically used for shared memory access in real-time systems. (3) We compare the performance of GSMT with different centralized implementations by synthesizing the designs in a 40 nm process. Our experiments show that with 64 clients GSMT can run up to four times faster than traditional architectures and have over 51% and 37% reduction in area and power consumption, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.5 Breaking Simulation Boundaries

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Meije

Chair:
Elena Ioana Vatajelu, Politecnico di Torino, IT

Co-Chair:
Florian Letombe, Synopsys, FR

Faster, faster, faster .... that's all you expect when you are simulating your designs. This session takes you through a journey of super fast simulation techniques at different abstraction levels.

Time	Label	Presentation Title Authors
14:30	3.5.1	VARIATION-AWARE EVALUATION OF MPSOC TASK ALLOCATION AND SCHEDULING STRATEGIES USING STATISTICAL MODEL CHECKING Speakers: Mingsong Chen¹, Daian Yue¹, Xiaoke Qin², Xin Fu³ and Prabhat Mishra² ¹East China Normal University, CN; ²University of Florida, US; ³University of Houston, US Abstract To maximize the overall performance yield, variation-aware analysis is becoming a key step in Multiprocessor System-on-Chip (MPSoC) Task Allocation and Scheduling (TAS). Although various approaches have been investigated to improve performance yields, most of them cannot perform quantitative comparison among existing TAS heuristics, which is important for MPSoC designers to make decisions. Based on the statistical model checker UPPAAL-SMC, we propose a framework that can automatically evaluate the performance yield of TAS strategies under time and power constraints with variations. Experimental results show that our approach can not only filter inferior strategies efficiently, but also support the automated tuning of architecture and constraint parameters to achieve the required performance yield. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.5.2	A FAST PARALLEL SPARSE SOLVER FOR SPICE-BASED CIRCUIT SIMULATORS Speaker: Hehe Li, Department of Electronic Engineering, Tsinghua University, CN Authors: Xiaoming Chen, Yu Wang and Huazhong Yang, Tsinghua University, CN Abstract The sparse solver is a serious bottleneck in SPICEbased circuit simulators. Although several existing researches have proposed some circuit simulation-oriented parallel solvers, there is still some room to improve the speed and scalability of these solvers. This paper proposes a fast parallel sparse solver based on a pivoting-reduction technique which takes full advantage of features of circuit simulation. Experimental results show that on average, the proposed solver is up to 50% faster than the state-of-the-art solver NICSLU, and up to 3.3X faster than KLU. Real DC simulation reveals that our solver is faster than NICSLU, PARDISO, and commercial solvers. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.5.3	MRP: MIX REAL CORES AND PSEUDO CORES FOR FPGA-BASED CHIP-MULTIPROCESSOR SIMULATION Speakers: Xinke Chen¹, Guangfei Zhang², Huandong Wang³, Ruiyang Wu¹, Peng Wu¹ and Longbing Zhang¹ ¹Institute of Computing Technology, CAS, CN; ²Shannon Laboratory, Huawei Technologies Co., Ltd, CN; ³Loongson Technology Corporation Limited, CN Abstract Facing the speed bottleneck of software-based simulators, FPGA-based simulation has been explored more and more. This paper proposes a novel methodology to simulate a chip-multiprocessor (CMP) on the limited FPGA resource. By mixing real cores and pseudo cores together (MRP), we can simulate a multicore system with fewer FPGA resource requirements and achieve a much higher simulation speed. We propose several methods to construct the pseudo cores. We implement our idea on a dual Virtex-6 FPGA board to simulate a general-purpose 4-core high performance CMP processor. Comparison experiments against the corresponding tape-out chip prove the effectiveness of MRP. We also evaluate MRP prototype's performance by running SPEC CPU2006 benchmarks on an unmodified Linux operating system, achieving tens to hundreds speedup compared to two other commonly-used simulators. Download Paper (PDF; Only available from the DATE venue WiFi)
15:45	3.5.4	SOURCE LEVEL PERFORMANCE SIMULATION OF GPU CORES Speakers: Christoph Gerum¹, Oliver Bringmann² and Wolfgang Rosenstiel¹ ¹University of Tuebingen, DE; ²University of Tuebingen / FZI, DE Abstract Graphic processing units (GPUs) contain a lot of complex architectural features, which make performance analysis and simulation of applications using them for general purpose computation very difficult. Especially when trying to do performance simulations at a higher abstraction level than interpreted instruction set simulators these features are not handled accurately by state of the art simulation techniques. This paper proposes a method for source level performance simulation of the microarchitecture of a GPU core that provides high enough simulation speeds to make testing of large application scenarios possible. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP1-14, 595	FAST AND ACCURATE BRANCH PREDICTOR SIMULATION Speakers: Antoine Faravelon, Nicolas Fournel and Frédéric Pétrot, TIMA Laboratory, Université de Grenoble-Alpes/CNRS, FR Abstract Embedded processors complexity has raised dramatically, due to the addition of architectural add-ons which improve performances significantly. High level models used in system simulation usually ignore these additions as the major issue is functional correctness. However, accurate estimates of software execution is sometimes required, therefore we focus in this paper on one of theses architectural features, the branch predictor. Unfortunately, advanced branch predictors use large tables, and a direct implementation of the scheme slows down simulation dramatically. To limit the simulation overhead, we define a modeling approach that we demonstrate on a state-of-the art predictor. We implemented the model in a dynamic binary translation based instruction set simulator and measured an accuracy of prediction of about 95% for a run-time overhead inferior to 5%. Download Paper (PDF; Only available from the DATE venue WiFi)
16:01	IP1-15, 605	COMPARATIVE STUDY OF TEST GENERATION METHODS FOR SIMULATION ACCELERATORS Speakers: Wisam Kadry¹, Dimtry Krestyashyn¹, Arkadiy Morgenshtein¹, Amir Nahir¹, Vitali Sokhin¹, Jae Cheol Son², Wookyeong Jeong², Sung-Boem Park² and Jin Sung Park² ¹IBM Research - Haifa, IL; ²Samsung, KR Abstract Hardware-accelerated simulation platforms are quickly becoming a major vehicle for the functional verification of modern systems and processors. Accelerator platforms provide functional verification with valuable simulation cycles. Yet, the high cost and limited bandwidth of accelerator platforms dictate a requirement for continuous utilization improvement. In this work, we perform a comparative analysis of two approaches of test generation for accelerator platforms. An exerciser tool is used as experimental vehicle for the study. An off-platform test generation methodology is implemented and is compared to on-platform test generation typically used in exercisers. We present experimental results from simulation of latest IBM POWER8 processor on Awan accelerator platform, as well as from simulation of an eight-core ARMv8-based design on Veloce emulation platform. Our results indicate that the utilization of accelerator platforms can be improved by up to ×7 ratio when using off-platform test generation. In addition, increase of up to 24% is observed in test coverage. Off-platform mode features significantly bigger image size, but maintains tolerable build and load times. Download Paper (PDF; Only available from the DATE venue WiFi)
16:02	IP1-16, 171	USING STRUCTURAL RELATIONS FOR CHECKING COMBINATIONALITY OF CYCLIC CIRCUITS Speakers: Wan-Chen Weng¹, Yung-Chih Chen², Jui-Hung Chen¹, Ching-Yi Huang¹ and Chun-Yao Wang¹ ¹National Tsing Hua University, TW; ²Yuan Ze University, TW Abstract Functionality and combinationality are two main issues that have to be dealt with in cyclic combinational circuits, which are combinational circuits containing loops. Cyclic circuits are combinational if nodes within the circuits have definite values under all input assignments. For a cyclified circuit, we have to check whether it is combinational or not. Thus, this paper proposes an efficient two-stage algorithm to verify the combinationality of cyclic circuits. A set of cyclified IWLS 2005 benchmarks are performed to demonstrate the efficiency of the proposed algorithm. Compared to the state-of-the-art algorithm, our approach has a speedup of about 4000 times on average. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.6 Hot Topic - Memristor based Computation-in-Memory Architecture for Data-Intensive Applications

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Bayard

Organisers:
Said Hamdioui, TU Delft, NL
Koen Bertels, TU Delft, NL

In today's data-intensive applications (known as Big Data problems), such as healthcare (e.g., use of genetic information to diagnose and treat diseases), social media, engineering (e.g. large scientific experiments), the primary goal is to increase the understanding of processes in order to extract so much potential and highly useful information hidden in the huge volume of data, which in turn can be used to increase the productivity. As the speed of information growth exceeds Moore's Law at the beginning of this century, excessive data is making great troubles to human beings. At the same time, Big Data arises with many challenges, such as data capture, data storage, data analysis and data visualization. Performing data analysis within economically affordable time and energy is the pillar to solve big data problems, and therefore extract extremely valuable information. The increase of the data size has already surpassed the capabilities of today's computation architectures which suffer from communication bottleneck due to limited bandwidth. For instance, the transfer of 1 petabyes data at a rate of 1000MB/second will cost 12.5 days! Communication and memory access does not only kill the performance, but also energy/power (more than between 70% and 90% such applications). Even the CMOS technology used to implement today's architectures contributes to such power due to the higher leakage; not to mention the limited scalability (as it is becoming very costly), reduced reliability (as it degrades faster), etc. In conclusion, today's CMOS based architecture are not able to provide the computation capability needed for data-intensive applications. New architectures based new technologies are therefore needed. This Hot-Topic Session will address the concept of "Computing-in-memory (CIM)" and discuss a new Memristor Based Architecture Paradigm for Data-Intensive applications, as an alternative architecture. The concept is based on performing the storage and computation in the same crossbar topology (non Von- Neumann architecture) where the key device is the non-volatile resistive switching element (memristor). CIM architecture is able significantly push the "memory wall", while the memristor device is able to reduce the static power to practically zero.

Time	Label	Presentation Title Authors
14:30	3.6.1	DATA-INTENSIVE APPLICATIONS- A MAJOR CHALLENGE AHEAD Speaker: Jan van Lunteren, IBM Research, CH
15:00	3.6.2	CIM ARCHITECTURE- BEYOND VON NEUMANN Speakers: Koen Bertels¹ and Henk Coorporal² ¹Delft University of Technology, NL; ²Eindhoven University of Technology, NL
15:30	3.6.3	MEMRISTIVE DEVICES - THE KEY ENABLER FOR CIM ARCHITECTURE IMPLEMENTATION Speaker: Eike Linn, RWTH Aachen University, DE
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.7 Model-based Analysis and Verification

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Les Bans

Chair:
Saddek Bensalem, Université Joseph Fourier, FR

Co-Chair:
Linh Thi Xuan Phan, University of Pennsylvania, US

This session focuses on the analysis and verification in model-based design of embedded systems. It has four regular papers : The first paper presents a delay analysis method for a general graph based workload model. The second one presents a new formal approach to verifying Interrupt-driven software based on symbolic execution. The third one proposes a method for model-based verification (and arguably implementation) of real-time systems, where the original model is expressed as a network of UPPAAL timed automata (PIM). The fourth one presents a generic method to automatically generate a symbolic executor for a given hardware architecture specified by some Architecture Description Language, which is used to verify program properties regarding the binary code level. In this session we have also an IP paper, which presents a technique for estimating non-functional requirements using a Knowledge Discovery in Databases (KDD) approach.

Time	Label	Presentation Title Authors
14:30	3.7.1	DELAY ANALYSIS OF STRUCTURAL REAL-TIME WORKLOAD Speakers: Nan Guan¹, Yue Tang², Yang Wang² and Wang Yi³ ¹Uppsala University, SE; ²Northeastern University, CN; ³Uppsala University, CN Abstract In many complex embedded systems, real-time workload is generated conforming certain structural constraints. In this paper we study how to analyze the delay of real-time workload of which the generation pattern can be modeled by task graphs. We first show that directly combining path abstraction technique (PAT) in real-time scheduling theory and real-time calculus (RTC) can provide safe delay bounds, but the results are typically over-pessimistic. Then we propose new algorithms to efficiently and precisely solve the delay analysis problem. Experiments with randomly generated task systems are conducted to evaluate the performance of the proposed methods. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.7.2	EFFECTIVE VERIFICATION OF LOW-LEVEL SOFTWARE WITH NESTED INTERRUPTS Speakers: Daniel Kroening¹, Lihao Liang¹, Tom Melham¹, Peter Schrammel¹ and Michael Tautschnig² ¹University of Oxford, GB; ²Queen Mary, University of London, GB Abstract Interrupt-driven software is difficult to test and debug, especially when interrupts can be nested and subject to priorities. Interrupts can arrive at arbitrary times, leading to an explosion in the number of cases to be considered. We present a new formal approach to verifying interrupt-driven software based on symbolic execution. The approach leverages recent advances in the encoding of the execution traces of interacting, concurrent threads. We assess the performance of our method on benchmarks drawn from embedded systems code and device drivers, and experimentally compare it to conventional formal approaches that use source-to-source transformations. Our experimental results show that our method significantly outperforms conventional techniques. To the best of our knowledge, our technique is the first to demonstrate effective formal verification of low-level embedded software with nested interrupts. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.7.3	PLATFORM-SPECIFIC TIMING VERIFICATION FRAMEWORK IN MODEL-BASED IMPLEMENTATION Speakers: Baek Gyu Kim, Lu Feng, Linh T.X. Phan, Oleg Sokolsky and Insup Lee, University of Pennsylvania, US Abstract In the model-based implementation methodology, the timed behavior of the software is typically modeled independently of the platform-specific timing semantics such as the delay due to scheduling or I/O handling. Although this approach helps to reduce the complexity of the model, it leads to timing gaps between the model and its implementation. This paper proposes a platform-specific timing verification framework that can be used to formally verify the timed behavior of an implementation that has been developed from a platform-independent model. We first describe a way to categorize the interactions among the software, a platform, and the environment in the form of implementation schemes. We then present an algorithm that systematically transforms a platform-independent model into a platform-specific model under a given implementation scheme. This transformation algorithm ensures that the timed behavior of the platform-specific model is close to that of the corresponding implementation. Our case study of an infusion pump system shows that the measured timing delay of the system is bounded by the formally verified bound of its platform-specific model. Download Paper (PDF; Only available from the DATE venue WiFi)
15:45	3.7.4	ARCHITECTURE DESCRIPTION LANGUAGE BASED RETARGETABLE SYMBOLIC EXECUTION Speaker: Andreas Ibing, TU München, DE Abstract This paper presents an approach to retargetable SMT-constrained symbolic execution of machine code. The retargetability is based on an existing open-source processor architecture description language which has been used for processor design and automatic generation of toolchains for dynamic program analysis. The benefit of the presented approach is that with a given architecture description, no manual writing of an instruction set grammar or of a translation of instruction semantics into logics is necessary. The proposed tool architecture relies on language reflection, code generation and dynamic loading to retarget symbolic execution to different machine code syntax. Instruction semantics is translated into SMT bit-vector logic equations by symbolically interpreting the architecture description language. The approach is implemented as plug-in extension to the Eclipse IDE and evaluated by automatically detecting integer overflows in binaries for the ARMv5 and SPARCv8 architectures. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP1-17, 877	NFRS EARLY ESTIMATION THROUGH SOFTWARE METRICS Speakers: Andrws Vieira¹, Pedro Faustini¹, Luigi Carro² and Erika Cota¹ ¹Federal University of Rio Grande do Sul (UFRGS), BR; ²Federal University of Rio Grande do Sul (UFRGS), Abstract We propose the use of regression analysis to generate accurate predictive models for physical metrics using design metrics as input. We validate our approach with 40+ implementations of three systems in two development scenarios: system evolution and first design. Results show maximum prediction errors of 1.66% during system evolution. In a first design scenario, the average error is 15% with the maximum error still below 20% for all physical metrics. This approach provides a fast and accurate strategy to boost embedded software productivity and quality, by estimating Non-Functional Requirements (NFRs) during the first design stages. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

3.8 Hot Topic - Design Methodologies for a Cyber-Physical Systems Approach to Personalized Medicine-on-a-Chip: Challenges and Opportunities

Date: Tuesday 10 March 2015
Time: 14:30 - 16:00
Location / Room: Salle Lesdiguières

Organiser:
Krishnendu Chakrabarty, Duke University, US

Chair:
Paul Pop, Technical University of Denmark, DK

Co-Chair:
Mohammad Abdullah Al Faruque, University of California Irvine, US

Modern stressful and sedentary lifestyles coupled with inadequate, irregular and inappropriate sleep patterns and diet have contributed not only to increased prevalence of chronic diseases but also to increased healthcare costs. To address these emerging clinical and healthcare challenges, in this special session, we advocate for a cross-disciplinary approach to cyber-physical systems design (CPS) aiming at seamlessly and safely integrate sensing, computation, communication, control and actuation for developing new technology for personalized and precise medicine.

Time	Label	Presentation Title Authors
14:30	3.8.1	ERROR RECOVERY IN DIGITAL MICROFLUIDICS FOR PERSONALIZED MEDICINE Speakers: Mohamed Ibrahim and Krishnendu Chakrabarty, Duke University, US Abstract Due to its emergence as an efficient platform for point-of-care clinical diagnostics, design optimization of digital-microfluidic biochips (DMFBs) has received considerable attention in recent years. In particular, error recoverability is of key interest in medical applications due to the need for system reliability. Errors are likely during droplet manipulation due to defects, chip degradation, and the lack of precision inherent in biochemical experiments. We present an illustrative survey on recently proposed techniques for error recovery. The parameters of the error-recovery design space are shown and evaluated for these schemes. Next, we make use of these evaluations to describe how they can guide error recovery in DMFBs. Finally, an experimental case study is presented to demonstrate how an error-recovery scheme can be applied to real-life biochips. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.8.2	A CYBER-PHYSICAL SYSTEMS APPROACH TO PERSONALIZED MEDICINE: CHALLENGES AND OPPORTUNITIES FOR NOC-BASED MULTICORE PLATFORMS Speaker: Paul Bogdan, University of Southern California, US Abstract This paper describes a few fundamental challenges concerning the design of Network-on-Chip (NoC) based multicores as the backbone of cyber-physical systems (CPS) for personalized medicine. One fundamental challenge in designing such CPS architectures is the need for a unifying mathematical description of the dynamical interactions between bio-physiological processes and cyber states. Another fundamental challenge is to build a rigorous mathematical optimization framework that allows the CPS to adapt to varying workloads and demands. To enable large-scale parallelism, we need a rigorous understanding of the CPS workloads that can guide the design and optimization of wired and wireless NoCs. We advocate for the development of goal-oriented self-organization algorithms that seek to both optimize specific design cost functions and maximize information about future system state. It is necessary to identify basic local rules of interaction not only for solving large scale optimization problems in a distributed fashion, but also for inducing an overall degree of autonomy and intelligence in the CPS architecture. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.8.3	ON-CHIP NETWORK-ENABLED MANY-CORE ARCHITECTURES FOR COMPUTATIONAL BIOLOGY APPLICATIONS Speakers: Turbo Majumder¹, Partha Pande² and Ananth Kalyanaraman² ¹Indian Institute of Technology Delhi, IN; ²Washington State University, US Abstract Computational molecular biology applications are at the heart of the backend processing in cyber-physical systems when applied to domains such as drug discovery, personalized medicine and genetic disease risk assessment. These applications are characterized by the preponderance of data and computational complexity, and yet require reasonably fast processing in order to have any meaningful impact. As such, hardware acceleration for these applications have generated a lot of research interest. In this paper, we discuss the superiority of Network-on-Chip (NoC)-enabled many-core platforms over other conventional platforms in both the quantum of speedup achieved and the amount of energy consumed. We hence posit that research in NoC-enabled platforms for CPS applications will be a major enabler of future scientific and medical breakthroughs. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

UB03 Session 3

Date: Tuesday 10 March 2015
Time: 15:00 - 17:30
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB03.1	4-LOOP: 4-CORE LEON 3 WITH LINUX OPERATING SYSTEM, OPENMP LIBRARY AND HARDWARE PROFILING SYSTEM Presenters: Giacomo Valente, Vittoriano Muttillo and Andrea Moro, University of L'Aquila, IT Authors: Vittoriano Muttillo and Fabio Federici, Abstract Multi-processor SoC based on soft-cores are increasing the range of applications that could be implemented by exploiting FPGAs. In this context, this demo presents a symmetric multi-processor system, composed of four Leon 3 cores and a custom Linux kernel, able to execute OpenMP-based applications and enhanced with a hardware profiling system. OpenMP support and the novel profiling system are the results of R&D activities conducted by several students and Professors at University of L'Aquila. More information ...
UB03.2	PARLOMA: A REMOTE COMMUNICATION SYSTEM FOR DEAFBLIND PEOPLE Presenter: Ludovico Orlando Russo, Politecnico di Torino, IT Authors: Giuseppe Airò Farulla¹, Marco Indaco¹, Calogero Maria Oddo², Daniele Pianu³, Paolo Prinetto¹, Stefano Rosa¹ and Ludovico Orlando Russo¹ ¹Politecnico di Torino, IT; ²Scuola Superiore Sant'Anna, The Biorobotics Institute, IT; ³CNR, IEIIT, IT Abstract This work aims at designing a low-cost communication system to allow remote communication among deafblind people, up to now impossible. Due to their lacking of both the auditory and the visive channel, deafblind people can receive feedbacks and mes-sages only resorting on hand-in-hand communication and only from speakers physically located near them. Such limitation ag-gravates their situation and cause deafblind people to live behind a wall of isolation from active society. PARLOMA aims at breaking this wall, developing a tool that can be used by deafblind people to communicate whenever they want and wherever they are. More information ...
UB03.3	FLARE: A RECONFIGURATION AWARE FLOORPLANNER Presenter: Riccardo Cattaneo, Politecnico di Milano, IT Authors: Marco Rabozzi and Marco Santambrogio, Politecnico di Milano, IT Abstract This demonstration presents a floorplanner tool addressing partially-reconfigurable FPGAs. The input of the tool consists of a set of regions described in terms of their heterogeneous resource requirements together with the number of interconnections among regions and the target FPGA of the partial reconfiguration (PR) design. Once the input are specified, the floorplanner allow the designer to manually or automatically perform the floorplan of the regions. More information ...
UB03.4	SMART CELL DEVELOPMENT PLATFORM FOR EMBEDDED BATTERY MANAGEMENT Presenter: Swaminathan Narayanaswamy, TUM CREATE, SG Authors: Matthias Kauer¹, Sebastian Steinhorst¹, Martin Lukasiewycz¹ and Samarjit Chakraborty² ¹TUM CREATE, SG; ²TU Munich, DE Abstract Embedded Battery Management (EBM) [1], in contrast to the existing state-of-the-art centralized Battery Management Systems (BMSs) found in Electric Vehicles (EVs) or stationary Electrical Energy Storage (EES) applications, focuses on monitoring and controlling each individual cell of the battery pack with a dedicated Cell Management Unit (CMU). This novel approach of battery management might offer significant advantages over the centralized BMSs, such as higher modularity, plug-and-play integration and shorter time to market. The combination of a battery cell and a CMU forms the smart cell and the system-level functionalities of the EBM are performed in a decentralized manner by the network of smart cells, with the help of the computational and communication resources of CMUs. We present a development platform for such a smart cell enabled EBM. The development platform consists of two components, the hardware platform and the software platform. The hardware platform of the demonstrator comprises of battery cells and their dedicated CMUs which consist of a smart cell controller board and an active cell balancing board. The software platform provides the smart cell firmware as well as a software tool for verification of active cell balancing architectures and a smart cell simulator for simulating system-level EBM functionalities. More information ...
UB03.5	OSTC: COMBINING HIFSUITE AND SCNSL FOR SMART DEVICE INTEGRATION AND SIMULATION Presenter: Graziano Pravadelli, University of Verona, IT Authors: Alessandro Danese, Franco Fummi, Valerio Guarnieri, Michele Lora, Graziano Pravadelli and Francesco Stefanni, University of Verona, IT Abstract The main design issue of smart devices is their high degree of heterogeneity, due to the simultaneous presence of multiple domains and extra-functional properties, together with the traditional system functionality. This makes design and simulation very challenging, even because heterogeneity implies that the functionality is not the only dimension that must be considered at validation time. Other properties, such as power consumption or thermal dissipation, are critical to ensure correctness of the final product and to correctly estimate its behavior. This makes component integration and simulation key phases in the design and verification process of smart devices. Thus, to efficiently master smart device design, it is fundamental to be aware of design issues and to know how to solve them through innovative tools and methods, which allow integrating all the components of a smart device into an efficient and flexible simulation platform. We addressed such issues by means of the combined use of HIFSuite tools and SCNSL to obtain a homogeneous and fast SystemC/C++ model of a smart device through the compositions of heterogeneous components. An Open Source Test Case (OSTC) has been defined to show the potentiality of the proposed methods and tools. More information ...
UB03.6	IMPLEMENTATIONS OF THE SEMI-GLOBAL MATCHING 3D VISION ALGORITHM FOR AUTOMOTIVE APPLICATIONS Presenter: Affaq Qamar, Politecnico di Torino, IT Author: Luciano Lavagno, Politecnico di Torino, IT Abstract The demo will show our real-time hardware implementations on a Xilinx® ZynqTM System-on-Chip of the Semi-Global Matching (SGM) algorithm, which is frequently used in stereo vision systems, e.g. for automotive applications. We will also compare the quality of results, flexibility and design time that we achieved using both High-Level Synthesis (HLS) and manual RTL design. The use of HLS is particularly promising because the automotive industry is very sensitive to production costs, hence it requires various implementations of the same algorithm, with very different resolutions, costs, and performance levels, for different target market segments. SGM mainly consists of three sequential processing steps which are, (i) cost cube calculation, (ii) path cost computation and (iii) disparity estimation and minimization. The path cost computation further involves processing of pixel wise cost cube data into eight distinct directions. The initial algorithmic "golden" model used very large arrays, which had to be mapped to an external DRAM and brought into the on-chip RAM of the FPGA on demand. This required both adding the memory transfer loops and inserting calls to the AXI transactors that access the DRAM through the on-chip DDR slave. Moreover, the initial single-threaded algorithm had to be parallelized, by converting the top-level sweeps of the image in eight directions into forward and backward passes. Both manual RTL and HLS designs were suitable to achieve the target real-time performance. The design space was thus explored by making several fairly different micro-architectural choices. In the end, it was possible to obtain an implementation which is comparable to the manual RTL design. The authors intend to demonstrate the FPGA based HW implementation of the SGM algorithm (upon permission from the industrial partners) and discuss the HLS flow and comparison strategy. More information ...
UB03.7	RECONFIGURABLE FPGA-BASED NON-INTRUSIVE BERT FOR PRODUCTION TEST Presenter: Sergei Odintsov, Tallinn University of Technology, EE Author: Artjom Jasnetski, Tallinn University of Technology, EE Abstract We introduce an FPGA-based Bit Error Rate (BER) tester solution for high-speed serial links targeting production environment. This solution does not require usage of external T&M equipment or extra DFT. As opposed to intrusive physical probing with external BER tester our approach produces more relevant output because measurement is done using transceivers in their functional mode. Introduced BERT instrument supports fine tuning of link parameters and pattern generation. This solution can replace long lasting BER test by quick evaluation of link quality using eye diagram. More information ...
UB03.8	XTSI: THE 3-D ELECTRO-THERMAL SIMULATOR Presenter: Jürgen Scheible, Reutlingen University, DE Author: Carl Christoph Jung, Reutlingen University, DE Abstract xtSi is a 3D electro-thermal simulation tool for integrated circuits. It uses a computationally efficient algorithm, which allows the simulation of typical ICs in only a few minutes. The temperature distribution is depicted graphically and with temporal resolution in a specially designed graphical user interface. With the help of xtSi designers can exactly identify isotherms and hotspots, thus enabling an optimization of the layout due to temperature effects. xtSi has been verified experimentally for device temperatures exceeding 500 °C up to the onset of thermal runaway. More information ...
UB03.9	ODEN: ASSERTION MINING FOR BEHAVIORAL DESCRIPTIONS Presenter: Alessandro Danese, University of Verona, IT Authors: Alessandro Danese, Tara Ghasempouri and Graziano Pravadelli, University of Verona, IT Abstract Specification mining is an automatic approach for extracting assertions from the implementation of the system under verification (SUV). Its primary goal is to improve the verification and documentation process by making available a matching between a manual definition of the expected functionality and a formalization of the actual implemented functionality. In order to automatically extract assertions, some approaches perform a static analysis of the SUV source code. These solutions, despite of their effectiveness, suffer of scalability problems. To overcome this drawback, dynamic approaches have been also proposed that extract assertions by relying only on the observation of SUV's execution traces. This guarantees a better scalability, even if only "likely true assertions" can be extracted. For this reason a qualification phase is generally implemented in order to discard irrelevant and spurious assertions. In this context, ODEN is a tool for dynamically extracting likely true assertions by combining static and dynamic techniques. ODEN works with both hardware design and software applications. The tools analyses the execution traces of the system under verification and it generates assertions in the form of temporal relationships between arithmetic/logic expressions over the variables of the SUV. With respect to existing tools, ODEN works on a wider range of abstraction levels (e.g., gate-level, RTL, TLM, SW level, ...) and it considers a wider set of temporal patterns to more precisely characterize the behaviours of the SUV. More information ...
17:30	End of session

IP1 Interactive Presentations

Date: Tuesday 10 March 2015
Time: 16:00 - 16:30
Location / Room: Exhibition Area

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. At the end of each afternoon Interactive Presentations session the award 'Best IP of the Day' is given.

Label	Presentation Title Authors
IP1-1	HIGH-RESOLUTION ONLINE POWER MONITORING FOR MODERN MICROPROCESSORS Speakers: Fabian Oboril, Jos Ewert and Mehdi Tahoori, Karlsruhe Institute of Technology, DE Abstract The power consumption of computing systems is nowadays a major design constraint that affects performance and reliability. To co-optimize these aspects, fine-grained adaptation techniques at runtime are of growing importance. However, to use these tools efficiently, fine-grained information about the power consumption of various on-chip components at runtime is required. Therefore, here we propose a novel software-implemented high-resolution (spatial and temporal) power monitoring approach that relies on micro-models to estimate the power consumption of all microarchitectural components inside a processor core. Combined with a self-calibration technique that uses an available on-chip power sensor, our power estimation approach can achieve an accuracy of more than 99 % and provides deep insights about the power dissipation inside a processor core during workload execution. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-2	REDUCING ENERGY CONSUMPTION IN MICROCONTROLLER-BASED PLATFORMS WITH LOW DESIGN MARGIN CO-PROCESSORS Speakers: Andres Gomez¹, Christian Pinto², Andrea Bartolini³, Davide Rossi², Hamed Fatemi⁴, Jose Pineda de Gyvez⁴ and Luca Benini⁵ ¹Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ²Università di Bologna, IT; ³Università di Bologna, IT / ETH Zürich, CH; ⁴NXP Semiconductors, NL; ⁵Università di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), IT Abstract Advanced energy minimization techniques (i.e. DVFS, Thermal Management, etc) and their high-level HW/SW requirements are well established in high-throughput multi-core systems. These techniques would have an intolerable overhead in low-cost, performance-constrained microcontroller units (MCU's). These devices can further reduce power by operating at a lower voltage, at the cost of increased sensitivity to PVT variation and increased design margins. In this paper, we propose an runtime environment for next-generation dual-core MCU platforms. These platforms complement a single-core with a low area overhead, reduced design margin shadow-processor. The runtime decreases the overall energy consumption by exploiting design corner heterogeneity between the two cores, rather than increasing the throughput. This allows the platform's power envelope to be dynamically adjusted to application-specific requirements. Our simulations show that, depending on the ratio of core to platform energy, total energy savings can be up to 20%. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-3	DE-ELASTISATION: FROM ASYNCHRONOUS DATAFLOWS TO SYNCHRONOUS CIRCUITS Speakers: Mahdi Jelodari Mamaghani, Jim Garside and Doug Edwards, University of Manchester, GB Abstract Whilst asynchronous VLSI programming provides a flexible abstract formalism to realise concurrent systems, the resulting performance is still an issue when adapting the flow in the industrial context. The asynchronous design paradigm provides `elasticity' which enables the system to tolerate delays in communication and computation; the drawback is that it imposes a communication overhead to the system which becomes prohibitively expensive when applied at a fine-grained level. This paper proposes a 'de-elastisation' technique in a CAD flow for asynchronous dataflow networks to improve the circuits' performance and area. To preserve the architectural advantages of asynchronous design (e.g. short cycles) the type of circuits are classified into blocking and non-blocking loops upon which our de-elastisation scheme relies. The technique is incorporated in the Teak CAD flow. Experimental results on several substantial case studies show significant performance and area improvement. This work shows 3x improvement for the first category of circuits, suitable for iterative realisations and DSP-like architectures and 4x for the second category which are suitable for concurrent realisations. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-4	AUTOMATED FEATURE LOCALIZATION FOR DYNAMICALLY GENERATED SYSTEMC DESIGNS Speakers: Jannis Stoppe¹, Robert Wille¹ and Rolf Drechsler² ¹University of Bremen, DE; ²University of Bremen/DFKI GmbH, DE Abstract Due to the large complexity of today's circuits and systems, all components e.g. in a System of Chip (SoC) cannot be designed from scratch anymore. As a consequence, designers frequently work on components which they did not create themselves and, hence, design understanding becomes a critical issue. Approaches for feature localization may help here by pinpointing to distinguished characteristics of a design. However, existing approaches for feature localization of SoCs mainly focused on the Register Transfer Level; existing solutions for the Electronic System Level (using languages such as SystemC) have severe limits. In this work, we propose an approach for advanced feature localization in SystemC designs. By this, we overcome main limitations of previously proposed solutions, in particular the missing support for dynamic descriptions, while keep the proposed solution as non-intrusive as possible. The benefits of the proposed approach are confirmed by means of a case study. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-5	INDUCTOR OPTIMIZATION FOR ACTIVE CELL BALANCING USING GEOMETRIC PROGRAMMING Speakers: Matthias Kauer¹, Swaminathan Narayanaswamy¹, Martin Lukasiewycz¹, Sebastian Steinhorst¹ and Samarjit Chakraborty² ¹TUM CREATE, SG; ²TU Munich, DE Abstract This paper proposes an optimization methodology for inductor components in active cell balancing architectures of electric vehicle battery packs. For this purpose, we introduce a new mathematical model to quantitatively describe the charge transfer of a family of inductor-based circuits. Utilizing worst case assumptions, this model yields a nonlinear program for designing the inductor and selecting the transfer current. In the next step, we transform this problem into a geometric program that can be efficiently solved. The optimized inductor reduces energy dissipation by at least 20% in various scenarios compared to a previous approach which selected an optimal off-the-shelf inductor. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-6	LIGHTWEIGHT AUTHENTICATION FOR SECURE AUTOMOTIVE NETWORKS Speakers: Philipp Mundhenk¹, Sebastian Steinhorst¹, Martin Lukasiewycz¹, Suhaib A. Fahmy² and Samarjit Chakraborty³ ¹TUM CREATE, SG; ²School of Computer Engineering, Nanyang Technological University, SG; ³TU Munich, DE Abstract We propose a framework to bridge the gap between secure authentication in automotive networks and on the internet. Our proposed framework allows runtime key exchanges with minimal overhead for resource-constrained in-vehicle networks. It combines symmetric and asymmetric cryptography to establish secure communication and enable secure updates of keys and software throughout the lifetime of the vehicle. For this purpose, we tailor authentication protocols for devices and authorization protocols for streams to the automotive domain. As a result, our framework natively supports multicast and broadcast communication. We show that our lightweight framework is able to initiate secure message streams over 15 times faster than conventional frameworks, for the first time meeting the real-time requirements of automotive networks. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-7	MINIMIZING THE NUMBER OF PROCESS CORNER SIMULATIONS DURING DESIGN VERIFICATION Speakers: Michael Shoniker, Bruce Cockburn, Jie Han and Witold Pedrycz, University of Alberta, CA Abstract Integrated circuit designs need to be verified in simulation over a large number of process corners that represent the expected range of transistor properties, supply voltages, and die temperatures. Each process corner can require substantial simulation time. Unfortunately, the required number of corners has been growing rapidly in the latest semiconductor technologies. We consider the problem of minimizing the required number of process corner simulations by iteratively learning a model of the output functions in order to confidently estimate key maximum and/or minimum properties of those functions. Depending on the output function, the required number of corner simulations can be reduced by factors of up to 95%. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-8	AN APPROXIMATE VOTING SCHEME FOR RELIABLE COMPUTING Speakers: Ke Chen¹, Jie Han² and Fabrizio Lombardi¹ ¹Northeastern University, US; ²University of Alberta, CA Abstract This paper relies on the principles of inexact computing to alleviate the issues arising in static masking by voting for reliable computing. A scheme that utilizes approximate voting is proposed; it is referred to as inexact double modular redundancy (IDMR). IDMR does not resort to triplication, thus saving overhead due to modular replication; moreover, this scheme is adaptive in its operation, i.e., it allows a threshold to determine the validity of the module outputs. IDMR operates by initially establishing the difference between the values of the outputs of the two modules; only if the difference is below a preset threshold, then the voter calculates the average value of the two module outputs. An extensive analysis of the voting circuits and an application to image processing are presented. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-9	FLINT: LAYOUT-ORIENTED FPGA-BASED METHODOLOGY FOR FAULT TOLERANT ASIC DESIGN Speakers: Rochus Nowosielski, Lukas Gerlach, Stephan Bieband, Guillermo Paya-Vaya and Holger Blume, Leibniz Universität Hannover, Institute of Microelectronic Systems, DE Abstract Research of efficient fault tolerance techniques for digital systems requires insight into the fault propagation mechanism inside the ASIC design. Radiation, high temperature, or charge sharing effects in ultra-deep submicron technologies influence fault generation and propagation dependent on die location. The proposed methodology links efficient fault injection to fault propagation in the floorplan view of a standard cell ASIC. This is achieved by instrumentation of the gate netlist after place&route, emulation in an FPGA system and experiment control via interactive user interface. Further, automated fault injection campaigns allow exhaustive fault tolerance evaluations taking single faults as well as adjacent cell faults into account. The proposed methodology can be used to identify vulnerable cell nodes in the design and allow the classification of placement strategies of fault tolerant ASIC designs. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-10	A UNIFIED HARDWARE/SOFTWARE MPSOC SYSTEM CONSTRUCTION AND RUN-TIME FRAMEWORK Speakers: Sam Skalicky¹, Andrew Schmidt², Matthew French² and Sonia Lopez¹ ¹Rochester Institute of Technology, US; ²USC/ISI, US Abstract With the continual enhancement of heterogeneous resources in FPGA devices, utilizing these resources becomes a challenging burden for developers. Especially with the inclusion of sophisticated multiple processor system-on-chips, the necessary skill set to effectively leverage these resources spans both hardware and software expertise. The maturation of high level synthesis tools and programming languages aim to alleviate these complexities, yet there still exist systematic gaps that must be bridged to provide a more cohesive hardware/software development environment. High level MPSoC design initiatives such as Redsharc have reduced the costs of entry, simplifying application implementation. We propose a unified hardware/software framework for system construction, leveraging Redsharc's APIs, efficient on-chip interconnects, and run-time controllers. We present system level abstractions that enable compilation and implementation tools for hardware and software to be merged into a single configurable system development environment. Finally, we demonstrate our proposed framework with Redsharc, using AES encryption/decryption spanning software implementations on ARM and MicroBlaze processors and hardware kernels. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-11	(AS)^2: ACCELERATOR SYNTHESIS USING ALGORITHMIC SKELETONS FOR RAPID DESIGN SPACE EXPLORATION Speakers: Shakith Fernando¹, Mark Wijtvliet¹, Cedric Nugteren¹, Akash Kumar² and Henk Corporaal³ ¹Eindhoven University of Technology, NL; ²National University of Singapore, SG; ³TU/e (Eindhoven University of Technology), NL Abstract Hardware accelerators in heterogeneous multiprocessor system-on-chips are becoming popular as a means of meeting performance and energy efficiency requirements of modern embedded systems. Current design methods for accelerator synthesis, such as High-Level Synthesis, are not fully automated. Therefore, time consuming manual iterations are required to explore efficient accelerator alternatives: the programmer is still required to think in terms of the underlying architecture. In this paper, we present (AS)^2: a design flow for Accelerator Synthesis using Algorithmic Skeletons. Skeletonization separates the structure of a parallel computation from an algorithms' functionality, enabling efficient implementations without requiring the programmer to have hardware knowledge. We define three such skeletons (for three image processing kernels), enabling FPGA specific parallelization techniques and optimizations. As a case study, we present a design space exploration of these skeletons and show how multiple design points with area-performance trade-offs, for the accelerators, can be efficiently and rapidly synthesized. We show that (AS)^2 is a promising direction for accelerator synthesis as it generates a Pareto front of 8 design points in under half an hour, for each of the three image processing kernels. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-12	ASSISTED GENERATION OF FRAME CONDITIONS FOR FORMAL MODELS Speakers: Philipp Niemann, Frank Hilken, Martin Gogolla and Robert Wille, University of Bremen, DE Abstract Modeling languages such as UML or SysML allow for the validation and verification of the structure and the behavior of designs even in the absence of a specific implementation. However, formal models inherit a severe drawback: Most of them hardly provide a comprehensive and determinate description of transitions from one system state to another. This problem can be addressed by additionally specifying so-called frame conditions. However, only naive "workarounds" based on trivial heuristics or completely relying on a manual creation have been proposed for their generation thus far. In this work, we aim for a solution which neither leaves the burden of generating frame conditions entirely on the designer (avoiding the introduction of another time-consuming and expensive design step) nor is completely automatic (which, due to ambiguities, is not possible anyway). For this purpose, a systematic design methodology for the assisted generation of frame conditions is proposed. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-13	TOWARDS A META-LANGUAGE FOR THE CONCURRENCY CONCERN IN DSLS Speakers: Julien Deantoni¹, Papa Issa Diallo², Ciprian Teodorov², Joel Champeau² and Benoit Combemale³ ¹I3S, University of Nice Sophia Antipolis, FR; ²Lab-STICC - ENSTA Bretagne, FR; ³IRISA, Universty of Rennes1, FR Abstract Concurrency is of primary interest in the development of complex software-intensive systems, as well as the deployment on modern platforms. Furthermore, Domain-Specific Languages (DSLs) are increasingly used in industrial processes to separate and abstract the various concerns of complex systems. % However, reifying the definition of the DSL concurrency remains a challenge. This not only prevents leveraging the concurrency concern of a particular domain or platform, but it also hinders: (1) the development of a complete understanding of the DSL semantics; (2) the effectiveness of concurrency-aware analysis techniques; (3) the analysis of the deployment on parallel architectures. % In this paper, we introduce the key ideas leading toward MoCCML, a dedicated meta-language for formally specifying the concurrency concern within the definition of a DSL. The concurrency constraints can reflect the knowledge in a particular domain, but also the constraints of a particular platform. MoCCML comes with a complete language workbench to help a DSL designer in the definition of the concurrency directly within the concepts of the DSL itself, and a generic workbench to simulate and analyze any model conforming to this DSL. % MoCCML is illustrated on the definition of an lightweight extension of SDF (Synchronous Data Flow). Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-14	FAST AND ACCURATE BRANCH PREDICTOR SIMULATION Speakers: Antoine Faravelon, Nicolas Fournel and Frédéric Pétrot, TIMA Laboratory, Université de Grenoble-Alpes/CNRS, FR Abstract Embedded processors complexity has raised dramatically, due to the addition of architectural add-ons which improve performances significantly. High level models used in system simulation usually ignore these additions as the major issue is functional correctness. However, accurate estimates of software execution is sometimes required, therefore we focus in this paper on one of theses architectural features, the branch predictor. Unfortunately, advanced branch predictors use large tables, and a direct implementation of the scheme slows down simulation dramatically. To limit the simulation overhead, we define a modeling approach that we demonstrate on a state-of-the art predictor. We implemented the model in a dynamic binary translation based instruction set simulator and measured an accuracy of prediction of about 95% for a run-time overhead inferior to 5%. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-15	COMPARATIVE STUDY OF TEST GENERATION METHODS FOR SIMULATION ACCELERATORS Speakers: Wisam Kadry¹, Dimtry Krestyashyn¹, Arkadiy Morgenshtein¹, Amir Nahir¹, Vitali Sokhin¹, Jae Cheol Son², Wookyeong Jeong², Sung-Boem Park² and Jin Sung Park² ¹IBM Research - Haifa, IL; ²Samsung, KR Abstract Hardware-accelerated simulation platforms are quickly becoming a major vehicle for the functional verification of modern systems and processors. Accelerator platforms provide functional verification with valuable simulation cycles. Yet, the high cost and limited bandwidth of accelerator platforms dictate a requirement for continuous utilization improvement. In this work, we perform a comparative analysis of two approaches of test generation for accelerator platforms. An exerciser tool is used as experimental vehicle for the study. An off-platform test generation methodology is implemented and is compared to on-platform test generation typically used in exercisers. We present experimental results from simulation of latest IBM POWER8 processor on Awan accelerator platform, as well as from simulation of an eight-core ARMv8-based design on Veloce emulation platform. Our results indicate that the utilization of accelerator platforms can be improved by up to ×7 ratio when using off-platform test generation. In addition, increase of up to 24% is observed in test coverage. Off-platform mode features significantly bigger image size, but maintains tolerable build and load times. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-16	USING STRUCTURAL RELATIONS FOR CHECKING COMBINATIONALITY OF CYCLIC CIRCUITS Speakers: Wan-Chen Weng¹, Yung-Chih Chen², Jui-Hung Chen¹, Ching-Yi Huang¹ and Chun-Yao Wang¹ ¹National Tsing Hua University, TW; ²Yuan Ze University, TW Abstract Functionality and combinationality are two main issues that have to be dealt with in cyclic combinational circuits, which are combinational circuits containing loops. Cyclic circuits are combinational if nodes within the circuits have definite values under all input assignments. For a cyclified circuit, we have to check whether it is combinational or not. Thus, this paper proposes an efficient two-stage algorithm to verify the combinationality of cyclic circuits. A set of cyclified IWLS 2005 benchmarks are performed to demonstrate the efficiency of the proposed algorithm. Compared to the state-of-the-art algorithm, our approach has a speedup of about 4000 times on average. Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-17	NFRS EARLY ESTIMATION THROUGH SOFTWARE METRICS Speakers: Andrws Vieira¹, Pedro Faustini¹, Luigi Carro² and Erika Cota¹ ¹Federal University of Rio Grande do Sul (UFRGS), BR; ²Federal University of Rio Grande do Sul (UFRGS), Abstract We propose the use of regression analysis to generate accurate predictive models for physical metrics using design metrics as input. We validate our approach with 40+ implementations of three systems in two development scenarios: system evolution and first design. Results show maximum prediction errors of 1.66% during system evolution. In a first design scenario, the average error is 15% with the maximum error still below 20% for all physical metrics. This approach provides a fast and accurate strategy to boost embedded software productivity and quality, by estimating Non-Functional Requirements (NFRs) during the first design stages. Download Paper (PDF; Only available from the DATE venue WiFi)

4.1 Executive Panel - Trends and Challenges in Today's Automotive Semiconductors

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Salle Oisans

Organiser:
Yervant Zorian, Fellow & Chief Architect, Synopsys, US

Moderator:
Chris Edwards, Tech Design Forum / E&T, GB

While the new chips in the automotive industry keep growing both in functionality and numbers, the complexity level and robustness requirements remain crucial, as always, given their safety critical application. The speakers in this executive session will address the current trends and challenges in the automotive semiconductor industry.

Panelists:

Andreas Brüning, ZMDI, DE
Maurizio Peri, STMicroelectronics, IT
Jean-Marie Saint-Paul, Mentor, FR
Frank Schirrmeister, Cadence Design Systems, US

18:30	End of session

4.2 Implementation and Verification of Security Components

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Belle Etoile

Chair:
Francesco Regazzoni, AlaRI, CH

Co-Chair:
Georg Becker, RUB, DE

System designers need secure building blocks for robust security devices. This session presents novel implementation and verification strategies for hardware circuits, post-quantum cryptography schemes and true random number generators.

Time	Label	Presentation Title Authors
17:00	4.2.1	PRIVACY-PRESERVING FUNCTIONAL IP VERIFICATION UTILIZING FULLY HOMOMORPHIC ENCRYPTION Speakers: Charalambos Konstantinou¹ and Michail Maniatakos² ¹New York University Polytechnic School of Engineering, US; ²New York University Abu Dhabi, AE Abstract Intellectual Property (IP) verification is a crucial component of System-on-Chip (SoC) design in the modern IC design business model. Given a globalized supply chain and an increasing demand for IP reuse, IP theft has become a major concern for the IC industry. In this paper, we address the trust issues that arise between IP owners and IP users during the functional verification of an IP core. Our proposed scheme ensures the privacy of IP owners and users, by a) generating a privacy-preserving version of the IP, which is functionally equivalent to the original design, and b) employing homomorphically encrypted input vectors. This allows the functional verification to be securely outsourced to a third-party, or to be executed by either parties, while revealing the least possible information regarding the test vectors and the IP core. Experiments on both combinational and sequential benchmark circuits demonstrate up to three orders of magnitude IP verification slowdown, due to the computationally intensive fully homomorphic operations, for different security parameter sizes. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	4.2.2	EFFICIENT SOFTWARE IMPLEMENTATION OF RING-LWE ENCRYPTION Speakers: Ruan de Clercq, Sujoy Sinha Roy, Frederik Vercauteren and Ingrid Verbauwhede, KU Leuven - COSIC, BE Abstract Present-day public-key cryptosystems such as RSA and Elliptic Curve Cryptography (ECC) will become insecure when quantum computers become a reality. This paper presents the new state of the art in efficient software implementations of a post-quantum secure public-key encryption scheme based on the ring-LWE problem. We use a 32-bit ARM Cortex-M4F microcontroller as the target platform. Our contribution includes optimization techniques for fast discrete Gaussian sampling and efficient polynomial multiplication. Our implementation beats all known software implementations of ring-LWE encryption by a factor of at least 7. We further show that our scheme beats ECC-based public-key encryption schemes by at least one order of magnitude. At medium-term security we require 121166 cycles per encryption and 43324 cycles per decryption, while at a long-term security we require 261939 cycles per encryption and 96520 cycles per decryption. Gaussian sampling is done at an average of 28.5 cycles per sample. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	4.2.3	EMBEDDED HW/SW PLATFORM FOR ON-THE-FLY TESTING OF TRUE RANDOM NUMBER GENERATORS Speakers: Bohan Yang¹, Vladimir Rozic¹, Nele Mentens¹, Wim Dehaene² and Ingrid Verbauwhede³ ¹ESAT/COSIC and iMinds, KU Leuven, BE; ²ESAT-MICAS, KU Leuven, BE; ³KU Leuven - COSIC, BE Abstract We present a HW/SW platform for on-the-fly detection of failures and weaknesses in entropy sources. By splitting the operations between hardware and software, we achieve sufficient flexibility to control the level of significance of the tests. This approach also enables sharing resources between different tests thereby reducing the area and power. Statistical tests were selected from the NIST test suite. We propose several versions of hardware co-processors for monitoring random bit sequences, ranging from 52 slices (5 tests) to 552 slices (9 tests) on Spartan-6 FPGA. We are the first to provide implementations of the Serial test and the Approximate entropy test for on-the-fly monitoring. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP2-1, 834	COMPARISON OF MULTI-PURPOSE CORES OF KECCAK AND AES Speakers: Panasayya Yalla, Ekawat Homsirikamol and Jens-Peter Kaps, George Mason University, US Abstract Most widely used security protocols, Internet Protocol Security (IPSec), Secure Socket Layer (SSL), and Transport Layer Security (TLS), provide several cryptographic services which in turn require multiple dedicated cryptographic algorithms. A single cryptographic primitive for all secret key functions utilizing different mode of operations can overcome this constraint. This paper investigates the possibility of using AES and Keccak as the underlying primitives for high-speed and resource constrained applications. Even though a plain AES implementation is typically much smaller and has a better throughput to area ratio than a plain Keccak, adding additional cryptographic services changes the results dramatically. Our multi-purpose Keccak outperforms our multi-purpose AES by a factor of 4 for throughput over area on average. This underlines the flexibility of the Keccak Sponge and Duplex functions. Our multi-purpose Keccak achieves a throughput of 23.2 Gbps in AE-mode (Keyak) on a Xilinx Virtex-7 and 28.7 Gbps on a Altera Stratix-IV. In order to study this further we also implemented two versions of a dedicated Keyak and dedicated AES-GCM. Our dedicated Keyak implementation outperforms our dedicated AES-GCM on average by a factor 6 in terms of throughput over area reaching a throughput of 28.9 Gbps and 4.1 Gbps respectively on a Xilinx Virtex-7. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

4.3 Multi-/Manycore Scheduling

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Stendhal

Chair:
Luciano Lavagno, Politechnico di Torino, IT

Co-Chair:
Aviral Shrivastava, Arrizona State University, US

This session tackles various issues in realistic, complex task scheduling/assignment methods in 2D and 3D multi/many-core systems. The first talk introduces an intra/inter-cores switching method in multi-core scheduling problem for efficient power saving under throughput constraint. The second talk studies thermal-pattern-aware task assignment for 3D multi-core processors, where hotspot is a critical issue, for improving reliability and lifetime. The third talk effectively combines logic solver and background theory solver to synthesize satisfiability modulo theories (SMT)-based systems.

Time	Label	Presentation Title Authors
17:00	4.3.1	AN ONLINE THERMAL-CONSTRAINED TASK SCHEDULER FOR 3D MULTI-CORE PROCESSORS Speakers: Chien-Hui Liao¹, Hung-Pin Wen¹ and Krishnendu Chakrabarty² ¹National Chiao Tung University, TW; ²Duke University, US Abstract Hotspots occur frequently in 3D multi-core processors (3D-MCPs) and they can adversely impact system reliability and lifetime. Moreover, frequent occurrences of hotspots lead to more dynamic voltage and frequency scaling (DVFS), resulting in degraded throughput. Therefore, a new thermal-constrained task scheduler based on thermal-pattern-aware voltage assignment (TPAVA) is proposed in this paper. By analyzing temperature profiles of different voltage assignments, TPAVA pre-emptively assigns different operating-voltage levels to cores for reducing temperature increase in 3D-MCPs. Moreover, the proposed task scheduler integrates a vertical-grouping voltage scaling (VGVS) strategy that considers thermal correlation in 3D-MCPs. Experimental results show that, compared with two previous methods, the proposed task scheduler can respectively lower hotspot occurrences by 47.13% and 53.91%, and improve throughput by 6.50% and 32.06%. As a result, TPAVA and VGVS are effectively for reducing occurrences of hotspots and optimizing throughput for 3D-MCPs under thermal constraints. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	4.3.2	A SYMBOLIC SYSTEM SYNTHESIS APPROACH FOR HARD REAL-TIME SYSTEMS BASED ON COORDINATED SMT-SOLVING Speakers: Alexander Biewer¹, Benjamin Andres², Jens Gladigau¹, Torsten Schaub² and Christian Haubelt³ ¹Robert Bosch GmbH, DE; ²University of Potsdam, DE; ³University of Rostock, DE Abstract We propose an SMT-based system synthesis approach where the logic solver performs static binding and routing while the background theory solver computes global time-triggered schedules. In contrast to previous work, we assign additional time to the logic solver in order to refine the binding and routing such that the background theory solver is more likely to find a feasible schedule within a reasonable amount of time. We show by experiments that this coordination of the two solvers results in a considerable reduction of the overall synthesis time. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	4.3.3	E-PIPELINE: ELASTIC HARDWARE/SOFTWARE PIPELINES ON A MANY-CORE FABRIC Speakers: Xi Zhang¹, Haris Javaid¹, Muhammad Shafique², Jorgen Peddersen¹, Joerg Henkel² and Sri Parameswaran¹ ¹University of New South Wales, AU; ²Karlsruhe Institute of Technology (KIT), DE Abstract On-chip many-core systems are expected to be in common use in the future. A set of homogeneous processors in a many-core system can be used to implement multiple pipelines which execute simultaneously. Pipelines of processors use varying numbers of cores when their workloads vary at run time. In this paper, we show how such a system executing multiple pipelines with varying workloads can be implemented. We further show how the system can switch cores within a pipeline (intra-elasticity) and between pipelines (inter-elasticity). The method is named E-pipeline, and is implemented and evaluated in a commercial tool suite. Compared to reference design methods with clock gating, E-pipeline achieves the same power savings, maintains the throughput to meet throughput constraints and reduces core usage by an average of 37.7%. The adaptation overhead for switching cores is approximately 2 us. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

4.4 Exploring Reliability and Efficiency Tradeoffs at the Architectural Level

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Chartreuse

Chair:
Todd Austin, University of Michigan, US

Co-Chair:
Gunar Schirner, Northeastern University, US

This session targets architectural solutions for energy-efficient and reliable memories and processors.

Time	Label	Presentation Title Authors
17:00	4.4.1	SOFT-ERROR RELIABILITY AND POWER CO-OPTIMIZATION FOR GPGPUS REGISTER FILE USING RESISTIVE MEMORY Speakers: Jingweijia Tan¹, Zhi Li² and Xin Fu¹ ¹University of Houston, US; ²University of Kansas, US Abstract The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g. spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this paper, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance loss. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 60% energy savings with negligible (e.g. 4%) performance loss. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	4.4.2	ENERGY-EFFICIENT CACHE DESIGN IN EMERGING MOBILE PLATFORMS: THE IMPLICATIONS AND OPTIMIZATIONS Speakers: Kaige Yan and Xin Fu, University of Houston, US Abstract Mobile devices are quickly becoming the most widely used processors in consumer devices. Since their major power supply is battery, the energy-efficient computing is highly desired. In this paper, we focus on the energy-efficient cache design in emerging mobile platforms. We observe that more than 40% of L2 cache accesses are OS kernel accesses in interactive smartphone applications. Such frequent kernel accesses cause serious interferences between the user and kernel blocks in the L2 cache, leading to the unnecessary block replacements and high L2 cache miss rate. We propose to partition the L2 cache into two separate segments which can only be accessed by the user code and kernel code, respectively. Meanwhile, the overall size of the two segments is shrunk, which greatly reduces the energy consumption by 15% while still maintains the similar cache miss rate. We further find completely different access behaviors between the two separated kernel and user segments in our novel L2 cache design, and explore the multi-retention STT-RAM based user and kernel segments to maximize the cache energy savings. The experimental results show that our techniques significantly reduce the cache energy consumption (e.g. 75%) with only 2% performance loss in emerging smartphones. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	4.4.3	EXPLOITING DYNAMIC TIMING MARGINS IN MICROPROCESSORS FOR FREQUENCY-OVER-SCALING WITH INSTRUCTION-BASED CLOCK ADJUSTMENT Speakers: Jeremy Constantin¹, Lai Wang², Georgios Karakonstantis³, Anupam Chattopadhyay² and Andreas Burg¹ ¹École Polytechnique Fédérale de Lausanne (EPFL), CH; ²RWTH Aachen, DE; ³Queen's University, GB Abstract Static timing analysis provides the basis for setting the clock period of a microprocessor core, based on its worst-case critical path. However, depending on the design, this critical path is not always excited and therefore dynamic timing margins exist that can theoretically be exploited for the benefit of better speed or lower power consumption (through voltage scaling). This paper introduces predictive instruction-based dynamic clock adjustment as a technique to trim dynamic timing margins in pipelined microprocessors. To this end, we exploit the different timing requirements for individual instructions during the dynamically varying program execution flow without the need for complex circuit-level measures to detect and correct timing violations. We provide a design flow to extract the dynamic timing information for the design using post-layout dynamic timing analysis and we integrate the results into a custom cycle-accurate simulator. This simulator allows annotation of individual instructions with their impact on timing (in each pipeline stage) and rapidly derives the overall code execution time for complex benchmarks. The design methodology is illustrated at the microarchitecture level, demonstrating the performance and power gains possible on a 6-stage OpenRISC in-order general purpose processor core in a 28 nm CMOS technology. We show that employing instruction-dependent dynamic clock adjustment leads on average to an increase in operating speed by 38% or to a reduction in power consumption by 24%, compared to traditional synchronous clocking, which at all times has to respect the worst-case timing identified through static timing analysis. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	4.4.4	VARIABILITY-AWARE DARK SILICON MANAGEMENT IN ON-CHIP MANY-CORE SYSTEMS Speakers: Muhammad Shafique¹, Dennis Gnad¹, Siddharth Garg² and Joerg Henkel¹ ¹Karlsruhe Institute of Technology (KIT), DE; ²University of Waterloo, CA Abstract Dark Silicon refers to the constraint that only a fraction of on-chip resources (cores) can be simultaneously powered-on (running at full performance) in order to stay within the allowable power budget and safe temperature limits, while others remain 'dark'. In this paper, we demonstrate how these 'dark cores' can be leveraged to improve the temperature profile at run-time, thus providing opportunities to power-on more cores at the nominal voltage than the number allowed when strictly obeying the conventional Thermal Design Power (TDP) constraint. In this paper, we propose a computationally efficient dark silicon management technique that determines the best set of cores to keep dark and the mapping of threads to cores at run-time, while also accounting for the impact of process variations. We have developed a light-weight temperature prediction mechanism that determines the impact of different candidate solutions on the chip thermal profile. Experimental evaluation of the proposed techniques on a simulated 8×8 many-core processor, and across a range of chips to account for process variations, show that the total instruction throughput is increased by 1.8× on average while keeping the temperature within the safe limits, when compared with state-of-the-art approaches. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

4.5 Industrial Test and Validation Experiments

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Meije

Chair:
Dan Alexandescu, iRoC, FR

Co-Chair:
Emmanuel Simeu, TIMA, FR

This session introduces test and validation industrial experiments. Each experiment addresses the challenges of system validation and test and shows lessons learned from industry

Time	Label	Presentation Title Authors
17:00	4.5.1	SYSTEMATIC APPLICATION OF THE ISO 26262 ON A SEOOC SUPPORT BY APPLYING A SYSTEMATIC REUSE APPROACH Speakers: Alejandra Ruiz¹, Alberto Melzi² and Tim Kelly³ ¹TECNALIA, ES; ²Centro Ricerche FIAT, IT; ³University of York, GB Abstract Automotive domain is suffering a huge transformation on the sector. The full electric vehicle is playing a role in updating the electronic systems on the car. The Electric parking system is one of those systems that are being evolving. On the other hand the entrance of the ISO 26262 [1] functional safety standard has impacted on the automotive designs. The ISO 26262 does include The Safety Element out of Context (SEooC) on the standard. However it does not mention how to follow a systematic process in order to apply the SEooC. On this paper we present our experience on the application of the SEooC concept from the ISO 26262 to an Electric parking system. We have followed a systematic approach that takes into account the needs for a safe reuse of the elements into the whole vehicle. Download Paper (PDF; Only available from the DATE venue WiFi)
17:15	4.5.2	TIMING ANALYSIS OF AN AVIONICS CASE STUDY ON COMPLEX HARDWARE/SOFTWARE PLATFORMS Speakers: Franck Wartel¹, Leonidas Kosmidis², Adriana Gogonel³, Andrea Baldovin⁴, Zoe Stephenson⁵, Benoit Triquet¹, Eduardo Quinones⁶, Code Lo³, Enrico Mezzetti⁷, Ian Broster⁵, Jaume Abella⁸, Liliana Cucu-Grosjean³, Tullio Vardanega⁴ and Francisco Cazorla⁹ ¹Airbus, FR; ²Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; ³INRIA, FR; ⁴University of Padova, IT; ⁵Rapita Systems, Ltd., GB; ⁶Barcelona Supercomputing Center, ES; ⁷University of Padua, IT; ⁸Barcelona Supercomputing Center (BSC-CNS), ES; ⁹Barcelona Supercomputing Center and IIIA-CSIC, ES Abstract Probabilistic Timing Analysis (PTA) in general and its measurement-based variant called MBPTA in particular have been shown to facilitate the estimation of the worst-case execution time (WCET). MBPTA relies on specific hardware and software support to randomise and/or upper bound a number of sources of execution time variation to drastically reduce the need for user-provided information, thus replacing uncertainty by probabilities. MBPTA has been shown effective in real case studies for specific single-core processor designs. However, some other hardware features and the advent of multicores challenge MBPTA application in industrial-size programs. While solutions to those challenges have been proven on benchmarks, they have not been proven yet on real-world applications, whose timing analysis is far more challenging than that of simple benchmarks. This paper discusses the application of MBPTA to a real avionics system in the context of (1) software-only single-core solutions and (2) hardware-only multicore solutions with an ARINC 653 operating system. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	4.5.3	SILICON PROOF OF THE INTELLIGENT ANALOG IP DESIGN FLOW FOR FLEXIBLE AUTOMOTIVE COMPONENTS Speakers: Torsten Reich, H. D. Benjamin Prautsch, Uwe Eichler and René Buhl, Fraunhofer Institute for Integrated Circuits IIS, Design Automation Division EAS, DE Abstract In this brief paper we present the successful silicon validation of the Intelligent Analog IP (IIP) design flow applied to the design of a SMART sensor IC for automotive requirements. Using a library of reconfigurable and robust analog IP we fast create parameterized cells up to high complexity levels including the corresponding layouts. This allows us (1) to overcome time-consuming handcrafted analog re-design cycles, (2) to include the effects of layout parasitics into the optimization loop, and thus (3) to fast achieve different specifications even for multiple technologies. We show that the IIP design flow leads to a strong improvement of design efficiency, silicon performance, and yield. Download Paper (PDF; Only available from the DATE venue WiFi)
17:45	4.5.4	FAST OPTICAL SIMULATION FROM A REDUCED SET OF IMPULSE RESPONSES USING SYSTEMC AMS Speakers: Fabien Teysseyre¹, David Navarro¹, Ian O'Connor¹, Francesco Cascio², Fabio Cenni² and Olivier Guillaume² ¹École Centrale de Lyon, FR; ²STMicroelectronic, FR Abstract In this paper we propose a methodology to simulate the optical filtering system of a camera module with limited access to proprietary data. The target of the simulation is the virtual prototyping of the overall camera module for a fine tuning of the auto-focus mechanism. For the optical system modeling, the methodology is based on the usage of some point spread functions (PSFs). The use of the full set of PSFs is computationally costly and memory space consuming hence compromising the usability of the optical model in the full system virtual prototyping. To improve the model execution time, PSFs interpolation and free-space propagation techniques are used: they allow reducing the sampling space with minimal impact on the accuracy of the model (sharpness error less than 2%). The total speed-up gain with respect to the standard non-optimized model is provided by two contributors. First, the interpolation technique leads to a speed-up linked to the PSFs number reduction. Second, the caching of computationally intense processes enables speed-up scaling with the number of frames. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	4.5.5	DESIGNER-LEVEL VERIFICATION -- AN INDUSTRIAL EXPERIENCE STORY Speakers: Stephen Bergman¹, Gabor Bobok¹, Walter Kowalski¹, Shlomit Koyfman², Shiri Moran², Ziv Nevo², Avigail Orni², Viresh Paruthi¹, Wolfgang Roesner¹, Gil Shurek² and Vasantha Vuyyuru¹ ¹IBM, US; ²IBM, IL Abstract Designer-level verification (DLV) is now widely accepted as a necessary practice in the hardware industry. More than ever, logic designers are held responsible for the initial validation of modules they develop, before these are released to systematic verification. DLV requires specific tools and methods adapted for designers, who are not full-time verification experts. We present user experience stories and usage statistics, describing how DLV has been practiced in our company, using a dedicated tool developed for this purpose. A typical pattern that emerges is of designers devoting short, fragmented time periods to DLV work, interleaved with other logic development tasks. We observe that the deployed DLV tool supports this mode of work, since it is simple and intuitive. This demonstrates that a suitable tool can help DLV become an integral part of a logic design project. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	4.5.6	MINIMUM CURRENT CONSUMPTION TRANSITION TIME OPTIMIZATION METHODOLOGY FOR LOW POWER CTS Speaker: Vibhu Sharma, NXP Research, NL Abstract The clock tree network can consume up to 40% of the power budget and is one of the limiting factors for realizing low power designs. This paper presents a novel clock transition time optimization based low power clock tree synthesis, for the non-throughput constraint designs. The proposed methodology quantifies the dependence of short circuit and switching power of the buffers on the input clock transition time, with the newly defined "weighted current strength" parameter. The reduction in the weighted current strength parameter value directly maps into the reduction in the total dynamic power of the clock tree. The proposed methodology determines the transition time constraint values for the clock signals which result in the minimum weighted current strength for the synthesized clock tree network. This technique results in up to 34% reduction in the dynamic power of the clock tree network with the existing clock tree synthesis tools and the clock tree library. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

4.6 Online Testing and Reliable Memories

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Bayard

Chair:
Mihalis Psarakis, University of Piraeus, GR

Co-Chair:
Cristiana Bolchini, Politecnico di Milano, IT

Temperature- and power-aware solutions are proposed for self- and on-line testing, together with innovative fault detection and reconfiguration schemes for caches and emerging memory technologies

Time	Label	Presentation Title Authors
17:00	4.6.1	A DEFECT-AWARE RECONFIGURABLE CACHE ARCHITECTURE FOR LOW-VCCMIN DVFS-ENABLED SYSTEMS Speakers: Michail Mavropoulos, Georgios Keramidas and Dimitris Nikolos, University of Patras, GR Abstract As process technology continues to shrink, due to manufacturing defects and process variations, a large number of bitcells in on-chip caches is expected to be faulty. The number of defective cells varies from die-to-die, wafer-to-wafer, and in the field of application depends on the run-time operating conditions (e.g., supply voltage and frequency). Those trends necessitate i) to study fault-tolerant (FT) cache mechanisms in a wide spectrum of fault- probabilities and ii) to devise appropriate FT cache techniques that must be able to adapt their fault tolerance capacity to the volume of defective locations of the target faulty caches. It is well known that keeping the cache capacity, block size and the volume of defective cells constant, the average number of misses due to faulty cells in general decreases as the associativity of the cache increases. To this end we propose DARCA, a Defect-Aware Reconfigurable Cache Architecture, which is equipped with the ability of dynamically varying its associativity according to the volume of the defective cells. To keep the hardware overhead very small, as the associativity of the cache is multiplied by a power of two, its block size is divided by the same number. Since almost all contemporary processors use prefetching, we also applied DARCA to prefetch-assisted caches. By performing cycle-accurate simulations for the SPEC2006 benchmark suite and assuming a plethora of fault maps and a wide range of fault-probabilities we showed that DARCA compares favorably against several already known FT cache mechanisms with respect to the performance loss caused by defective cells. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	4.6.2	TEMPERATURE-AWARE SOFTWARE-BASED SELF-TESTING FOR DELAY FAULTS Speakers: Ying Zhang¹, Zebo Peng², Jianhui Jiang³, Huawei Li⁴ and Masahiro Fujita⁵ ¹Tongji University, Shanghai, China, CN; ²Embedded Systems Lab, Linköping University, SE; ³School of Software Engineering, Tongji University, CN; ⁴Institute of Computing Technology, Chinese Academy of Sciences, CN; ⁵VLSI Design and Education Center, University of Tokyo, JP Abstract Delay defects under high temperature have been one of the most critical factors to affect the reliability of computer systems, and the current test methods don't address this problem properly. In this paper, temperature-aware software-based self-testing (SBST) technique is proposed to self-heat the processors within a high temperature range and effectively test delay faults under high temperature. First, it automatically generates high-quality test programs through automatic test instruction generation (ATIG), and avoids over-testing caused by nonfunctional patterns. Second, it exploits two effective power-intensive program transformations to self-heat up the processors internally. Third, it applies a greedy algorithm to search the optimized schedule of the test templates in order to generate the test program while making sure that the temperature of the processor under test is within the specified range. Experimental results show that the generated program is successful to guarantee delay test within the given temperature range, and achieves high test performance with functional patterns. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	4.6.3	OPERATIONAL FAULT DETECTION AND MONITORING OF A MEMRISTOR-BASED LUT Speakers: Nandha kumar Thulasiraman¹, Haider A.F. Almurib¹ and Fabrizio Lombardi² ¹The University of Nottingham, MY; ²Northeastern University, US Abstract This paper presents a method for operational testing of a memristor-based memory look-up table (LUT). In the proposed method, the deterioration of the memristors (as storage elements of a LUT) is modeled based on the reduction of the resistance range as observed in fabricated devices and recently reported in the technical literature. A quiescent current technique is used for testing the memristors when deterioration results in a change of state, thus leading to an erroneous (faulty) operation. An equivalent circuit model of the operational deterioration for a memristor-based LUT is presented. In addition to modeling and testing, the proposed method can be utilized also for continuous monitoring of the LUT in the presence of memristor deterioration in the LUT. The proposed method is assessed using LTSPICE; extensive simulation results are presented with respect to different operational features, such as LUT dimension and range of resistance. These results show that the proposed test method is scalable with LUT dimension and highly efficient for testing and monitoring a LUT in the presence of deteriorating multiple memristors. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	4.6.4	POWER-AWARE ONLINE TESTING OF MANYCORE SYSTEMS IN THE DARK SILICON ERA Speakers: Mohammad-Hashem Haghbayan¹, Amir-Mohammad Rahmani¹, Mohammad Fattah¹, Pasi Liljeberg¹, Juha Plosila¹, Hannu Tenhunen² and Zainalabedin Navabi³ ¹University of Turku, FI; ²KTH Royal Institute of Technology, SE; ³Worcester Polytechnic Institute, US Abstract Online defect screening techniques to detect run-time faults are becoming a necessity in current and near future technologies. At the same time, due to aggressive technology scaling into the nanometer regime, power consumption is becoming a significant burden. Most of today's chips employ advanced power management features to monitor the power consumption and apply dynamic power budgeting (i.e., capping) accordingly to prevent over-heating of the chip. Given the notable power dissipation of existing testing methods, one needs to efficiently manage the power budget to cover test process of a manycore system in runtime. In this paper, we propose a power-aware online testing method for many-core systems benefiting from advanced power management capabilities. The proposed power-aware method uses non-intrusive online test scheduling strategy to functionally test the cores in their idle period. In addition, we propose a test-aware utilization-oriented runtime mapping technique that considers the utilization of cores and their test criticality in the mapping process. Our extensive experimental results reveal that the proposed power-aware online testing approach can efficiently utilize temporarily free resources and available power budget for the testing purposes, within less than 1% penalty for the 16nm technology. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP2-2, 940	ON-LINE PREDICTION OF NBTI-INDUCED AGING RATES Speakers: Rafal Baranowski¹, Farshad Firouzi², Saman Kiamehr², Chang Liu¹, Hans-Joachim Wunderlich¹ and Mehdi Tahoori² ¹Stuttgart University, DE; ²Karlsruhe Institute of Technology (KIT), DE Abstract Nanoscale technologies are increasingly susceptible to aging processes such as Negative-Bias Temperature Instability (NBTI) which undermine the reliability of VLSI systems. Existing monitoring techniques can detect the violation of safety margins and hence make the prediction of an imminent failure possible. However, since such techniques can only detect measurable degradation effects which appear after a relatively long period of system operation, they are not well suited to early aging prediction and proactive aging alleviation. This work presents a novel method for the monitoring of NBTI-induced degradation rate in digital circuits. It enables the timely adoption of proper mitigation techniques that reduce the impact of aging. The proposed method employs machine learning techniques to find a small set of so called Representative Critical Gates (RCG), the workload of which is correlated with the degradation of the entire circuit. The workload of RCGs is observed in hardware using so called workload monitors. The output of the workload monitors is evaluated on-line to predict system degradation experienced within a configurable (short) period of time, e.g. a fraction of a second. Experimental results show that the proposed monitors predict the degradation rate with an average error of only 3% at less than 2.4% area overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

4.7 How Resilient Are Emerging Technologies?

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Les Bans

Chair:
Vikas Chandra, ARM, US

Co-Chair:
Mehdi Tahoori, Karlsruhe Institute of Technology, DE

Many new technologies are being proposed as alternatives to conventional CMOS design. Resiliency, including robustness, reliability and fault modeling, will be a key factor in their success. This session includes results on several of these, as well as IP presentations on two others.

Time	Label	Presentation Title Authors
17:00	4.7.1	(Best Paper Award Candidate) DIGITAL CIRCUITS RELIABILITY WITH IN-SITU MONITORS IN 28NM FULLY DEPLETED SOI Speakers: Marine Saliva¹, Florian Cacho¹, Vincent Huard¹, Xavier Federspiel¹, Damien Angot¹, Ahmed Benhassain¹, Alain Bravaix² and Lorena Anghel³ ¹STMicroelectronics, FR; ²IM2NP-ISEN, FR; ³TIMA, FR Abstract Aging induced degradation mechanisms occurring in digital circuits are of a greater importance in the latest technologies. Monotonous degradation such as Bias Temperature Instability (BTI) or Hot Carrier Injection (HCI) but also sudden degradation such as Dielectric Breakdown (DB) are identified as the major sources of reliability hazard. The impact of these phenomena on the digital circuits is usually observed in terms of timing degradations and thus it may result in setup/hold violation. In this paper we will focus on the impact of aging related degradation mechanisms on timing. In-situ monitor is a promising strategy to measure timing slacks and to provide pre-error warning prior timing violation. A dedicated structure has been developed to measure and benchmark the behaviors of different monitors. The technology used for the test structure and in-situ monitors are processed in 28nm Fully Depleted SOI. The designs of monitors are mostly based on delay elements. Three types of delays are proposed in this paper: flip-flop's Master delay, Buffers delay and Passive delay. In addition, we investigate the impact of global and local variations on the accuracy of the measurements by providing complete monitors characterization. Experimental results show a good agreement with SPICE simulation. Finally the proposed in-situ monitors will be compared and their applications to circuit aging prediction will be discussed. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	4.7.2	READ/WRITE ROBUSTNESS ESTIMATION METRICS FOR SPIN TRANSFER TORQUE (STT) MRAM CELL Speakers: Elena Ioana Vatajelu¹, Rosa Rodriguez-Montañés², Marco Indaco¹, Michel Renovell³, Paolo Prinetto¹ and Joan Figueras² ¹Politecnico di Torino, IT; ²Universitat Politecnica de Catalunya, ES; ³LIRMM-CNRS, Abstract — The rapid development of low power, high density, high performance SoCs has pushed the embedded memories to their limits and opened the field to the development of emerging memory technologies. The Spin-Transfer-Torque Magnetic Random Access Memory (STT-MRAM) has emerged as a promising choice for embedded memories due to its reduced read/write latency and high CMOS integration capability. Under today aggressive technology scaling requirements, the STT-MRAM is affected by process variability making robustness evaluation an important concern. In this paper, we provide new metrics for robustness prediction of an STT-MRAM memory cell. Independent Robustness Margin metrics are defined for Read Operation and Write Operation based on the electrical characteristics of the memory cell and the fabrication induced variability. These metrics are used to estimate the extreme parameter variation causing the cell failure, Current Noise Margins and the Failure Probability of the STT-MRAM cell. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	4.7.3	FAULT MODELING IN CONTROLLABLE POLARITY SILICON NANOWIRE CIRCUITS Speakers: Hassan Ghasemzadeh Mohammadi¹, Pierre-Emmanuel Gaillardon² and Giovanni De Micheli¹ ¹École Polytechnique Fédérale de Lausanne (EPFL), CH; ²École Polytechnique Fédérale de Lausanne (EPFL), Abstract Controllable polarity silicon nanowire transistors are among the promising candidates to replace current CMOS in the near future owing to their superior electrostatic characteristics and advanced functionalities. From a circuit testing point of view, it is unclear if the current CMOS and FinFET fault models are comprehensive enough to model all defects of controllable polarity nanowires. In this paper, we deal with the above problem using inductive fault analysis on three-independent-gate silicon nanowire FETs. Simulations revealed that the current fault models, i.e. stuck-open faults, are insufficient to cover all modes of operation. The newly introduced test algorithm for stuck open can adequately capture the malfunction behavior of controllable polarity logic gates in the presence of nanowire break and bridge on polarity terminals. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP2-3, 849	RETRAINING BASED TIMING ERROR MITIGATION FOR HARDWARE NEURAL NETWORKS Speakers: Jiachao Deng¹, Yuntan Fang¹, Zidong Du¹, Ying Wang¹, Huawei Li¹, Olivier Temam², Paolo Ienne³, David Novo³, Xiaowei Li¹, Yunji Chen¹ and Chengyong Wu¹ ¹State Key Laboratory of Computer Architecture, ICT, CAS, Beijing, China †University of Chinese Academy of Sciences, Beijing, China, CN; ²INRIA Saclay, France, FR; ³École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract Recently, neural network (NN) accelerators are gaining popularity as part of future heterogeneous multi-core architectures due to their broad application scope and excellent energy efficiency. Additionally, since neural networks can be retrained, they are inherently resillient to errors and noises. Prior work has utilized the error tolerance feature to design approximate neural network circuits or tolerate logical faults. However, besides high-level faults or noises, timing errors induced by delay faults, process variations, aging, etc. are dominating the reliability of NN accelerator under nanoscale manufacturing process. In this paper, we leverage the error resiliency of neural network to mitigate timing errors in NN accelerators. Specifically, when timing errors significantly affect the output results, we propose to retrain the accelerators to update their weights, thus circumventing critical timing errors. Experimental results show that timing errors in NN accelerators can be well tamed for different applications. Download Paper (PDF; Only available from the DATE venue WiFi)
18:31	IP2-4, 269	DICTIONARY-BASED SPARSE REPRESENTATION FOR RESOLUTION IMPROVEMENT IN LASER VOLTAGE IMAGING OF CMOS INTEGRATED CIRCUITS Speakers: Tenzile Berkin Cilingiroglu, Mahmoud Zangeneh, Aydan Uyar, W. Clem Karl, Janusz Konrad, Ajay Joshi, Bennett B. Goldberg and M. Selim Unlu, Boston University, US Abstract The rapid decrease in the dimensions of integrated circuits with a simultaneous increase in component density have introduced resolution challenges for optical failure analysis tech- niques. Although optical microscopy efforts continue to increase resolution of optical systems through hardware modifications, signal processing methods are essential to complement these efforts to meet the resolution requirements for the nanoscale integrated circuit technologies. In this work, we focus on laser voltage imaging as the optical failure analysis technique and show how an overcomplete dictionary-based sparse representation can improve resolution and localization accuracy. We describe a reconstruction approach based on this sparse representation and validate its performance on simulated data. We achieve an 80% reduction of the localization error. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

4.8 Strength by Interdisciplinary Research: The Cadence Academic Network

Date: Tuesday 10 March 2015
Time: 17:00 - 18:30
Location / Room: Salle Lesdiguières

Organiser:
Patrick Haspel, Cadence Academic Network, US

Chair:
Jürgen Haase, edacentrum, DE

The Academic Network was launched by Cadence in 2007. The aim was to promote the proliferation of leading-edge technologies and methodologies at universities renowned for their engineering and design excellence. A knowledge network among selected universities, research institutes, industry advisors and Cadence was established to facilitate the sharing of technology expertise in the areas of verification, design and implementation of microelectronic systems.

Specific examples of research directions in the cadence academic network will be given in three talks.

Time	Label	Presentation Title Authors
17:00	4.8.1	INTRODUCTION TO THE ACADEMIC NETWORK Speaker: Patrick Haspel, Cadence Academic Network, US
17:15	4.8.2	DEPENDABILITY AND DESIGN-FOR-TESTABILITY Speaker: Said Hamdioui, Delft University of Technology, NL
17:40	4.8.3	DIGITAL SYSTEM DESIGN Speaker: Mladen Berekovic, TU Braunschweig, DE
18:05	4.8.4	SYSTEM LEVEL DEVELOPMENT USING VIRTUAL PROTOTYPING Speakers: Michael Hübner and Diana Goehringer, Ruhr-University Bochum, DE
18:30		End of session

UB04 Session 4

Date: Tuesday 10 March 2015
Time: 17:30 - 19:30
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB04.1	4-LOOP: 4-CORE LEON 3 WITH LINUX OPERATING SYSTEM, OPENMP LIBRARY AND HARDWARE PROFILING SYSTEM Presenters: Giacomo Valente, Vittoriano Muttillo and Andrea Moro, University of L'Aquila, IT Authors: Vittoriano Muttillo and Fabio Federici, Abstract Multi-processor SoC based on soft-cores are increasing the range of applications that could be implemented by exploiting FPGAs. In this context, this demo presents a symmetric multi-processor system, composed of four Leon 3 cores and a custom Linux kernel, able to execute OpenMP-based applications and enhanced with a hardware profiling system. OpenMP support and the novel profiling system are the results of R&D activities conducted by several students and Professors at University of L'Aquila. More information ...
UB04.2	THE Ψ-CHART DESIGN APPROACH IN TTOOL/DIPLODOCUS: A FRAMEWORK FOR HW/SW CO-DESIGN OF DATA-DOMINATED SYSTEMS-ON-CHIP Presenter: Andrea Enrici, Télécom ParisTech, FR Authors: Ludovic Apvrille, Daniel Camara and Renaud Pacalet, Télécom ParisTech, FR Abstract In the scope of the DATE 2015 University Booth, we present our latest achievements for the system level design of parallel and distributed embedded systems. We propose a demonstration of a novel design approach, the Ψ-chart, in TTool/DIPLODOCUS, a UML/SysML framework for the design, validation and automatic code generation for data-dominated SoCs. The Ψ-chart is a design approach where communication patterns are designed with dedicated models, independently of a pair application-architecture, before mapping phase. It allows for a complete orthogonalization of concerns between the design of computations and communications, thus achieving faster Design Space Exploration, complete design portability as well as reduced design times and costs. The subject of our demonstration is the design of the physical layer (PHY) of the transmitter part of the Zigbee wireless standard (IEEE 802.15.4) mapped onto a MPSoC architecture with shared memory. Our demonstration will illustrate the full design of the Zigbee transmitter, from models to the automatic generation of the emulation code, via simulation and formal verification. We will validate our design by comparing the output samples produced by the emulation code, with a real implementation of the transmitter on a FPGA prototyping board. More information ...
UB04.3	WHERE IS IT? FIND THE CODE YOU ARE INTERESTED IN! Presenter: Jan Malburg, University of Bremen, DE Author: Görschwin Fey, University of Bremen / German Aerospace Center, DE Abstract The demonstration presents our tool for feature localization and debugging of RTL-designs. Feature localization helps a designer to find the code relevant for a certain feature and, thus, helps him to faster understand a design previously unknown to him. The developer can choose between three basic techniques for feature localization. In the area of debugging the tools allows fault localization, reverse debugging based on dynamic data- and control-flow of the design and dynamic slicing. More information ...
UB04.4	HIPER-NIRGAM: A TOOL CHAIN BASED FRAMEWORK FOR MODELLING THERMAL - AWARE RELIABILITY ESTIMATION IN 2D MESH NOCS Presenter: Ashish Sharma, Malaviya National Institute of Technology, Jaipur, IN Authors: Manoj Singh Gaur¹, Lava Bhargava¹, Vijay Laxmi¹ and Mark Zwolinski² ¹Malaviya National Institute of Technology, Jaipur, IN; ²University of Southampton, GB Abstract Every three years, power density in system-on-chip (SoCs) gets doubled. As the semiconductor technology is scaling, the number of cores and interconnect network connections are increasing. To improve system performance while meeting permissible power limits, Chip-Multi Processors (CMPs) and many-core processors have emerged as an appealing solution. One of the significant aspects of many-core design is an on chip interconnect network that can effectively support intra-core and inter-core communications. This interconnect should be scalable, support high communication bandwidth and multiple concurrent connections among cores. Network-on-chip (NoC) replaces the traditional bus based interconnect architecture as former is scalable, has higher bandwidth, fault tolerance and offers parallelism. Regular NoC topologies improve scalability too. Adaptive NoC routing solutions distribute power densities and delay onset of hotspot creation. With ever-growing demand of computation and communication bandwidth by applications, the system designer need to consider and address resultant power and thermal issues in SoC as well as NoC design. Design tools need to incorporate thermal effects in design and evaluation of prototypes. Abstract--- Regional temperature differential and hotspots are two thermal problems in network-on-chip. On-chip thermal problems have an adverse impact on system performance and reliability. We propose creation of a toolchain based framework for incorporating thermal evaluation of NoC through existing simulation tools. Our proposed framework provides an integration of NoC simulator with power and thermal simulation models for analyzing the thermal hotspots and can be used for thermal-aware reliability estimation. In our framework, reliability estimation is based on life time failure models such as TDDB (Time dependent dielectric breakdown), NBTI (Negative bias temperature instability) and SM (Stress Migration). In our proposed reliability measurement is based on MTTF (Mean time to failure) comparative value. Our tool chain consists NIRGAM as a NoC simulator, NoC configuration parameters such as number of virtual channel, buffer size, routing logic, simulation cycles and application traffic are passed to power models (Orion 2.0 and McPAT). Power models provide the power trace and area of given NoC configuration. The power model results are further used in Hotspot 5.02 [HOTSPOT] thermal simulation model for generating floorplan and temperature trace (steady temperature file). The steady temperature trace used in reliability estimation tool REST [REST_tool] to estimating MTTF vales. Abstract--- We believe that this generic framework can be used by researchers on academia and industry to incorporate thermal-aware reliability estimation in their design exploration. More information ...
UB04.5	NUMERICAL METHODS FOR EFFICIENT SIMULATIONS OF CIRCUITS WITH SEPARATED TIME SCALES Presenter: Genie Hsieh, Sandia National Laboratories, US Abstract Circuit simulations can support to analyze and predict the performance, safety, and reliability of nuclear weapons and to certify their functionality. Small circuits with strong, nonlinear oscillations (i.e. circuits have separated fast/slow time scales; hereafter denoted by "fast/slow circuits") can make the computation time of even a single simulation unmanageable. These types of circuits are common in weapon systems. Many numerical methods are proposed to speedup such simulations by utilizing multiple time variables to efficiently represent circuit signals with widely separated rates of variation. However, weapon circuits possess complex behaviors that are shown to be the outstanding challenges in this research field. In this work, we develop novel numerical methods for fast/slow weapon circuit simulations and deliver significant simulation speedups to facilitate efficient weapon assurance. More information ...
UB04.7	AIDASOFT: ANALOG IC DESIGN AUTOMATION Presenter: Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT Authors: Nuno Lourenço¹, Ricardo Martins¹, Ricardo Póvoa¹, António Canelas¹, Ricardo Lourenço² and Pedro Ventura² ¹Instituto de Telecomunicações/Instituto Superior Técnico, PT; ²Instituto de Telecomunicações, PT Abstract This demo presents AIDA an ongoing project at Instituto de Telecomunicações/University of Lisbon, Portugal, which addresses analog IC design automation from circuit-level specifications to layout descriptions in GDS-II. AIDA consists of two main modules AIDA-C and AIDA-L. AIDA-C is demonstrated for layout-aware circuit-level sizing and optimization by generating a family of robust Pareto Optimal solutions. AIDA-L is demonstrated by generating the layout taking into account electrical currents information to mitigate electromigration and IR-drop effects, and also wiring symmetry for multiport multi-terminal signal nets of analog ICs. More information ...
UB04.8	XTSI: THE 3-D ELECTRO-THERMAL SIMULATOR Presenter: Jürgen Scheible, Reutlingen University, DE Author: Carl Christoph Jung, Reutlingen University, DE Abstract xtSi is a 3D electro-thermal simulation tool for integrated circuits. It uses a computationally efficient algorithm, which allows the simulation of typical ICs in only a few minutes. The temperature distribution is depicted graphically and with temporal resolution in a specially designed graphical user interface. With the help of xtSi designers can exactly identify isotherms and hotspots, thus enabling an optimization of the layout due to temperature effects. xtSi has been verified experimentally for device temperatures exceeding 500 °C up to the onset of thermal runaway. More information ...
UB04.9	BONDCALC: THE BOND CALCULATOR Presenter: Carl Christoph Jung, Reutlingen University, DE Authors: Christian Silber¹ and Juergen Scheible² ¹Robert Bosch GmbH, DE; ²Reutlingen University, DE Abstract The Bond Calculator is a fast and exact tool to help designers to choose a bond wire, which does not fuse. The Bond Calculator is orders of magnitude faster than FEM and Easy-to-use. The Bond Calculator helps designers to estimate the temperature at the bond connection itself, by calculating the time and space dependence of the power delivered from the bond wire to the chip. These temperature changes can affect the durability of the bond connection. The Bond Calculator uses a simplified simulation model to calculate the temperature profile in a bond wire from the induced current profile. This software tool has been validated by FEM and measurement. More information ...
19:30	End of session

Exhibition-Reception Exhibition Reception

Date: Tuesday 10 March 2015
Time: 18:30 - 19:30
Location / Room: Several serving points inside the Exhibition Area

The Exhibition Reception will take place on Tuesday, March 10, 2015, from 18:30 - 19:30 in the exhibition area of the congress center, where free drinks for all conference delegates and exhibition visitors will be offered. All exhibitors are welcome to also provide drinks and snacks for the attendees.

Time	Label	Presentation Title Authors
19:30		End of session

5.1 SPECIAL DAY Hot Topic: Applications of IoT

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Salle Oisans

Organisers:
Rolf Drechsler, University of Bremen/DFKI GmbH, DE
Ahmed Jerraya, CEA-Leti, FR

Chair:
Gabriela Nicolescu, Ecole Polytechnique Montreal, CA

Co-Chair:
Ahmed Jerraya, CEA-Leti, FR

Internet of things (IoT) applications is changing the landscape of the whole society and even non-traditional ICT intensive domains. More products in all market segments are emerging every day and is changing the way human and machines are interacting. This represents a great opportunity for innovators in industry and new vistas of research for academia. This session overview several application domains already impacted by this IoT wave.

Time	Label	Presentation Title Authors
08:30	5.1.1	IOT FOR HEALTHCARE Speaker: Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH
08:52	5.1.2	IOT FOR SMART HOME Speaker: Sylvain Paineau, Schneider Electric, FR
09:14	5.1.3	IOT FOR AUTOMOTIVE Speaker: Juergen Hornung, Robert Bosch GmbH, DE
09:36	5.1.4	IOT FOR SMART CITIES Speaker: Levent Gurgen, CEA/LETI, FR
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

5.2 Hardware Trojan and Active Implementation Attacks

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Belle Etoile

Chair:
Paolo Maistri, TIMA, FR

Co-Chair:
Viktor Fischer, Hubert Curien Laboratory, FR

This session proposes novel techniques to detect hardware Trojans inserted at gate level and presents improvements and novel targets for fault attacks.

Time	Label	Presentation Title Authors
08:30	5.2.1	IMPROVED PRACTICAL DIFFERENTIAL FAULT ANALYSIS OF GRAIN-128 Speakers: Prakash Dey¹, Abhishek Chakraborty², Avishek Adhikari¹ and Debdeep Mukhopadhyay² ¹Department of Pure Mathematics, University of Calcutta, Kolkata-700019, IN; ²Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur-721302, IN Abstract Differential Fault Attacks (DFA) on stream ciphers have been an active field of research. However, their practical realizations have not been reported in the public literature. Hence, the assumptions on the fault models made in the context of DFA for stream ciphers have not been studied. Furthermore, there have been few efforts reported on the popular stream cipher candidate, Grain-128. We consider a simple low-cost fault injection set-up, using clock glitches and show that in stream ciphers the critical path of the circuit affects few bit positions (the feedback bit for the Shift Registers in the stream ciphers). Thus the fault is often localized to single bit position, and because of the absence of required faulty ciphers makes existing theoretical DFAs invalid. In order to create multiple instance of faults, we use clock glitches to induce the fault, and then use the shifting property of the internal registers of Grain to create multiple instances of contiguously located faults. In parallel, we also develop a more relaxed DFA for Grain-128, to show that when the fault is k neighbourhood bits, $k in {1,..,5}$, the attack is successful to retrieve the key without knowing the locations or exact number of bits flipped by the internal fault. We also devise a technique for rejecting the bad faults with high probabilities, i.e., when the faults are not in the contiguous location as required in the attack. Combining the above attacks we demonstrate using a simple set-up via clock glitches that such faults can be practically obtained and analysed using the proposed attack algorithm to retrieve the key. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.2.2	A SCORE-BASED CLASSIFICATION METHOD FOR IDENTIFYING HARDWARE-TROJANS AT GATE-LEVEL NETLISTS Speakers: Masaru Oya, Youhua Shi, Masao Yanagisawa and Nozomu Togawa, Waseda University, JP Abstract Recently, digital ICs are often designed by outside vendors to reduce design costs in semiconductor industry, which may introduce severe risks that malicious attackers implement Hardware Trojans (HTs) on them. Since IC design phase generates only a single design result, an RT-level or gate-level netlist for example, we cannot assume an HT-free netlist or a Golden netlist and then it is too difficult to identify whether a generated netlist is HT-free or HT-inserted. In this paper, we propose a score-based classification method for identifying HT-free or HT-inserted gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give it scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a it score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be HT-inserted and all the HT-free gate-level benchmarks to be HT-free in approximately three hours for each benchmark. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.2.3	(Best Paper Award Candidate) HARDWARE TROJAN DETECTION FOR GATE-LEVEL ICS USING SIGNAL CORRELATION BASED CLUSTERING Speakers: Burcin Cakir and Sharad Malik, Princeton University, US Abstract Malicious tampering of the internal circuits of ICs can lead to detrimental results. Insertion of Trojan circuits may change system behavior, cause chip failure or send information to a third party. Trojans are hidden cleverly by the adversary to evade detection using typical pre-silicon verification and post-manufacturing testing. Therefore, the validation of chips to detect these has emerged as an important problem, particularly for safety-critical applications. This paper presents an information-theoretic approach for Trojan detection. It estimates the statistical correlation between the signals in a design, and explores how this estimation can be used in a clustering algorithm to detect the Trojan logic. The gate level circuit is modeled as a weighted graph. The edge weights are determined using correlations between the signal transitions at the inputs and outputs of a gate based on simulation data. These weights are used to compute a distance metric in a density-based clustering algorithm. This approach exploits the fact that Trojans have a stealthy nature. The nodes which are nearly unused and hence, have weak correlation with the rest of the circuit, are detected as outliers by this clustering method and flagged as suspicious. Compared with the other algorithms, our tool does not require extensive logic analysis. We neither need the circuit to be brought to the triggering state, nor the effect of the Trojan payload to be propagated and observed at the output. Instead we leverage already available simulation data in this information-theoretic approach. We conducted experiments on the TrustHub benchmarks to validate the practical efficacy of this approach. The results show that our tool can detect Trojan logic with up to 100% coverage with low false positive rates. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-5, 46	FAULT-BASED ATTACKS ON THE BEL-T BLOCK CIPHER FAMILY Speakers: Philipp Jovanovic and Ilia Polian, University of Passau, DE Abstract We present the first fault-based attack on the Bel-T block cipher family which has been adopted recently as a national standard of the Republic of Belarus. Our attack successfully recovers the secret key of the 128-bit, 192-bit and 256-bit versions of Bel-T using 4, 7 and 10 fault injections, respectively. We also show the results from our comprehensive simulation-based experiments. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

5.3 Variability Challenges in Nanoscale Circuits

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Stendhal

Chair:
Pablo Garcia del Valle, École Polytechnique Fédérale de Lausanne (EPFL), CH

Co-Chair:
Muhammad Shafique, Karlsruhe Institute of Technology, DE

This session proposes new techniques to address variability related challenges in nanoscale chips. The topics addressed include retention time variations in DRAM and variations in the power delivery network.

Time	Label	Presentation Title Authors
08:30	5.3.1	EXPLOITING DRAM RESTORE TIME VARIATIONS IN DEEP SUB-MICRON SCALING Speakers: Xianwei Zhang¹, Youtao Zhang¹, Bruce Childers¹ and Jun Yang² ¹Department of Computer Science, University of Pittsburgh, US; ²Electrical and Computer Engineering Department, University of Pittsburgh, US Abstract Recent studies reveal that one of the major challenges in scaling DRAM in deep sub-micron regime is its significant variations on cell restore time, which affects timing constraints such as write recovery time tWR. Adopting traditional approaches results in either low yield rate or large performance degradation. In this paper, we propose schemes to expose the variations to the architectural level. By constructing memory chunks with different accessing speeds and, in particular, exploiting the performance benefits of fast chunks, a variation-aware memory controller can effectively compensate the performance loss due to relaxed timing constraints. Our experimental results show that, comparing to traditional designs such as row sparing and ECC, the proposed schemes help to improve system performance by up to 10.3% and 12.9%, respectively, for 20nm and 14nm tech nodes on a 4-core multiprocessor system. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.3.2	ADAPTIVELY TOLERATE POWER-GATING-INDUCED POWER/GROUND NOISE UNDER PROCESS VARIATIONS Speakers: Zhe Wang, Xuan Wang, Jiang Xu, Xiaowen Wu, Zhehui Wang, Peng Yang, Luan H. K. Duong, Haoran Li, Rafael K. V. Maeda and Zhifei Wang, HKUST, HK Abstract Power gating is one of the most effective techniques to reduce the leakage power in multiprocessor system-on-chips (MPSoCs). However, the power-mode transition during the power gating period of an individual processing unit will introduce serious power/ground (P/G) noise to the neighboring processing units. As technology scales, the P/G noise problem becomes a severe reliability threat to MPSoCs. At the same time, the increasing manufacturing process variations also bring uncertainties to the P/G noise problem and make it difficult to predict and deal with. In order to address this problem, for the first time, this paper analyzes the power-gating-induced P/G noise in the presence of process variations, and proposes a hardware-software collaborated online method to adaptively protect processing units from P/G noise. Sensor network-on-chip (SENoC) is used to gather noise information and coordinate different system components. Meanwhile an online software-based algorithm is developed to effectively decide the noise impact range and arrange protections for affected processing units based on the collected information. We evaluate the proposed method through Monte Carlo simulations on a NoC-based MPSoC platform. The experimental results show that for a set of real applications, our method achieves on average 13.2% overall performance improvement and 13.3% system energy reduction compared with the traditional stop-go method. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.3.3	ENERGY VERSUS DATA INTEGRITY TRADE-OFFS IN EMBEDDED HIGH-DENSITY LOGIC COMPATIBLE DYNAMIC MEMORIES Speakers: Adam Teman¹, Georgios Karakonstantis², Robert Giterman³, Pascal Meinerzhagen⁴ and Andreas Burg¹ ¹École Polytechnique Fédérale de Lausanne (EPFL), CH; ²Queen's University, CH; ³Ben-Gurion University, IL; ⁴Intel Labs, US Abstract Current variation aware design methodologies, tuned for worst-case scenarios, are becoming increasingly pessimistic from the perspective of power and performance. A good example of such pessimism is setting the refresh rate of DRAMs according to the worst-case access statistics, thereby resulting in very frequent refresh cycles, which are responsible for the majority of the standby power consumption of these memories. However, such a high refresh rate may not be required, either due to extremely low probability of the actual occurrence of such a worst-case, or due to the inherent error resilient nature of many applications that can tolerate a certain number of potential failures. In this paper, we exploit and quantify the possibilities that exist in dynamic memory design by shifting to the so-called approximate computing paradigm in order to save power and enhance yield at no cost. The statistical characteristics of the retention time in dynamic memories were revealed by studying a fabricated 2kb CMOS compatible embedded DRAM (eDRAM) memory array based on gain-cells. Measurements show that up to 73% of the retention power can be saved by altering the refresh time and setting it such that a small number of failures is allowed. We show that these savings can be further increased by utilizing known circuit techniques, such as body biasing, which can help, not only in extending, but also in preferably shaping the retention time distribution. Our approach is one of the first attempts to access the data integrity and energy trade-offs achieved in eDRAMs for utilizing them in error resilient applications and can prove helpful in the anticipated shift to approximate computing. Download Paper (PDF; Only available from the DATE venue WiFi)
09:45	5.3.4	RETENTION TIME MEASUREMENTS AND MODELLING OF BIT ERROR RATES OF WIDE-I/O DRAM IN MPSOCS Speakers: Christian Weis¹, Matthias Jung¹, Peter Ehses¹, Cristiano Santos², Pascal Vivet³, Sven Goossens⁴, Martijn Koedam⁴ and Norbert Wehn¹ ¹University of Kaiserslautern, DE; ²UFGRS, Porto Alegre and CEA-Leti, France, BR; ³CEA-Leti, FR; ⁴Eindhoven University of Technology, NL Abstract DRAM cells use capacitors as volatile and leaky bit storage elements. The time spent without refreshing them is called retention time. It is well known that the retention time depends inverse exponentially on the temperature. In 3D stacking, the challenges of high power densities and thermal dissipation are exacerbated and have a much stronger impact on the retention time of 3D-stacked WIDE I/O DRAMs that are placed on top of an MPSoC. Consequently, it is very important to study the temperature behaviour of WIDE I/O DRAMs. To the best of our knowledge, no investigations based on real measurements were done for stacked DRAM-on-logic devices. In this paper, we first provide detailed measurements on temperature-dependent retention time and bit error rates of WIDE I/O DRAMs. To obtain the correct temperature distribution of the WIDE-I/O DRAM die we use an advanced thermal modelling tool: the DOCEA AceThermalModeler™ (ATM). The WIDE I/O DRAM retention times and bit error rates are compared to the behaviour of 2D-DRAM chips (DIMMs) with the help of an advanced FPGA-based test system. We observed data pattern dependencies and variable retention times (VRTs). Second, based on this data, we develop and validate a SystemC-TLM2.0 DRAM bit error rate model. Our proposed DRAM bit error model enables early investigations on the temperature vs. retention time trade-off in future 3D-stacked MPSoCs with WIDE I/O DRAMs in SystemC-TLM2.0 environments. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-6, 440	ON THE PREMISES AND PROSPECTS OF TIMING SPECULATION Speakers: Rong Ye¹, Feng Yuan², Jie Zhang² and Qiang Xu² ¹Imperial College, GB; ²The Chinese University of Hong Kong, HK Abstract Timing speculation (TS), being able to detect and correct circuit timing errors at runtime, is a promising alternative solution to mitigate the ever-increasing variation effects in nanometer circuits. The potential energy-efficiency improvement, however, is limited by the circuit "timing wall", a critical operating point caused by conventional circuit optimization techniques (e.g., gate sizing). With a given circuit netlist, we study the bound of the potential benefits provided by TS techniques in this work, which facilitate designers to decide whether it worths the effort to implement a timing-speculative circuit. Experimental results on benchmark circuits demonstrate the effectiveness of the proposed methodology. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP2-7, 932	IMPACT OF INTERCONNECT MULTIPLE-PATTERNING VARIABILITY ON SRAMS Speakers: Ioannis Karageorgos¹, Michele Stucchi², Praveen Raghavan², Julien Ryckaert², Zsolt Tokei², Diederik Verkest², Rogier Baert², Sushil Sakhare² and Wim Dehaene³ ¹imec, BE; ²IMEC, BE; ³KU Leuven, imec, BE Abstract The introduction of Multiple Patterning (MP) in sub-32nm technology nodes may pose severe variability problems in wire resistance and capacitance of IC circuits. In this paper we evaluate the impact of this variability on the performance of SRAM cell arrays based on the 10nm technology node, for a relevant range of process variation assumptions. The MP options we consider are the triple Litho-Etch (LE³) and the Self Aligned Double Patterning (SADP), together with Single Patterning Extreme-UV (EUV). In addition to the analysis of the worst-case variability scenario and the impact on SRAM performance, we propose an analytical formula for the estimation of SRAM read time penalty, using the RC variation of the bit line and the array size as input parameters. This formula, verified with SPICE simulations, allows a fast extraction of the statistical distribution of the read time penalty, using the Monte-Carlo method. Results on each patterning option are presented and compared. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

5.4 Emerging Technologies for NoCs

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Chartreuse

Chair:
Ian O'Connor, University of Lyon, FR

Co-Chair:
Davide Bertozzi, University of Ferrara, IT

Several new technologies are enabling capabilities for NoCs. In this session, we demonstrate how to leverage optical, 3D and wireless methodologies to improve your NoCs. The first paper explores crosstalk mitigation techniques in optical NoCs. The second paper explores TSV minimization through virtual channels and the final paper deals with dynamic calibration in wireless NoCs.

Time	Label	Presentation Title Authors
08:30	5.4.1	COHERENT CROSSTALK NOISE ANALYSES IN RING-BASED OPTICAL INTERCONNECTS Speakers: Luan H.K. Duong¹, Mahdi Nikdast², Jiang Xu¹, Zhehui Wang¹, Yvain Thonnart³, Sébastien Le Beux⁴, Peng Yang¹, Xiaowen Wu¹ and Zhifei Wang¹ ¹The Hong Kong University of Science and Technology, HK; ²École Polytechnique de Montréal, Montréal, CA; ³CEA-Leti, FR; ⁴Lyon Institute of Nanotechnology, Ecole Centrale de Lyon, FR Abstract Recently, optical interconnects have been proposed for ultra-high bandwidth and low latency inter/intra-chip communication in multiprocessor systems-on-chip (MPSoCs). These optical interconnects employ the microresonators (MRs) to direct/detect the optical signal. However, utilized MRs suffer from intrinsic crosstalk noise and power loss, degrading the network efficiency via the signal-to-noise ratio (SNR). In this paper, both coherent and incoherent crosstalk in wavelength-division multiplexing (WDM) networks are discussed and systematically analyzed. We carefully develop our analytical models at the optical-circuit level, and apply them to two ring-based networks: SUOR and Corona ONoCs. The quantitative results have demonstrated that the architectural design of the ONoCs determines the impact of crosstalk on the SNR. Even though SUOR and Corona are both ring-based ONoCs, the worst-case SNR can be differed up to 50dB. Our analyses of the worst-case SNR can be utilized as a platform to compare the realistic performance among different optical interconnection networks via the degradation of BER and data bandwidth. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.4.2	ENABLING VERTICAL WORMHOLE SWITCHING IN 3D NOC-BUS HYBRID SYSTEMS Speakers: Changlin Chen, Marius Enachescu and Sorin Cotofana, Delft University of Technology, NL Abstract In Networks-on-Chip (NoC) systems Wormhole Switching (WS) enables lower packet transmission latency and requires less silicon real estate than the Packet Switching (PS). However, enabling vertical WS in conventional 3D NoC-Bus hybrid systems requires a large amount of TSVs, which have low yield in state of theart 3D staking technology. In this paper, we alleviate this issue by introducing a Bus Virtual Channel (VC) Allocation (BVA) mechanism, which assigns to at most one cross layer packet a free input VC in its target router before injecting it into the bus. In this way, a routing path is reserved by the head flit, and the rest of the packet flits can be WS transmitted through the vertical buses. Given that VC allocation is performed only once per packet per hop BVA can be performed in such a way that it doesn't become a system bottleneck. We evaluated our proposal with both synthetic and real application traffics and the experimental results indicate that when vertical WS is implemented, the bus critical path length is reduced by at least 31% and the average packet transmission latency is reduced by at least 22%, when compared with conventional pipelined bus or TDMA bus based systems. Moreover, the area cost and power consumption of the output buffer incident to the bus are reduced by 47% and 43%, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.4.3	A CLOSED LOOP TRANSMITTING POWER SELF-CALIBRATION SCHEME FOR ENERGY EFFICIENT WINOC ARCHITECTURES Speaker: Maurizio Palesi, Kore University, IT Authors: Andrea Mineo¹, Mohd Shahrizal Rusli², Maurizio Palesi³, Giuseppe Ascia¹, Vincenzo Catania¹ and M. N. Marsono² ¹University of Catania, IT; ²Universiti Teknologi Malaysia, MY; ³Kore University, IT Abstract In a wireless Network-on-Chip (WiNoC) the radio transceiver accounts for a significant fraction of the total communication energy. Recently, a configurable transceiver architecture able to regulate its transmitting power based on the location of the destination node has been proposed. Unfortunately, the use of such transceiver requires a costly, time consuming and complex characterization phase performed at design time and mainly based on the use of field solver simulators whose accuracy has not yet been proved in the context of integrated on-chip antennas. In this paper we present a closed loop transmitting power self-calibration mechanism which allows to determine on-line the optimal transmitting power for each transmitting and receiving pair in a WiNoC. The proposed mechanism is general and can be applied to any WiNoC architecture with a low overhead in terms of silicon area. Its application to three well known WiNoC architectures shows its effectiveness in drastically reducing the overall communication energy (up to 50%) with a limited impact on performance. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-8, 894	COHERENCE BASED MESSAGE PREDICTION FOR OPTICALLY INTERCONNECTED CHIP MULTIPROCESSORS Speakers: Anouk Van Laer¹, Chamath Ellawala¹, Muhammad Ridwan Madarbux¹, Timothy M. Jones² and Philip M. Watts¹ ¹University College London, GB; ²University of Cambridge, GB Abstract Photonic networks on chip have been proposed to reduce latency and power consumption of on-chip communication in chip multiprocessors. However, in switched photonic networks, the path setup latency can create a high overhead, particularly for the short messages generated by shared memory chip multiprocessors (CMP). This has led to proposals for networks which avoid switching using all-to-all or single writermultiple reader (SWMR) networks which dramatically increase optical component counts and hence power consumption. In this work we propose a predictor which uses information from the coherence protocol and previously transmitted messages to predict future messages and hence hide the path setup latency by speculatively setup photonic paths. We show that a directly mapped predictor can achieve prediction hit rates of up to 85% for PARSEC benchmarks in a 16-core x86 system using the MESI coherence protocol whereas a more resource efficient set associative predictor can still achieve prediction rates up to 75% Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

5.5 Critical Embedded Systems

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Meije

Chair:
Lothar Thiele, Swiss Federal Institute of Technology Zurich, CH

Co-Chair:
Iain Bate, University of York, GB

The papers in this session focus on design concerns for safety-critical embedded systems. Topics include scheduling for engine-control tasks, fault tolerance, real-time communication, and safety and security in embedded systems.

Time	Label	Presentation Title Authors
08:30	5.5.1	(Best Paper Award Candidate) SUFFICIENT RESPONSE TIME ANALYSIS CONSIDERING DEPENDENCIES BETWEEN RATE-DEPENDENT TASKS Speakers: Timo Feld¹ and Frank Slomka² ¹Institute of Embedded Systems / Real-Time Systems Ulm University, DE; ²Ulm University, DE Abstract In automotive embedded real-time systems, such as the engine control unit (ECU), some tasks are activated whenever the engine arrives at a specific angular position. In consequence, the frequency at which this task is activated changes with the speed of the engine i. e. angular velocity. Additionally, these tasks have worst case execution times and deadlines that also depends on the angular velocity. Such tasks exhibit rate-dependent behaviour. In recently published works analytical methods for tasks with this rate-dependent behaviour were introduced. Though those methods do not consider dependencies between tasks. For instance one event might be displaced a certain angular position after an event of another task. In this paper, a sufficient analysis will be introduced, which considers those dependencies to improve the accuracy of existing methods. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.5.2	ENGINE CONTROL: TASK MODELLING AND ANALYSIS Speakers: Alessandro Biondi and Giorgio Buttazzo, Scuola Superiore Sant'Anna, IT Abstract Engine control is characterized by computational activities that are triggered by specific crankshaft rotation angles and are designed to adapt their functionality based on the angular velocity of the engine. Although a few models have been proposed in the literature to handle such tasks, most of them are quite simplistic and do not allow expressing features that are presently used by the automotive industry. This paper proposes a new task model for expressing realistic features of engine control tasks and presents a real-time analysis for applications consisting of multiple engine control tasks and classical periodic tasks. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.5.3	EVALUATION OF DIVERSE COMPILING FOR SOFTWARE-FAULT TOLERANCE Speakers: Andrea Höller¹, Nermin Kajtazovic², Tobias Rauter², Kay Römer² and Christian Kreiner² ¹TU Graz, AT; ²Graz University of Technology, AT Abstract Although software fault prevention techniques improve continually, faults remain in every complex software system. Thus safety-critical embedded systems need mechanisms to tolerate software faults. Typically, these systems use static redundancy to detect hardware faults during operation. However, the reliability of a redundant system not only depends on the reliability of each version, but also on the dissimilarity between them. Thus, researchers have investigated ways to automatically add cost-efficient diversity to software to increase the efficiency of redundancy strategies. One of these automated software diversification methods is diverse compiling, which exploits the diversity introduced by different compilers and different optimization flags. Today, diverse compiling is used to improve the hardware fault tolerance and to avoid common defects from compilers. However, in this paper we show that diverse compiling also enhances the software fault tolerance by increasing the chance of finding defects in the source code of the executed software during runtime. More precisely, the memory is organized differently, when using different compilers and compiler flags. This enhances the chance of detecting memory-related software bugs, such as missing memory initialization, during runtime. Here we experimentally quantify the efficiency of diverse compiling for software fault tolerance and we show that diverse compiling can help to detect up to about 70% of memory-related software bugs. Download Paper (PDF; Only available from the DATE venue WiFi)
09:45	5.5.4	WORST-CASE COMMUNICATION TIME ANALYSIS OF NETWORKS-ON-CHIP WITH SHARED VIRTUAL CHANNELS Speakers: Eberle A Rambo and Rolf Ernst, TU Braunschweig, DE Abstract Network-on-Chip (NoC) based multi- and many-core architectures show high potential for use in real-time applications due to their superior efficiency. In real-time systems, it is necessary to guarantee that the application's timing requirements are met through the analysis of the worst-case behavior. A typical approach to guarantee real-time is the exclusive assignment of virtual channels to tasks or cores. Virtual channels, however, are a limited resource in NoCs. In future systems, there will be more tasks than virtual channels (VCs) in the network. In this paper, we propose a worst-case communication analysis of wormhole-switched best-effort NoCs (no special QoS mechanism) with SLIP arbitration and support to shared VCs. The approach is based on Compositional Performance Analysis, which enables non-symmetrical guarantees for the streams. The analysis is evaluated experimentally and compared with simulation and related work. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-9, 778	OPENMP AND TIMING PREDICTABILITY: A POSSIBLE UNION? Speakers: Roberto Vargas¹, Eduardo Quinones² and Andrea Marongiu³ ¹Barcelona Supercomputing Center (BSC) and Technical University of Catalonia (UPC), ES; ²Barcelona Supercomputing Center (BSC), ES; ³Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract Next-generation many-core embedded platforms have the chance of intercepting a converging need for high performance and predictability. Programming methodologies for such platforms will have to promote predictability as a first-class design constraint, along with features for massive parallelism exploitation. OpenMP, increasingly adopted in the embedded systems domain, has recently evolved to deal with the programmability of heterogeneous many-cores, with mature support for fine-grained task parallelism. While tasking is potentially very convenient for coding real-time applications modeled as periodic task graphs, OpenMP adopts an execution model completely agnostic to any timing requirement that the target application may have. In this position paper we reason about the suitability of the current OpenMP v4 specification and execution model to provide timing guarantees in many-cores. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP2-10, 622	(Best Paper Award Candidate) SAHARA: A SECURITY-AWARE HAZARD AND RISK ANALYSIS METHOD Speakers: Georg Macher¹, Harald Sporer¹, Reinhard Berlach¹, Eric Armengaud² and Christian Kreiner¹ ¹Graz University of Technology, AT; ²AVL List GmbH, AT Abstract Safety and Security appear to be two contradicting overall system features, which challenge researchers for decades. Traditionally, these two features have been treated separately, but due to increasing awareness of mutual impacts, cross domain knowledge and fine grasp of commonalities becomes more important. Due to increasing interlacing of systems (such as Car2x in the automotive domain) it is no longer acceptable to assume safety systems immune from security risks and vice versa. Future automotive systems require appropriate systematic approaches to support security aware safety development. Therefore, this paper presents a combined approach of the automotive HARA (hazard analysis and risk assessment) with the security domain STRIDE approach to trace impacts of security issues on safety concepts on system level. We present an approach to classify the probability of security threats to determine the appropriate amount of countermeasures to be considered. Furthermore, we analyze the impact of these security threats on safety analysis of automotive systems. The paper describes how such a method has been developed based on the HARA approach and how a safety-critical contribution of successful security attacks can be quantified and proceeded. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

5.6 Analyzing and Improving Memories

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Bayard

Chair:
Robert Aitken, ARM, US

Co-Chair:
Panagiota Papavramidou, IMAG, FR

Memories are a driving force behind virtually all IC designs. This session describes methods of improving memory architecture, robustness, and lifetime.

Time	Label	Presentation Title Authors
08:30	5.6.1	ON THE STATISTICAL MEMORY ARCHITECTURE EXPLORATION AND OPTIMIZATION Speakers: Charalampos Antoniadis¹, Georgios Karakonstantis², Nestoras Evmorfopoulos¹, Andreas Burg³ and George Stamoulis¹ ¹University of Thessaly, GR; ²Queen's University, GB; ³École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract The worsening of process variations and the consequent increased spreads in circuit performance and consumed power hinder the satisfaction of the targeted budgets and lead to yield loss. Corner based design and adoption of design guardbands might limit the yield loss. However, in many cases such methods may not be able to capture the real effects which might be way better than the predicted ones leading to increasingly pessimistic designs. The situation is even more severe in memories which consist of substantially different individual building blocks, further complicating the accurate analysis of the impact of variations at the architecture level leaving many potential issues uncovered and opportunities unexploited. In this paper, we develop a framework for capturing non-trivial statistical interactions among all the components of a memory/cache. The developed tool is able to find the optimum memory/cache configuration under various constraints allowing the designers to make the right choices early in the design cycle and consequently improve performance, energy, and especially yield. Our, results indicate that the consideration of the architectural interactions between the memory components allow to relax the pessimistic access times that are predicted by existing techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.6.2	ECRIPSE: AN EFFICIENT METHOD FOR CALCULATING RTN-INDUCED FAILURE PROBABILITY OF AN SRAM CELL Speakers: Hiromitsu Awano, Masayuki Hiromoto and Takashi Sato, Kyoto University, JP Abstract Failure rate degradation of an SRAM cell due to random telegraph noise (RTN) is calculated for the first time. ECRIPSE, an efficient method for calculating the RTN-induced failure probability of an SRAM cell, has been developed to exhaustively cover a large number of possible bias-voltage combinations on which RTN statistics strongly depend. In order to shorten computational time, the Monte Carlo calculation of a single gate-bias condition is accelerated by incorporating two techniques: 1) construction of an optimal importance sampling using particles that move about the ``important'' regions in a variability space, and 2) a classifier that quickly judges whether the random samples are in failure regions or not. We show that the proposed method achieves at least 15.6x speed-up over the state-of-the-art method. We then integrate an RTN model to modulate failure probability. In our experiment, RTN worsens failure probability by six times than that calculated without the effect of RTN. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.6.3	SUBPAGE PROGRAMMING FOR EXTENDING THE LIFETIME OF NAND FLASH MEMORY Speakers: Jung-Hoon Kim¹, Sang-Hoon Kim² and Jin-Soo Kim³ ¹Samsung Electronics Corp., KR; ²KAIST, KR; ³Sungkyunkwan University, KR Abstract During the past decade, the density of NAND flash memory has been increased in many folds. The increase has been driven by storing multiple bits in a cell and scaling down the fabrication process. Such advance in manufacturing technology, however, has been significantly impaired the reliability of the flash memory so that the reliability becomes one of the major concerns in use of the flash memory. Moreover, as the flash memory writes data in the unit of flash page, the trend of the increase in page size worsens the reliability by amplifying a small update to a full flash page programming. In this paper, we propose a new programming method to improve the flash endurance cycle, especially when a small amount of data are written repeatedly. Proposed method so called "subpage programming" partitions a page into smaller subpages. A small amount of data can be programmed to one of the subpages while the other subpages are inhibited from the programming by leveraging the mechanisms of flash cell programming. Thus, the number of flash cells that undergo programming is minimized. We evaluated the effect of the proposed subpage programming on real NAND flash memory chips from three different manufacturers. Our evaluation results show that the subpage programming improves the flash endurance cycle by up to 258%. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

5.7 Architectures and Design for Cyber-Physical Systems

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Les Bans

Chair:
Rolf Ernst, Technische Universität Braunschweig, DE

Co-Chair:
Paul Pop, Technical University of Denmark, DK

The session covers architectures for non-volatile processors, mixed-criticality, reliable and self-aware systems, and design optimisation issues such as system synthesis for reliability and cost, online scheduling and FPGA acceleration.

Time	Label	Presentation Title Authors
08:30	5.7.1	OPTIMIZED SELECTION OF RELIABLE AND COST-EFFECTIVE CYBER-PHYSICAL SYSTEM ARCHITECTURES Speakers: Nikunj Bajaj¹, Pierluigi Nuzzo¹, Michael Masin² and Alberto Sangiovanni-Vincentelli¹ ¹University of California at Berkeley, US; ²IBM Haifa Research Lab, IL Abstract We address the problem of synthesizing safety-critical cyber-physical system architectures to minimize a cost function while guaranteeing the desired reliability. We cast the problem as an integer linear program on a reconfigurable graph which models the architecture. Since generating symbolic probability constraints by exhaustive enumeration of failure cases on all possible graph configurations takes exponential time, we propose two algorithms to decrease the problem complexity, i.e. Integer-Linear Programming Modulo Reliability (ILP-MR) and Integer-Linear Programming with Approximate Reliability (ILP-AR). We compare the two approaches and demonstrate their effectiveness on the design of aircraft electric power system architectures. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.7.2	(Best Paper Award Candidate) SOFTWARE ASSISTED NON-VOLATILE REGISTER REDUCTION FOR ENERGY HARVESTING BASED CYBER-PHYSICAL SYSTEM Speakers: Mengying Zhao¹, Qingan Li², Mimi Xie³, Yongpan Liu⁴, Jingtong Hu³ and Jason Xue¹ ¹City University of Hong Kong, HK; ²Wuhan University & City University of Hong Kong, CN; ³Oklahoma State University, US; ⁴Tsinghua University, CN Abstract Wearable devices are important components as information collector in many cyber-physical systems. Energy harvesting instead of battery is a better power source for these wearable devices due to many advantages. However, harvested energy is naturally unstable and program execution will be interrupted frequently. Non-volatile processors demonstrate promising advantages to back up volatile state before the system energy is depleted. However, it also introduces non-negligible energy and area overhead. Since the chip size is a vital factor for wearable devices, in this work, we target non-volatile register reduction for application-specific systems. We propose to analyze the application program and determine efficient backup positions, by which the necessary non-volatile register file size can be significantly reduced. The evaluation results deliver an average of 62.9% reduction on non-volatile register file size for stack backup, with negligible storage overheads. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.7.3	A RE-ENTRANT FLOWSHOP HEURISTIC FOR ONLINE SCHEDULING OF THE PAPER PATH IN A LARGE SCALE PRINTER Speakers: Umar Waqas¹, Marc Geilen¹, Jack Kandelaars², Lou Somers³, Twan Basten¹, Sander Stuijk¹, Patrick Vestjens² and Henk Corporaal¹ ¹Eindhoven University of Technology, NL; ²Oce Technologies, NL; ³Oce technologies, NL Abstract A Large Scale Printer (LSP) is a Cyber Physical System (CPS) printing thousands of sheets per day with high quality. The print requests arrive at run-time requiring online scheduling. We capture the LSP scheduling problem as online scheduling of re-entrant flowshops with sequence dependent setup times and relative due dates with makespan minimization as the scheduling criterion. Exhaustive approaches like Mixed Integer Programming can be used, but they are compute intensive and not suited for online use. We present a novel heuristic for scheduling of LSPs that on average requires 0.3 seconds per sheet to find schedules for industrial test cases. We compare the schedules to lower bounds, to schedules generated by the current scheduler and schedules generated by a modified version of the classical NEH (MNEH) heuristic [1], [2]. On average, the proposed heuristic generates schedules that are 40% shorter than the current scheduler, have an average difference of 25% compared to the estimated lower bounds and generates schedules with less than 67% of the makespan of schedules generated by the MNEH heuristic. Download Paper (PDF; Only available from the DATE venue WiFi)
09:45	5.7.4	MPIOV: SCALING HARDWARE-BASED I/O VIRTUALIZATION FOR MIXED-CRITICALITY EMBEDDED REAL-TIME SYSTEMS USING NON TRANSPARENT BRIDGES TO (MULTI-CORE) MULTI-PROCESSOR SYSTEMS Speaker: Benny Akesson, Czech Technical University in Prague, CZ Authors: Daniel Muench¹, Michael Paulitsch¹, Oliver Hanka¹ and Andreas Herkersdorf² ¹Airbus Group Innovation, DE; ²TU Munich, DE Abstract Safety-critical systems consolidating multiple functionalities of different criticality (so-called mixed-criticality systems) require separation between these functionalities to assure safety and security properties. Performance-hungry and safety-critical applications (like a radar processing system steering an autonomous flying aircraft) can have demand for an embedded high-performance computing cluster of more than one (multi-core) processor. This paper presents the Multi-Processor I/O Virtualization (MPIOV)concept to enable hardware-based Input/Output (I/O) virtualization or sharing with separation among multiple (multi-core) processors in (mixed-criticality) embedded real-time systems which usually do not have means for separation like an Input/Output Memory Management Unit (IOMMU). The concept uses a non-Transparent Bridge (NTB) to connect each processing host to the management host while checking the target address and source / origin ID to decide whether to block a transaction or to pass a transaction. It is a standardized, portable and non-proprietary platform-independent spatial separation solution without requiring an IOMMU in the processor. Furthermore, the concept sketches an approach for PCI Express (PCIe)-based systems to enable sharing of up to 2048 (virtual) functions per end-point while still being compatible to the plain PCIe standard. A practical evaluation demonstrates that the impact to performance degradation (transfer time, transfer rate) is neglectable (about 0.01%) compared to a system using no separation. -Keywords: spatial separation, hardware-based I/O virtualization, non-transparent bridge (NTB), real-time embedded systems, mixed-criticality systems, IOMMU, IOMPU, multi-core, multi-processor Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-11, 349	CYBERPHYSICAL-SYSTEM-ON-CHIP (CPSOC) : A SELF-AWARE MPSOC PARADIGM WITH CROSS-LAYER VIRTUAL SENSING AND ACTUATION Speakers: Nikil Dutt¹, Puneet Gupta², Nalini Venkatasubramanian³ and Alex Nicolau¹ ¹University of California Irvine, US; ²University of California Los Angeles, US; ³, Abstract Cyber-physical systems (CPSs) are physical and engineered systems whose operations are monitored, coordinated, controlled, and integrated by a computing, control, and communication core. We propose Cyberphysical-System-on-Chips (CPSoC), a new class of sensor and actuator-rich multiprocessor systems-on-chip (MPSoCs), that augment MPSoCs with additional on-chip and cross-layer sensing and actuation capabilities to enable self-awareness within the observe-decide-act (ODA) paradigm. Unlike traditional MPSoC designs, CPSoC differs primarily on the co-design of computing-communication-control (C3) systems that interacts with the physical environment in real-time in order to adapt system behavior so as to dynamically react to environmental changes while achieving overall design goals. We illustrate CPSoC's potential through a virtual sensor network that accurately estimates run-time power for variability affected subsystems using noisy thermal sensors in improving system goals and Quality-of-Service (QoS). Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP2-12, 753	OCCUPANCY DETECTION VIA IBEACON ON ANDROID DEVICES FOR SMART BUILDING MANAGEMENT Speakers: Andrea Corna, Lorenzo Fontana, Alessandro Antonio Nacci and Donatella Sciuto, Politecnico di Milano, IT Abstract Building heating, ventilation, and air conditioning (HVAC) systems are considered to be the main target for energy reduction due to their significant contribution to commercial buildings' energy consumption. Knowing a building's occupancy plays a crucial role in implementing demand-response HVAC. In this paper we propose a new solution based on the iBeacon technology. This solution is different from the previous ones because it leverages on the Bluetooth Low Energy standard, which provides lower power consumption. Moreover, the iBeacon protocol can be used both on iOs systems and Android ones, making this new approach portable. Differently from our previous work based on iOS devices, in this paper we focus on an Android based solution with the aim of increasing the accuracy of the location and the energy efficiency of the entire system. We increased the accuracy by 10% and the energy efficiency by 15%. Download Paper (PDF; Only available from the DATE venue WiFi)
10:02	IP2-13, 100	A NEURAL MACHINE INTERFACE ARCHITECTURE FOR REAL-TIME ARTIFICIAL LOWER LIMB CONTROL Speakers: Jason Kane, Qing Yang, Robert Hernandez, Willard Simoneau and Matthew Seaton, University of Rhode Island, US Abstract This paper presents a novel architecture of a lower limb neural machine interface (NMI) for determination of user intent. Our new design and implementation paves the way for future bionic legs that require high speed real-time deterministic response, high accuracy, easy portability, and low power consumption. A working FPGA-based prototype has been built, and experiments have shown that it achieves average performance gains of around 8x that of the equivalent software algorithm running on an Intel Core i7 2670QM, or 24x that of an Intel Atom Z530 with no perceivable loss in accuracy. Furthermore, our fully pipelined and parallel non-linear support vector machine-based FPGA implementation led to a 6.4x speedup over an equivalent GPU-based design. In this paper, we also characterize our achieved timing margin to show that our design is capable of supporting real-time wireless communications. With additional refinement, such a wireless personal area network (PAN) system will provide improved flexibility on an individual basis for electromyography (EMG) sensor placement. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

5.8 Hot Topic - The Next Generation of Virtual Prototyping: Ultra-fast yet Accurate Simulation of HW/SW Systems

Date: Wednesday 11 March 2015
Time: 08:30 - 10:00
Location / Room: Salle Lesdiguières

Organisers:
Daniel Müller-Gritschneder, Technische Universitaet München, DE
Oliver Bringmann, University of Tübingen, DE

Chair:
Andy D. Pimentel, University of Amsterdam, NL

Co-Chair:
Christian Haubelt, University of Rostock, DE

This session addresses leading-edge solutions in the field of virtual prototyping. Employing techniques such as source-level software simulation, host-compiled firmware, OS and processor modeling, as well as abstract communication and peripheral models, it is possible to reach very high simulation speeds. With intelligent new out-of-order modeling, synchronization and temporal decoupling techniques, such ultra-fast simulation can be achieved while also maintaining a very high accuracy.

Time	Label	Presentation Title Authors
08:30	5.8.1	ULTRA-FAST SOURCE-LEVEL TIMING SIMULATION - HIGH ACCURACY NEEDS EXACT CODE MATCHING Speaker: Oliver Bringmann, University of Tuebingen / FZI, DE
08:52	5.8.2	HOST-COMPILED OPERATING SYSTEM AND PROCESSOR MODELING Speaker: Andreas Gerstlauer, The University of Texas at Austin, US
09:15	5.8.3	ABSTRACT COMMUNICATION MODELS FOR ACCURATE AND FAST SOC SIMULATION Speaker: Daniel Mueller-Gritschneder, Technische Universität München, DE
09:37	5.8.4	INDUSTRIAL PERSPECTIVE ON ULTRA-HIGH SPEED AND TIMING-ACCURATE SOC AND PERIPHERAL MODELS Speaker: Ajay Goyal, Infineon Technologies, IN
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

IP2 Interactive Presentations

Date: Wednesday 11 March 2015
Time: 10:00 - 10:30
Location / Room: Exhibition Area

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the morning. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. At the end of each afternoon Interactive Presentations session the award 'Best IP of the Day' is given.

Label	Presentation Title Authors
IP2-1	COMPARISON OF MULTI-PURPOSE CORES OF KECCAK AND AES Speakers: Panasayya Yalla, Ekawat Homsirikamol and Jens-Peter Kaps, George Mason University, US Abstract Most widely used security protocols, Internet Protocol Security (IPSec), Secure Socket Layer (SSL), and Transport Layer Security (TLS), provide several cryptographic services which in turn require multiple dedicated cryptographic algorithms. A single cryptographic primitive for all secret key functions utilizing different mode of operations can overcome this constraint. This paper investigates the possibility of using AES and Keccak as the underlying primitives for high-speed and resource constrained applications. Even though a plain AES implementation is typically much smaller and has a better throughput to area ratio than a plain Keccak, adding additional cryptographic services changes the results dramatically. Our multi-purpose Keccak outperforms our multi-purpose AES by a factor of 4 for throughput over area on average. This underlines the flexibility of the Keccak Sponge and Duplex functions. Our multi-purpose Keccak achieves a throughput of 23.2 Gbps in AE-mode (Keyak) on a Xilinx Virtex-7 and 28.7 Gbps on a Altera Stratix-IV. In order to study this further we also implemented two versions of a dedicated Keyak and dedicated AES-GCM. Our dedicated Keyak implementation outperforms our dedicated AES-GCM on average by a factor 6 in terms of throughput over area reaching a throughput of 28.9 Gbps and 4.1 Gbps respectively on a Xilinx Virtex-7. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-2	ON-LINE PREDICTION OF NBTI-INDUCED AGING RATES Speakers: Rafal Baranowski¹, Farshad Firouzi², Saman Kiamehr², Chang Liu¹, Hans-Joachim Wunderlich¹ and Mehdi Tahoori² ¹Stuttgart University, DE; ²Karlsruhe Institute of Technology (KIT), DE Abstract Nanoscale technologies are increasingly susceptible to aging processes such as Negative-Bias Temperature Instability (NBTI) which undermine the reliability of VLSI systems. Existing monitoring techniques can detect the violation of safety margins and hence make the prediction of an imminent failure possible. However, since such techniques can only detect measurable degradation effects which appear after a relatively long period of system operation, they are not well suited to early aging prediction and proactive aging alleviation. This work presents a novel method for the monitoring of NBTI-induced degradation rate in digital circuits. It enables the timely adoption of proper mitigation techniques that reduce the impact of aging. The proposed method employs machine learning techniques to find a small set of so called Representative Critical Gates (RCG), the workload of which is correlated with the degradation of the entire circuit. The workload of RCGs is observed in hardware using so called workload monitors. The output of the workload monitors is evaluated on-line to predict system degradation experienced within a configurable (short) period of time, e.g. a fraction of a second. Experimental results show that the proposed monitors predict the degradation rate with an average error of only 3% at less than 2.4% area overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-3	RETRAINING BASED TIMING ERROR MITIGATION FOR HARDWARE NEURAL NETWORKS Speakers: Jiachao Deng¹, Yuntan Fang¹, Zidong Du¹, Ying Wang¹, Huawei Li¹, Olivier Temam², Paolo Ienne³, David Novo³, Xiaowei Li¹, Yunji Chen¹ and Chengyong Wu¹ ¹State Key Laboratory of Computer Architecture, ICT, CAS, Beijing, China †University of Chinese Academy of Sciences, Beijing, China, CN; ²INRIA Saclay, France, FR; ³École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract Recently, neural network (NN) accelerators are gaining popularity as part of future heterogeneous multi-core architectures due to their broad application scope and excellent energy efficiency. Additionally, since neural networks can be retrained, they are inherently resillient to errors and noises. Prior work has utilized the error tolerance feature to design approximate neural network circuits or tolerate logical faults. However, besides high-level faults or noises, timing errors induced by delay faults, process variations, aging, etc. are dominating the reliability of NN accelerator under nanoscale manufacturing process. In this paper, we leverage the error resiliency of neural network to mitigate timing errors in NN accelerators. Specifically, when timing errors significantly affect the output results, we propose to retrain the accelerators to update their weights, thus circumventing critical timing errors. Experimental results show that timing errors in NN accelerators can be well tamed for different applications. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-4	DICTIONARY-BASED SPARSE REPRESENTATION FOR RESOLUTION IMPROVEMENT IN LASER VOLTAGE IMAGING OF CMOS INTEGRATED CIRCUITS Speakers: Tenzile Berkin Cilingiroglu, Mahmoud Zangeneh, Aydan Uyar, W. Clem Karl, Janusz Konrad, Ajay Joshi, Bennett B. Goldberg and M. Selim Unlu, Boston University, US Abstract The rapid decrease in the dimensions of integrated circuits with a simultaneous increase in component density have introduced resolution challenges for optical failure analysis tech- niques. Although optical microscopy efforts continue to increase resolution of optical systems through hardware modifications, signal processing methods are essential to complement these efforts to meet the resolution requirements for the nanoscale integrated circuit technologies. In this work, we focus on laser voltage imaging as the optical failure analysis technique and show how an overcomplete dictionary-based sparse representation can improve resolution and localization accuracy. We describe a reconstruction approach based on this sparse representation and validate its performance on simulated data. We achieve an 80% reduction of the localization error. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-5	FAULT-BASED ATTACKS ON THE BEL-T BLOCK CIPHER FAMILY Speakers: Philipp Jovanovic and Ilia Polian, University of Passau, DE Abstract We present the first fault-based attack on the Bel-T block cipher family which has been adopted recently as a national standard of the Republic of Belarus. Our attack successfully recovers the secret key of the 128-bit, 192-bit and 256-bit versions of Bel-T using 4, 7 and 10 fault injections, respectively. We also show the results from our comprehensive simulation-based experiments. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-6	ON THE PREMISES AND PROSPECTS OF TIMING SPECULATION Speakers: Rong Ye¹, Feng Yuan², Jie Zhang² and Qiang Xu² ¹Imperial College, GB; ²The Chinese University of Hong Kong, HK Abstract Timing speculation (TS), being able to detect and correct circuit timing errors at runtime, is a promising alternative solution to mitigate the ever-increasing variation effects in nanometer circuits. The potential energy-efficiency improvement, however, is limited by the circuit "timing wall", a critical operating point caused by conventional circuit optimization techniques (e.g., gate sizing). With a given circuit netlist, we study the bound of the potential benefits provided by TS techniques in this work, which facilitate designers to decide whether it worths the effort to implement a timing-speculative circuit. Experimental results on benchmark circuits demonstrate the effectiveness of the proposed methodology. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-7	IMPACT OF INTERCONNECT MULTIPLE-PATTERNING VARIABILITY ON SRAMS Speakers: Ioannis Karageorgos¹, Michele Stucchi², Praveen Raghavan², Julien Ryckaert², Zsolt Tokei², Diederik Verkest², Rogier Baert², Sushil Sakhare² and Wim Dehaene³ ¹imec, BE; ²IMEC, BE; ³KU Leuven, imec, BE Abstract The introduction of Multiple Patterning (MP) in sub-32nm technology nodes may pose severe variability problems in wire resistance and capacitance of IC circuits. In this paper we evaluate the impact of this variability on the performance of SRAM cell arrays based on the 10nm technology node, for a relevant range of process variation assumptions. The MP options we consider are the triple Litho-Etch (LE³) and the Self Aligned Double Patterning (SADP), together with Single Patterning Extreme-UV (EUV). In addition to the analysis of the worst-case variability scenario and the impact on SRAM performance, we propose an analytical formula for the estimation of SRAM read time penalty, using the RC variation of the bit line and the array size as input parameters. This formula, verified with SPICE simulations, allows a fast extraction of the statistical distribution of the read time penalty, using the Monte-Carlo method. Results on each patterning option are presented and compared. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-8	COHERENCE BASED MESSAGE PREDICTION FOR OPTICALLY INTERCONNECTED CHIP MULTIPROCESSORS Speakers: Anouk Van Laer¹, Chamath Ellawala¹, Muhammad Ridwan Madarbux¹, Timothy M. Jones² and Philip M. Watts¹ ¹University College London, GB; ²University of Cambridge, GB Abstract Photonic networks on chip have been proposed to reduce latency and power consumption of on-chip communication in chip multiprocessors. However, in switched photonic networks, the path setup latency can create a high overhead, particularly for the short messages generated by shared memory chip multiprocessors (CMP). This has led to proposals for networks which avoid switching using all-to-all or single writermultiple reader (SWMR) networks which dramatically increase optical component counts and hence power consumption. In this work we propose a predictor which uses information from the coherence protocol and previously transmitted messages to predict future messages and hence hide the path setup latency by speculatively setup photonic paths. We show that a directly mapped predictor can achieve prediction hit rates of up to 85% for PARSEC benchmarks in a 16-core x86 system using the MESI coherence protocol whereas a more resource efficient set associative predictor can still achieve prediction rates up to 75% Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-9	OPENMP AND TIMING PREDICTABILITY: A POSSIBLE UNION? Speakers: Roberto Vargas¹, Eduardo Quinones² and Andrea Marongiu³ ¹Barcelona Supercomputing Center (BSC) and Technical University of Catalonia (UPC), ES; ²Barcelona Supercomputing Center (BSC), ES; ³Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract Next-generation many-core embedded platforms have the chance of intercepting a converging need for high performance and predictability. Programming methodologies for such platforms will have to promote predictability as a first-class design constraint, along with features for massive parallelism exploitation. OpenMP, increasingly adopted in the embedded systems domain, has recently evolved to deal with the programmability of heterogeneous many-cores, with mature support for fine-grained task parallelism. While tasking is potentially very convenient for coding real-time applications modeled as periodic task graphs, OpenMP adopts an execution model completely agnostic to any timing requirement that the target application may have. In this position paper we reason about the suitability of the current OpenMP v4 specification and execution model to provide timing guarantees in many-cores. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-10	(Best Paper Award Candidate) SAHARA: A SECURITY-AWARE HAZARD AND RISK ANALYSIS METHOD Speakers: Georg Macher¹, Harald Sporer¹, Reinhard Berlach¹, Eric Armengaud² and Christian Kreiner¹ ¹Graz University of Technology, AT; ²AVL List GmbH, AT Abstract Safety and Security appear to be two contradicting overall system features, which challenge researchers for decades. Traditionally, these two features have been treated separately, but due to increasing awareness of mutual impacts, cross domain knowledge and fine grasp of commonalities becomes more important. Due to increasing interlacing of systems (such as Car2x in the automotive domain) it is no longer acceptable to assume safety systems immune from security risks and vice versa. Future automotive systems require appropriate systematic approaches to support security aware safety development. Therefore, this paper presents a combined approach of the automotive HARA (hazard analysis and risk assessment) with the security domain STRIDE approach to trace impacts of security issues on safety concepts on system level. We present an approach to classify the probability of security threats to determine the appropriate amount of countermeasures to be considered. Furthermore, we analyze the impact of these security threats on safety analysis of automotive systems. The paper describes how such a method has been developed based on the HARA approach and how a safety-critical contribution of successful security attacks can be quantified and proceeded. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-11	CYBERPHYSICAL-SYSTEM-ON-CHIP (CPSOC) : A SELF-AWARE MPSOC PARADIGM WITH CROSS-LAYER VIRTUAL SENSING AND ACTUATION Speakers: Nikil Dutt¹, Puneet Gupta², Nalini Venkatasubramanian³ and Alex Nicolau¹ ¹University of California Irvine, US; ²University of California Los Angeles, US; ³, Abstract Cyber-physical systems (CPSs) are physical and engineered systems whose operations are monitored, coordinated, controlled, and integrated by a computing, control, and communication core. We propose Cyberphysical-System-on-Chips (CPSoC), a new class of sensor and actuator-rich multiprocessor systems-on-chip (MPSoCs), that augment MPSoCs with additional on-chip and cross-layer sensing and actuation capabilities to enable self-awareness within the observe-decide-act (ODA) paradigm. Unlike traditional MPSoC designs, CPSoC differs primarily on the co-design of computing-communication-control (C3) systems that interacts with the physical environment in real-time in order to adapt system behavior so as to dynamically react to environmental changes while achieving overall design goals. We illustrate CPSoC's potential through a virtual sensor network that accurately estimates run-time power for variability affected subsystems using noisy thermal sensors in improving system goals and Quality-of-Service (QoS). Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-12	OCCUPANCY DETECTION VIA IBEACON ON ANDROID DEVICES FOR SMART BUILDING MANAGEMENT Speakers: Andrea Corna, Lorenzo Fontana, Alessandro Antonio Nacci and Donatella Sciuto, Politecnico di Milano, IT Abstract Building heating, ventilation, and air conditioning (HVAC) systems are considered to be the main target for energy reduction due to their significant contribution to commercial buildings' energy consumption. Knowing a building's occupancy plays a crucial role in implementing demand-response HVAC. In this paper we propose a new solution based on the iBeacon technology. This solution is different from the previous ones because it leverages on the Bluetooth Low Energy standard, which provides lower power consumption. Moreover, the iBeacon protocol can be used both on iOs systems and Android ones, making this new approach portable. Differently from our previous work based on iOS devices, in this paper we focus on an Android based solution with the aim of increasing the accuracy of the location and the energy efficiency of the entire system. We increased the accuracy by 10% and the energy efficiency by 15%. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-13	A NEURAL MACHINE INTERFACE ARCHITECTURE FOR REAL-TIME ARTIFICIAL LOWER LIMB CONTROL Speakers: Jason Kane, Qing Yang, Robert Hernandez, Willard Simoneau and Matthew Seaton, University of Rhode Island, US Abstract This paper presents a novel architecture of a lower limb neural machine interface (NMI) for determination of user intent. Our new design and implementation paves the way for future bionic legs that require high speed real-time deterministic response, high accuracy, easy portability, and low power consumption. A working FPGA-based prototype has been built, and experiments have shown that it achieves average performance gains of around 8x that of the equivalent software algorithm running on an Intel Core i7 2670QM, or 24x that of an Intel Atom Z530 with no perceivable loss in accuracy. Furthermore, our fully pipelined and parallel non-linear support vector machine-based FPGA implementation led to a 6.4x speedup over an equivalent GPU-based design. In this paper, we also characterize our achieved timing margin to show that our design is capable of supporting real-time wireless communications. With additional refinement, such a wireless personal area network (PAN) system will provide improved flexibility on an individual basis for electromyography (EMG) sensor placement. Download Paper (PDF; Only available from the DATE venue WiFi)

UB05 Session 5

Date: Wednesday 11 March 2015
Time: 10:00 - 12:00
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB05.1	A FRAMEWORK FOR THE EMULATION AND PROTOTYPING OF NANO-PHOTONIC OPTICAL ACCELERATORS Presenter: Alberto Garcia-Ortiz, University of Bremen, DE Authors: Wolfgang Büter¹, A. Ali², S Mahmood², S. Arefin², V. V. Parsi Sreenivas², M. Mike Bülters³ and R.-B. Bergmann³ ¹Institute for Electrodynamics and Microelectronics Systems (ITEM), DE; ²University of Bremen, Physics/Electrical Engineering, DE; ³Bremer Institut für angewandte Strahltechnik GmbH, DE Abstract The recent advances in on-chip optical communication anticipate nano-photonic optical computing as a disruptive new technology. Different architectural solutions and competing physical implementations are currently being investigated. They include a wide spectrum of approaches such as those based on optical analog processing (e.g., nano-photonic optical vector-matrix multiplication), digital optical gates (e.g. reversible nano-photonic gates, BDD-based approaches, etc) or even quantum computing. Since the computing, performance, and error characteristics of these technologies differ substantially from those of standard CMOS technologies, an early co-design framework for nano-photonic accelerators embedded with digital multiprocessor systems is urgently required. It should allow an early investigation about the possible implementation of some kernels using optical accelerators and the effect of optical non-idealities in the overall system. This demo presents a framework for the virtual emulation and prototyping of nano-phonic accelerators for optical analog processing and optical digital gates, currently being developed at the "Institute of Electrodynamics and Microelectronics" (ITEM) and at the "Bremen Institute for Applied Beam Technology" (BIAS). This framework, based on the ideas of rapid prototyping and virtual emulation using FPGA technology, provides two levels of operation. At a first level, it offers a library of models that can be used to construct a virtual prototype of a hybrid multi-processor and nano-photonic system. The parameterizable models emulate several optical non-idealities but are synthesizable at the RTL-level, so that a standard FPGA-emulation of the complete system can be carried out. In a second level, it offers the possibility to plug-in a macroscopic optical accelerator to prototype the nano-photonic one with higher accuracy. In order to illustrate these two levels of operation, the demo at the DATE University-Booth will be twofold. Virtual emulation demo In the first demo, the user can define the functionality of the optical accelerator. Using a reversible-toolchain based on the RevKit tools by the "Computer Architecture and Reliable Embedded Systems" (AGRA), a reversible implementation is created. The structural reversible implementation is transformed into a nano-photonic model which is emulated in a FPGA. The final system, composed by a standard processor communicating with the nano-photonic model of the accelerator, can be programmed in C so that the user can study the impact of the accelerator in its algorithm. Physical prototype-demo This demo focuses on the prototype of a low-cost vector-matrix multiplication core using optical processing. The optical prototype is composed by a sandwich of two LCD structures with orthogonal polarizations sending an image to an integrated camera. The different elements are controlled by a hardware IP and connected to an embedded processor. The user can define several parameters of the optical processor, such as the spot-size, the photo-detector pitch etc. The physical prototype is then configured to that mode, and used as an optical accelerator by a microprocessor embedded in a FPGA. Again the user can program some C algorithms to study the system performance. Additionally a link to Matlab® allows the analysis of the precision achieved by the optical vector-matrix-multiplication process. More information ...
UB05.2	ODEN: ASSERTION MINING FOR BEHAVIORAL DESCRIPTIONS Presenter: Alessandro Danese, University of Verona, IT Authors: Alessandro Danese, Tara Ghasempouri and Graziano Pravadelli, University of Verona, IT Abstract Specification mining is an automatic approach for extracting assertions from the implementation of the system under verification (SUV). Its primary goal is to improve the verification and documentation process by making available a matching between a manual definition of the expected functionality and a formalization of the actual implemented functionality. In order to automatically extract assertions, some approaches perform a static analysis of the SUV source code. These solutions, despite of their effectiveness, suffer of scalability problems. To overcome this drawback, dynamic approaches have been also proposed that extract assertions by relying only on the observation of SUV's execution traces. This guarantees a better scalability, even if only "likely true assertions" can be extracted. For this reason a qualification phase is generally implemented in order to discard irrelevant and spurious assertions. In this context, ODEN is a tool for dynamically extracting likely true assertions by combining static and dynamic techniques. ODEN works with both hardware design and software applications. The tools analyses the execution traces of the system under verification and it generates assertions in the form of temporal relationships between arithmetic/logic expressions over the variables of the SUV. With respect to existing tools, ODEN works on a wider range of abstraction levels (e.g., gate-level, RTL, TLM, SW level, ...) and it considers a wider set of temporal patterns to more precisely characterize the behaviours of the SUV. More information ...
UB05.3	WHERE IS IT? FIND THE CODE YOU ARE INTERESTED IN! Presenter: Jan Malburg, University of Bremen, DE Author: Görschwin Fey, University of Bremen / German Aerospace Center, DE Abstract The demonstration presents our tool for feature localization and debugging of RTL-designs. Feature localization helps a designer to find the code relevant for a certain feature and, thus, helps him to faster understand a design previously unknown to him. The developer can choose between three basic techniques for feature localization. In the area of debugging the tools allows fault localization, reverse debugging based on dynamic data- and control-flow of the design and dynamic slicing. More information ...
UB05.4	STRNG: A SELF-TIMED RING BASED TRUE RANDOM NUMBER GENERATOR WITH MONITORING AND ENTROPY ASSESSMENT Presenter: Abdelkarim Cherkaoui, TIMA, FR Authors: Laurent Fesquet¹, Viktor Fischer² and Alain Aubert² ¹TIMA, FR; ²LaHC, FR Abstract The Self-timed ring based True Random Number Generator (STRNG) leverages the jitter of events propagating in a self-timed ring to generate provably random binary sequences. Several implementations in FPGAs and in CMOS design flows have shown the feasability of this generator in digital technologies, and also confirmed that it can provide high quality random bit sequences that pass the standard statistical test batteries at rates as high as 200 Mbit/s. Following AIS31 recommandations for the design and evaluation of TRNGs, the security of this generator is based primarily on an entropy assessment obtained by modeling the entropy extraction and measuring the entropy source. Secondly, the generator is protected against active attacks by monitoring its behavior in real-time or on demand. In this demonstration, we illustrate this approach in an Altera Cyclone III implementation of the STRNG. We show how the design is configurated depending on the measurement of the entropy source (the jitter magnitude) in order to guarantee a given minimum entropy rate per output bit. Then, we emulate physical attacks on the generator by willingly manipulating its internal structure in order to demonstrate how the entropy monitoring can detect abnormal behaviors and send the appropriate alarms. More information ...
UB05.5	3D-COSTAR: USING 3D-COSTAR FOR 2.5D-/3D-SIC COST ANALYSIS Presenter: Mottaqiallah Taouil, TU Delft, NL Authors: Mottaqiallah Taouil¹, Said Hamdioui¹ and Erik Jan Marinissen² ¹TU Delft, NL; ²IMEC, BE Abstract Selecting an appropriate and efficient test flow for a 2.5D/3D Stacked IC (2.5D-SIC/3D-SIC) is crucial for overall cost optimization. In this demonstration, we present 3D-COSTAR, a tool that considers costs involved in the whole 2.5D/3D-SIC chain, including design, manufacturing, test, packaging and logistics, e.g. related to shipping wafers between a foundry and a test house; and provides the estimated overall cost for 2.5D/3D-SICs and its cost breakdown for a given input parameter set, e.g., test flows, die yield and stack yield. Several case studies will be presented in which the overall cost and product quality (in defective parts per million) are analyzed. More information ...
UB05.6	VDA-ADMF: AN AGILE MIGRATION FRAMEWORK FOR ANALOG LAYOUT DESIGN Presenter: Po-Cheng Pan, National Chiao Tung University, TW Authors: Ching-Yu Chin¹, Hung-Ming Chen¹, Tung-Chieh Chen², Jou-Chun Lin² and Yi-Peng Weng³ ¹National Chiao Tung University, TW; ²Synopsys Co., Ltd., TW; ³Taiwan Semiconductor Manufacturing Company, TW Abstract Layout generation in the late analog CMOS design is challenging by its increasing layout constraints and performance requirements. However, iterative refinement on manual design damages the productivity of analog layout. Therefore, it is more efficient to enroll the know-how from existing design instead of generating a new one. To contend with time-consuming analog layout for more possibilities, this software aims to demonstrate a fast layout prototyping framework for migration purpose into real layout design. In our framework, a reference analog layout design is given to generate potential layout candidates at the objective technology. The demonstration includes the original layout, the extracted topology with placement and routing, the generated layout figures, the dumped layout results and the simulated results. This procedure of migration provides a convincing exhibition of our migration framework. More information ...
UB05.7	OSTC: COMBINING HIFSUITE AND SCNSL FOR SMART DEVICE INTEGRATION AND SIMULATION Presenter: Graziano Pravadelli, University of Verona, IT Authors: Alessandro Danese, Franco Fummi, Valerio Guarnieri, Michele Lora, Graziano Pravadelli and Francesco Stefanni, University of Verona, IT Abstract The main design issue of smart devices is their high degree of heterogeneity, due to the simultaneous presence of multiple domains and extra-functional properties, together with the traditional system functionality. This makes design and simulation very challenging, even because heterogeneity implies that the functionality is not the only dimension that must be considered at validation time. Other properties, such as power consumption or thermal dissipation, are critical to ensure correctness of the final product and to correctly estimate its behavior. This makes component integration and simulation key phases in the design and verification process of smart devices. Thus, to efficiently master smart device design, it is fundamental to be aware of design issues and to know how to solve them through innovative tools and methods, which allow integrating all the components of a smart device into an efficient and flexible simulation platform. We addressed such issues by means of the combined use of HIFSuite tools and SCNSL to obtain a homogeneous and fast SystemC/C++ model of a smart device through the compositions of heterogeneous components. An Open Source Test Case (OSTC) has been defined to show the potentiality of the proposed methods and tools. More information ...
UB05.8	XTSI: THE 3-D ELECTRO-THERMAL SIMULATOR Presenter: Jürgen Scheible, Reutlingen University, DE Author: Carl Christoph Jung, Reutlingen University, DE Abstract xtSi is a 3D electro-thermal simulation tool for integrated circuits. It uses a computationally efficient algorithm, which allows the simulation of typical ICs in only a few minutes. The temperature distribution is depicted graphically and with temporal resolution in a specially designed graphical user interface. With the help of xtSi designers can exactly identify isotherms and hotspots, thus enabling an optimization of the layout due to temperature effects. xtSi has been verified experimentally for device temperatures exceeding 500 °C up to the onset of thermal runaway. More information ...
UB05.9	FUNCTIONAL ECO: AN EFFICIENT REWIRING ENHANCED FUNCTIONAL ECO Presenter: Tak Kei Lam, The Chinese University of Hong Kong, HK Authors: Xing Wei¹, Yi Diao¹, Tak Kei Lam² and Yu-Liang Wu¹ ¹Easy-Logic Technology Limited, HK; ²The Chinese University of Hong Kong, HK Abstract Circuit designs have been much more complex nowadays. Bugs and/or specification changes often happen in late design cycles. Running the whole design cycle again is time consuming and costly. Functional engineering change order (ECO), which is the process that patches an old implementation to accomplish a new specification, is therefore performed instead to save time and cost. In an ECO effort, minimizing the patch size is crucial since it gives a higher chance of successful insertion and minimal perturbation to a near or completely committed EDA outcome (e.g. satisfaction on area and timing constraints). However, an ECO work can be very difficult at this stage as the combinational signals of the old specification may have vanished after iterations of synthesis and optimizations. We implemented a practical prototype for functional ECO. Our result outperforms all results publicized in the ICCAD 2012 Contest. More information ...
12:00	End of session
12:30	Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

6.1 SPECIAL DAY Hot Topic: Platforms for the IoT

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Salle Oisans

Organisers:
Rolf Drechsler, University of Bremen/DFKI GmbH, DE
Christoph Grimm, University of Kaiserslautern, DE

Chair:
Christoph Grimm, University of Kaiserslautern, DE

Co-Chair:
Marie-Minerve Louerat, University of Paris, FR

The pervasive networking of embedded systems enables the vision of the "Internet of Things". Appliances are built on top of and using hardware, software, and communication platforms. The presentations in this session cover the new and challenging requirements: The first presentation gives a visionary overview of how platforms will be used in an open, dynamic and organic way. To make these visions happen right now, technical challenges are addressed: in the second presentation, energy-awareness electronic platforms are in the focus. In the third presentation, an overview of software architectures for the IoT is given. Last but not least, standardized networking at semantic layer is required to enable machine-to-machine (M2M) communication and intelligent service discovery.

Time	Label	Presentation Title Authors
11:00	6.1.1	THE HUMAN INTRANET: WHERE SWARMS AND HUMANS MEET Speaker: Jan Rabaey, UC Berkeley, US Abstract A Human Intranet is envisioned as an open scalable platform that seamlessly integrates an ever-increasing number of sensor, actuation, computation, storage, communication and energy nodes located on, in, or around the human body acting in symbiosis with the functions provided by the body itself. This may fundamentally alter the ways humans operate, and interact with the physical world around them. It all starts with concepts that find their roots in the Internet of Things (IoT) and swarm technologies. Download Paper (PDF; Only available from the DATE venue WiFi)
11:22	6.1.2	ENERGY EFFICIENT ELECTRONICS FOR THE INTERNET OF THINGS Speaker: Stefan Heinen, RWTH Aachen, DE
11:44	6.1.3	SOFTWARE ARCHITECTURES FOR THE INTERNET OF THINGS Speaker: Mario Trapp, FhG IESE, DE
12:06	6.1.4	ONEM2M : A STANDARD FOR AN OPEN AND INTEROPERABLE M2M PLATFORM, THANKS TO SEMANTIC WEB TOOLS Speaker: Marylin Arndt-Vincent, Orange Labs, FR
12:30		End of session Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

6.2 Physical Unclonable Functions

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Belle Etoile

Chair:
Ingrid Verbauwhede, KUL, BE

Co-Chair:
Tim Güneysu, Ruhr University Bochum, DE

Physically Unclonable Functions (PUF) have received much attention for fingerprinting and as secret key provider in electronic devices. This session presents novel constructions and attacks on Arbiter, Ring-Oscillator and DRAM PUFs.

Time	Label	Presentation Title Authors
11:00	6.2.1	EFFICIENT ATTACKS ON ROBUST RING OSCILLATOR PUF WITH ENHANCED CHALLENGE-RESPONSE SET Speakers: Phuong Ha Nguyen, Durga Prasad Sahoo, Rajat Subhra Chakraborty and Debdeep Mukhopadhyay, Indian Institute of Technology Kharagpur, IN Abstract Physically Unclonable Function (PUF) circuits are an important class of hardware security primitives that promise a paradigm shift in applied cryptography. Ring Oscillator PUF (ROPUF) is an important PUF variant, but it suffers from hardware overhead limitations, which in turn restricts the size of its challenge space. To overcome this fundamental shortcoming, improved ROPUF variants based on the subset selection concept have been proposed, which significantly ``expand'' the challenge space of a ROPUF at acceptable hardware overhead. In this paper, we develop cryptanalytic attacks on previously proposed low--overhead and robust ROPUF variant. The proposed attacks are practical as they have quadratic time and data complexities in the worst case. We demonstrate the effectiveness of the proposed attack by successfully attacking a public domain dataset acquired from FPGA implementations. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	6.2.2	A ROBUST AUTHENTICATION METHODOLOGY USING PHYSICALLY UNCLONABLE FUNCTIONS IN DRAM ARRAYS Speakers: Maryam S. Hashemian¹, Bhanu Singh¹, Francis Wolff¹, Chris Papachristou¹, Steve Clay² and Daniel Weyer² ¹Case Western Reserve University, US; ²Rockwell Automation, US Abstract The high availability of DRAM in either embedded or stand-alone form make it a target for counterfeit attacks. In this paper, we propose a robust authentication methodology against counterfeiting. The authentication is performed by exploiting the intrinsic process variation in write reliability of DRAM cells. Extensive Monte Carlo simulations performed in HSPICE show that the proposed authentication methodology provides high uniqueness of 50.01% average inter-die Hamming distance and good robustness under temporal fluctuations in supply voltage, temperature, and ageing effect over a 10-year lifetime. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	6.2.3	A NOVEL MODELING ATTACK RESISTANT PUF DESIGN BASED ON NON-LINEAR VOLTAGE TRANSFER CHARACTERISTICS Speakers: Arunkumar Vijayakumar and Sandip Kundu, Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, US Abstract Physical Unclonable Function (PUF) circuits are used for chip authentication. PUF designs rely on manufacturing process variations to produce unique response to input challenges. It has been shown that many PUF designs are vulnerable to machine learning (ML) attacks, where a model can be built to predict PUF response to any input after only a few observations. In this work, we propose a ML attack resistant PUF design based on a circuit block to implement a non-linear voltage transfer function. The proposed circuit is simple, exhibits high uniqueness and randomness. Further improvements are proposed to enhance PUF reliability. The proposed circuit was simulated in a 45nm technology process and the results indicate a significant improvement in ML attack resistance in comparison to traditional PUFs. Results on uniqueness and reliability are also presented. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP3-1, 505	STT MRAM-BASED PUFS Speakers: Elena Ioana Vatajelu¹, Giorgio Di Natale², Marco Indaco¹ and Paolo Prinetto¹ ¹Politecnico di Torino, IT; ²LIRMM, FR Abstract Physical Unclonable Functions (PUFs) are emerging cryptographic primitives used to implement low-cost device authentication and secure secret key generation. Weak PUFs (i.e., devices able to generate a single signature or able to deal with a limited number of challenges) are widely discussed in literature. Nowadays, the most promising solution is based on SRAMs. In this paper we propose an innovative PUF design based on STT-MRAM memory. We exploit the high variability affecting the electrical resistance of the MTJ device in anti-parallel magnetization. We will show that the proposed solution is robust, unclonable and unpredictable. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

6.3 Emerging Low Power Techniques

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Stendhal

Chair:
Guillermo Payá Vayá, Leibniz Universität Hannover, DE

Co-Chair:
Alberto Garcia-Ortiz, U. Bremen, DE

Technology improvements towards the nanometric era are inducing new challenges which need to be addressed at all the abstration levels. This section focuses on the latest emerging approaches to cope with those challenges, as for example approximate computing, compressed sensing, asymmetric underlapped FinFET, etc.

Time	Label	Presentation Title Authors
11:00	6.3.1	(Best Paper Award Candidate) ASYMMETRIC UNDERLAPPED FINFET BASED ROBUST SRAM DESIGN AT 7NM NODE Speaker: Rangharajan Venkatesan, NVIDIA Corporation, US Authors: Arun Goud Akkala¹, Rangharajan Venkatesan², Anand Raghunathan¹ and Kaushik Roy¹ ¹Purdue University, US; ²NVIDIA Corporation, US Abstract Robust 6T SRAM design in 7nm technology node, at low supply voltage and rising leakage, requires ingenious design of FinFETs capable of providing reasonable Ion/Ioff ratio and acceptable short channel effects even under new leakage mechanisms such as direct source to drain tunneling. In this work, we explore asymmetric underlapped FinFET design with the help of quantum mechanical device simulations considering both the bit-cell and cache design constraints. We show that our optimized FinFET achieves a significant improvement in on-current over conventional symmetrically underlapped FinFETs. Through circuit simulations using compact models, we demonstrate that when such asymmetric underlapped n-FinFETs are used as bit-line access transistors, read/write conflict can be mitigated with simultaneous reduction in 6T SRAM bit-cell leakage. Improvement in write noise margin as well as access time can also be achieved under iso-read stability condition. Based on these technology and bit-cell models, we have developed a CACTI-based simulator for evaluating asymmetric FinFET based SRAM cache at 7nm node. Using this device-circuit-system level framework and optimized asymmetric underlapped FinFETs, we demonstrate significant energy savings and performance improvements for an 8KB L1 cache and a 4MB last-level cache. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	6.3.2	QUALITY CONFIGURABLE REDUCE-AND-RANK FOR ENERGY EFFICIENT APPROXIMATE COMPUTING Speakers: Arnab Raha, Swagath Venkataramani, Vijay Raghunathan and Anand Raghunathan, Purdue University, US Abstract Approximate computing is an emerging design paradigm that exploits the intrinsic ability of applications to produce acceptable outputs even when their computations are executed approximately. In this work, we explore approximate computing for a key computation pattern, Reduce-and-Rank (RnR), which is prevalent in a wide range of workloads including video processing, recognition, search and data mining. An RnR kernel performs a reduction operation (e.g., distance computation, dot product, L1-norm) between an input vector and each of a set of reference vectors, and ranks the reduction outputs to select the top reference vectors for the current input. We propose two complementary approximation strategies for the RnR computation pattern. The first is interleaved reduction-and-ranking, wherein the vector reductions are decomposed into multiple partial reductions and interleaved with the rank computation. Leveraging this transformation, we propose the use of intermediate reduction results and ranks to identify future computations that are likely to have low impact on the output, and can hence be approximated. The second strategy, input similarity based approximation, exploits the spatial or temporal correlation of inputs (e.g., pixels of an image or frames of a video) to identify computations that are amenable to approximation. These strategies address a key challenge in approximate computing - identification of which computations to approximate - and may be used to drive any approximation mechanism such as computation skipping and precision scaling to realize performance or energy improvements. A second key challenge in approximate computing is that the extent to which computations can be approximated varies significantly from application to application, and across inputs for even a single application. Hence, quality configurability, or the ability to automatically modulate the degree of approximation at runtime is essential. To enable quality configurability in RnR kernels, we propose a kernel-level quality metric that correlates well to application-level quality, and identify key parameters that can be used to tune the proposed approximation strategies dynamically. We develop a runtime framework that modulates the identified parameters during execution of RnR kernels to minimize their energy while meeting a given target quality. To evaluate the proposed concepts, we designed quality-configurable hardware implementations of 6 RnR-based applications from the recognition, mining, search and video processing application domains in 45nm technology. Our experiments demonstrate 1.06X-2.18X reduction in energy consumption with virtually no loss in output quality (<0.5%) at the application-level. The energy benefits further improve up to 2.38X and 2.5X when the quality constraints are relaxed to 2.5% and 5% respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	6.3.3	ULTRA-LOW-POWER ECG FRONT-END DESIGN BASED ON COMPRESSED SENSING Speakers: Hossein Mamaghanian¹ and Pierre Vandergheynst² ¹EPFL, CH; ²École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract Ultra-low-power design has been a challenging area for design of the sensor front-ends especially in the area of Wireless Body Sensor Nodes (WBSN), where a limited amount of power budget and hardware resources is available. Since introduction of CS, there has been a huge challenge to design CS readout devices for different applications and among all for biomedical signals. Till now, different proposed realizations of the digital CS prove the suitability of using CS as a compression technique for compressible biomedical signals. However, these works mainly take advantages of only one aspect of the benefits of the CS. In this type of works, CS is usually used as a very low cost and easy to implement compression technique. This means that we should acquire the signal with traditional limitations on the bandwidth (BW) and later compresses it. However, the main power of the CS, which lies on the efficient data acquisition, remains untouched. Building on our previous work [1], where the suitability of the CS is proven for the compression of the ECG signals, and our investigation on ultra-low-power CS-based A2I devices [2] , here in this paper we propose a fully redesigned complete CS-based "Analog-to-information'' (A/I) front-end for ECG signals. Our results show that proposed hybrid design easily outperforms the traditional implementation of CS with more than 11 times fold reduction in power consumption in high compression ratio. Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	6.3.4	GTFUZZ: A NOVEL ALGORITHM FOR ROBUST DYNAMIC POWER OPTIMIZATION VIA GATE SIZING WITH FUZZY GAMES Speakers: Tony Casagrande and Nagarajan Ranganathan, University of South Florida, US Abstract As CMOS technology continues to scale, the effects of variation inject a greater proportion of error and uncertainty into the design process. Ultra-deep submicron circuits require accurate modeling of gate delay in order to meet challenging timing constraints.With the lack of statistical data, designers are faced with a arduous task to optimize a circuit which is greatly affected by variability due to the mechanical and chemical manufacturing process. Discrete gate sizing is a complex problem which requires (1) accurate models that take into account random parametric variation and (2) a fair allocation of resources to maximize the solution in the delay-energy space. The GTFUZZ algorithm is presented which handles both of these tasks. Fuzzy games are used to model the problem of gate sizing as a resource allocation problem. In fuzzy games, delay is considered a fuzzy goal with fuzzy parameters to capture the imprecision of gate delay early in the design phase when empirical data is absent. Dynamic power is normalized as a fuzzy goal without varying coefficients. The fuzzy goals also provide a flexible platform for multimetric optimization. The robust GTFUZZ algorithm is compared against fuzzy linear programming (FLP) and deterministic worst-case FLP (DWCFLP) algorithms. Benchmark circuits are first synthesized, placed, routed, and optimized for performance using the Synopsys University 32/28nm standard cell library and technology files. Operating at the optimized clock frequency, results show an average power reduction of about 20% versus DWCFLP and 9% against variation-aware gate sizing with FLP. Timing and timing yield are verified by both Synopsys PrimeTime and Monte Carlo simulations of the most critical paths using HSPICE. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP3-2, 1051	SPATIAL AND TEMPORAL GRANULARITY LIMITS OF BODY BIASING IN UTBB-FDSOI Speakers: Johannes Maximilian Kühn¹, Dustin Peterson¹, Hideharu Amano², Oliver Bringmann¹ and Wolfgang Rosenstiel¹ ¹Eberhard Karls Universität Tübingen, DE; ²Keio University, JP Abstract Advances in SOI technology such as STMicro's 28nm UTBB-FDSOI enabled a renaissance of body biasing. Body biasing is a fast and efficient technique to change power and performance characteristics. As the electrical task to change the substrate potential is small compared to Dynamic Voltage Scaling, much finer island sizes are conceivable. This however creates new challenges in regard to design partitioning into body bias islands and body bias combinations across such designs. These combinations should be chosen so that energy efficiency improves while maintaining timing constraints. We introduce a combination based analysis tool to find optimized body bias island partitions and body biasing levels. For such partitions, optimized body bias assignments for static, programmable and dynamic body biasing can be computed. The overheads incurred by dynamically switching body biases are estimated to yield actual improvements and to give an upper bound for the power consumption of required additional circuitry. Based on these partitionings and the switching overheads, optimized application specific switching strategies are computed. The effectiveness of this method is demonstrated in a frequency scaling scenario using forward body biasing on a Dynamic Reconfigurable Processor (DRP) design. We show that leakage can be greatly reduced using the proposed methods and that dynamic body biasing can be beneficial even at small time periods. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

6.4 Bridging the Moore's Law Gap with Application-Specific Architectures

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Chartreuse

Chair:
Cristina Silvano, Politecnico di Milano, IT

Co-Chair:
Akash Kumar, National University of Singapore, SG

This session focuses on approximation, low-power, and high-performance optimization techniques for application-specific architectures, including neural networks, multicores and GPUs.

Time	Label	Presentation Title Authors
11:00	6.4.1	A ULTRA-LOW-ENERGY CONVOLUTION ENGINE FOR FAST BRAIN-INSPIRED VISION IN MULTICORE CLUSTERS Speakers: Francesco Conti¹ and Luca Benini² ¹Università di Bologna, IT; ²Università di Bologna / ETH Zürich, IT Abstract State-of-art brain-inspired computer vision algorithms such as Convolutional Neural Networks (CNNs) are reaching accuracy and performance rivaling that of humans; however, the gap in terms of energy consumption is still many degrees of magnitude wide. Many-core architectures using shared-memory clusters of power-optimized RISC processors have been proposed as a possible solution to help close this gap. In this work, we propose to augment these clusters with Hardware Convolution Engines (HWCEs): ultra-low energy coprocessors for accelerating convolutions, the main building block of many brain-inspired computer vision algorithms. Our synthesis results in ST 28nm FDSOI technology show that the HWCE is capable of performing a convolution in the lowest-energy state spending as little as 35 pJ/pixel on average, with an optimum case of 6.5 pJ/pixel. Furthermore, we show that augmenting a cluster with a HWCE can lead to an average boost of 40x or more in energy efficiency in convolutional workloads. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	6.4.2	ELIMINATING INTRA-WARP CONFLICT MISSES IN GPU Speakers: Bin Wang, Zhuo Liu, Xinning Wang and Weikuan Yu, Auburn University, US Abstract Cache indexing functions play a key role in reducing conflict misses by spreading accesses evenly among all sets of cache blocks. Although various methods have been proposed, no significant effort has been expended on the behavior of conflict misses in GPU where threads are organized into warps and execute in lock-step. When intra-warp accesses could not be coalesced into one or two cache blocks, which is often referred to as memory divergence, a warp incurs up to SIMD-width (e.g., 32) independent cache accesses. Such a burst of divergent accesses not only increases contention on cache capacity, but also incurs intra-warp associativity conflicts when they are pathologically concentrated in a few cache sets. Due to the lock- step execution, the GPU Load/Store units would be stalled when intra-warp concentration exceeds available cache associativity. Through an in-depth analysis of GPU access patterns, we find that column-majored strided accesses are likely to incur high intra-warp concentration. Based on the analysis, we propose a Full Permutation (FUP) based indexing method that adapts to both large and medium strides in this pattern. Across the 10 highly cache-sensitive GPU applications we have evaluated, FUP eliminates intra-warp associativity conflicts and outperforms two state-of-the-art indexing methods by 22% and 15%, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	6.4.3	RNA: A RECONFIGURABLE ARCHITECTURE FOR HARDWARE NEURAL ACCELERATION Speakers: Fengbin Tu¹, Shouyi YIN¹, Peng Ouyang¹, Leibo Liu² and Shaojun Wei¹ ¹Tsinghua University, CN; ²Institute of Microelectronics and The National Lab for Information Science and Technology, Tsinghua University, CN Abstract As the energy problem has become a big concern in digital system design, one promising solution is combining the core processor with a multi-purpose accelerator targeting high performance applications. Many modern applications can be approximated by multi-layer perceptron (MLP) models, with little quality loss. However, many current MLP accelerators have several drawbacks, such as the unbalance of their performance and flexibility. In this paper, we propose a scheduling framework to guide mapping MLPs onto limited hardware resources with high performance. The framework successfully solves the main constraints of hardware neural acceleration. Furthermore, we implement a reconfigurable neural architecture (RNA) based on this framework, whose computing pattern can be reconfigured for different MLP topologies. The RNA achieves comparable performance with application-specific accelerators and greater flexibility than other hardware MLPs. Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	6.4.4	APPROXANN: AN APPROXIMATE COMPUTING FRAMEWORK FOR ARTIFICIAL NEURAL NETWORK Speakers: Qian Zhang, Ting Wang, Ye Tian, Feng Yuan and Qiang Xu, The Chinese University of Hong Kong, HK Abstract Artificial Neural networks (ANNs) are one of the most well-established machine learning techniques and have a wide range of applications, such as Recognition, Mining and Synthesis (RMS). As many of these applications are inherently error-tolerant, in this work, we propose a novel approximate computing framework for ANN, namely ApproxANN. When compared to existing solutions, ApproxANN not only considers approximation for the computational units, but also approximates memory accesses. To be specific, ApproxANN characterizes the impact of neurons on the output quality in an effective and efficient manner, and judiciously determine how to approximate the computation and memory accesses of certain less critical neurons to achieve the maximum energy efficiency gain under a given quality constraint. Experimental results on various ANN applications with different datasets demonstrate the efficacy of the proposed solution. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP3-3, 377	A HARDWARE IMPLEMENTATION OF A RADIAL BASIS FUNCTION NEURAL NETWORK USING STOCHASTIC LOGIC Speakers: Yuan Ji¹, Feng Ran¹, Cong Ma² and David Lilja² ¹Shanghai University, CN; ²University of Minnesota - Twin Cities, US Abstract Hardware implementations of artificial neural networks typically require significant amounts of hardware resources. This paper proposes a novel radial basis function artificial neural network using stochastic computing elements, which greatly reduces the required hardware. The Gaussian function used for the radial basis function is implemented with a two-dimensional finite state machine. The norm between the input data and the center point is optimized using simple logic gates. Results from two pattern recognition case studies, the standard Iris flower and the MICR font benchmarks, show that the difference of the average mean squared error between the proposed stochastic network and the corresponding traditional deterministic network is only 1.3% when the stochastic stream length is 10kbits. The accuracy of the recognition rate varies depending on the stream length, which gives the designer tremendous flexibility to tradeoff speed, power, and accuracy. From the FPGA implementation results, the hardware resource requirement of the proposed stochastic hidden neuron is only a few percent of the hardware requirement of the corresponding deterministic hidden neuron. The proposed stochastic network can be expanded to larger scale networks for complex tasks with simple hardware architectures. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP3-4, 536	SODA: SOFTWARE DEFINED FPGA BASED ACCELERATORS FOR BIG DATA Speakers: Chao Wang, Xi Li and Xuehai Zhou, University of Science and Technology of China, CN Abstract FPGA has been an emerging field in novel big data architectures and systems, due to its high efficiency and low power consumption. It enables the researchers to deploy massive accelerators within one single chip. In this paper, we present a software defined FPGA based accelerators for big data, named SODA, which could reconstruct and reorganize the acceleration engines according to the requirement of the various data-intensive applications. SODA decomposes large and complex applications into coarse grained single-purpose RTL code libraries that perform specialized tasks in out-of-order hardware. We built a prototyping system with constrained shortest path Finding (CSPF) case studies to evaluate SODA framework. SODA is able to achieve up to 43.75X speedup at 128 node application. Furthermore, hardware cost of the SODA framework demonstrates that it can achieve high speedup with moderate hardware utilization. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

6.5 Multimedia and Consumer Electronics

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Meije

Chair:
Muhammad Shafique, Karlsruhe Institute of Technology, DE

Co-Chair:
Marcello Coppola, STMicroelectronics, FR

This session presents hardware and software architectures that enable effective implementations of multimedia and consumer electronics systems.

Time	Label	Presentation Title Authors
11:00	6.5.1	DRAM OR NO-DRAM? EXPLORING LINEAR SOLVER ARCHITECTURES FOR IMAGE DOMAIN WARPING IN 28 NM CMOS Speakers: Michael Schaffner¹, Frank K. Gürkaynak¹, Aljoscha Smolic² and Luca Benini³ ¹Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ²Disney Research Zurich, CH; ³Università di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract Solving large optimization problems within the energy and cost budget of mobile SoCs in real-time is a challenging task and motivates the development of specialized hardware accelerators. We present an evaluation of different linear solvers suitable for least-squares problems emanating from image processing applications such as image domain warping. In particular, we estimate implementation costs in 28 nm CMOS technology, with focus on trading on-chip memory vs. off-chip (DRAM) bandwidth. Our assessment shows large differences in circuit area, throughput and energy consumption and aims at providing a recommendation for selecting a suitable architecture. Our results emphasize that DRAM-free accelerators are an attractive choice in terms of power consumption and overall system complexity, even though they require more logic silicon area when compared to accelerators that make use of external DRAM. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	6.5.2	A SMALL NON-VOLATILE WRITE BUFFER TO REDUCE STORAGE WRITES IN SMARTPHONES Speakers: Mungyu Son¹, Sungkwang Lee¹, Kyungho Kim², Sungjoo Yoo¹ and Sunggu Lee¹ ¹POSTECH, KR; ²Samsung Electronics, KR Abstract Storage write behavior in mobile devices, e.g., smartphones, is characterized by frequent overwrites of small data. In our work, we first demonstrate a small non-volatile write buffer is effective in coalescing such overwrites to reduce storage writes. We also present how to make the best use of write buffer resource the size of which is limited by the requirement of small form factor. We present two new methods, shadow tag and SQLite-aware buffer management both of which aim at identifying hot storage data to keep in the write buffer. We also investigate the storage behavior of multiple mobile applications and show that their interference can reduce the effectiveness of write buffer. In order to resolve this problem, we propose a new dynamic buffer allocation method. We did experiments with real mobile applications running on a smartphone and a Flash memory-based storage system and obtained average 56.2% and 50.2% reduction in storage writes in single and multiple application runs, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	6.5.3	CLUSTERING-BASED MULTI-TOUCH ALGORITHM FRAMEWORK FOR THE TRACKING PROBLEM WITH A LARGE NUMBER OF POINTS Speakers: Shih-Lun Huang, Sheng-Yi Hung and Chung-Ping Chen, Graduate Institute of Electronics Engineering, National Taiwan University, TW Abstract Microcontrollers (MCUs) are extensively used in consumer devices for specific purposes because they are tiny, cheap, and low-power. Any time-consuming algorithm and any large-size program are not suited for MCUs. Recently, we found that the conventional multi-touch algorithm becomes computationally expensive to handle the applications of large-sized touch panels. Although a more high-end MCU can obtain an improvement on speed, it would increase manufacturing cost and operating power consumption as well. In the whole multi-touch algorithm flow, point tracking is the most computationally expensive part. Fortunately, touch point tracking is similar to the pin-assignment problem in EDA. To accelerate tracking, we employ EDA techniques, such as clustering, to speed up our multi-touch algorithm. Besides, we prove that the tracking problem would be solved in O(n) time for practical cases and without losing its accuracy after clustering. Furthermore, we apply computational geometry techniques to develop an efficient clustering method. Experimental results show that clustering is efficient and effective. For the necessary requirement of large-area touch panels having 20 touch points, we can reduce the runtime by up to 70%. Besides, our multi-touch algorithm may support up to 80 touch points accompanied by a low-cost MCU. Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	6.5.4	A LOW ENERGY 2D ADAPTIVE MEDIAN FILTER HARDWARE Speakers: Ercan Kalali and Ilker Hamzaoglu, Sabanci University, TR Abstract The two-dimensional (2D) spatial median filter is the most commonly used filter for image denoising. Since it is a non-linear sorting based filter, it has high computational complexity. Therefore, in this paper, we propose a novel low complexity 2D adaptive median filter algorithm. The proposed algorithm reduces the computational complexity of 2D median filter by exploiting the pixel correlations in the input image, and it produces higher quality filtered images than 2D median filter. We also designed and implemented a low energy 2D adaptive median filter hardware implementing the proposed 2D adaptive median filter algorithm. The proposed hardware is verified to work correctly on a Xilinx Zynq 7000 FPGA board. It can process 105 full HD (1920x1080) images per second in the worst case on a Xilinx Virtex 6 FPGA, and it has more than 80% less energy consumption than original 2D median filter hardware on the same FPGA. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP3-5, 851	DYNAMIC RECONFIGURABLE PUNCTURING FOR SECURE WIRELESS COMMUNICATION Speakers: Liang Tang¹, Jude Angelo Ambrose², Akash Kumar¹ and Sri Parameswaran² ¹National University of Singapore, SG; ²University of New South Wales, AU Abstract The ubiquity of wireless devices has created security concerns on the information being transferred. It is critical to protect the secret information in every layer of wireless communication to thwart any type of attacks. A dynamic reconfigurable puncturing based security mechanism, named RePunc, is proposed in this paper to provide an extra level of security at the physical layer. RePunc utilizes the puncturing feature of Forward Error Correction (FEC) to insert the secure information in the punctured positions of the standard information encoded data. The punctured patterns are dynamically changed and passed as a secret key from the sender to the receiver. An eavesdropper will not be able to detect the transmission of the secure information since the inserted secure information will be processed as channel noise by the eavesdropper's receiver. However, the rightful receiver will be able to successfully decode the secure packets by knowingly differentiating the secure information and the standard information before the FEC decoding. A case study of RePunc implementation for WiFi communication is presented in this paper, showing the extreme high security complexity with low hardware overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP3-6, 662	QR-DECOMPOSITION ARCHITECTURE BASED ON TWO-VARIABLE NUMERIC FUNCTION APPROXIMATION Speakers: Jochen Rust, Frank Ludwig and Steffen Paul, University of Bremen, DE Abstract This paper presents a new approach for hardware-based QR-decomposition using an efficient computation scheme of the Givens-Rotation. In detail, the angle of rotation and its application to the Givens-Matrix are processed in a direct, straight-forward manner. High-performance signal processing is achieved by piecewise approximation of the arctangent and sine function. In order to identify appropriate function approximations, several designs with varying constraints are automatically generated and analyzed. Physical and logical synthesis is performed in a 130nm CMOS-technology. The application of our proposal in a multi-antenna mobile communication scenario highlights our work to be very efficient in terms of calculation accuracy and computation performance. Download Paper (PDF; Only available from the DATE venue WiFi)
12:32	IP3-7, 1055	IN-PLACE MEMORY MAPPING APPROACH FOR OPTIMIZED PARALLEL HARDWARE INTERLEAVER ARCHITECTURES Speakers: Saeed Ur Rehman¹, Cyrille Chavet², Philippe Coussy² and Awais Sani¹ ¹Lab-STICC / Université de Bretagne Sud, PK; ²Lab-STICC / Université de Bretagne Sud, FR Abstract Due to their impressive error correction performances, turbo-codes or LDPC (Low Density Parity Check) architectures are now widely used in communication system and are one of the most critical parts of decoders. In order to achieve high throughput requirements these decoders are based on parallel architecture, which results in a major problem to be solved: parallel memory access conflicts. To solve these conflicts, different approaches have been proposed in state of the art resulting in a lot of different architectural solutions. In this article, we introduce a new class of memory mapping approach that can solve the conflicts with an optimized architecture based on in-place memory mapping for any application. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

6.6 Panel - The Future of Electronics, Semiconductor, and Design in Europe

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Bayard

Organiser:
Marco Casale-Rossi, Synopsys, US

Chair:
Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH

For more than a decade, Europe has been the wireless continent; today, wireless has almost completely shifted to the U.S. and Asia. This shift has had a profound impact on the electronic, semiconductor, and design ecosystem: long-time leaders have disappeared, or have abandoned the wireless business/market. Europe needs to re-invent itself once again. Is there a future for electronics, and IC design and manufacturing in Europe? If so, what are the applications, and the technologies that will bring Europe back to the top of the world leadership? This panel session will gather executives from the semiconductor, IP, and R&D sectors to discuss the prospects of our industry in Europe.

Panelists:

Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH
Andrew Repton, -, US
Thierry Collette, LETI, France, FR
Antun Domic, Synopsys, US
Horst Symanzik, Bosch Sensortec, DE
Tony King-Smith, -, US

12:30

End of session
Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00

6.7 Application-Mapping Strategies for Many-Cores

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Les Bans

Chair:
Amit Kumar Singh, University of York, GB

Co-Chair:
Marc Geilen, Eindhoven University of Technology, NL

This session deals with application performance. The first paper proposes a performance model to guide run-time mapping. The other two papers optimize performance, one by mapping tasks to match the parallelism of the underlying architecture, and the other by identifying shared memory sections to facilitate parallel execution.

Time	Label	Presentation Title Authors
11:00	6.7.1	ADAPTIVE ON-THE-FLY APPLICATION PERFORMANCE MODELING FOR MANY CORES Speakers: Sebastian Kobbe, Lars Bauer and Joerg Henkel, Karlsruhe Institute of Technology (KIT), DE Abstract Resource management for a many-core system entails allocating cores to applications and binding tasks of the applications to particular cores. Accurate on-the-fly estimates of different core allocations w.r.t. application performance are required before binding the tasks to cores for execution efficiency. We propose an adaptive on-the-fly application performance model that largely al-leviates this increasingly important problem. It allows reacting to spontaneous workload variations and it considers topological properties of resources. Extensive evaluations show that the aver-age estimation error is reduced from 14.7% to 4.5%, resulting in high quality of on-the-fly adaptive application mapping. Our work is a first milestone towards optimality of systems that exhibit a high degree of spontaneous workload variations. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	6.7.2	CUSTOMIZATION OF OPENCL APPLICATIONS FOR EFFICIENT TASK MAPPING UNDER HETEROGENEOUS PLATFORM CONSTRAINTS Speakers: Edoardo Paone¹, Francesco Robino², Gianluca Palermo¹, Vittorio Zaccaria¹, Ingo Sander² and Cristina Silvano¹ ¹Politecnico di Milano, IT; ²KTH Royal Institute of Technology, SE Abstract When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes each application kernel for a given platform and 2) a task-mapping phase that maximizes the overall application throughput by exploiting concurrency in the application task graph. The tuning phase is suitable for customizing parameterized OpenCL kernels considering device-specific constraints. Then, the mapping phase improves task-level parallelism for multi-device execution accounting for the overhead of memory transfers, overheads implied by multiple OpenCL contexts for different device vendors. Benefits of the proposed design flow have been assessed on a stereo-matching application targeting two commercial heterogeneous platforms. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	6.7.3	ENABLING MULTI-THREADED APPLICATIONS ON HYBRID SHARED MEMORY MANYCORE ARCHITECTURES Speakers: Tushar Rawat and Aviral Shrivastava, Arizona State University, US Abstract As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. In this paper, we present a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application, and maps it to the off-chip shared memory such that it enables execution on HSM architectures. This improves the performance of our benchmarks by 32x. Following, we identify and map the frequently accessed shared data to the on-chip shared memory. This further improves the performance of our benchmarks by 8x on average. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP3-8, 739	MAXIMIZING COMMON IDLE TIME ON MULTI-CORE PROCESSORS WITH SHARED MEMORY Speakers: Chenchen Fu¹, Yingchao Zhao², Minming Li¹ and Jason Xue³ ¹Department of Computer Science, City University of Hong Kong, HK; ²Department of Computer Science, Caritas Institute of Higher Education, Hong Kong, HK; ³City University of Hong Kong, HK Abstract Reducing energy consumption is a critical problem in most of the computing systems today. This paper focuses on reducing the energy consumption of the shared main memory in multi-core processors by putting it into sleep state when all the cores are idle. Based on this idea, this work presents systematic analysis of different assignment and scheduling models and proposes a series of scheduling schemes to maximize the common idle time of all cores. An optimal scheduling scheme is proposed assuming the number of cores is unbounded. When the number of cores is bounded, an efficient heuristic algorithm is proposed. The experimental results show that the heuristic algorithm works efficiently and can save as much as 25.6% memory energy compared to a conventional multi-core scheduling scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP3-9, 78	MAXIMIZING IO PERFORMANCE VIA CONFLICT REDUCTION FOR FLASH MEMORY STORAGE SYSTEMS Speakers: Qiao Li¹, Liang Shi², Congming Gao¹, Kaijie Wu¹, Jason Chun Xue³, Qingfeng Zhuge¹ and H.-M. Edwin Sha⁴ ¹Chongqing University, CN; ²College of Computer Science, Chongqing University, CN; ³City University of Hong Kong, HK; ⁴Chongqing University and University of Texas at Dallas, CN Abstract Flash memory has been widely deployed during the recent years with the improvement of bit density and technology scaling. However, a significant performance degradation is also introduced with the development trend. The latency of IO requests on flash memory storage systems is composed of access conflict latency, data transfer latency, flash chip access latency and ECC encoding/decoding latency. Studies show that the access conflict latency, which is mainly induced by the slow transfer latency and access latency, has become the dominate part of the IO latency, especially for IO intensive applications. This paper proposes to reduce the flash access conflict latency through the reduction of the transfer and flash access latencies. A latency model is built to construct the relationship among the transfer latency and access latency based on the reliability characteristics of flash memory. Simulation experiments show that the proposed approach achieves significant performance improvement. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

6.8 Panel - The Future of Electronics, Semiconductor, and Design in Europe -> takes place as session 6.6 in Salle Bayard

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Salle Lesdiguières

Session 6.8 is part of the exhibition program open to all exhibition visitors, but takes place as session 6.6 in the larger room Bayard (no presentations in room Les Diguières). Please refer to session 6.6 for the details.

12:30

End of session
Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00

UB06 Session 6

Date: Wednesday 11 March 2015
Time: 12:00 - 14:00
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB06.1	SMART CELL DEVELOPMENT PLATFORM FOR EMBEDDED BATTERY MANAGEMENT Presenter: Swaminathan Narayanaswamy, TUM CREATE, SG Authors: Matthias Kauer¹, Sebastian Steinhorst¹, Martin Lukasiewycz¹ and Samarjit Chakraborty² ¹TUM CREATE, SG; ²TU Munich, DE Abstract Embedded Battery Management (EBM) [1], in contrast to the existing state-of-the-art centralized Battery Management Systems (BMSs) found in Electric Vehicles (EVs) or stationary Electrical Energy Storage (EES) applications, focuses on monitoring and controlling each individual cell of the battery pack with a dedicated Cell Management Unit (CMU). This novel approach of battery management might offer significant advantages over the centralized BMSs, such as higher modularity, plug-and-play integration and shorter time to market. The combination of a battery cell and a CMU forms the smart cell and the system-level functionalities of the EBM are performed in a decentralized manner by the network of smart cells, with the help of the computational and communication resources of CMUs. We present a development platform for such a smart cell enabled EBM. The development platform consists of two components, the hardware platform and the software platform. The hardware platform of the demonstrator comprises of battery cells and their dedicated CMUs which consist of a smart cell controller board and an active cell balancing board. The software platform provides the smart cell firmware as well as a software tool for verification of active cell balancing architectures and a smart cell simulator for simulating system-level EBM functionalities. More information ...
UB06.2	ID.FIX: AN EDA TOOL FOR FIXED-POINT REFINEMENT OF EMBEDDED SYSTEMS Presenter: Olivier Sentieys, INRIA, FR Authors: Daniel Menard¹ and Nicolas Simon² ¹INSA Rennes, FR; ²INRIA, FR Abstract Most of digital image and signal processing algorithms are implemented into architectures based on fixed-point arithmetic to satisfy cost and power consumption constraints associated with most of embedded and cyber-physical systems. The fixed-point conversion process (or refinement) is crucial for reducing the time-to-market and design tools to automate this phase and to explore the design space are still lacking. The ID.Fix EDA tool, based on the compiler infrastructure GECOS, allows for the conversion of a floating-point C source code into a C code using fixed-point data types. The data word-lengths are optimized by minimizing the implementation cost under accuracy constraint. To achieve low optimization time, an analytical approach is used to evaluate the fixed-point computation accuracy. This approach is valid for systems made-up of any smooth arithmetic operations. Commercial tools can then be used to synthesize the architecture or to perform software compilation from the output fixed-point description of the application. Thus, the goal is to bridge the gap between the floating-point description developed by algorithm designer and the fixed-point description use as input for high-level synthesis or compilation tools. More information ...
UB06.3	RSOC FRAMEWORK: FRAMEWORK FOR RAPID PROTOTYPING OF APPLICATIONS ON RECONFIGURABLE SOCS Presenter: Korcek Pavol, Brno University of Technology, CZ Authors: Jan Viktorin, Vlastimil Kosar and Jan Korenek, Brno University of Technology, CZ Abstract Recent chips with ARM based processors and FPGA logic provide potential for many applications. IP cores and operating systems (OS) have been prepared to simplify development. However the integration of IP cores and OS is not covered by any development tool yet. We propose universal Reconfigurable System on Chip (RSoC) Framework to support rapid prototyping of different applications on these chips. Application can run in FPGA and/or in processor and RSoC Framework covers all mutual communication. More information ...
UB06.4	REAL-TIME MULTIPROCESSOR COMPILER DEMO: COMPILER FOR REAL-TIME MULTIPROCESSOR SYSTEMS WITH SHARED ACCELERATORS Presenter: Marco Bekooij, University of Twente, NL Authors: Guus Kuiper, Stefan Geuns, Philip Wilmanns, Joost Hausmans and Marco Bekooij, University of Twente, NL Abstract Accelerators are added in real-time multiprocessor systems for power-efficiency improvement and cost reduction. Sharing of these accelerators improves their utilization but without tool support it also complicates programming. This demonstration shows a multiprocessor compiler for a real-time multiprocessor system that contains support for the sharing of hardware accelerators. The capabilities of this compiler are demonstrated by mapping a packet based GMSK receiver application onto this multiprocessor system. The multiprocessor system is implemented on a Xilinx Virtex-6 FPGA to which an RF front-end is connected. This multiprocessor system contains 16 Microblaze processors and 5 accelerators. With this system a real-time digital audio stream is received and demodulated. More information ...
UB06.5	THE Ψ-CHART DESIGN APPROACH IN TTOOL/DIPLODOCUS: A FRAMEWORK FOR HW/SW CO-DESIGN OF DATA-DOMINATED SYSTEMS-ON-CHIP Presenter: Andrea Enrici, Télécom ParisTech, FR Authors: Ludovic Apvrille, Daniel Camara and Renaud Pacalet, Télécom ParisTech, FR Abstract In the scope of the DATE 2015 University Booth, we present our latest achievements for the system level design of parallel and distributed embedded systems. We propose a demonstration of a novel design approach, the Ψ-chart, in TTool/DIPLODOCUS, a UML/SysML framework for the design, validation and automatic code generation for data-dominated SoCs. The Ψ-chart is a design approach where communication patterns are designed with dedicated models, independently of a pair application-architecture, before mapping phase. It allows for a complete orthogonalization of concerns between the design of computations and communications, thus achieving faster Design Space Exploration, complete design portability as well as reduced design times and costs. The subject of our demonstration is the design of the physical layer (PHY) of the transmitter part of the Zigbee wireless standard (IEEE 802.15.4) mapped onto a MPSoC architecture with shared memory. Our demonstration will illustrate the full design of the Zigbee transmitter, from models to the automatic generation of the emulation code, via simulation and formal verification. We will validate our design by comparing the output samples produced by the emulation code, with a real implementation of the transmitter on a FPGA prototyping board. More information ...
UB06.6	INTERACTIVE VISUALIZATION OF ESL DESIGNS Presenter: Jannis Stoppe, University of Bremen, DE Authors: Robert Wille and Rolf Drechsler, University of Bremen/DFKI GmbH, DE Abstract In this work, we propose an improved visualization tool for SystemC which assists a designer in communicating a system's structure and behavior. Please see the uploaded pdf-file for details. More information ...
UB06.7	AN FPGA LAB-ON-CHIP: AN ANALYSIS TOOL AND FRAMEWORK FOR ADVANCED MEASUREMENTS AND RELIABILITY ASSESSMENTS ON MODERN NANOSCALE FPGAS Presenter: Petr Pfeifer, Technical University of Liberec, CZ Abstract Wide portfolio of new technologies in design and manufacturing of advanced integrated circuits enables higher integration of complex structures at ultra-high nanoscale densities, but also sensitivity to various changes of the internal nanostructures and their parameters, resulting in the requirement of advanced reliability assessments. The developed and presented revolutionary new set of tools enables complex lab-on-chip solutions in nanoscale FPGAs and it allows easy implementation of tasks like completely on-chip internal parameter measurements in FPGAs, actual structure delays with respect to environmental parameters, device and platform identification, validation of selected design parameters, identification of crosstalk path and mutual impacts, as well as various changes in internal parameters. It actively supports design reconfiguration. The set of tools can be used for fast standalone or system built-in post-production device and platform parameter and quality checking and validation, parameter-aware placement and routing of critical design parts and performance optimization of existing designs, device aging identification and measurement, active and online data generation for reliability assessments and design reliability enhancements. It is available for FPGAs from 90nm down and will be demonstrated on advanced 28nm Xilinx FPGAs. More information ...
UB06.8	MAMMA: SPEECH ENHANCEMENT DEMO EXPLOITING MEMS MICROPHONE ARRAY FOR PEOPLE WITH DISABILITIES Presenter: Luca Sarti, University of Pisa, IT Authors: Alessandro Palla¹, Luca Fanucci¹ and Roberto Sannino² ¹University of Pisa, IT; ²STMicroelectronics, IT Abstract Disabled people, especially the ones with motor skill impairments, have difficulties in interaction with electronic devices. Indeed voice recognition could be exploited, but its performance strongly depends by the environmental noise. We propose a wearable speech enhancement system based on MEMS microphone array and an ARM Cortex M4 CPU featuring a beamforming technique and an adaptive acoustic echo cancellation filtering in order to increase SNR of acquired voice stream. An increase by 16.5 dB in the SNR is obtained when noise and voice come from opposite directions. Theoretical analysis and in-system measurements prove the effectiveness of the proposed solution. More information ...
UB06.9	AIDASOFT: ANALOG IC DESIGN AUTOMATION Presenter: Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT Authors: Nuno Lourenço¹, Ricardo Martins¹, Ricardo Póvoa¹, António Canelas¹, Ricardo Lourenço² and Pedro Ventura² ¹Instituto de Telecomunicações/Instituto Superior Técnico, PT; ²Instituto de Telecomunicações, PT Abstract This demo presents AIDA an ongoing project at Instituto de Telecomunicações/University of Lisbon, Portugal, which addresses analog IC design automation from circuit-level specifications to layout descriptions in GDS-II. AIDA consists of two main modules AIDA-C and AIDA-L. AIDA-C is demonstrated for layout-aware circuit-level sizing and optimization by generating a family of robust Pareto Optimal solutions. AIDA-L is demonstrated by generating the layout taking into account electrical currents information to mitigate electromigration and IR-drop effects, and also wiring symmetry for multiport multi-terminal signal nets of analog ICs. More information ...
14:00	End of session
16:00	Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.0 SPECIAL DAY Keynotes

Date: Wednesday 11 March 2015
Time: 12:50 - 14:30
Location / Room: Salle Oisans

Time	Label	Presentation Title Authors
12:50	7.0.1	SPECIAL DAY KEYNOTE: INDUSTRIE 4.0: FROM THE INTERNET OF THINGS TO CYBER-PHYSICAL PRODUCTION SYSTEMS Speaker: Wolfgang Wahlster, German Research Center for Artificial Intelligence (DFKI), DE Abstract The Internet of Things is finding its way into production. Semantic machine-to-machine communication revolutionizes factories by decentralized control. Embedded digital product memories guide the flexible work piece flow through smart factories, so that low-volume, high-mix production is realized in a cost-efficient way. A new generation of industrial assistant systems using augmented reality and multimodal interaction will help factory workers to deal with the complexity of cyber-physical production. INDUSTRIE 4.0 is the German strategic initiative to take up a pioneering role in industrial IT that is currently revolutionizing the manufacturing engineering sector. Semantic product memories will play a key role in the upcoming fourth industrial revolution based on cyber-physical production systems. Low-cost and compact digital storage, sensors and radio modules make it possible to embed a digital memory into a product for recording all relevant events throughout the entire lifecycle of the artifact. By capturing and interpreting ambient conditions and user actions, such computationally enhanced products have a data shadow and are able to perceive and control their environment, to analyze their observations and to communicate with other smart objects and human users about their lifelog data. Cyber-physical systems and the Internet of Things lead to a disruptive change in the production architecture: the workpiece navigates through a highly instrumented smart factory and tries to find the production services that it needs in order to meet its individual product specifications stored on the product memory. We illustrate this revolutionary production architecture with examples from DFKI' Smart Factory.
13:20	7.0.2	SPECIAL DAY KEYNOTE: THE RISE OF IOT, AND THE ROLE OF EDA Speaker: Antun Domic, Synopsys, US Abstract On April 19th, 2015, we will celebrate the 50th anniversary of Moore's law. Process technology went from several microns to a few nanometers, transistors integration capabilities increased millions of times, and volume production grew from the few thousands of units in the early digital computer era to the several billions in the smartphone one. IoT is expected to bring volume production up by one, and perhaps even two orders of magnitude in the next decade. Today, IC volume growth has been anchored on smart phones. Smart everything (cars, homes, cities) may be the next killer application, which would fuel the volume growth. IoT devices and systems will certainly span the entire spectrum, from extremely advanced and complex to "disposable". They will make metrics such as reliability and resilience, be as important as performance, power, and area. But in order for IoT to happen, our industry should dramatically improve its efficiency - all "resources" are scarce, and therefore precious. Flexibility - systems are heterogeneous by nature - and productivity - to deliver the best possible quality-of-results within the allotted turn-around-time - will be critical. As both process technology and system complexity increase, advanced EDA will be a key enabler. Advanced design implementation infrastructure, tools, flows, and methodologies will deliver a competitive advantage, and advanced IP sub-systems, consisting of hardware and software solutions will deliver complete, complex functions, ready for integration, greatly simplifying the IoT "siliconization". These two components show the only viable path towards the trillion units many industry leaders are envisioning.
14:30		End of session
16:00		Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

UB07 Session 7

Date: Wednesday 11 March 2015
Time: 14:00 - 16:00
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB07.1	SMART CELL DEVELOPMENT PLATFORM FOR EMBEDDED BATTERY MANAGEMENT Presenter: Swaminathan Narayanaswamy, TUM CREATE, SG Authors: Matthias Kauer¹, Sebastian Steinhorst¹, Martin Lukasiewycz¹ and Samarjit Chakraborty² ¹TUM CREATE, SG; ²TU Munich, DE Abstract Embedded Battery Management (EBM) [1], in contrast to the existing state-of-the-art centralized Battery Management Systems (BMSs) found in Electric Vehicles (EVs) or stationary Electrical Energy Storage (EES) applications, focuses on monitoring and controlling each individual cell of the battery pack with a dedicated Cell Management Unit (CMU). This novel approach of battery management might offer significant advantages over the centralized BMSs, such as higher modularity, plug-and-play integration and shorter time to market. The combination of a battery cell and a CMU forms the smart cell and the system-level functionalities of the EBM are performed in a decentralized manner by the network of smart cells, with the help of the computational and communication resources of CMUs. We present a development platform for such a smart cell enabled EBM. The development platform consists of two components, the hardware platform and the software platform. The hardware platform of the demonstrator comprises of battery cells and their dedicated CMUs which consist of a smart cell controller board and an active cell balancing board. The software platform provides the smart cell firmware as well as a software tool for verification of active cell balancing architectures and a smart cell simulator for simulating system-level EBM functionalities. More information ...
UB07.2	FLARE: A RECONFIGURATION AWARE FLOORPLANNER Presenter: Riccardo Cattaneo, Politecnico di Milano, IT Authors: Marco Rabozzi and Marco Santambrogio, Politecnico di Milano, IT Abstract This demonstration presents a floorplanner tool addressing partially-reconfigurable FPGAs. The input of the tool consists of a set of regions described in terms of their heterogeneous resource requirements together with the number of interconnections among regions and the target FPGA of the partial reconfiguration (PR) design. Once the input are specified, the floorplanner allow the designer to manually or automatically perform the floorplan of the regions. More information ...
UB07.3	RSOC FRAMEWORK: FRAMEWORK FOR RAPID PROTOTYPING OF APPLICATIONS ON RECONFIGURABLE SOCS Presenter: Korcek Pavol, Brno University of Technology, CZ Authors: Jan Viktorin, Vlastimil Kosar and Jan Korenek, Brno University of Technology, CZ Abstract Recent chips with ARM based processors and FPGA logic provide potential for many applications. IP cores and operating systems (OS) have been prepared to simplify development. However the integration of IP cores and OS is not covered by any development tool yet. We propose universal Reconfigurable System on Chip (RSoC) Framework to support rapid prototyping of different applications on these chips. Application can run in FPGA and/or in processor and RSoC Framework covers all mutual communication. More information ...
UB07.4	STRNG: A SELF-TIMED RING BASED TRUE RANDOM NUMBER GENERATOR WITH MONITORING AND ENTROPY ASSESSMENT Presenter: Abdelkarim Cherkaoui, TIMA, FR Authors: Laurent Fesquet¹, Viktor Fischer² and Alain Aubert² ¹TIMA, FR; ²LaHC, FR Abstract The Self-timed ring based True Random Number Generator (STRNG) leverages the jitter of events propagating in a self-timed ring to generate provably random binary sequences. Several implementations in FPGAs and in CMOS design flows have shown the feasability of this generator in digital technologies, and also confirmed that it can provide high quality random bit sequences that pass the standard statistical test batteries at rates as high as 200 Mbit/s. Following AIS31 recommandations for the design and evaluation of TRNGs, the security of this generator is based primarily on an entropy assessment obtained by modeling the entropy extraction and measuring the entropy source. Secondly, the generator is protected against active attacks by monitoring its behavior in real-time or on demand. In this demonstration, we illustrate this approach in an Altera Cyclone III implementation of the STRNG. We show how the design is configurated depending on the measurement of the entropy source (the jitter magnitude) in order to guarantee a given minimum entropy rate per output bit. Then, we emulate physical attacks on the generator by willingly manipulating its internal structure in order to demonstrate how the entropy monitoring can detect abnormal behaviors and send the appropriate alarms. More information ...
UB07.5	ISIS: CUSTOMIZABLE RUNTIME VERIFICATION OF HARDWARE/SOFTWARE VIRTUAL PLATFORMS Presenter: Laurence Pierre, TIMA, FR Author: Martial Chabot, TIMA, FR Abstract Debugging today's hardware/software embedded systems is a complex process. We have previously described our tool, ISIS, that enables the runtime Assertion-Based Verification (ABV) of temporal requirements for high-level (SystemC TLM) models of such systems. We present here an extended version of the tool, that gives the user the possibility to customize and to optimize the verification process. More information ...
UB07.6	DESIGNING AND EVALUATING RESOURCE MANAGEMENT POLICIES FOR HETEROGENEOUS SYSTEM ARCHITECTURES Presenter: Gianluca Durelli, Politecnico di Milano, IT Authors: Cristiana Bolchini, Antonio Miele, Gabriele Pallotta, Marcello Pogliani and Marco Santambrogio, Politecnico di Milano, IT Abstract Current trends in computing architectures are going in the direction of heterogeneous systems (i.e. constituted by CPUs, GPUs, and FPGAs). The design space to effectively exploit these platforms is huge. Within this context, research is moving towards systems able to adapt themselves to a wide range of workloads to optimize performance/energy trade-offs. We propose a virtual platform (VP) to help designers to develop adaptive policies. The VP allows to perform an high-level evaluation of the policies with the possibility to customize both the architecture and the workload mix. More information ...
UB07.7	RECONFIGURABLE FPGA-BASED NON-INTRUSIVE BERT FOR PRODUCTION TEST Presenter: Sergei Odintsov, Tallinn University of Technology, EE Author: Artjom Jasnetski, Tallinn University of Technology, EE Abstract We introduce an FPGA-based Bit Error Rate (BER) tester solution for high-speed serial links targeting production environment. This solution does not require usage of external T&M equipment or extra DFT. As opposed to intrusive physical probing with external BER tester our approach produces more relevant output because measurement is done using transceivers in their functional mode. Introduced BERT instrument supports fine tuning of link parameters and pattern generation. This solution can replace long lasting BER test by quick evaluation of link quality using eye diagram. More information ...
UB07.8	SYSTEM-LEVEL FPGA PROTOTYPING OF ANALOG/MIXED-SIGNAL SYSTEMS Presenter: Georg Gläser, Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE Authors: Eckhard Hennig¹ and Vojtech Dvorak² ¹Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE; ²Brno University of Technology, CZ Abstract System-level verification of large-scale mixed-signal systems using virtual prototypes is a powerful tool for design and verification. Still, the limited computing power demands for new methods to enhance the simulation performance. For digital hardware development FPGA-based prototyping is commonly used to enable faster verification of hardware and software system-components. We focus on a new approach for embedding system-level analog/mixed-signal models in FPGA-based verification environments. Starting from a system-level SystemC description of an A/MS circuit, we synthesize a model-specific FPGA-based hardware accelerator capable of running A/MS simulations using floating-point data types. This new approach will be demonstrated by prototyping an A/MS pressure-sensor frontend ASIC on an FPGA board. More information ...
UB07.9	WORKCRAFT: FRAMEWORK FOR INTERPRETED GRAPHS Presenter: Danil Sokolov, Newcastle University, GB Abstract Workcraft is a cross-platform framework for capture, simulation, synthesis and verification of graph models. It supports a wide range of popular graph formalisms and provides a plugin-based framework for modelling and analysis of new model types. More information ...
16:00	End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.1 SPECIAL DAY Hot Topic: Design Tools for the IoT

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Salle Oisans

Organisers:
Frank Schirrmeister, Cadence Design Systems, US
Rolf Drechsler, University of Bremen/DFKI GmbH, DE

Chair:
Frank Schirrmeister, Cadence Design Systems, US

This sessions will describe challenges and solutions regarding the development aspects of the internet of things. Based on user challenges described by NXP and Intel, ARM and Cadence will describe IP and tool offerings for development of Embedded Software as well as SoCs in edge node, gateway and cloud devices.

Time	Label	Presentation Title Authors
14:30	7.1.1	IOT CHALLENGES AND OPPORTUNITIES Speaker: Frank Schirrmeister, Cadence Design Systems, US
14:52	7.1.2	IOT DEVELOPMENT FOR A CONNECTED CAR Speaker: Marco Bekooij, NXP Semiconductors, NL
15:14	7.1.3	IF IT'S NOT ON THE INTERNET, IT'S JUST A THING: BUT WHAT ARE THE IOT PROBLEMS TO SOLVE? Speaker: Remy Pottier, ARM, FR
15:36	7.1.4	IOT HARDWARE & MIXED SIGNAL DEVELOPMENT Speaker: Ian Dennison, Cadence Design Systems, GB
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.2 Hot Topic - Trading Accuracy for Efficient Computing

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Belle Etoile

Organisers:
Anand Raghunathan, Purdue University, US
Akash Kumar, National University of Singapore, SG

Chair:
Muhammad Shafique, Karlsruhe Institute of Technology, DE

Co-Chair:
Marc Geilen, Eindhoven University of Technology, NL

This session will introduce inexact or approximate computing, a promising direction to improve the efficiency of computing in the face of diminishing benefits from scaling. Speakers will discuss the key challenges in the field and provide a vision for bringing these technologies to the mainstream. The session will cover approximate hardware, system level inexactness, and memory models. An application of the design principles to weather simulation will also be presented.

Time	Label	Presentation Title Authors
14:30	7.2.1	COMPUTING APPROXIMATELY, AND EFFICIENTLY Speakers: Swagath Venkataramani¹, Srimat T. Chakradhar², Kaushik Roy¹ and Anand Raghunathan¹ ¹Purdue University, US; ²NEC Laboratories America, US Abstract Recent years have witnessed significant interest in the area of approximate computing. Much of this interest stems from the quest for new sources of computing efficiency in the face of diminishing benefits from technology scaling. We argue that trends in computing workloads will greatly increase the opportunities for approximate computing, describe the vision and key principles that have guided our work in this area, and outline a range of approximate computing techniques that we have developed at all layers of the computing stack, spanning circuits, architecture, and software. Download Paper (PDF; Only available from the DATE venue WiFi)
14:52	7.2.2	NOVEL INEXACTNESS AWARE ALGORITHM CO-DESIGN FOR ENERGY EFFICIENT COMPUTATION Speakers: Guru Prakash Arumugam¹, Ayush Bhargava¹, Prashanth Srikanthan¹, Sreelatha Yenugula¹, John Augustine¹, Eli Upfal² and Krishna Palem³ ¹Indian Institute of Technology Madras, IN; ²Brown University, US; ³Rice University, US Abstract It is increasingly accepted that energy savings can be achieved by trading the accuracy of a computing system for energy gains—quite often significantly. This approach is referred to as inexact or approximate computing. Given that a significant portion of the energy in a modern general purpose processor is spent on moving data to and from storage, and that increasingly data movement contributes significantly to activity during the execution of applications, it is important to be able to develop techniques and methodologies for inexact computing in this context. To accomplish this to its fullest level, it is important to start with algorithmic specifications and alter their intrinsic design to take advantage of inexactness. This calls for a new approach to inexact memory aware algorithm design (IMAD) or co-design. In this paper, we provide the theoretical foundations which include novel models as well as technical results in the form of upper and lower bounds for IMAD in the context of universally understood and canonical problems: variations of sorting, and string matching. Surprisingly, IMAD allowed us to design entirely error-free algorithms while achieving energy gain factors of 1.5 and 5 in the context of sorting and string matching when compared to their traditional (textbook) algorithms. IMAD is also amenable to theoretical analysis and we present several asymptotic bounds on energy gains. Download Paper (PDF; Only available from the DATE venue WiFi)
15:15	7.2.3	DESIGNING INEXACT SYSTEMS EFFICIENTLY USING ELIMINATION HEURISTICS Speakers: Shyamsundar Venkataraman¹, Akash Kumar¹, Jeremy Schlachter² and Christian Enz² ¹National University of Singapore, SG; ²École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract There are a wide variety of applications that are able to tolerate small errors in the values of the outputs, provided they are within the application-specific thresholds. For such applications, there have been many efforts to study the trade-off involved in the accuracy of the output and the energy/area requirement. However, most of the efforts have been at the level of individual components. In this article, we present a design flow to study the inexactness at the level of system and provide heuristics to quickly explore the design-space under given inexactness and area/energy constraints. The approach is applied to various digital signal processing filters and an ECG application of QRS detection. In both cases, orders of magnitude speed-ups are obtained in the design-flow process. Download Paper (PDF; Only available from the DATE venue WiFi)
15:37	7.2.4	OPPORTUNITIES FOR ENERGY EFFICIENT COMPUTING: A STUDY OF INEXACT GENERAL PURPOSE PROCESSORS FOR HIGH-PERFORMANCE AND BIG-DATA APPLICATIONS Speakers: Peter Duben¹, Jeremy Schlachter², - Parishkrati³, Sreelatha Yenugula³, John Augustine³, Christian Enz², K. Palem⁴ and T. N. Palmer⁵ ¹Oxford University, GB; ²École Polytechnique Fédérale de Lausanne (EPFL), CH; ³Indian Institute of Technology Madras, IN; ⁴Rice University, Houston, US; ⁵University of Oxford, GB Abstract In this paper, we demonstrate that disproportionate gains are possible through a simple devise for injecting inexactness or approximation into the hardware architecture of a computing system with a general purpose template including a complete memory hierarchy. The focus of the study is on energy savings possible through this approach in the context of large and challenging applications. We choose two such from different ends of the computing spectrum—the IGCM model for weather and climate modeling which embodies significant features of a high-performance computing workload, and the ubiquitous PageRank algorithm used in Internet search. In both cases, we are able to show in the affirmative that an inexact system outperforms its exact counterpart in terms of its efficiency quantified through the relative metric of operations per virtual Joule (OPVJ)—a relative metric that is not tied to particular hardware technology. As one example, the IGCM application can be used to achieve savings through inexactness of (almost) a factor of 3 in energy without compromising the quality of the forecast, quantified through the forecast error metric, in a noticeable manner. As another example finding, we show that in the case of PageRank, an inexact system is able to outperform its exact counterpart by close to a factor of 1.5 using the OPVJ metric. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.3 Hot Topic - Advances in Hardware Trojans Detection

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Stendhal

Organiser:
Julien Francq, Airbus Defence & Space CyberSecurity, FR

Chair:
Giorgio Di Natale, LIRMM, CNRS/University of Montpellier, FR

Hardware Trojans (HTs) are malicious modifications of an Integrated Circuit (IC) during its design flow. These added transistors could induce in the infected IC some malicious effects. Complete trust in ICs has been now lost because of the outsourcing of the fabrication of the ICs and the complexity of the IC design flow. This special session will present some advances in HT detection developed in French funded research project HOMERE.

Time	Label	Presentation Title Authors
14:30	7.3.1	INTRODUCTION TO HARDWARE TROJAN DETECTION METHODS Speakers: Julien Francq¹ and Florian Frick² ¹Cassidian, FR; ²University of Stuttgart, DE Abstract Hardware Trojans (HTs) are identified as an emerging threat for the integrity of Integrated Circuits (ICs) and their applications. Attackers attempt to maliciously manipulate the functionality of ICs by inserting HTs, potentially causing disastrous effects (Denial of Service, sensitive information leakage, etc.). Over the last 10 years, various methods have been proposed in literature to circumvent HTs. This article introduces the general context of HTs and summarizes the recent advances in HT detection from a French funded research project named HOMERE. Some of these results will be detailed in the related special session. Download Paper (PDF; Only available from the DATE venue WiFi)
14:45	7.3.2	NEW TESTING PROCEDURE FOR FINDING INSERTION SITES OF STEALTHY HARDWARE TROJANS Speakers: Sophie Dupuis, Papa-Sidy Ba, Marie-Lise Flottes, Giorgio Di Natale and Bruno Rouzeyre, LIRMM, FR Abstract Hardware Trojans (HTs) are malicious alterations to a circuit. These modifications can be inserted either during the design phase or during the fabrication process. Due to the diversity of Hardware Trojans, detecting and/or locating them are challenging tasks. Numerous approaches have been proposed to address this problem. Methods based on logic testing consist in trying to activate potential HTs and detect erroneous outputs during test. However, HTs are stealthy in nature i.e. mostly inactive unless they are triggered by a very rare condition. The activation of a HT is therefore a major challenge. In this paper, we propose a new testing procedure dedicated to identifying where a possible HT may be easily inserted and generating the test patterns that are able to excite these sites. The selection of the sites is based on the assumption that the HT (i) is triggered by signals with low controllability, (ii) combines them using gates in close proximity in the circuit's layout, and (iii) without introducing new gates in critical paths. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.3.3	HARDWARE TROJAN DETECTION BY DELAY AND ELECTROMAGNETIC MEASUREMENTS Speakers: X-T. Ngo¹, I Exurville², S Bhasin¹, Jean-Luc Danger¹, Sylvain Guilley¹, Z Najm¹, Jean Baptiste Rigaud³ and Bruno Robisson² ¹Télécom ParisTech, FR; ²CEA, FR; ³EMSE, FR Abstract Hardware trojan (HT) inserted in integrated circuits have received special attention of researchers. In this paper, we present firstly a novel HT detection technique based on path delays measurements. A delay model is established for a net, which consider intra-die process variations. Secondly, we show how to detect HT using ElectroMagnetic (EM) measurements. We study the HT detection probability according to its size taking into account the inter-die process variations with a set of FPGAs. The results show that: there is a probability superior than 95% with a false negative rate of 5% to detect a HT bigger than 1.7% of the original circuit. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.3.4	A HIGH EFFICIENCY HARDWARE TROJAN DETECTION TECHNIQUE BASED ON FAST SEM IMAGING Speakers: Franck Courbon¹, Philippe Loubet-Moundi², Fournier Jacques³ and Assia Tria³ ¹GEMALTO Security Labs/Ecole des Mines de Saint-Etienne, FR; ²Gemalto, FR; ³CEA Tech Region DPACA/LSAS, FR Abstract In the semiconductor market where more and more companies become fabless, malicious integrated circuits' modifications are seen as possible threats. Those Hardware Trojans can have various effects and can be implemented by different entities with different means. This article includes the integration of an almost automatic Hardware Trojan detection. The latter is based on a visual inspection implemented within the integrated circuit life cycle. The proposed detection methodology is quite efficient regarding tools, user experience and time needed. A single layer of the chip is accessed and then imaged with a Scanning Electron Microscope (SEM). The acquisition of several hundred images at high magnification is automated as does the images registration. Then depending on the reference availability, one can check if any supplementary gates have been inserted in the design using a golden reference or a graphic/text design file. Depending on the reference, either basic image processing is used to compare the chip extracted image with a golden model or some pattern recognition can be used to retrieve the number of occurrences of each standard cell. The depicted methodology aims to detect any gate modification, substitution, removal or addition and so far require an invasive approach and a reference. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.4 Routing Advances for Fault-tolerant and Multicast NoCs

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Chartreuse

Chair:
Fabien Clermidy, CEA, FR

Co-Chair:
Masoud Daneshtalab, University of Turku, FI

NoCs are migrating into large-scale multicore systems which lead to new issues to be solved. In this session, we see how NoCs can tackle both faulty behaviors and performance bottlenecks. The first paper demonstrates low overhead multicast using surface-wave communication. The two other papers deal with low-overhead and low-latency fault-tolerance.

Time	Label	Presentation Title Authors
14:30	7.4.1	MIXED WIRE AND SURFACE-WAVE COMMUNICATION FABRICS FOR DECENTRALIZED ON-CHIP MULTICASTING Speakers: Ammar Karkar¹, Kin-Fai Tong², Terrence Mak³ and Alex Yakovlev¹ ¹School of Electrical and Electronic Engineering, Newcastle University, Newcastle upon Tyne, GB; ²Department of Electrical and Electronic Engineering, UCL, London, GB; ³Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, CN Abstract Network-on-chip (NoC) has emerged to tackle different on-chip challenges and has satisfied different demands in terms of high performance, economical and reliable interconnect implementation. However, a merely metal-based interconnect reaches performance bound with the relentless technology scaling. Especially, it displayed a bottleneck to meet the communication bandwidth demand for multicasting. This paper proposes a novel hybrid architecture, which improves the on-chip communication bandwidth significantly using mixed wires and surface wave interconnects (SWI) fabrics. In particular, the bandwidth of multicasting can be drastically improved. We introduce a decentralized arbitration method to fully utilize the slack-time scheduling with deadlock-free flow control. Evaluation results, based on a cycle-accurate and hardware-based simulation, demonstrate the effectiveness of the proposed architecture and methods. Compared to a wire-based NoC, the mixed fabric approach can achieve an improvement in power reduction and communication speed up to 63% and 12X, respectively. These results are achieved with almost negligible hardware overheads. This new paradigm efficiently addresses the emerged challenges for on-chip communications. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.4.2	D2-LBDR: DISTANCE-DRIVEN ROUTING TO HANDLE PERMANENT FAILURES IN 2D MESH NOCS Speakers: Rimpy Bisnhoi¹, Manoj Gaur¹, Vijay Laxmi¹ and Josè Flich² ¹Malaviya National Institute of Technology, IN; ²Associate Professor, Universitat Politècnica de València, ES Abstract With the advent of deep sub-micron technology, fault-tolerant solutions are needed to keep many-core chips operative. In NoCs, Logic Based Distributed Routing (LBDR) proved to be a flexible routing framework for 2D meshes with link and router faults. However, to provide full coverage, LBDR requires a module named FORKS which replicates some messages. This imposes the use of virtual cut-through switching and a complex router arbiter, increasing excessively the router cost, mainly in buffer area. Also, some failure combinations require the use of a non-trivial dynamic reconfiguration strategy to avoid deadlocks. We propose d2 -LBDR which adds, on every router, a distance register to the closest failure. This enables the support of more failure combinations without an excessive implementation cost. Indeed, we restore the use of wormhole switching, keeping router architecture simple, while achieving the same fault coverage as the best LBDR version, without requiring complex switching strategies nor any dynamic reconfiguration strategy. Results show that a small area overhead (3%) is enough for the implementation of a fully flexible routing method without any limiting support case when compared with LBDR. d2 -LBDR reduces area overhead over the best LBDR approach (300% overhead against 3%) while preserving fault coverage. Results show d2 -LBDR performance equal to LBDR. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.4.3	SYNERGISTIC USE OF MULTIPLE ON-CHIP NETWORKS FOR ULTRA-LOW LATENCY AND SCALABLE DISTRIBUTED ROUTING RECONFIGURATION Speakers: Marco Balboni¹, Josè Flich² and Davide Bertozzi¹ ¹University of Ferrara, IT; ²Associate Professor, Universitat Politècnica de València, ES Abstract Abstract—Extending the principle of partially good die allowance to manycore processors, and testing them over time to detect the onset of permanent faults, are only feasible through proper support in the on-chip interconnection network. In fact, this implies the ability to reconfigure the routing algorithm at runtime to reflect changes in network topologies. Current literature cannot avoid a large hardware and/or software overhead when tackling this challenge. This paper exploits the existence of multiple physical networks in industry-relevant manycore processors in a synergistic way, for the sake of fast and scalable distributed reconfiguration of the routing function at runtime. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP3-10, 944	A HYBRID PACKET/CIRCUIT-SWITCHED ROUTER TO ACCELERATE MEMORY ACCESS IN NOC-BASED CHIP MULTIPROCESSORS Speakers: Yassin Mazloumi and Mehdi Modarressi, University of Tehran, IR Abstract Modern chip multiprocessors will feature a large shared last-level cache (LLC) that is decomposed into smaller slices and physically distributed throughout the chip area. These architectures rely on a network-on-chip (NoC) to handle remote cache access and hence, NoCs play a critical role in optimizing memory access latency and power consumption. Circuit-switching is the most power- and performance-efficient switching mechanism in NoCs, but is not advantageous when the packet transmission time is not long enough compared to the circuit setup time. In this paper, we propose a zero-latency circuit setup scheme to make circuit-switching applicable in transferring individual data packets. The design leverages the fact that in CMPs with distributed LLC (where a considerable portion of the on-chip traffic is composed of remote LLC access requests and data responses), every response packet is sent in reply to a request packet and traverses the same path as its corresponding request, but at the backward direction. The short request packets, then, are responsible to reserve a path for their corresponding response packets. This NoC tries to reduce conflict among circuit paths by considering conflicts in backward direction during request packet routing, backed by a run-time technique to resolve conflicts when circuits are actually set up. Experimental results show that the proposed NoC architecture considerably reduces average packet latency that directly translates to faster memory access Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.5 System Reliability: from Runtime to Design Languages

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Meije

Chair:
Dirk Stroobandt, Ghent University, BE

Co-Chair:
Diana Goehringer, University of Bochum, DE

Over the last few years, reliability has become an increasingly relevant consideration for electronic systems. This session will address system reliability from design flow to run-time in both digital as well as analog systems.

Time	Label	Presentation Title Authors
14:30	7.5.1	AXILOG: LANGUAGE SUPPORT FOR APPROXIMATE HARDWARE DESIGN Speakers: Amir Yazdanbakhsh¹, Divya Mahajan¹, Bradley Thwaites¹, Jongse Park¹, Anandhavel Nagendrakumar¹, Sindhuja Sethuraman¹, Kartik Ramkrishnan², Nishanthi Ravindran², Rudra Jariwala², Abbas Rahimi³, Hadi Esmailzadeh¹ and Kia Bazargan² ¹Georgia Institute of Technology, US; ²University of Minnesota, US; ³University of San Diego, US Abstract Relaxing the traditional abstraction of "near-perfect" accuracy in hardware design can lead to significant gains in energy efficiency, area, and performance. To exploit this opportunity, there is a need for design abstractions that can systematically incorporate approximation in hardware design. We introduce Axilog, a set of language annotations, that provides the necessary syntax and semantics for approximate hardware design and reuse in Verilog. Axilog enables the designer to relax the accuracy requirements in certain parts of the design, while keeping the critical parts strictly precise. Axilog is coupled with a Relaxability Inference Analysis that automatically infers the relaxable gates and connections from the designer's annotations. The analysis provides formal safety guarantees that approximation will only affect the parts that the designer intended to approximate. Finally, the paper describes a synthesis flow that approximates only the relaxable elements. Axilog enables applying approximation in the synthesis process while abstracting away the details of approximate synthesis from the designer. We evaluate Axilog, its analysis, and the synthesis flow using a diverse set of benchmark designs. The results show that the intuitive nature of the language extensions coupled with the automated analysis enables safe approximation of designs even with thousands of lines of code. Applying our approximate synthesis flow to these designs yields, on average, 54% energy savings and 1.9× area reduction with 10% output quality loss. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.5.2	IMPROVING MPSOC RELIABILITY THROUGH ADAPTING RUNTIME TASK SCHEDULE BASED ON TIME-CORRELATED FAULT BEHAVIOR Speakers: Laura A Rozo Duque¹, Jose Monsalve² and Chengmo Yang¹ ¹University of Delaware, US; ²University of Delaware, CO Abstract The increasing susceptibility of multicore systems to temperature variations, environmental issues and different aging effects has made system reliability a crucial concern. Unpredictability of all these factors makes fault behavior to be diverse in nature, which should be considered by the runtime task scheduler to improve overall system reliability. To achieve this goal, this paper proposes a fault tolerant approach to model core reliability at runtime and tune resource allocation accordingly. Given the variation of fault duration, we propose a reliability model capable of tracking not only faults appeared in each core but also their correlation in time. Taking this model as an input, a scheduling algorithm that allocates critical and vulnerable tasks to reliable cores is also proposed. Experimental results show that the proposed adaptive technique delivers 56% improvement in application execution time compared to other existing techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.5.3	ACSEM: ACCURACY-CONFIGURABLE FAST SOFT ERROR MASKING ANALYSIS IN COMBINATORIAL CIRCUITS Speakers: Florian Kriebel, Semeen Rehman, Duo Sun, Pau Vilimelis Aceituno, Muhammad Shafique and Joerg Henkel, Karlsruhe Institute of Technology (KIT), DE Abstract Technology scaling in the nano era allows smaller and faster transistors with improved performance and power efficiency. However, small feature sizes and associated low-operating voltages have led to radiation-induced soft errors as a major source of unreliability in modern circuits. As not all errors propagate to the final output of a combinatorial circuit (e.g., because of logical masking effects), an analysis of the error masking characteristics is required to evaluate and enhance the quality of a reliable processor design. State-of-the-art gate-level soft error masking techniques require a significant amount of analysis time due to their inherent nature of parsing and analyzing the complete processor's netlist, which may take up to several days. In this paper, we present a fast and Accuracy-Configurable Soft Error Masking analysis technique (ACSEM) that performs error probability analysis on parts of netlist within the user-provided masking accuracy range. To enable this, we theoretically derive the maximum number of steps in the netlist graph that has to be processed to reach the required masking accuracy level. This significantly reduces the analysis time by orders of magnitude compared to traditional state-of-the art approaches that process all logic gate paths in a given combinatorial circuit. Download Paper (PDF; Only available from the DATE venue WiFi)
15:45	7.5.4	ENERGY MINIMIZATION FOR FAULT TOLERANT SCHEDULING OF PERIODIC FIXED-PRIORITY APPLICATIONS ON MULTIPROCESSOR PLATFORMS Speakers: Qiushi Han¹, Ming Fan¹, Linwei Niu² and Gang Quan¹ ¹Florida International University, US; ²West Virginia State University, US Abstract While technology scaling enables the mass integration of transistors into a single chip for performance enhancement, it also makes processors less reliable with ever-increasing failure rates. In this paper, we study the problem of energy minimization for scheduling periodic fixed-priority applications on multiprocessor platforms with fault tolerance requirements. We first introduce an efficient method to determine the checkpointing scheme that guarantees the schedulability of an application under the worst-case scenario, i.e. up to K faults occur, on a single processor. Based on this method, we then present a task allocation scheme aiming at minimizing energy consumption while ensuring the fault tolerance requirement of the system. We evaluate the efficiency and effectiveness of our approaches using extensive simulation studies. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP3-11, 674	SEMIAUTOMATIC IMPLEMENTATION OF A BIOINSPIRED RELIABLE ANALOG TASK DISTRIBUTION ARCHITECTURE FOR MULTIPLE ANALOG CORES Speakers: Julius von Rosen¹, Markus Meissner¹ and Lars Hedrich² ¹Goethe Universität Frankfurt, DE; ²Goethe-Universitat Frankfurt a. M., DE Abstract In this paper we present a silicon implementation of a bioinspired analog task distribution system for enabling reliable analog multi-core systems. The increase in reliability is achieved by a dependable task distribution architecture using a hormone based mechanism. The specifications are generated by a feasibility analysis of the algebraic description of the architecture. Starting from the specifications, an automated analog synthesis framework is used to fasten the time-consuming design of the needed analog amplifiers. The complete system with the designed amplifiers has been layouted and fabricated. We present measurements of two different architectures of task distribution system on silicon showing the full functionality of the system and the design methodology. Download Paper (PDF; Only available from the DATE venue WiFi)
16:01	IP3-12, 925	POWER-EFFICIENT ACCELERATOR ALLOCATION IN ADAPTIVE DARK SILICON MANY-CORE SYSTEMS Speakers: Muhammad Usman Karim Khan, Muhammad Shafique and Joerg Henkel, Karlsruhe Institute of Technology (KIT), DE Abstract Modern many-core systems in the dark silicon era face the predicament of underutilized resources of the chip due to power constraints. Therefore, hardware accelerators are becoming popular as they can overcome this problem by exercising a part of the program on dedicated custom logic in an energy efficient way. However, efficient accelerator usage poses numerous challenges, like adaptations for accelerator's sharing schedule on the many-core systems under run-time varying scenarios. In this work, we propose a power-efficient accelerator allocation scheme for adaptive many-core systems that maximally utilizes and dynamically allocates a shared accelerator to competing cores, such that deadlines of the executing applications are met and the total power consumption of the overall system is minimized. The experimental results demonstrate power minimization and high accelerator utilization for a many-core system. Download Paper (PDF; Only available from the DATE venue WiFi)
16:02	IP3-13, 806	THERMAL-AWARE FLOORPLANNING FOR PARTIALLY-RECONFIGURABLE FPGA-BASED SYSTEMS Speakers: Davide Pagano, Mikel Vuka, Marco Rabozzi, Riccardo Cattaneo, Donatella Sciuto and Marco D. Santambrogio, Politecnico di Milano, IT Abstract Field Programmable Gate Arrays (FPGAs) systems are being more and more frequent in high performance applications. Temperature affects both reliability and performance, therefore its optimization has become challenging for system designers. In this work we present a novel thermal aware floorplanner based on both Simulated Annealing (SA) and Mixed- Integer Linear Programming (MILP). The proposed method takes into account an accurate description of heterogeneous resources and partially reconfigurable constraints of recent FPGAs. Our major contribution is to provide a high level formulation for the problem, without resorting to low level consideration about FPGAs resources. Within our approach we combine the benefits of SA and MILP to handle both linear and non-linear optimization metrics while providing an effective exploration of the solution space. Experimental results show that, for several designs, it is possible to reduce the peak temperature by taking into account power consumption during the floorplanning stage. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.6 Test Power and 3-D Fault Tolerance

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Bayard

Chair:
Juergen Schloeffel, Mentor, DE

Co-Chair:
Sybille Hellebrand, Universität Paderborn, DE

The section presents low power solutions for scan-based test and a new redundant TSV architecture for 3-D ICs

Time	Label	Presentation Title Authors
14:30	7.6.1	DP-FILL : A DYNAMIC PROGRAMMING APPROACH TO X-FILLING FOR MINIMIZING PEAK TEST POWER IN SCAN TESTS Speakers: Satya A. Trinadh¹, Sobhan Babu Ch.¹, Shiv Govind Singh¹, Seetal Potluri² and Kamakoti V² ¹Indian Institute of Technology Hyderabad, IN; ²Indian Institute of Technology Madras, IN Abstract At-speed testing is crucial to catch small delay defects that occur during the manufacture of high performance digital chips. Launch- Off-Capture (LOC) and Launch-Off-Shift (LOS) are two prevalently used schemes for this purpose. LOS scheme achieves higher fault coverage while consuming lesser test time over LOC scheme, but dissipates higher power during the capture phase of the at-speed test. Excessive IR-drop during capture phase on the power grid causes false delay failures leading to significant yield reduction that is unwarranted. As reported in literature, an intelligent filling of don't care bits (X-filling) in test cubes has yielded significant power reduction. Given that the tests output by ATPG tools for big circuits have large number of don't care bits, the X-filling technique is very effective for them. Assuming that the DFT preserves the state of the combinational logic between capture phases of successive patterns, this paper maps the problem of optimal X-filling for peak power minimization during LOS scheme to a variant of interval coloring problem and proposes a dynamic programming (DP) algorithm for the same along with a theoretical proof for its optimality. The proposed algorithm when experimented on ITC99 benchmarks produced peak power savings of up to 34% over the best known low power X-filling algorithm for LOS testing. Interestingly, it is observed that the power savings increase with the size of the circuit. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.6.2	A SCAN PARTITIONING ALGORITHM FOR REDUCING CAPTURE POWER OF DELAY-FAULT LBIST Speakers: Nan Li¹, Elena Dubrova² and Gunnar Carlsson³ ¹Royal Institute of Technology, SE; ²Ericsson AB/Royal Institute of Technology - KTH, SE; ³Development Unit Radio, Ericsson AB, SE Abstract It is well-known that high power consumption in test mode can cause problems such as overheating and IR-drop which have negative effect on circuit reliability and yield. The problem is particularly hard in the case of at-speed delay-fault testing where it cannot be mitigated by lowering the clock frequency. The difficulty increases even further if pseudo-random rather than ATPG patterns are used for testing. ATPG patterns can be chosen selectively, as well as re-ordered and specified in a power-friendly manner. This is not possible with pseudo-random test patterns. In this paper, we present a scan partitioning algorithm for reducing capture power targeting delay-fault LBIST. The algorithm uses a novel weighted S-graph model in which the weights are determined by signal probability analysis. Our experimental results show that, on average, the presented method reduces average capture power by 50% and peak capture power by 39% with less than 2% loss in the transition fault coverage. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.6.3	ARCHITECTURE OF RING-BASED REDUNDANT TSV FOR CLUSTERED FAULTS Speakers: Wei-Hen Lo, Kang Chi and TingTing Hwang, National Tsing Hua University, TW Abstract Three-dimensional Integrated Circuits (3D-ICs) that employ the Through-Silicon Vias (TSVs) vertically stacking multiple dies provide many benefits, such as high density, high bandwidth, low-power. However, the fabrication and bonding of TSVs may fail because of many factors, such as the winding level of the thinned wafers, the surface roughness and cleaness of silicon dies, and bonding technology. To improve the yield of 3D-ICs, many redundant TSV architectures were proposed to repair 3D-ICs with faulty TSVs. These methods reroute siganls of faulty TSVs to other regular or redundant TSVs. In practice, the faulty TSVs may cluster because of imperfect bonding technology. To resolve the problem of clustered TSV faults, router-based [1] redundant TSV architecture was the first paper proposed to pay attention to this clustering problem. Their method enables faulty TSVs to be repaired by redundant TSVs that are farther apart. However, for some rarely occurring defective patterns, their method consumes too much area. In this paper, we propose a ring-based redundant TSV architecture to utilize the area more efficiently as well as to maintain high yield. Simulation results show that for a given number of TSVs (8×8) and TSV failure rate (1%), our design achieves 55% area reduction of MUXes per signal, while the yield of our ring-based redundant TSV architectures can still maintain 98.47% to 99.00% as compared with router-based desgin [1]. Furthermore, the minimum shifting length of our ring-based redundant TSV architecture is at most 1 which guarantees the minimum timing overhead of each signal. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP3-14, 72	FEEDBACK-BUS OSCILLATION RING: A GENERAL ARCHITECTURE FOR DELAY CHARACTERIZATION AND TEST OF INTERCONNECTS Speakers: Shi-Yu Huang¹, Meng-Ting Tsai¹, Kun-Han Tsai² and Wu-Tung Cheng² ¹National Tsing Hua University, TW; ²Mentor, US Abstract In this paper we propose a flexible delay characterization and test method for arbitrary die-to-die interconnects in a 3D IC. As compared to previous works, it is unique in its ability to streamline the characterization/test operations for a set of arbitrary interconnects with multiple pins sprawling multiple dies. During the Design-for-Testability stage, one common feedback-bus (connected to all dies in the IC under characterization/test) is inserted. Through the feedback-bus, a oscillation ring can be formed dynamically and the Variable-Output-Threshold (VOT) technique can be applied to characterize the delay of a selected interconnect segment at a time. Experimental results indicate that this method is not only flexible and scalable, but requiring only a small area overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.7 Energy-efficient Computing

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Les Bans

Chair:
Damien Querlioz, CNRS-IEF, FR

Co-Chair:
Swaroop Ghosh, University of South Florida, US

The papers in this session are all focused on energy efficient computing. In the first talk, the authors present approaches for accelerating learning algorithms for resistive cross-point arrays. The next paper considers what training schemes are most suitable when RRAM arrays are used to realize spiking neural networks. Finally, the last presentation will discuss how devices that offer the potential for non-volatile state retention can be employed in power gating architectures.

Time	Label	Presentation Title Authors
14:30	7.7.1	TECHNOLOGY-DESIGN CO-OPTIMIZATION OF RESISTIVE CROSS-POINT ARRAY FOR ACCELERATING LEARNING ALGORITHMS ON CHIP Speakers: Pai-Yu Chen, Deepak Kadetotad, Zihan Xu, Abinash Mohanty, Binbin Lin, Jieping Ye, Sarma Vrudhula, Jae-sun Seo, Yu Cao and Shimeng Yu, Arizona State University, US Abstract Technology-design co-optimization methodologies of the resistive cross-point array are proposed for implementing machine learning algorithms on a chip. A novel read and write scheme is designed to accelerate the training process, which realizes fully parallel operations of the weighted sum and the weight update. Furthermore, technology and design parameters of the cross-point array are co-optimized to enhance the array performance in learning tasks, including learning accuracy, latency and energy consumption. In contrast to the conventional memory design, a set of reverse scaling rules is proposed on the resistive cross-point array to achieve high learning accuracy. These include 1) larger wire width to reduce the IR drop on interconnects thereby increasing the learning accuracy; 2) use of multiple cells for each weight element to alleviate the impact of the device variations, at an affordable expense of area, energy and latency. The optimized resistive cross-point array with peripheral circuitry is implemented at the 65 nm node. Its performance is benchmarked for handwritten digit recognition on the MNIST database using gradient-based sparse coding. Compared to state-of-the-art software approach running on CPU, it achieves >1E3 speed-up and >1E6 energy efficiency improvement, enabling real-time image feature extraction and learning. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.7.2	SPIKING NEURAL NETWORK WITH RRAM: CAN WE USE IT FOR REAL-WORLD APPLICATION? Speakers: Tianqi Tang¹, Lixue Xia¹, Boxun Li¹, Rong Luo¹, Yiran Chen², Yu Wang¹ and Huazhong Yang¹ ¹Tsinghua University, CN; ²University of Pittsburgh, US Abstract The spiking neural network (SNN) provides a promising solution to drastically promote the performance and efficiency of computing systems. Previous work of SNN mainly focus on increasing the scalability and level of realism in a neural simulation, while few of them support practical cognitive applications with acceptable performance. At the same time, based on the traditional CMOS technology, the efficiency of SNN systems is also unsatisfactory. In this work, we explore different training algorithms of SNN for real-world applications, and demonstrate that the Neural Sampling method is much more effective than Spiking Time Dependent Plasticity (STDP) and Remote Supervision Method (ReSuMe). We also propose an energy efficient implementation of SNN with the emerging metal-oxide resistive random access memory (RRAM) devices, which includes an RRAM crossbar array works as network synapses, an analog design of the spike neuron, and an input encoding scheme. A parameter mapping algorithm is also introduced to configure the RRAM-based SNN. Simulation results illustrate that the system achieves 91.2\% accuracy on the MNIST dataset with an ultra-low power consumption of 3.5mW. Moreover, the RRAM-based SNN system demonstrates great robustness to 20\% process variation with less than 1\% accuracy decrease, and can tolerate 20\% signal fluctuation with about 2\% accuracy loss. These results reveal that the RRAM-based SNN will be quite easy to be physically realized. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.7.3	COMPARATIVE STUDY OF POWER-GATING ARCHITECTURES FOR NONVOLATILE FINFET-SRAM USING SPINTRONICS-BASED RETENTION TECHNOLOGY Speakers: Yusuke Shuto¹, Shuu'ichirou Yamamoto¹ and Satoshi Sugahara² ¹Tokyo Institute of Technology, JP; ²Tokyo Institute of Technology, Abstract Power-gating (PG) architectures employing nonvolatile state/data retention are expected to be a highly efficient energy reduction technique for high-performance CMOS logic systems. Recently, two types of PG architectures using nonvolatile retention have been proposed: One architecture is nonvolatile PG (NVPG) using nonvolatile bistable circuits such as nonvolatile SRAM (NV-SRAM) and nonvolatile flip-flop (NV-FF), in which nonvolatile retention is not utilized during the normal SRAM/FF operation mode and it is used only when there exist energetically meaningful shutdown periods given by break-even time (BET). In contrast, the other architecture employs nonvolatile retention during the normal SRAM/FF operation mode. In this type of architecture, an even short standby period can be replaced by a shutdown period, and thus this architecture is also called normally-off (NOF) rather than PG. In this paper, these two PG architectures for a FinFET-based high-performance NV-SRAM cell employing spintronics-based nonvolatile retention were systematically analyzed using HSPICE with a magnetoresistive-device macromodel. The NVPG architecture shows effective reduction of energy dissipation without performance degradation, whereas the NOF architecture causes severe performance degradation and the energy efficiency of the NOF architecture cannot be superior to that of the NVPG architecture. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP3-15, 483	ANALOG NEUROMORPHIC COMPUTING ENABLED BY MULTI-GATE PROGRAMMABLE RESISTIVE DEVICES Speakers: Vehbi Calayir, Mohamed Darwish, Jeffrey Weldon and Larry Pileggi, Carnegie Mellon University, US Abstract Analog neural networks represent a massively parallel computing paradigm by mimicking the human brain. Two important functions that are not efficiently built by CMOS technology for their practical hardware implementations are weighting for synapse circuits and summing for neuron circuits. In this paper we propose the use of tunable analog resistances, such as multi-gate graphene devices, to efficiently enable these two functions. We design and demonstrate a complete analog neuromorphic circuitry enabled by such devices. Simulation results based on Verilog-A compact models for graphene devices confirm its functionality. We also provide experimental demonstration of our proposed graphene device along with projected circuit performance based on scaling targets. Our proposed design is suitable not only for the device example shown in this paper, but also for any beyond-CMOS technology that exhibits similar device characteristics. Download Paper (PDF; Only available from the DATE venue WiFi)
16:01	IP3-16, 582	AN ENERGY-EFFICIENT NON-VOLATILE IN-MEMORY ACCELERATOR FOR SPARSE-REPRESENTATION BASED FACE RECOGNITION Speakers: Yuhao Wang¹, Hantao Huang¹, Leibin Ni¹, Hao Yu¹, Mei Yan¹, Chuliang Weng², Wei Yang² and Junfeng Zhao² ¹Nanyang Technological University, SG; ²Shannon Laboratory, Huawei Technologies Co., Ltd, CN Abstract Data analytics such as face recognition involves large volume of image data, and hence leads to grand challenge on mobile platform design with strict power requirement. Emerging non-volatile STT-MRAM has the minimum leakage power and comparable speed to SRAM, and hence is considered as a promising candidate for data-oriented mobile computing. However, there exists significantly higher write-energy for STT-MRAM when compared to the SRAM. Based on the use of STT- MRAM, this paper introduces an energy-efficient non-volatile in-memory accelerator for a sparse-representation based face recognition algorithm. We find that by projecting high-dimension image data to much lower dimension, the current scaling for STT-MRAM write operation can be applied aggressively, which leads to significant power reduction yet maintains quality-of-service for face recognition. Specifically, compared to a baseline with SRAM, leakage power and dynamic power are reduced by 91.4% and 79% respectively with only slight compromise on recognition rate. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

7.8 Critical Research Areas Driven by Industry Transformations

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Salle Lesdiguières

Organiser:
John Zhao, MathWorks, US

Moderator:
Pieter J. Mosterman, MathWorks, US

Panelists:
Koen Bertels, Delft University of Technology, NL
Axel Jantsch, Vienna University of Technology, AT
Ahmed Jerraya, CEA-Leti, FR
Ian O’Connor, Ecole Centrale de Lyon, FR
Joseph Sifakis, École polytechnique fédérale de Lausanne EPFL, CH

Increasing demands of electrification arise from connected vehicles, medical devices, smart-grid and microgrid technologies, and the IoT evolution of devices into smart, interconnected systems. Those systems must meet market requirements for not only more sophisticated functionality, but also improved performance and robustness. As a result, companies need to transform how they design, analyze, implement, and verify their systems. At the same time, embedded-system platforms have become increasingly diverse combinations of digital/analog electronics and software, ranging from FPGA/ARM platforms (e.g,. Xilinx Zynq and Altera SoC) to a diverse range of heterogeneous manycore systems.

To help companies leverage these trends in their product and system development, EDA and embedded-system researchers are called upon to focus their research on new kinds of issues that arise. This panel will explore the key research needs and opportunities that come from the transformations that industries must embrace.

Time

Label

Presentation Title
Authors

14:30

7.8.1

PANEL PRESENTATIONS AND DISCUSSIONS, WITH Q&A

16:00

End of session
Coffee Break in Exhibition Area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00

IP3 Interactive Presentations

Date: Wednesday 11 March 2015
Time: 16:00 - 16:30
Location / Room: Exhibition Area

Label	Presentation Title Authors
IP3-1	STT MRAM-BASED PUFS Speakers: Elena Ioana Vatajelu¹, Giorgio Di Natale², Marco Indaco¹ and Paolo Prinetto¹ ¹Politecnico di Torino, IT; ²LIRMM, FR Abstract Physical Unclonable Functions (PUFs) are emerging cryptographic primitives used to implement low-cost device authentication and secure secret key generation. Weak PUFs (i.e., devices able to generate a single signature or able to deal with a limited number of challenges) are widely discussed in literature. Nowadays, the most promising solution is based on SRAMs. In this paper we propose an innovative PUF design based on STT-MRAM memory. We exploit the high variability affecting the electrical resistance of the MTJ device in anti-parallel magnetization. We will show that the proposed solution is robust, unclonable and unpredictable. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-2	SPATIAL AND TEMPORAL GRANULARITY LIMITS OF BODY BIASING IN UTBB-FDSOI Speakers: Johannes Maximilian Kühn¹, Dustin Peterson¹, Hideharu Amano², Oliver Bringmann¹ and Wolfgang Rosenstiel¹ ¹Eberhard Karls Universität Tübingen, DE; ²Keio University, JP Abstract Advances in SOI technology such as STMicro's 28nm UTBB-FDSOI enabled a renaissance of body biasing. Body biasing is a fast and efficient technique to change power and performance characteristics. As the electrical task to change the substrate potential is small compared to Dynamic Voltage Scaling, much finer island sizes are conceivable. This however creates new challenges in regard to design partitioning into body bias islands and body bias combinations across such designs. These combinations should be chosen so that energy efficiency improves while maintaining timing constraints. We introduce a combination based analysis tool to find optimized body bias island partitions and body biasing levels. For such partitions, optimized body bias assignments for static, programmable and dynamic body biasing can be computed. The overheads incurred by dynamically switching body biases are estimated to yield actual improvements and to give an upper bound for the power consumption of required additional circuitry. Based on these partitionings and the switching overheads, optimized application specific switching strategies are computed. The effectiveness of this method is demonstrated in a frequency scaling scenario using forward body biasing on a Dynamic Reconfigurable Processor (DRP) design. We show that leakage can be greatly reduced using the proposed methods and that dynamic body biasing can be beneficial even at small time periods. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-3	A HARDWARE IMPLEMENTATION OF A RADIAL BASIS FUNCTION NEURAL NETWORK USING STOCHASTIC LOGIC Speakers: Yuan Ji¹, Feng Ran¹, Cong Ma² and David Lilja² ¹Shanghai University, CN; ²University of Minnesota - Twin Cities, US Abstract Hardware implementations of artificial neural networks typically require significant amounts of hardware resources. This paper proposes a novel radial basis function artificial neural network using stochastic computing elements, which greatly reduces the required hardware. The Gaussian function used for the radial basis function is implemented with a two-dimensional finite state machine. The norm between the input data and the center point is optimized using simple logic gates. Results from two pattern recognition case studies, the standard Iris flower and the MICR font benchmarks, show that the difference of the average mean squared error between the proposed stochastic network and the corresponding traditional deterministic network is only 1.3% when the stochastic stream length is 10kbits. The accuracy of the recognition rate varies depending on the stream length, which gives the designer tremendous flexibility to tradeoff speed, power, and accuracy. From the FPGA implementation results, the hardware resource requirement of the proposed stochastic hidden neuron is only a few percent of the hardware requirement of the corresponding deterministic hidden neuron. The proposed stochastic network can be expanded to larger scale networks for complex tasks with simple hardware architectures. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-4	SODA: SOFTWARE DEFINED FPGA BASED ACCELERATORS FOR BIG DATA Speakers: Chao Wang, Xi Li and Xuehai Zhou, University of Science and Technology of China, CN Abstract FPGA has been an emerging field in novel big data architectures and systems, due to its high efficiency and low power consumption. It enables the researchers to deploy massive accelerators within one single chip. In this paper, we present a software defined FPGA based accelerators for big data, named SODA, which could reconstruct and reorganize the acceleration engines according to the requirement of the various data-intensive applications. SODA decomposes large and complex applications into coarse grained single-purpose RTL code libraries that perform specialized tasks in out-of-order hardware. We built a prototyping system with constrained shortest path Finding (CSPF) case studies to evaluate SODA framework. SODA is able to achieve up to 43.75X speedup at 128 node application. Furthermore, hardware cost of the SODA framework demonstrates that it can achieve high speedup with moderate hardware utilization. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-5	DYNAMIC RECONFIGURABLE PUNCTURING FOR SECURE WIRELESS COMMUNICATION Speakers: Liang Tang¹, Jude Angelo Ambrose², Akash Kumar¹ and Sri Parameswaran² ¹National University of Singapore, SG; ²University of New South Wales, AU Abstract The ubiquity of wireless devices has created security concerns on the information being transferred. It is critical to protect the secret information in every layer of wireless communication to thwart any type of attacks. A dynamic reconfigurable puncturing based security mechanism, named RePunc, is proposed in this paper to provide an extra level of security at the physical layer. RePunc utilizes the puncturing feature of Forward Error Correction (FEC) to insert the secure information in the punctured positions of the standard information encoded data. The punctured patterns are dynamically changed and passed as a secret key from the sender to the receiver. An eavesdropper will not be able to detect the transmission of the secure information since the inserted secure information will be processed as channel noise by the eavesdropper's receiver. However, the rightful receiver will be able to successfully decode the secure packets by knowingly differentiating the secure information and the standard information before the FEC decoding. A case study of RePunc implementation for WiFi communication is presented in this paper, showing the extreme high security complexity with low hardware overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-6	QR-DECOMPOSITION ARCHITECTURE BASED ON TWO-VARIABLE NUMERIC FUNCTION APPROXIMATION Speakers: Jochen Rust, Frank Ludwig and Steffen Paul, University of Bremen, DE Abstract This paper presents a new approach for hardware-based QR-decomposition using an efficient computation scheme of the Givens-Rotation. In detail, the angle of rotation and its application to the Givens-Matrix are processed in a direct, straight-forward manner. High-performance signal processing is achieved by piecewise approximation of the arctangent and sine function. In order to identify appropriate function approximations, several designs with varying constraints are automatically generated and analyzed. Physical and logical synthesis is performed in a 130nm CMOS-technology. The application of our proposal in a multi-antenna mobile communication scenario highlights our work to be very efficient in terms of calculation accuracy and computation performance. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-7	IN-PLACE MEMORY MAPPING APPROACH FOR OPTIMIZED PARALLEL HARDWARE INTERLEAVER ARCHITECTURES Speakers: Saeed Ur Rehman¹, Cyrille Chavet², Philippe Coussy² and Awais Sani¹ ¹Lab-STICC / Université de Bretagne Sud, PK; ²Lab-STICC / Université de Bretagne Sud, FR Abstract Due to their impressive error correction performances, turbo-codes or LDPC (Low Density Parity Check) architectures are now widely used in communication system and are one of the most critical parts of decoders. In order to achieve high throughput requirements these decoders are based on parallel architecture, which results in a major problem to be solved: parallel memory access conflicts. To solve these conflicts, different approaches have been proposed in state of the art resulting in a lot of different architectural solutions. In this article, we introduce a new class of memory mapping approach that can solve the conflicts with an optimized architecture based on in-place memory mapping for any application. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-8	MAXIMIZING COMMON IDLE TIME ON MULTI-CORE PROCESSORS WITH SHARED MEMORY Speakers: Chenchen Fu¹, Yingchao Zhao², Minming Li¹ and Jason Xue³ ¹Department of Computer Science, City University of Hong Kong, HK; ²Department of Computer Science, Caritas Institute of Higher Education, Hong Kong, HK; ³City University of Hong Kong, HK Abstract Reducing energy consumption is a critical problem in most of the computing systems today. This paper focuses on reducing the energy consumption of the shared main memory in multi-core processors by putting it into sleep state when all the cores are idle. Based on this idea, this work presents systematic analysis of different assignment and scheduling models and proposes a series of scheduling schemes to maximize the common idle time of all cores. An optimal scheduling scheme is proposed assuming the number of cores is unbounded. When the number of cores is bounded, an efficient heuristic algorithm is proposed. The experimental results show that the heuristic algorithm works efficiently and can save as much as 25.6% memory energy compared to a conventional multi-core scheduling scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-9	MAXIMIZING IO PERFORMANCE VIA CONFLICT REDUCTION FOR FLASH MEMORY STORAGE SYSTEMS Speakers: Qiao Li¹, Liang Shi², Congming Gao¹, Kaijie Wu¹, Jason Chun Xue³, Qingfeng Zhuge¹ and H.-M. Edwin Sha⁴ ¹Chongqing University, CN; ²College of Computer Science, Chongqing University, CN; ³City University of Hong Kong, HK; ⁴Chongqing University and University of Texas at Dallas, CN Abstract Flash memory has been widely deployed during the recent years with the improvement of bit density and technology scaling. However, a significant performance degradation is also introduced with the development trend. The latency of IO requests on flash memory storage systems is composed of access conflict latency, data transfer latency, flash chip access latency and ECC encoding/decoding latency. Studies show that the access conflict latency, which is mainly induced by the slow transfer latency and access latency, has become the dominate part of the IO latency, especially for IO intensive applications. This paper proposes to reduce the flash access conflict latency through the reduction of the transfer and flash access latencies. A latency model is built to construct the relationship among the transfer latency and access latency based on the reliability characteristics of flash memory. Simulation experiments show that the proposed approach achieves significant performance improvement. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-10	A HYBRID PACKET/CIRCUIT-SWITCHED ROUTER TO ACCELERATE MEMORY ACCESS IN NOC-BASED CHIP MULTIPROCESSORS Speakers: Yassin Mazloumi and Mehdi Modarressi, University of Tehran, IR Abstract Modern chip multiprocessors will feature a large shared last-level cache (LLC) that is decomposed into smaller slices and physically distributed throughout the chip area. These architectures rely on a network-on-chip (NoC) to handle remote cache access and hence, NoCs play a critical role in optimizing memory access latency and power consumption. Circuit-switching is the most power- and performance-efficient switching mechanism in NoCs, but is not advantageous when the packet transmission time is not long enough compared to the circuit setup time. In this paper, we propose a zero-latency circuit setup scheme to make circuit-switching applicable in transferring individual data packets. The design leverages the fact that in CMPs with distributed LLC (where a considerable portion of the on-chip traffic is composed of remote LLC access requests and data responses), every response packet is sent in reply to a request packet and traverses the same path as its corresponding request, but at the backward direction. The short request packets, then, are responsible to reserve a path for their corresponding response packets. This NoC tries to reduce conflict among circuit paths by considering conflicts in backward direction during request packet routing, backed by a run-time technique to resolve conflicts when circuits are actually set up. Experimental results show that the proposed NoC architecture considerably reduces average packet latency that directly translates to faster memory access Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-11	SEMIAUTOMATIC IMPLEMENTATION OF A BIOINSPIRED RELIABLE ANALOG TASK DISTRIBUTION ARCHITECTURE FOR MULTIPLE ANALOG CORES Speakers: Julius von Rosen¹, Markus Meissner¹ and Lars Hedrich² ¹Goethe Universität Frankfurt, DE; ²Goethe-Universitat Frankfurt a. M., DE Abstract In this paper we present a silicon implementation of a bioinspired analog task distribution system for enabling reliable analog multi-core systems. The increase in reliability is achieved by a dependable task distribution architecture using a hormone based mechanism. The specifications are generated by a feasibility analysis of the algebraic description of the architecture. Starting from the specifications, an automated analog synthesis framework is used to fasten the time-consuming design of the needed analog amplifiers. The complete system with the designed amplifiers has been layouted and fabricated. We present measurements of two different architectures of task distribution system on silicon showing the full functionality of the system and the design methodology. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-12	POWER-EFFICIENT ACCELERATOR ALLOCATION IN ADAPTIVE DARK SILICON MANY-CORE SYSTEMS Speakers: Muhammad Usman Karim Khan, Muhammad Shafique and Joerg Henkel, Karlsruhe Institute of Technology (KIT), DE Abstract Modern many-core systems in the dark silicon era face the predicament of underutilized resources of the chip due to power constraints. Therefore, hardware accelerators are becoming popular as they can overcome this problem by exercising a part of the program on dedicated custom logic in an energy efficient way. However, efficient accelerator usage poses numerous challenges, like adaptations for accelerator's sharing schedule on the many-core systems under run-time varying scenarios. In this work, we propose a power-efficient accelerator allocation scheme for adaptive many-core systems that maximally utilizes and dynamically allocates a shared accelerator to competing cores, such that deadlines of the executing applications are met and the total power consumption of the overall system is minimized. The experimental results demonstrate power minimization and high accelerator utilization for a many-core system. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-13	THERMAL-AWARE FLOORPLANNING FOR PARTIALLY-RECONFIGURABLE FPGA-BASED SYSTEMS Speakers: Davide Pagano, Mikel Vuka, Marco Rabozzi, Riccardo Cattaneo, Donatella Sciuto and Marco D. Santambrogio, Politecnico di Milano, IT Abstract Field Programmable Gate Arrays (FPGAs) systems are being more and more frequent in high performance applications. Temperature affects both reliability and performance, therefore its optimization has become challenging for system designers. In this work we present a novel thermal aware floorplanner based on both Simulated Annealing (SA) and Mixed- Integer Linear Programming (MILP). The proposed method takes into account an accurate description of heterogeneous resources and partially reconfigurable constraints of recent FPGAs. Our major contribution is to provide a high level formulation for the problem, without resorting to low level consideration about FPGAs resources. Within our approach we combine the benefits of SA and MILP to handle both linear and non-linear optimization metrics while providing an effective exploration of the solution space. Experimental results show that, for several designs, it is possible to reduce the peak temperature by taking into account power consumption during the floorplanning stage. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-14	FEEDBACK-BUS OSCILLATION RING: A GENERAL ARCHITECTURE FOR DELAY CHARACTERIZATION AND TEST OF INTERCONNECTS Speakers: Shi-Yu Huang¹, Meng-Ting Tsai¹, Kun-Han Tsai² and Wu-Tung Cheng² ¹National Tsing Hua University, TW; ²Mentor, US Abstract In this paper we propose a flexible delay characterization and test method for arbitrary die-to-die interconnects in a 3D IC. As compared to previous works, it is unique in its ability to streamline the characterization/test operations for a set of arbitrary interconnects with multiple pins sprawling multiple dies. During the Design-for-Testability stage, one common feedback-bus (connected to all dies in the IC under characterization/test) is inserted. Through the feedback-bus, a oscillation ring can be formed dynamically and the Variable-Output-Threshold (VOT) technique can be applied to characterize the delay of a selected interconnect segment at a time. Experimental results indicate that this method is not only flexible and scalable, but requiring only a small area overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-15	ANALOG NEUROMORPHIC COMPUTING ENABLED BY MULTI-GATE PROGRAMMABLE RESISTIVE DEVICES Speakers: Vehbi Calayir, Mohamed Darwish, Jeffrey Weldon and Larry Pileggi, Carnegie Mellon University, US Abstract Analog neural networks represent a massively parallel computing paradigm by mimicking the human brain. Two important functions that are not efficiently built by CMOS technology for their practical hardware implementations are weighting for synapse circuits and summing for neuron circuits. In this paper we propose the use of tunable analog resistances, such as multi-gate graphene devices, to efficiently enable these two functions. We design and demonstrate a complete analog neuromorphic circuitry enabled by such devices. Simulation results based on Verilog-A compact models for graphene devices confirm its functionality. We also provide experimental demonstration of our proposed graphene device along with projected circuit performance based on scaling targets. Our proposed design is suitable not only for the device example shown in this paper, but also for any beyond-CMOS technology that exhibits similar device characteristics. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-16	AN ENERGY-EFFICIENT NON-VOLATILE IN-MEMORY ACCELERATOR FOR SPARSE-REPRESENTATION BASED FACE RECOGNITION Speakers: Yuhao Wang¹, Hantao Huang¹, Leibin Ni¹, Hao Yu¹, Mei Yan¹, Chuliang Weng², Wei Yang² and Junfeng Zhao² ¹Nanyang Technological University, SG; ²Shannon Laboratory, Huawei Technologies Co., Ltd, CN Abstract Data analytics such as face recognition involves large volume of image data, and hence leads to grand challenge on mobile platform design with strict power requirement. Emerging non-volatile STT-MRAM has the minimum leakage power and comparable speed to SRAM, and hence is considered as a promising candidate for data-oriented mobile computing. However, there exists significantly higher write-energy for STT-MRAM when compared to the SRAM. Based on the use of STT- MRAM, this paper introduces an energy-efficient non-volatile in-memory accelerator for a sparse-representation based face recognition algorithm. We find that by projecting high-dimension image data to much lower dimension, the current scaling for STT-MRAM write operation can be applied aggressively, which leads to significant power reduction yet maintains quality-of-service for face recognition. Specifically, compared to a baseline with SRAM, leakage power and dynamic power are reduced by 91.4% and 79% respectively with only slight compromise on recognition rate. Download Paper (PDF; Only available from the DATE venue WiFi)

UB08 Session 8

Date: Wednesday 11 March 2015
Time: 16:00 - 18:00
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB08.1	VDA-ADMF: AN AGILE MIGRATION FRAMEWORK FOR ANALOG LAYOUT DESIGN Presenter: Po-Cheng Pan, National Chiao Tung University, TW Authors: Ching-Yu Chin¹, Hung-Ming Chen¹, Tung-Chieh Chen², Jou-Chun Lin² and Yi-Peng Weng³ ¹National Chiao Tung University, TW; ²Synopsys Co., Ltd., TW; ³Taiwan Semiconductor Manufacturing Company, TW Abstract Layout generation in the late analog CMOS design is challenging by its increasing layout constraints and performance requirements. However, iterative refinement on manual design damages the productivity of analog layout. Therefore, it is more efficient to enroll the know-how from existing design instead of generating a new one. To contend with time-consuming analog layout for more possibilities, this software aims to demonstrate a fast layout prototyping framework for migration purpose into real layout design. In our framework, a reference analog layout design is given to generate potential layout candidates at the objective technology. The demonstration includes the original layout, the extracted topology with placement and routing, the generated layout figures, the dumped layout results and the simulated results. This procedure of migration provides a convincing exhibition of our migration framework. More information ...
UB08.2	AN FPGA LAB-ON-CHIP: AN ANALYSIS TOOL AND FRAMEWORK FOR ADVANCED MEASUREMENTS AND RELIABILITY ASSESSMENTS ON MODERN NANOSCALE FPGAS Presenter: Petr Pfeifer, Technical University of Liberec, CZ Abstract Wide portfolio of new technologies in design and manufacturing of advanced integrated circuits enables higher integration of complex structures at ultra-high nanoscale densities, but also sensitivity to various changes of the internal nanostructures and their parameters, resulting in the requirement of advanced reliability assessments. The developed and presented revolutionary new set of tools enables complex lab-on-chip solutions in nanoscale FPGAs and it allows easy implementation of tasks like completely on-chip internal parameter measurements in FPGAs, actual structure delays with respect to environmental parameters, device and platform identification, validation of selected design parameters, identification of crosstalk path and mutual impacts, as well as various changes in internal parameters. It actively supports design reconfiguration. The set of tools can be used for fast standalone or system built-in post-production device and platform parameter and quality checking and validation, parameter-aware placement and routing of critical design parts and performance optimization of existing designs, device aging identification and measurement, active and online data generation for reliability assessments and design reliability enhancements. It is available for FPGAs from 90nm down and will be demonstrated on advanced 28nm Xilinx FPGAs. More information ...
UB08.3	RSOC FRAMEWORK: FRAMEWORK FOR RAPID PROTOTYPING OF APPLICATIONS ON RECONFIGURABLE SOCS Presenter: Korcek Pavol, Brno University of Technology, CZ Authors: Jan Viktorin, Vlastimil Kosar and Jan Korenek, Brno University of Technology, CZ Abstract Recent chips with ARM based processors and FPGA logic provide potential for many applications. IP cores and operating systems (OS) have been prepared to simplify development. However the integration of IP cores and OS is not covered by any development tool yet. We propose universal Reconfigurable System on Chip (RSoC) Framework to support rapid prototyping of different applications on these chips. Application can run in FPGA and/or in processor and RSoC Framework covers all mutual communication. More information ...
UB08.4	ISP RAS VERIFICATION TOOLS: INTEGRATED APPROACH TO HARDWARE VERIFICATION AT UNIT AND SYSTEM LEVELS BASED ON STATIC AND DYNAMIC METHODS Presenter: Andrei Tatarnikov, Institute for System Programming of the Russian Academy of Sciences (ISP RAS), RU Authors: Mikhail Chupilko, Alexander Kamkin, Artem Kotsynyak and Sergey Smolov, Institute for System Programming of the Russian Academy of Sciences (ISP RAS), RU Abstract Verification has long been recognized as an integral part of the hardware design process. As each hardware design is developed from unit- and core-level point of view, verification process should account this fact and provide means for dealing with both of them. Applied approaches include both static (formal methods, source code analysis) and dynamic (testing) methods. To facilitate verification, it is important to provide a uniform methodology that would allow integrating different approaches. In this work, we present a set of verification tools that takes advantage exactly of combining static and dynamic approaches. This allows knowledge sharing between tools, which helps to build more accurate models of hardware designs to be used in verification activities at different levels of abstraction. Brief descriptions of the tools are given below. MicroTESK is a reconfigurable (retargetable and extendable) model-based test program generator for microprocessors and other programmable devices. Lightweight formal specifications customize the generator for a particular architecture and provide knowledge about situations to be covered by tests. A convenient test template framework allows rapid development of complex verification scenarios. Being retargetable, MicroTESK is able to support various RISC and CISC architectures. C++TESK is an open-source C++ based toolkit intended for automated functional testing of software components (mostly in C/C++) and RTL (HDL) models of digital hardware (in Verilog and VHDL). The main part of the toolkit is a library of C++ classes and macros that define facilities for constructing formal specifications (reference models), adapters of components under test, test scenarios and test coverage metrics. Basing on C++ descriptions provided by a user, a test system is compiled. It allows automatically generating and applying sequences of stimuli to the component under test, checking correctness of its reactions and collecting statistics on test execution. Besides the basic library, the toolkit includes a report generator, means for parallelizing test execution on computer clusters, and Eclipse-based IDE. The toolkit is planned to be integrated into UVM methodology. Retrascope is an extendable toolkit for RTL (HDL) models transformation and functional verification at unit level. Analyzing source HDL-code, it extracts control and data flows, transforms them into Extended Finite State Machines (EFSM), and generates covering test sequences for them. The toolkit supports RTL modules written in VHDL and Verilog. It can be used both from command line and from Eclipse-based IDE. More information ...
UB08.5	HIPER-NIRGAM: A TOOL CHAIN BASED FRAMEWORK FOR MODELLING THERMAL - AWARE RELIABILITY ESTIMATION IN 2D MESH NOCS Presenter: Ashish Sharma, Malaviya National Institute of Technology, Jaipur, IN Authors: Manoj Singh Gaur¹, Lava Bhargava¹, Vijay Laxmi¹ and Mark Zwolinski² ¹Malaviya National Institute of Technology, Jaipur, IN; ²University of Southampton, GB Abstract Every three years, power density in system-on-chip (SoCs) gets doubled. As the semiconductor technology is scaling, the number of cores and interconnect network connections are increasing. To improve system performance while meeting permissible power limits, Chip-Multi Processors (CMPs) and many-core processors have emerged as an appealing solution. One of the significant aspects of many-core design is an on chip interconnect network that can effectively support intra-core and inter-core communications. This interconnect should be scalable, support high communication bandwidth and multiple concurrent connections among cores. Network-on-chip (NoC) replaces the traditional bus based interconnect architecture as former is scalable, has higher bandwidth, fault tolerance and offers parallelism. Regular NoC topologies improve scalability too. Adaptive NoC routing solutions distribute power densities and delay onset of hotspot creation. With ever-growing demand of computation and communication bandwidth by applications, the system designer need to consider and address resultant power and thermal issues in SoC as well as NoC design. Design tools need to incorporate thermal effects in design and evaluation of prototypes. Abstract--- Regional temperature differential and hotspots are two thermal problems in network-on-chip. On-chip thermal problems have an adverse impact on system performance and reliability. We propose creation of a toolchain based framework for incorporating thermal evaluation of NoC through existing simulation tools. Our proposed framework provides an integration of NoC simulator with power and thermal simulation models for analyzing the thermal hotspots and can be used for thermal-aware reliability estimation. In our framework, reliability estimation is based on life time failure models such as TDDB (Time dependent dielectric breakdown), NBTI (Negative bias temperature instability) and SM (Stress Migration). In our proposed reliability measurement is based on MTTF (Mean time to failure) comparative value. Our tool chain consists NIRGAM as a NoC simulator, NoC configuration parameters such as number of virtual channel, buffer size, routing logic, simulation cycles and application traffic are passed to power models (Orion 2.0 and McPAT). Power models provide the power trace and area of given NoC configuration. The power model results are further used in Hotspot 5.02 [HOTSPOT] thermal simulation model for generating floorplan and temperature trace (steady temperature file). The steady temperature trace used in reliability estimation tool REST [REST_tool] to estimating MTTF vales. Abstract--- We believe that this generic framework can be used by researchers on academia and industry to incorporate thermal-aware reliability estimation in their design exploration. More information ...
UB08.7	ID.FIX: AN EDA TOOL FOR FIXED-POINT REFINEMENT OF EMBEDDED SYSTEMS Presenter: Olivier Sentieys, INRIA, FR Authors: Daniel Menard¹ and Nicolas Simon² ¹INSA Rennes, FR; ²INRIA, FR Abstract Most of digital image and signal processing algorithms are implemented into architectures based on fixed-point arithmetic to satisfy cost and power consumption constraints associated with most of embedded and cyber-physical systems. The fixed-point conversion process (or refinement) is crucial for reducing the time-to-market and design tools to automate this phase and to explore the design space are still lacking. The ID.Fix EDA tool, based on the compiler infrastructure GECOS, allows for the conversion of a floating-point C source code into a C code using fixed-point data types. The data word-lengths are optimized by minimizing the implementation cost under accuracy constraint. To achieve low optimization time, an analytical approach is used to evaluate the fixed-point computation accuracy. This approach is valid for systems made-up of any smooth arithmetic operations. Commercial tools can then be used to synthesize the architecture or to perform software compilation from the output fixed-point description of the application. Thus, the goal is to bridge the gap between the floating-point description developed by algorithm designer and the fixed-point description use as input for high-level synthesis or compilation tools. More information ...
UB08.8	ISIS: CUSTOMIZABLE RUNTIME VERIFICATION OF HARDWARE/SOFTWARE VIRTUAL PLATFORMS Presenter: Laurence Pierre, TIMA, FR Author: Martial Chabot, TIMA, FR Abstract Debugging today's hardware/software embedded systems is a complex process. We have previously described our tool, ISIS, that enables the runtime Assertion-Based Verification (ABV) of temporal requirements for high-level (SystemC TLM) models of such systems. We present here an extended version of the tool, that gives the user the possibility to customize and to optimize the verification process. More information ...
UB08.9	INTERACTIVE VISUALIZATION OF ESL DESIGNS Presenter: Jannis Stoppe, University of Bremen, DE Authors: Robert Wille and Rolf Drechsler, University of Bremen/DFKI GmbH, DE Abstract In this work, we propose an improved visualization tool for SystemC which assists a designer in communicating a system's structure and behavior. Please see the uploaded pdf-file for details. More information ...
18:00	End of session
19:30	DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.1 SPECIAL DAY Panel: Security and Verification for the IoT

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Salle Oisans

Organisers:
Rolf Drechsler, University of Bremen/DFKI GmbH, DE
Dominique Borrione, TIMA Lab, UGA, FR

Chair:
Dominique Borrione, TIMA Lab, UGA, FR

Co-Chair:
Guy Gogniat, Lab-STICC, Université de Bretagne-Sud, Lorient, FR

Panelists:

Erdinç Öztürk, Ticaret University, TR
Guido Bertoni, STMicroelectronics, IT
Francois-Xavier Standaert, Université Catholique de Louvain, BE
Christoph Grimm, University of Kaiserslautern, DE
Sandip Kundu, University of Massachusetts Amherst, US

18:30

End of session

19:30

DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France)

Musée de Grenoble

As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s.

During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century).

Please kindly note that it is not a seated dinner.

All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person.

How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.2 Flash Memories & Numerical Approximation

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Belle Etoile

Chair:
Philippe Coussy, Universite de Bretagne-Sud, FR

Co-Chair:
Zili Shao, HongKong Polytechnic University, HK

This session presents papers on flash memories and numerical approximations. The first paper presents a new flash memory management scheme to extend the product lifetime of SSDs. The second paper describes a hardware accelerator approach for solving linear equations. The third paper describes an approach for automatically synthesizing non-linear 2D function approximations without requiring costly, and power-hungry, multipliers. Finally, the session ends with two interactive presentations that address progressive wear-leveling for flash memories, and security threats and countermeasures for solid-state drives.

Time	Label	Presentation Title Authors
17:00	8.2.1	HLC: SOFTWARE-BASED HALF-LEVEL-CELL FLASH MEMORY Speakers: Han-Yi Lin¹ and Jen-Wei Hsieh² ¹National Taiwan University, TW; ²National Taiwan University of Science and Technology, TW Abstract In recent years, flash memory has been widely used in embedded systems, portable devices, and high-performance storage products due to its non-volatility, shock resistance, low power consumption, and high performance natures. To reduce the product cost, multi-level-cell flash memory (MLC) has been proposed; compared with the traditional single-level-cell flash memory (SLC) that only stores one bit of data per cell, each MLC cell can store two or more bits of data. Thus MLC can achieve a larger capacity and reduce the cost per unit. However, MLC also suffers from the degradation in both performance and reliability. In this paper, we try to enhance the reliability and reduce the product cost of flash-memory based storage devices from a totally different perspective. We propose a half-level-cell (HLC) management scheme to manage and reuse the worn-out space in solid-state drives (SSDs); through our management scheme, the system can treat two corrupted pages as a normal page without sacrificing performance and reliability. To the best of our knowledge, this is the first research that reclaims free space by reviving the corrupted pages. The experiment results show that the lifetime of SSD can be extended by 48.54% for the trace of general users applications with our proposed HLC management scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.2.2	AHEAD: AUTOMATED FRAMEWORK FOR HARDWARE ACCELERATED ITERATIVE DATA ANALYSIS Speakers: Ebrahim M. Songhori, Xuyang Lu and Farinaz Koushanfar, Rice University, US Abstract This paper introduces AHEAD, a novel domain-specific framework for automated (hardware-based) acceleration of massive data analysis applications with a dense (non-sparse) correlation matrix. Due to non-scalability of matrix inversion, often iterative computation is used for converging to a solution. AHEAD addresses two sets of domain-specific matrix computation challenges. First, the I/O and memory bandwidth constraints which limit the performance of hardware accelerators. Second, the hardness of handling large data because of the complexity of the known matrix transformations and the inseparability of non-sparse correlations. The inseparability problem translates to an increased communication cost with the accelerators. To optimize the performance within these limits, AHEAD learns the dependency structure of the domain data and suggests a scalable matrix transformation. The transformation minimizes the memory access required for matrix computing within an error threshold and thus, optimizes the mapping of domain data to the available (bandwidth constrained) accelerator resources. To facilitate automation, AHEAD also provides an Application Programming Interface (API) so users can customize the framework to an arbitrary iterative analysis algorithm and hardware mapping. Proof-of-concept implementation of AHEAD is performed on the widely used compressive sensing and general l1 regularized least squares solvers. On a massive light field imaging data set with 4.6B non-zeros, AHEAD attains up to 320x iteration speed improvement using reconfigurable hardware accelerators compared with the conventional solver and about 4x improvement compared to our transformed matrix solver on a general purpose processor (without hardware acceleration). Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.2.3	DESIGN METHOD FOR MULTIPLIER-LESS TWO-VARIABLE NUMERIC FUNCTION APPROXIMATION Speakers: Jochen Rust and Steffen Paul, University of Bremen, DE Abstract In this paper a novel method for hardware-based realization of two-variable numeric functions is introduced. The main idea is based on the extension of the well-known piecewise linear approximation technique, which is often used for the calculation of one-variable elementary functions. A non-uniform and plane segmentation scheme enables quick segment access at runtime; the use of multiplier-less linear equations causes high performance in terms of throughput. As both the extraction of approximation-related parameters and its mapping to corresponding hardware elements is automated, the design time is also reduced to a minimum. For evaluation, several approximations with varying constraints are generated and compared on the algorithmic level to one another as well as to actual references. In conjunction with the results of logical and physical CMOS synthesis, our work turns out to be highly efficient in terms of throughput, memory requirements and energy consumption. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP4-1, 702	PWL: A PROGRESSIVE WEAR LEVELING TO MINIMIZE DATA MIGRATION OVERHEADS FOR NAND FLASH DEVICES Speakers: Fu-Hsin Chen¹, Ming-Chang Yang², Yuan-Hao Chang³ and Tei-Wei Kuo⁴ ¹Department of Computer Science and Information Engineering, National Taiwan University, TW; ²Graduate Institute of Networking and Multimedia, National Taiwan University, TW; ³Institute of Information Science, Academia Sinica, TW; ⁴Academia Sinica & National Taiwan University, TW Abstract As the endurance of flash memory keeps deteriorating, exploiting wear leveling techniques to improve the lifetime/endurance of flash memory has become a critical issue in the design of flash storage devices. In contrast to existing wear-leveling techniques that aggressively distributes the erases to all flash blocks by a fixed threshold, we propose a progressive wear leveling design to perform wear leveling in a "progressive" way to prevent any block from being worn out prematurely, and thereby to ultimately minimize the performance overheads caused by the unnecessary data migration. The results reveal that, instead of sacrificing the device lifetime, performing wear leveling in such a progressive way can not only minimize the performance overheads but even have potentials to extend the device lifespan. Download Paper (PDF; Only available from the DATE venue WiFi)
18:31	IP4-2, 705	TOWARDS TRUSTABLE STORAGE USING SSDS WITH PROPRIETARY FTL Speakers: Xiaotong Cui¹, Minhui Zou¹, Liang Shi² and Kaijie Wu¹ ¹Chongqing University, CN; ²College of Computer Science, Chongqing University, CN Abstract In recent years, we have seen an increasing deployment of flash-based storage, such as SSD, in mission-critical applications due to its fast read/write speed, small form factor, strong shock resistance, and etc. SSD uses a host interface and a middle layer called flash translation layer (FTL) to maintain the compatibility with the traditional magnetic-based HDD. Unlike the traditional HDD where the host OS has the full control on where to access the data, SSD uses FTL to translate and implement all operations, and OS has no such control. Even worse, FTL, which is considered as one of most important intellectual property of SSD, is often proprietary. This brings up a security concern on design trustworthiness: what if the manufacturer either accidentally or intentionally implement those operations incorrectly or even maliciously? In this paper we analyze the possible threats and propose a simple yet effective countermeasure. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session
19:30		DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.3 Dynamic Thermal Management for Multi-cores

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Stendhal

Chair:
Georgios Karakonstantis, Queen's University, GB

Co-Chair:
José Luis Ayala, Complutense University of Madrid, ES

This session covers the state-of-the-art in dynamic thermal management techniques for multi-core platforms in the mobile and high-performance computing domains. The issues addressed include thermal stress, thermal modeling and emerging cooling solutions.

Time	Label	Presentation Title Authors
17:00	8.3.1	A THERMAL STRESS-AWARE ALGORITHM FOR POWER AND TEMPERATURE MANAGEMENT OF MPSOCS Speakers: Mehdi Kamal¹, Arman Iranfar¹, Ali Afzali-Kusha¹ and Massoud Pedram² ¹University of Tehran, IR; ²University of Southern California, US Abstract In this work, we propose a thermal stress-aware algorithm for the management of the power and temperature in MPSoCs. The algorithm, which uses a heuristic approach, controls the power consumption, maximum temperature, thermal cycles, and temporal/spatial thermal gradients of MPSoCs. At the top level, the decision on turning the cores on and off is made based on the constraints of peak temperature, maximum spatial thermal gradients, and power consumption. At the next tier,the optimal frequencies (and supply voltages) of the ON cores, formulated in a convex optimization problem, are determined again based on satisfying the constraints of the maximum total power consumption, peak temperature, thermal cycles, and also temporal thermal gradients. The technique may be applied to both the heterogeneous and homogenous MPSoCs. The efficacy of the proposed approach in reducing the thermal cycles as well as temporal thermal gradients is evaluated by comparing its results with a similar previous power and temperature management approach. The evaluation which is performed on 8-core processors under Splash2 benchmarks, demonstrates the ability of the suggested technique in limiting a considerable reduction in the thermal stress parameters. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.3.2	PREDICTIVE DYNAMIC THERMAL AND POWER MANAGEMENT FOR HETEROGENEOUS MOBILE PLATFORMS Speakers: Gaurav Singla¹, Gurinderjit Kaur¹, Ali Unver² and Umit Y Ogras¹ ¹Arizona State University, US; ²Intel Corporation, US Abstract Heterogeneous multiprocessor system-on-chip (MPSoCs) powering mobile platforms integrate multiple asymmetric CPU cores, GPU, and many specialized processors. When the MPSoC operates close to its peak performance, power dissipation easily increases the temperature, hence adversely impacts reliability. Since using a fan is not a viable solution for hand-held devices, there is a strong need for dynamic thermal and power management (DTPM) algorithms that can regulate temperature with minimal performance impact. This paper presents a DTPM algorithm based on a practical temperature prediction methodology using system identification. The DTPM algorithm dynamically computes a power budget using the predicted temperature, and controls the types and number of active processors as well as their frequencies. Experiments on an octa-core big.LITTLE processor and common Android apps demonstrate that the proposed technique predicts temperature within 3% accuracy, while the DTPM algorithm provides around 6x reduction in temperature variance, and as large as 16% reduction in total platform power compared to using a fan. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.3.3	POWER-EFFICIENT CONTROL OF THERMOELECTRIC COOLERS CONSIDERING DISTRIBUTED HOT SPOTS Speakers: Mohammad Javad Dousti and Massoud Pedram, University of Southern California, US Abstract Thermoelectric coolers are compact devices that can target hot spots on a VLSI die. These devices are connected electrically in series and controlled together, i.e., all are ON or OFF at the same time. However, spatial and temporal distributions of hot spots on a VLSI die are non-uniform, and therefore, activating all of TECs to address one or a few localized hot spots is not economical. This traditional technique indeed leads to a significant power waste. This paper suggests that adjacent hot spots with the same thermal behavior can be grouped and controlled by a cluster of TECs. A bypass switch for each TEC cluster is added in order to allow selectively turning OFF some TEC clusters which are needed. More precisely, a clustering problem is formulated which aims to minimize the power waste due to excessive use of TECs. Due to the large number of variables in problems of interesting sizes, a greedy heuristic method for solving the problem is introduced. It is shown that the proposed heuristic can reduce the wasted power on average by 81% and also decrease the total TEC power consumption on average by 42%. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP4-3, 148	USER-SPECIFIC SKIN TEMPERATURE-AWARE DVFS FOR SMARTPHONES Speakers: Begum Birsen Egilmez¹, Gokhan Memik¹, Seda Ogrenci-Memik¹ and Oğuz Ergin² ¹Northwestern University, US; ²TOBB University of Economics and Technology, TR Abstract Skin temperature of mobile devices intimately affects the user experience. Power management schemes built into smartphones can lead to quickly crossing a user's threshold of tolerable skin temperature. Furthermore, there is a significant variation among users in terms of their sensitivity. Hence, controlling the skin temperature as part of the device's power management scheme is paramount. To achieve this, we first present a method for estimating skin and screen temperature at run-time using a combination of available on-device thermal sensors and performance indicators. In an Android-based smartphone, we achieve 99.05% and 99.14% accuracy in estimations of back cover and screen temperatures, respectively. Leveraging this run-time predictor, we develop User-specific Skin Temperature-Aware (USTA) DVFS mechanism to control the skin temperature. Performance of USTA is tested both with benchmarks and user tests comparing USTA to the standard Android governor. The results show that more users prefer to use USTA as opposed to the default DVFS mechanism. Download Paper (PDF; Only available from the DATE venue WiFi)
18:31	IP4-4, 503	FORMAL PROBABILISTIC ANALYSIS OF DISTRIBUTED DYNAMIC THERMAL MANAGEMENT Speaker: Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE Authors: Shafaq Iqtedar¹, Osman Hasan², Muhammad Shafique³ and Joerg Henkel³ ¹National University of Sciences and Technology (NUST), Islamabad, PK; ²National University of Sciences and Technology (NUST), Islamabad, ; ³Karlsruhe Institute of Technology (KIT), DE Abstract The prevalence of Dynamic Thermal Management (DTM) schemes coupled with demands for high reliability motivate the rigorous verification and testing of these schemes before deployment. Conventionally, these schemes are analyzed using either simulations or by running on real systems. But these traditional analysis techniques cannot exhaustively validate the distributed DTM schemes and thus compromise on the accuracy of the analysis results. Moreover, the randomness due to task assignments, task completion times and re-mappings, is often ignored in the analysis of distributed DTM schemes. We propose to overcome both of these limitations by using probabilistic model checking, which is a formal method for modeling and verifying concurrent systems with randomized behaviors. The paper presents a case study on the formal verification of a state-ofthe- art distributed DTM scheme using the PRISM model checker. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session
19:30		DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.4 Industrial System Design Opportunities

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Chartreuse

Chair:
Norbert Wehn, University of Kaiserslautern, DE

Co-Chair:
David Raphael, CEA-LIST, FR

This session introduces Innovative experiments from Industry that address the challenges of system design Each experiment presents a demonstrator and shows a substantial measurable economic and or strategic impact.

Time	Label	Presentation Title Authors
17:00	8.4.1	DSP BASED PROGRAMMABLE FHD HEVC DECODER Speakers: Sangjo Lee¹, Joonho Song², Wonchang Lee², Doohyun Kim², Jaehyun Kim² and Shihwa Lee² ¹SAMSUNG Electronics, KR; ²Samsung Electronics, Abstract A programmable video decoding system with multi-core DSP and co-processors is presented. This system is adopted by Digital TV System on Chip (SoC) and is used for FHD High Efficiency Video Coding (HEVC) decoder under 400MHz. Using the DSP based programmable solution, we can reduce commercialization period by one year because we can parallelize algorithm development, software optimization and hardware design. In addition to the HEVC decoding, the proposed system can be used for other application such as other video decoding standard for multi-format decoder or video quality enhancement. Download Paper (PDF; Only available from the DATE venue WiFi)
17:15	8.4.2	ACCELERATING COMPLEX BRAIN-MODEL SIMULATIONS ON GPU PLATFORMS Speakers: HoangAnh DuNguyen¹, Zaid Al-Ars¹, Georgios Smaragdos² and Christos Strydis² ¹Delft University of Technology, NL; ²Erasmus Medical Center, NL Abstract The Inferior Olive (IO) in the brain, in conjunction with the cerebellum, is responsible for crucial sensorimotor-integration functions in humans. In this paper, we simulate a computationally challenging IO neuron model consisting of three compartments per neuron in a network arrangement on GPU platforms. Several GPU platforms of the two latest NVIDIA GPU architectures (Fermi, Kepler) have been used to simulate large-scale IO-neuron networks. These networks have been ported on 4 diverse GPU platforms and implementation has been optimized, scoring 3x speedups compared to its unoptimized version. The effect of GPU L1-cache and thread block size as well as the impact of numerical precision of the application on performance have been evaluated and best configurations have been chosen. In effect, a maximum speedup of 160x has been achieved with respect to a reference CPU platform. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.4.3	A PACKET-SWITCHED INTERCONNECT FOR CONTROL APPLICATIONS ON ZYNQ WITH BEST-EFFORT AND REAL-TIME SERVICES Speakers: Runan Ma¹, Axel Jantsch² and Zhida Hui³ ¹FuDan University, CN; ²Vienna University of Technology, AT; ³Memcom.Soc Microelectronics Ltd, CN Abstract A packet-switched interconnect design which supports real-time, best-effort and a data management engine (DME) for industrial control application based on the Zynq7000 FPGA platform is described. This interconnect uses a modified round-robin arbitration and realizes a combination of best-effort and real-time communication with time stamping. The interconnect offers an intuitive API for SW programmers and addresses real-time issues as found in hierarchical control loops of industrial applications such as vision-guided high-speed sorting systems. A comparison of data transfer performance between the proposed interconnect and a bus based structure is made. Download Paper (PDF; Only available from the DATE venue WiFi)
17:45	8.4.4	(Best Paper Award Candidate) REDUCING TRACE SIZE IN MULTIMEDIA APPLICATIONS ENDURANCE TESTS Speakers: Serge Vladimir Emteu Tchagou¹, Alexandre Termier², Jean-François Méhaut³, Brice Videau³, Miguel Santana⁴ and René Quiniou⁵ ¹University of Grenoble Alpes, STMicroelectronics, FR; ²University of Rennes 1, FR; ³University of Grenoble Alpes, FR; ⁴STMicroelectronics, FR; ⁵INRIA Rennes, FR Abstract Proper testing of applications over embedded systems such as set-top boxes requires endurance tests, i.e. running applications for extended periods of times, typically several days. In order to understand bugs or poor performances, execution traces have to be analyzed, however current trace analysis methods are not designed to handle several days of execution traces due to the huge quantity of data generated. Our proposal, designed for regular applications such as multimedia decoding/encoding, is to monitor execution by analysing trace on the fly in order to record trace only in time periods where a suspicious activity is detected. Our experiments show a significant reduction in the trace size compared to recording the whole trace. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.4.5	EXPLORATION AND DESIGN OF EMBEDDED SYSTEMS INCLUDING NEURAL ALGORITHMS Speakers: Jean-Marc Philippe¹, Alexandre Carbon¹, Olivier Brousse² and Michel Paindavoine² ¹CEA LIST, FR; ²GlobalSensing Technologies, FR Abstract The current trend in embedded systems is to make them surrounding the users, providing services thanks to a knowledge of their environment. These self-awareness and context-awareness properties are provided by numerous sensors, from different types. Using the provided information causes at least two problems: the fusion of data from different sources, and the noise induced by sensors which are closer from the processing unit than ever. Additionally, the needed applications that use these information are based on different recognition processings, sometimes not easy to formalize with conventional algorithms. Processing chains using neural-based algorithms are promising approaches for solving these kinds of issues. Unfortunately, embedding bio-inspired algorithms in an embedded system is not so easy since there is no exploration environment for this specific task. Moreover, neural networks often need pre- or post-processing of data for optimal operation. This paper presents early results of a collaboration towards the design of such an exploration environment coming from a joint laboratory between an SME and a Research Institute. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	8.4.6	A NEW DISTRIBUTED FRAMEWORK FOR INTEGRATION OF DISTRICT ENERGY DATA FROM HETEROGENEOUS DEVICES Speakers: Francesco Gavino Brundu¹, Edoardo Patti¹, Andrea Acquaviva¹, Michelangelo Grosso², Gaetano Rasconà², Salvatore Rinaudo³ and Enrico Macii¹ ¹Politecnico di Torino, IT; ²ST-Polito s.c.ar.l., IT; ³STMicroelectronics s.r.l., IT Abstract The introduction of "smart" low-cost sensing (and actuating) devices enabled the recent diffusion of technological products within the "Internet of Things" paradigm. In a city district context, such devices are crucial for visualization and simulation of energy consumption trends, to increase the energy distribution network efficiency and promote user awareness. Nevertheless, to unlock the potential of this technology, many challenges have to be faced at district level due to the current lack of interoperability between heterogeneous data sources. In this work, we introduce an original infrastructure model, which efficiently manage and integrate district energy data. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session
19:30		DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.5 Hot Topic - Spintronics based Computing

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Meije

Organisers:
Weisheng Zhao, University Paris‐Sud/CNRS, FR
Lionel Torres, LIRMM, CNRS/University of Montpellier, FR

Chair:
Lionel Torres, LIRMM, CNRS/University of Montpellier, FR

Co-Chair:
Weisheng Zhao, University Paris‐Sud/CNRS, FR

Thanks to its non-volatility, fast data access, low power and infinite endurance, Spintronics (Nobel Prize 2007) is considered as one of the major technologies beyond CMOS to overcome the power and speed bottlenecks of advance computing systems. This topic is under intense study from device to system levels by both academics and industries. This session brings together the worldwide leading experts to share their recent results and discuss future challenges.

Time	Label	Presentation Title Authors
17:00	8.5.1	SPINTRONIC DEVICES AS KEY-ELEMENTS FOR ENERGY-EFFICIENT NEUROINSPIRED ARCHITECTURES Speakers: Nicolas Locatelli¹, Adrien F. Vincent², Alice Mizrahi², Joseph S. Friedman², Damir Vodenicarevic², Joo-Von Kim², Jacques-Olivier Klein², Weisheng Zhao³, Julie Grollier⁴ and Damien Querlioz² ¹nstitut d’Electronique Fondamentale, Univ. Paris-Sud, CNRS, FR; ²Institut d’Electronique Fondamentale, Univ. Paris-Sud, CNRS, FR; ³Spintronics Interdisciplinary Center, Beihang University, Beijing, CN; ⁴Unite Mixte de Physique CNRS/Thales and Universite Paris-Sud, FR Abstract Processing the data deluge using current CMOS architectures requires a remendous amount of energy, as the latter has proved to lack efficiency in tasks such as data mining, recognition and synthesis. Alternative models of computation such as neuroinspiration can be more efficient for this kind of tasks, but do not map ideally to traditional CMOS. Spintronics, in contrast, can offer features such as embedded nonvolatile memory, stochastic and memristive behavior, which, when associated with CMOS, can be key enablers of neuroinspired computing. In this paper, we explore different works that go in this direction. First, we illustrate how recent developments of embedded nonvolatile memory based on magnetic tunnel junctions (MTJs) can ideally provide the large amount of nonvolatile memory required in neuro-inspired designs, while avoiding Von Neumann bottleneck. Second, we show that recently developed spintronics memristors can implement artificial synapses for neuromorphic systems. With a more breakthrough design, we show how probabilistic writing of a single MTJ bit can efficiently replace multi-level weighting in some classes of neuroinspired architectures. Finally, we show that a special class of MTJs can exhibit the phenomenon of stochastic resonance, a strategy used in biological systems to detect weak signals. These results suggest that the impact of spintronics may go far beyond the traditional standalone and embedded memory markets. Download Paper (PDF; Only available from the DATE venue WiFi)
17:20	8.5.2	GIANT SPIN HALL EFFECT (GSHE) LOGIC DESIGN FOR LOW POWER APPLICATION Speakers: Yaojun Zhang¹, Bonan Yan¹, Wenqing Wu², Hai Li¹ and Yiran Chen¹ ¹University of Pittsburgh, US; ²Qualcomm Incorporated, US Abstract Conventional CMOS transistors will reach its power wall, a huge leakage power consumption limits the performance growth when technology scales down, especially beyond 45nm technology nodes. Spin based devices are one of the alternative computing technologies that aims to replace the current MOS based circuits by taking the advantage of their attractive characteristics, including non-volatility, high integration density and small cell area. The development of technologies such as spin transfer torque random access memory (STT-RAM) and spin torque majority gate logic has become a story of great success. However, most of these technologies faces problems like, small operation margin, poor fan-out ability, etc. As the latest spin technology, Giant Spin Hall Effect (GSHE) Magnetic Tunneling Junction (MTJ) demonstrates a much better operation speed, switching probability and resistance margin. By leveraging the benefit of greater power efficiency and area density, GSHE MTJ elements become a suitable candidate for spintronic logic gates. Compare with traditional MOS transistors based logic gates, GSHE MTJ based logic can operate as a non-volatile memory and requires a much smaller number of elements to perform same logical operations (i.e., 'AND', 'OR', 'NAND' or 'NOR' gate.). And compare with other spin based logics, GSHE MTJ based logic also provides an better performance, excellent CMOS process compatibility and great fan-out ability. Download Paper (PDF; Only available from the DATE venue WiFi)
17:40	8.5.3	SPINTRONICS-BASED NONVOLATILE LOGIC-IN-MEMORY ARCHITECTURE 　TOWARDS AN ULTRA-LOW-POWER AND HIGHLY RELIABLE VLSI COMPUTING PARADIGM Speakers: Takahiro Hanyu¹, Daisuke Suzuki¹, Naoya Onizawa¹, Shoun Matsunaga², Masanori Natsui¹ and Akira Mochizuki¹ ¹Tohoku University, JP; ²AC Technologies Co., Ltd., JP Abstract Novel logic-LSI architecture, called "spintronics-based nonvolatile logic-in-memory (NV-LIM) architecture," where nonvolatile spintronic storage elements are distributed over a logic-circuit plane, is proposed as a promising candidate to overcome performance wall and power wall due to the present CMOS-only-based logic-LSIs. Some concrete design examples based on the NV-LIM architecture are demonstrated and their usefulness is discussed in comparison with the corresponding CMOS-only-based realization. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.5.4	POTENTIAL APPLICATIONS BASED ON NVM EMERGING TECHNOLOGIES Speakers: Sophiane Senni¹, Raphael Martins Brum¹, Lionel Torres², Gilles Sassatelli¹, Abdoulaye Gamatie¹ and Bruno Mussard³ ¹LIRMM, FR; ²LIRMM, CNRS/University of Montpellier, FR; ³Crocus technology, FR Abstract Energy efficiency is a critical figure of merit for battery-powered applications. Today's embedded systems suffer from significant increase of power consumption essentially due to a high leakage current in advanced technology node. A significant portion of the total power consumption is spent into memory systems because of an increasing trend of embedded volatile memory area among the building components in System-on-Chips (SoCs). That is why new Non-Volatile Memory (NVM) technologies are considered as a potential solution to solve the energy efficiency issue. Among these NVM technologies, Magnetic RAM (MRAM) is a promising candidate to replace current memories since it combines non-volatility, high scalability, high density, low latency and low leakage. This paper explores use of MRAM into a memory hierarchy (from cache memory to register) of a processor-based system analyzing both performance and energy consumption. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	8.5.5	FROM DEVICE TO SYSTEM: CROSS-LAYER DESIGN EXPLORATION OF RACETRACK MEMORY Speakers: Guangyu Sun¹, Chao Zhang¹, Hehe Li², Yue Zhang³, Weiqi Zhang¹, Yizi Gu², Yinan Sun², Jacques-Olivier Klein³, Dafine Ravelosona³, Yongpan Liu², Weisheng Zhao⁴ and Huazhong Yang² ¹Peking University, CN; ²Tsinghua University, CN; ³Univ. Paris-Sud, FR; ⁴Beihang University, CN Abstract Recently, Racetrack Memory (RM) has attracted more and more attention of memory researchers because it has advantages of ultra-high storage density, fast access speed, and non-volatility. Prior research has demonstrated that RM has potential to replace SRAM for large capacity on-chip memory design. At the same time, it also addressed that the design space exploration of RM could be more complicated compared to traditional on-chip memory technologies for several reasons. First, a single RM cell introduces more device level design parameters. Second, considering these device-level design factors, the layout exploration of a RM array demonstrates trade-off among area, performance, and power consumption of RM circuit level design. Third, in the architecture level, the unique ``shift'' operation results in an extra dimension for design exploration. In this paper, we will review all these design issues in different layers and try to reveal the relationship among them. The experimental results demonstrate that cross-layer design exploration is necessary for racetrack memory. In addition, a system level case study of using RM in a sensor node is presented to demonstrate its advantages over SRAM or STT-RAM. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session
19:30		DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.6 Statistical Answers to Analog/Mixed Signal Design and Test Problems

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Bayard

Chair:
Jacob Abraham, The University of Texas at Austin, US

Co-Chair:
Michel Renovell, LIRMM/CNRS, FR

The session will demonstrate applications of Bayesian model fusion, machine learning classifiers, feature selection, virtual probe, and Quasi Monte Carlo for solving challenging design and test problems for analog and mixed signal circuits.

Time	Label	Presentation Title Authors
17:00	8.6.1	EFFICIENT BIT ERROR RATE ESTIMATION FOR HIGH-SPEED LINK BY BAYESIAN MODEL FUSION Speakers: Chenlei Fang¹, Qicheng Huang¹, Fan Yang¹, Xuan Zeng¹, Xin Li² and Chenjie Gu³ ¹Fudan University, CN; ²Carnegie Mellon University and Fudan University, US; ³Strategic CAD Labs, Intel Corporation, US Abstract High-speed I/O link is an important component in computer systems, and estimating its bit error rate (BER) is a critical task to guarantee its performance. In this paper, we propose an efficient method to estimate BER by Bayesian Model Fusion. Its key idea is to borrow conventional extrapolated BER value as prior knowledge, and combine it with additional measurement data to "calibrate" the BER value. This method can be viewed as an application of Bayesian Model Fusion (BMF) technique. We further propose some novel methodologies to make BMF applicable in the BER estimation case. In this way, we can sufficiently decrease the number of bits needed to estimate BER value. Several experiments demonstrate that our proposed method achieves up to 8x speed-up over direct estimation method. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.6.2	FAST DEPLOYMENT OF ALTERNATE ANALOG TEST USING BAYESIAN MODEL FUSION Speakers: John Liaperdos¹, Haralampos Stratigopoulos², Louay Abdallah², Yiorgos Tsiatouhas³, Angela Arapoyanni⁴ and Xin Li⁵ ¹Technological Educational Institute of Peloponnese, GR; ²TIMA Laboratory, Université de Grenoble-Alpes/CNRS, FR; ³University of Ioannina, GR; ⁴National and Kapodistrian University of Athens, GR; ⁵Carnegie Mellon University, US Abstract In this paper, we address the problem of limited training sets for learning the regression functions in alternate analog test. Typically, a large volume of real data needs to be collected from different wafers and lots over a long period of time to be able to train the regression functions with accuracy across the whole design space and apply alternate test with high confidence. To avoid this delay and achieve a fast deployment of alternate test, we propose to use the Bayesian model fusion technique that leverages prior knowledge from simulation data and fuses this information with data from few real circuits to draw accurate regression functions across the whole design space. The technique is demonstrated for an alternate test designed for RF low noise amplifiers. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.6.3	BORDERSEARCH: AN ADAPTIVE IDENTIFICATION OF FAILURE REGIONS Speakers: Markus Dobler¹, Manuel Harrant², Monica Rafaila², Georg Pelz², Wolfgang Rosenstiel¹ and Martin Bogdan³ ¹University of Tübingen, DE; ²Infineon Technologies, DE; ³University of Tübingen, Leipzig University, DE Abstract The reliability and safety of modern analog devices, e.g. in automotives, aircraft or consumer electronics, is influenced by many input parameters like supply voltage, ambient temperature or load resistances. In certain regions of this large parameter space, the device exhibits degraded performance or it fails completely. The validation of such a device has to find the regions of the input parameter space in which the device misbehaves. However, with several parameters, it is a complex task to determine these regions, especially if parameters interact. In this paper, we present the Bordersearch algorithm, which combines adaptive testing with a machine learning classifier to efficiently determine the border between passing and failing regions in the parameter space. Furthermore, this method enables sophisticated post-processing analysis, e.g. better visualizations and automatic ranking of the parameters according to their influence. This algorithm scales well to a high-dimensional parameter space and is robust against outliers and fuzzy borders. We show the effectiveness of this method on an automotive electro-mechanical system with eleven input parameters. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	8.6.4	A FAST SPATIAL VARIATION MODELING ALGORITHM FOR EFFICIENT TEST COST REDUCTION OF ANALOG/RF CIRCUITS Speakers: Hugo Gonçalves¹, Xin Li², Miguel Correia³, Vitor Tavares³, John Carulli⁴ and Kenneth Butler⁵ ¹CMU/FEUP, PT; ²Carnegie Mellon University, US; ³FEUP, PT; ⁴GLOBALFOUNDRIES, US; ⁵Texas Instruments, US Abstract In this paper, we adopt a novel numerical algorithm, referred to as dual augmented Lagrangian method (DALM), for efficient test cost reduction based on spatial variation modeling. The key idea of DALM is to derive the dual formulation of the L1-regularized least-squares problem posed by Virtual Probe (VP), which can be efficiently solved with substantially lower computational cost than its primal formulation. In addition, a number of unique properties associated with discrete cosine transform (DCT) are exploited to further reduce the computational cost of DALM. Our experimental results of an industrial RF transceiver demonstrate that the proposed DALM solver achieves up to 38x runtime speed-up over the conventional interior-point solver without sacrificing any performance on escape rate and yield loss for test applications. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP4-5, 656	A HYBRID QUASI MONTE CARLO METHOD FOR YIELD AWARE ANALOG CIRCUIT SIZING TOOL Speakers: Engin Afacan, Günhan Dündar, Gonenc Berkol, Ali Emre Pusane and İsmail Faik Baskaya, Bogazici University, TR Abstract Efficient yield estimation methods are required by yield aware automatic sizing tools, where many iterative variability analyses are performed. Quasi Monte Carlo (QMC) is a popular approach, in which samples are generated more homogeneously, hence faster convergence is obtained compared to the conventional MC. However, since QMC is deterministic and has no natural variance, there is no convenient way to obtain estimation error bounds. To determine the confidence interval of the estimated yield, scrambled QMC, in which samples are randomly permuted, is run multiple times to obtain stochastic variance by sacrificing computational cost. To palliate this challenge, this paper proposes a hybrid method, where a single QMC is performed to determine infeasible solutions in terms of yield, which is followed by a few scrambled QMC analyses providing variance and confidence interval of the estimated yield. Yield optimization is performed considering the worst case of the current estimation, thus the optimizer guarantees that the solution will satisfy the confidence interval. Furthermore, a yield ranking mechanism is also developed to enforce the optimizer to search for more robust solutions. Download Paper (PDF; Only available from the DATE venue WiFi)
18:31	IP4-6, 179	FEATURE SELECTION FOR ALTERNATE TEST USING WRAPPERS: APPLICATION TO AN RF LNA CASE STUDY Speakers: Manuel Barragan¹ and Gildas Leger² ¹TIMA Laboratory, FR; ²Instituto de Microelectronica de Sevilla, IMSE-CNM, (CSIC - Universidad de Sevilla), ES Abstract Testing analog, mixed-signal and RF circuits represents the main cost component for testing complex SoCs. A promising solution to alleviate this cost is the Alternate Test strategy. Alternate test is an indirect test approach that replaces costly specification measurements by simpler signatures. Machine learning techniques are then used to map signatures and performances. One key point that still remains as an open problem is the conception of adequate simple measurement candidates. This work presents efficient algorithms for selecting information rich signatures. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session
19:30		DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.7 Compilers and Tools for Performance

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Les Bans

Chair:
Frank Hannig, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE

Co-Chair:
Christian Haubelt, University of Rostock, DE

This session introduces different aspects of optimizing compiler technology in the context of embedded systems. The first paper shows that the performance of Android applications can be significantly improved by ahead-of-time compilation to C. The second paper shows how to generate efficient vector code from domain-specific (linear algebra) specifications. The first interactive presentation presents a technique to speed up the QEMU machine emulator by dynamic binary translation of vector instructions. The second one proposes a trace-based reuse distance profiler to help improving the memory profile of applications.

Time	Label	Presentation Title Authors
17:00	8.7.1	BYTECODE-TO-C AHEAD-OF-TIME COMPILATION FOR ANDROID DALVIK VIRTUAL MACHINE Speakers: Hyeong-Seok Oh, Ji Hwan Yeo and Soo-Mook Moon, Seoul National University, KR Abstract Android employs Java for programming its apps which is executed by its own virtual machine called the Dalvik VM (DVM). One problem of the DVM is its performance. Its just-in-time compiler (JITC) cannot generate high-performance code due to its trace-based compilation with short traces and modest optimizations, compared to JVM's method-based compilation with ample optimziations. This paper proposes a bytecode-to-C ahead-of-time compilation (AOTC) for the DVM to accelerate pre-installed apps. We translated the bytecode of some of the hot methods used by these apps to C code, which is then compiled together with the DVM source code. AOTC-generated code works with the existing Android zygote mechanism, with corrects garbage collection and exception handling. Due to off-line, method-based compilation using existing compiler with full optimizations and Java-specific optimizations, AOTC can generate quality code while obviating runtime compilation overhead. For benchmarks, AOTC can improve the performance by 10% to 500%. When we compare this result with the recently-introduced ART, which also performs ahead-of-time compilation, our AOTC performs better. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.7.2	A BASIC LINEAR ALGEBRA COMPILER FOR EMBEDDED PROCESSORS Speakers: Nikolaos Kyrtatas, Daniele Giuseppe Spampinato and Markus Püschel, Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract Many applications in signal processing, control, and graphics on embedded devices require efficient linear algebra computations. On general-purpose computers, program generators have proven useful to produce such code, or important building blocks, automatically. An example is LGen, a compiler for basic linear algebra computations of fixed size. In this work, we extend LGen towards the embedded domain using as example targets Intel Atom, ARM Cortex-A8, ARM Cortex-A9, and ARM1176 (Raspberry Pi). To efficiently support these processors we introduce support for the NEON vector ISA and a methodology for domain-specific load/store optimizations. Our experimental evaluation shows that the new version of LGen produces code that performs in many cases considerably better than well-established, commercial and non-commercial libraries (Intel MKL and IPP), software generators (Eigen and ATLAS), and compilers (icc, gcc, and clang). Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.7.3	VARSHA: VARIATION AND RELIABILITY-AWARE APPLICATION SCHEDULING WITH ADAPTIVE PARALLELISM IN THE DARK-SILICON ERA Speakers: Nishit Kapadia and Sudeep Pasricha, Colorado State University, US Abstract With deeper technology scaling accompanied by a worsening power-wall, an increasing proportion of chip area on a chip multiprocessor (CMP) is expected to be occupied by dark-silicon. At the same time, design challenges due to process variations and soft-errors in integrated circuits are projected to become even more severe. In this work, we propose a novel framework that leverages the knowledge of variations on the chip to perform runtime application mapping and dynamic voltage scaling to optimize system performance and energy, while satisfying dark-silicon power constraints of the chip as well as application-specific performance and reliability constraints. Our experimental results show average savings of 35%-80% in application service-times and 13%-15% in energy consumption, compared to the state-of-the-art. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP4-7, 356	IMPROVING SIMD CODE GENERATION IN QEMU Speakers: Sheng-Yu Fu¹, Jan-Jan Wu² and Wei-Chung Hsu¹ ¹Department of Computer Science National Taiwan University, TW; ²Institute of Information Science Academia Sinica, TW Abstract Modern processors are often enhanced with SIMD instructions. For examples, the MMX, SSE, and AVX instruction set in the x86 architecture, and the Neon instruction set in the ARM architecture are SIMD instructions. Using these SIMD instructions could significantly increase the performance of applications, hence application binaries are likely to have a good fraction of instructions that are SIMD instructions. However, SIMD instruction translation has not attacked much attention in Dynamic Binary Translation (DBT). For example, in the popular QEMU system emulator, guest SIMD instructions are often emulated with a sequence of scalar instructions even when the host machines do have SIMD instructions to support such parallel computation, leaving a large potential for performance enhancement. In this paper, we propose two approaches, one to leverage the existing helper function implementation in QEMU, and the other to use a newly introduced vector IR (Intermediate Representation) to enhance the performance of SIMD instructions translation in DBT of QEMU. The approaches have been implemented in the QEMU to support ARM and IA32 frontend and x86-64 backend. Our preliminary experiments show that adding vector IR can significantly enhance the performance of guest applications containing SIMD instructions for both ARM and IA32 architectures when running with QEMU on the x86-64 platform. Download Paper (PDF; Only available from the DATE venue WiFi)
18:31	IP4-8, 442	REUSE DISTANCE ANALYSIS FOR LOCALITY OPTIMIZATION IN LOOP-DOMINATED APPLICATIONS Speakers: Christakis Lezos, Grigoris Dimitroulakos and Konstantinos Masselos, University of Peloponnese, GR Abstract This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C code and exposes critical parts for memory related optimizations on embedded systems that can heavily affect systems performance, power and cost. The tool includes enhanced features for data reuse distance analysis and source code transformation recommendations for temporal locality optimization. Several of data reuse distance measurement algorithms have been implemented leading to different trade-offs between accuracy and profiling execution time. The proposed tool can be easily and seamlessly integrated into different software development environments offering a unified environment for application development and optimization. The novelties of our work over a similar optimization tool are also discussed. MemAddIn has been applied for the dynamic computation of data reuse distance for a number of different applications. Experimental results prove the effectiveness of the tool through the analysis and optimization of a realistic image processing application. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session
19:30		DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

8.8 Share a Fab - Multi Project Wafers Enable Your Innovations

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Salle Lesdiguières

Moderator:
Andreas Vörg, edacentrum, DE

Today most innovations in the major industries include the use of dedicated chips. However, extremely high IC fabrication costs and foundries accepting only orders with high quantities are substantial obstacles for innovations developed from SMEs, start-ups, universities and research organisations.

The solution is that many of these innovators team up and share a fab run by using the opportunities offered by Multi Project Wavers (MPWs). European service providers like Europractice and CMP provide the access to MPW runs as well as the required know-how and tooling.

In the tutorial part of this session first-time users as well as experienced users will be provided with comprehensive information about the available semiconductor technologies, new design methodologies and possible applications. In the best practice part of the session users of such services will share their experience with the MPW concept with the audience and present projects and products realized by utilizing MPW opportunities.

Time	Label	Presentation Title Authors
17:00	8.8.1	INTRODUCTION Speaker: Andreas Vörg, edacentrum, DE
17:05	8.8.2	THE EUROPRACTICE MPW SERVICE AS AN ENABLER FOR LOW COST ASIC PROTOTYPING FOR R&D AND PRODUCT DEVELOPMENT Speaker: Carl Das, IMEC/Europractice, BE
17:25	8.8.3	CMP MPW SERVICES FOR IC PROTOTYPING AND LOW VOLUME PRODUCTION Speaker: Jean-Christophe Crébier, CMP, FR
17:45	8.8.4	THE ASSOCIATIVE MEMORY ASIC DEVELOPMENT FOR HIGH ENERGY PHYSICS SINCE 2003 TO 2015, IMPORTANCE OF IMEC SUPPORT Speaker: Alberto Annovi, INFN Pisa, IT Author: Paola Giannetti, INFN Pisa, IT
18:00	8.8.5	A FEEDBACK FROM MPW USER Speaker: Paul Hyland, Microelectronic Circuit Centre Ireland (MCCI), IE
18:15	8.8.6	DISCUSSION
18:30		End of session
19:30		DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France) As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s. During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century). Please kindly note that it is not a seated dinner. All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person. How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.

Party DATE Party

Date: Wednesday 11 March 2015
Time: 19:30 - 23:00
Location / Room: Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France)

Musée de Grenoble

Please kindly note that it is not a seated dinner.

Time	Label	Presentation Title Authors
23:00		End of session

9.1 SPECIAL DAY Hot Topic: Game-changing Innovative Technology Platforms for Health Care

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Salle Oisans

Organiser:
Jo De Boeck, IMEC, BE

Chair:
Jo De Boeck, IMEC, BE

After an introductory talk on the future impact of electronics in health care innovation, the panel of experts from industry will discuss the main trends they see and the challenges this presents for technology and care providers.

Time	Label	Presentation Title Authors
08:30	9.1.1	GAME CHANGING INNOVATION IN TECHNOLOGY AND DESIGN FOR EFFECTIVE HEALTH CARE Speaker: Chris Van Hoof, IMEC, BE Abstract Vision and positioning the talks of the day in perspective to the overall challenge.
09:00	9.1.2	ADVANCED SELF-POWERED SYSTEMS OF INTEGRATED SENSORS AND TECHNOLOGIES Speaker: Veena Misra, ASSIST, NC State University, US Abstract Development of nano-enabled energy harvesting, energy storage, nanodevices, and sensors enables innovative battery-free, body-powered, and wearable health monitoring systems. One of challenges addressed in this presentation is developing efficient ways to harness energy from the human body or the environment, convert it to usable forms, and store it in ultra-high-density capacitors. Another focal point of research is low-Power nanoelectronics with the aim to design low-power electronics and antennae.
09:30	9.1.3	HEALTHCARE IN AN INTEGRATED DIGITAL WORLD Speaker: Jean-Paul Linnartz, Philips, NL Abstract Integration (IC) technology, smart systems and large scale analysis of measurements and data are rapidly innovating many aspects of the healthcare system. This includes innovations in imaging such as enhancing the resolution, ability to differentiate between tissues and compensation of artifacts. Many imaging modalities require advanced signal handling, often in massively parallel structures. At the same time, miniaturization of sensors and actuators paves the way for minimally invasive in-body devices for diagnostics and for interventions, which then become part of integrated interventional solutions.
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

9.2 Hot Topic - Transparent Use of Accelerators in Heterogeneous Computing Systems

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Belle Etoile

Organisers:
Heiner Giefers, IBM Research Zurich, CH
Christian Plessl, University of Paderborn, DE

Chair:
Christian Plessl, University of Paderborn, DE

Co-Chair:
Heiner Giefers, IBM Research Zurich, CH

This hot topic session discusses recent research for transparent compilation and offloading of computational hotspots from CPUs to accelerators, in particular, many-core processors and FPGAs. The overarching objective of these approaches is to make the performance and energy-efficiency benefits of heterogeneous computing available to a broader spectrum of applications and users by reducing or even obviating the effort for porting applications.

Time	Label	Presentation Title Authors
08:30	9.2.1	TRANSPARENT ACCELERATION OF PROGRAM EXECUTION USING RECONFIGURABLE HARDWARE Speakers: Nuno Paulino¹, João Canas Ferreira¹, João Bispo² and João M. P. Cardoso² ¹INESC TEC and Faculty of Engineering, PT; ²University of Porto, PT Abstract The acceleration of applications, running on a general purpose processor (GPP), by mapping parts of their execution to reconfigurable hardware is an approach which does not involve program's source code and still ensures program portability over different target reconfigurable fabrics. However, the problem is very challenging, as suitable sequences of GPP instructions need to be translated/mapped to hardware, possibly at runtime. Thus, all mapping steps, from compiler analysis and optimizations to hardware generation, need to be both efficient and fast. This paper introduces some of the most representative approaches for binary acceleration using reconfigurable hardware, and presents our binary acceleration approach and the latest results. Our approach extends a GPP with a Reconfigurable Processing Unit (RPU), both sharing the data memory. Repeating sequences of GPP instructions are migrated to an RPU composed of functional units and interconnect resources, and able to exploit instruction-level parallelism, e.g., via loop pipelining. Although we envision a fully dynamic system, currently the RPU resources are selected and organized offline using execution trace information. We present implementation prototypes of the system on a Spartan-6 FPGA with a MicroBlaze as GPP and the very encouraging results achieved with a number of benchmarks. Download Paper (PDF; Only available from the DATE venue WiFi)
08:52	9.2.2	ACCELERATING ARITHMETIC KERNELS WITH COHERENT ATTACHED FPGA COPROCESSORS Speakers: Heiner Giefers, Raphael Polig and Christoph Hagleitner, IBM Research Zurich, CH Abstract Abstract—The energy efficiency of computer systems can be increased by migrating computational kernels that are known to under-utilize the CPU to an FPGA based coprocessor. In contrast to traditional I/O-based coprocessors that require explicit data movement, coherently attached accelerators can operate on the same virtual address space than the host CPU. A shared memory organization enables widely accepted programming models and helps to deploy energy efficient accelerators in general purpose computing systems. In this paper we study an FFT accelerator on FPGA attached via the Coherent Accelerator Processor Interface (CAPI) to a POWER8 processor. Our results show that the coherent attached accelerator outperforms device driver based approaches in terms of latency. Hardware acceleration delivers a 5x gain in energy efficiency compared to an optimized parallel software FFT running on a 12-core CPU and improves single thread performance by more than 2x. We conclude that the integration of CAPI into heterogeneous programming frameworks such as OpenCL will facilitate latency critical operations and will further enhance programmability of hybrid systems. Download Paper (PDF; Only available from the DATE venue WiFi)
09:15	9.2.3	TRANSPARENT OFFLOADING OF COMPUTATIONAL HOTSPOTS FROM BINARY CODE TO XEON PHI Speakers: Marvin Damschen¹, Heinrich Riebler², Gavin Vaz² and Christian Plessl² ¹Karlsruhe Institute of Technology (KIT), DE; ²University of Paderborn, DE Abstract In this paper, we study how binary applications can be transparently accelerated with novel heterogeneous computing resources without requiring any manual porting or developer-provided hints. Our work is based on Binary Acceleration At Runtime (BAAR), our previously introduced binary acceleration mechanism that uses the LLVM Compiler Infrastructure. BAAR is designed as a client-server architecture. The client runs the program to be accelerated in an environment, which allows program analysis and profiling and identifies and extracts suitable program parts to be offloaded. The server compiles and optimizes these offloaded program parts for the accelerator and offers access to these functions to the client with a remote procedure call (RPC) interface. Our previous work proved the feasibility of our approach, but also showed that communication time and overheads limit the granularity of functions that can be meaningfully offloaded. In this work, we motivate the importance of a lightweight, high-performance communication between server and client and present a communication mechanism based on the Message Passing Interface (MPI). We evaluate our approach by using an Intel Xeon Phi 5110P as the acceleration target and show that the communication overhead can be reduced from 40% to 10%, thus enabling even small hotspots to benefit from offloading to an accelerator. Download Paper (PDF; Only available from the DATE venue WiFi)
09:37	9.2.4	TRANSPARENT LINKING OF COMPILED SOFTWARE AND SYNTHESIZED HARDWARE Speakers: David Thomas¹, Shane T. Fleming¹, George A. Constantinides¹ and Dan R. Ghica² ¹Imperial College London, GB; ²University of Birmingham, GB Abstract Modern heterogeneous devices contain tightly coupled CPU and FPGA logic, allowing low latency access to accelerators. However, designers of the system need to treat accelerated functions specially, with device specific code for instantiating, configuring, and executing accelerators. We present a system level linker, which allows functions in hardware and software to be linked together to create heterogeneous systems. The linker works with post-compilation and post-synthesis components, allowing the designer to transparently move functions between devices simply by linking in either hardware or software object files. The linker places no special emphasis on the software, allowing computation to be initiated from within hardware, with function calls to software to provide services such as file access. A strong type-system ensures that individual code artifacts can be written using the conventions of that domain (C, HLS, VHDL), while allowing direct and transparent linking. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

9.3 NoC Optimization

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Stendhal

Chair:
Marcello Coppola, STMicroelectronics, FR

Co-Chair:
José Flich, Universidad Politecnica de Valencia, ES

As NoCs are becoming mature technology, the time is coming to develop sophisticated system-level optimization techniques. In this session, several distinct optimization techniques are addressed. The first paper presents a novel approach to traffic isolation, the second paper deals with power-saving technique exploration and the final paper develops customized NoCs for emerging biomedical applications.

Time	Label	Presentation Title Authors
08:30	9.3.1	(Best Paper Award Candidate) PHASENOC: TDM SCHEDULING AT THE VIRTUAL-CHANNEL LEVEL FOR EFFICIENT NETWORK TRAFFIC ISOLATION Speakers: Anastasios Psarras¹, Ioannis Seitanidis¹, Chrysostomos Nicopoulos² and Giorgos Dimitrakopoulos¹ ¹Democritus University of Thrace, GR; ²University of Cyprus, CY Abstract The efficiency of modern Networks-on-Chip (NoC) is no longer judged solely by their physical scalability, but also by their ability to deliver high performance, quality-of-service, and flow isolation at the minimum possible cost. Although traditional architectures supporting Virtual Channels (VC) offer the resources for flow partitioning and isolation, an adversarial workload can still interfere and degrade the performance of other workloads that are active in a different set of VCs. In this paper, we present PhaseNoC, a truly non-interfering VC-based architecture that adopts Time-Division Multiplexing (TDM) at the VC level. Distinct flows, or application domains, mapped to disjoint sets of VCs are isolated, both inside the router's pipeline and at the network level. Any latency overhead is minimized by appropriate scheduling of flows in separate phases of operation, irrespective of the chosen topology. The resulting design yields significant reductions in the area/delay cost of the network. Experimental results corroborate that - with lower cost than state-of-the-art NoC architectures, and with minimum latency overhead - we remove any flow interference and allow for efficient network traffic isolation. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.3.2	RATE-BASED VS DELAY-BASED CONTROL FOR DVFS IN NOC Speakers: Mario R. Casu and Paolo Giaccone, Politecnico di Torino, Department of Electronics and Telecommunications, IT Abstract Minimization of power via DVFS in an NoC is possible, but may result in an intolerable increase of network delay. We examined two DVFS policies, a rate-based policy that scales down frequency and voltage to the minimum value that allows to sustain the injection rate without causing saturation, and a delay-based policy in which a closed-loop control tunes frequency and voltage such that the NoC average delay tracks a target value. We evaluated the power-delay trade-off by means of network simulations and accurate power estimations after synthesis on a 28-nm FDSOI CMOS technology. Our result over synthetic and multimedia traffic patterns show that the first policy largely pays the better saving in power (20%-50% less than the second policy) with a large network delay increase (up to 3x). We then conclude that the delay-based policy offers a better power-delay trade-off. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.3.3	NOC-ENABLED MULTICORE ARCHITECTURES FOR STOCHASTIC ANALYSIS OF BIOMOLECULAR REACTIONS Speakers: Turbo Majumder¹, Xian Li², Paul Bogdan³ and Partha Pande² ¹Indian Institute of Technology Delhi, IN; ²Washington State University, US; ³University of Southern California, US Abstract Recent medical challenges such as cancer, drug-resistant microbes or diabetes crucially affect human health. To tackle these, modern medicine must analyze molecular interactions and rely on powerful computational platforms for the design and performance evaluation of medical therapies. Towards this end, we propose a Network-on-Chip (NoC)-based multicore platform enabling the efficient analysis of stochastic molecular interactions among biological entities. Our in-depth analysis of the stochastic interactions among biological components and the characterization of their computational and communication requirements allows us to design a high-performance NoC architecture sustaining a throughput of over 1.36E5 events/ms, while consuming only 15 mJ per 1E5 stochastic events. Our proposed NoC-based multicore can offer a throughput improvement of 23% over a regular mesh-based NoC, while consuming 20% less energy. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP4-9, 1076	TAPP: TEMPERATURE-AWARE APPLICATION MAPPING FOR NOC-BASED MANY-CORE PROCESSORS Speakers: Di Zhu, Lizhong Chen, Timothy Pinkston and Massoud Pedram, University of Southern California, US Abstract Application mapping with its ability to spread out high-power components can potentially be a good approach to mitigate the looming issue of hotspots in many-core processors. However, very few works have explored effective ways of making tradeoff between temperature and network latency. Moreover, on-chip routers, which are of high power density and may lead to hotspots, are not considered in these works. In this paper, we propose TAPP (Temperature-Aware Partitioning and Placement), an efficient application mapping algorithm to reduce on-chip hotspots while sacrificing little network performance. This algorithm "spreads" high-power cores and routers across the chip by performing hierarchical bi-partitioning of the cores and concurrently conducting placement of the cores onto tiles, and achieves high efficiency and superior scalability. Simulation results show that the proposed algorithm reduces the temperature by up to 6.80°C with minimal latency increase compared to the latency-oriented mapping solution. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP4-10, 694	MALLEABLE NOC: DARK SILICON INSPIRED ADAPTABLE NETWORK ON CHIP Speakers: Haseeb Bokhari¹, Haris Javaid², Muhammad Shafique³, Joerg Henkel³ and Sri Parameswaran¹ ¹University of New South Wales, AU; ²Google Inc., ; ³Karlsruhe Institute of Technology (KIT), DE Abstract Network on Chip (NoC) has been envisioned as a scalable fabric for many core chips. However, NoCs can consume a considerable share of chip power. Moreover, diverse applications are executed in these multicore, where each application imposes a unique load on the NoC. To realise a NoC which is Energy and Delay efficient, we propose combining multiple VF optimized routers for each node (in traditional NoCs, we have only a single router per node) for efficient NoC for Dark Silicon chips. We present a generic NoC with routers designed for different VF levels, which are distributed across the chip. At runtime, depending on application profile, we combine these VF optimized routers to form constantly changing energy efficient NoC fabric. We call our architecture Malleable NoC. In this paper, we describe the architectural details of the proposed architecture and the runtime algorithms required to dynamically adapt the NoC resources. We show that for a variety of multi program benchmarks executing on Malleable NoC, Energy Delay product (EDP) can be reduced by up to 46% for widely differing workloads. We further show the effect on EDP savings for differing amounts of dark silicon area budget. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

9.4 Advanced Trends in Alternative Technologies

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Chartreuse

Chair:
Martin Trefzer, University of York, GB

Co-Chair:
Yvain Thonnart, CEA-Leti, FR

As technology evolves, "information" is no longer limited to charge-based representations of 1s and 0s. In this session, the first presented paper discusses work where qubits are stored as the internal states of an atomic ion. The second presentation considers labs on chip — where reactants are moved through valves and channels to solve problems such as protein analysis, disease diagnosis, etc. Finally, the last presentation discusses the use of optical waveguides to move data from point-to-point, thus eliminating higher latency electrical interconnect.

Time	Label	Presentation Title Authors
08:30	9.4.1	OPTIMIZATION OF QUANTUM COMPUTER ARCHITECTURE USING A RESOURCE-PERFORMANCE SIMULATOR Speakers: Muhammad Ahsan and Jungsang Kim, Duke University, US Abstract The hardware technology characterized by the device parameters often drives the architectural optimization in a novel computer design such as the quantum computer (QC). We highlight the role of these parameters, by quantifying the performance of a fully error-corrected 1024-bit quantum carry look-ahead adder on a modular, re-configurable architecture based on trapped ions. We develop a simulation tool that estimates the performance and resource requirements for running a quantum circuit on various quantum architectures as a function of the underlying device parameters. Using this tool, we found that (1) the latency of the adder circuit execution due to slow entanglement generation process for qubit communication, can be adequately eliminated with a small increase in entangling qubits and (2) the failure probability of the circuit is ultimately determined by the qubit coherence time, which needs to be improved in order to reliably execute the adders comprising core of the Shor's algorithm. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.4.2	VOLUME-ORIENTED SAMPLE PREPARATION FOR REACTANT MINIMIZATION ON FLOW-BASED MICROFLUIDIC BIOCHIPS WITH MULTI-SEGMENT MIXERS Speakers: Chi-Mei Huang¹, Chia-Hung Liu² and Juinn-Dar Huang¹ ¹Department of Electronics Engineering, National Chiao Tung University, TW; ²National Chiao Tung University, TW Abstract Sample preparation is one of essential processes in most biochemical reactions. In this process, raw reactants are diluted to achieve given target concentrations. So far, most of existing sample preparation techniques only consider mixing of two source solutions under the (1:1) mixing model. In this paper, we propose the first sample preparation algorithm VOSPA that not only blends several solutions in a dilution operation but also allows various mixing models on flow-based microfluidic biochips with multi-segment mixers. VOSPA is a volume-oriented sample preparation algorithm that enables segment-based intermediate solution reuse for better reactant minimization. Experimental results show that VOSPA can lower the reactant consumption and operation count by 72% and 59% as compared to the baseline bit-scanning method if an 8-segment mixer is used. Moreover, VOSPA outperforms an optimal algorithm, which merely allows the use of (1:1) mixing model; the reactant usage and operation count can be further reduced by 37% and 76%. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.4.3	THERMAL AWARE DESIGN METHOD FOR VCSEL-BASED ON-CHIP OPTICAL INTERCONNECT Speakers: Hui Li¹, Alain Fourmigue², Sébastien Le Beux¹, Xavier Letartre¹, Ian O’Connor¹ and Gabriela Nicolescu² ¹Ecole Centrale de Lyon, FR; ²Ecole Polytechnique de Montréal, CA Abstract Optical Network-on-Chip (ONoC) is an emerging technology considered as one of the key solutions for future generation on-chip interconnects. However, silicon photonic devices in ONoC are highly sensitive to temperature variation, which leads to a lower efficiency of Vertical-Cavity Surface-Emitting Lasers (VCSELs), a resonant wavelength shift of Microring Resonators (MR), and results in a lower Signal to Noise Ratio (SNR). In this paper, we propose a methodology enabling thermal-aware design for optical interconnects relying on CMOS-compatible VCSEL. Thermal simulations allow designing ONoC interfaces with low gradient temperature and analytical models allow evaluating the SNR. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP4-11, 170	TOPOLOGY IDENTIFICATION FOR SMART CELLS IN MODULAR BATTERIES Speakers: Sebastian Steinhorst and Martin Lukasiewycz, TUM CREATE, SG Abstract This paper proposes an approach to automatically identifying the topological order of smart cells in modular batteries. Emerging smart cell architectures enable battery management without centralized control by coordination of activities via communication. When connecting smart cells in series to form a battery pack, the topological order of the cells is not known and it cannot be automatically identified using the available communication bus. This order, however, is of particular importance for several battery management functions, including temperature control and active cell balancing which relate properties of the cells and their location. Therefore, this paper presents a methodology to automatically identify a topological order on the smart cells in a battery pack using a hybrid communication approach, involving both the communication and the balancing layer of the smart cell architecture. A prototypic implementation on a development platform shows the feasibility and scalability of the approach. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP4-12, 993	LVS CHECK FOR PHOTONIC INTEGRATED CIRCUIT - CURVILINEAR FEATURE EXTRACTION AND VALIDATION Speakers: Ruping Cao¹, Julien Billoudet¹, John Ferguson¹, Lionel Couder², John Cayo², Alexandre Arriordaz¹ and Ian O'Connor³ ¹Mentor Graphics Corp, FR; ²Mentor Graphics Corp, US; ³Lyon Institute of Nanotechnology, FR Abstract This work is motivated by the demand of an electronic design automation (EDA) approach for the emerging ecosystem of the photonic integrated circuit (PIC) technology. A reliable physical verification flow cannot be achieved without the adaption of the traditional EDA tools to the photonic design verification needs. We analyze how layout versus schematic (LVS) checking is performed differently for photonic designs, and propose an LVS flow that addresses the particular need of curvilinear feature validation (curved path length and bend curvature extraction). We show that it is possible to reuse and extend the current LVS tools to perform such critical but non-traditional checks, which ensures a more reliable photonic layout implementation in term of functionality and circuit yield. Going forward, we propose possible future studies that can further improve the flows. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

9.5 Modeling and Simulation of Extra-Functional Properties

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Meije

Chair:
Sylvian Kaiser, DOCEA Power, FR

Co-Chair:
Sander Stuijk, Eindhoven University of Technology, NL

This session deals with modeling and simulating extra-functional system properties. The first paper presents a framework to accurately model the power and timing of hardware models without requiring a full micro-architectural simulation of the hardware module to extract signal transitions. The second paper presents a method to extend Design Space Exploration (DSE) of systems with Out-of-Order execution by accurately predicting the performance of CPUs with different cache configurations within reasonable time. The third paper is about an approach that constructs a learning-based thermal model for a given (black-box) multi-core processor running a given application mix.

Time	Label	Presentation Title Authors
08:30	9.5.1	DYNAMIC POWER AND PERFORMANCE BACK-ANNOTATION FOR FAST AND ACCURATE FUNCTIONAL HARDWARE SIMULATION Speakers: Dongwook Lee, Lizy Kurian John and Andreas Gerstlauer, The University of Texas at Austin, US Abstract Virtual platform prototypes are widely used for early design space exploration at the system level. There is, however, a lack of accurate and fast power and performance models of hardware components at such high levels of abstraction. In this paper, we present an approach that extends fast functional hardware models with the ability to produce detailed, cycle-level timing and power estimates. Our approach is based on back-annotating behavioral hardware descriptions with a dynamic power and performance model that allows capturing cycle-accurate and data-dependent activity without a significant loss in simulation speed. By integrating with existing high-level synthesis (HLS) flows, back-annotation is fully automated for custom hardware synthesized by HLS. We further leverage state-of-the-art machine learning techniques to synthesize abstract power models, where we introduce a structural decomposition technique to reduce model complexities and increase estimation accuracy. We have applied our back-annotation approach to several industrial-strength design examples under various architecture configurations. Results show that our models predict average power consumption to within 1% and cycle-by-cycle power dissipation to within 10% of a commercial gate-level power estimation tool, all while running several orders of magnitude faster. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.5.2	FAST AND PRECISE CACHE PERFORMANCE ESTIMATION FOR OUT-OF-ORDER EXECUTION Speakers: Roeland Douma, Sebastian Altmeyer and Andy Pimentel, University of Amsterdam, NL Abstract Design space exploration (DSE) is a key ingredient of system-level design, enabling designers to quickly prune the set of possible designs and determine, e.g., the number of the processing cores, the mapping of application tasks to cores, and the core configuration such as the cache organization. High-level performance estimation is a principle component of any system-level DSE: it has to be fast and sufficiently precise. Modern out-of-order architectures with caches pose a significant problem to this performance estimation process, as no simple one-to-one mapping of the number of cache misses and resulting cycle time exists. We present a high-level cache performance-estimation framework for out-of-order processors. Evaluation shows that our prediction method is on average 15 times faster than cycle-accurate simulation, while our estimates only show an average error of below 3.5%, reduce the pessimism of a naive high-level performance estimation by around 66%, and still maintain a high fidelity. Our approach thus enables quick yet accurate performance estimation and extends the applicability of system-level DSE to out-of-order processors with caches. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.5.3	A CALIBRATION BASED THERMAL MODELING TECHNIQUE FOR COMPLEX MULTICORE SYSTEMS Speakers: Devendra Rai and Lothar Thiele, Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract A calibration based method to construct fast and accurate thermal models of the state-of-the-art multicore systems is presented. Such models are usually required during Design Space Exploration (DSE) exercises to evaluate various task-to-core mapping, associated scheduling and processor speed-scaling options for their overall impact on the system temperature. Current approaches require modeling the thermal characteristics of the target processor using numerical simulators, which assume accurate information about several critical parameters (e.g., the processor floorplan). Such parameters are not readily available, forcing the system designers to use time and cost intensive, and possibly error-prone techniques such as using heat maps for reverse-engineering such parameters. Additionally, advanced power and temperature management algorithms commonly found in the state-of-the-art processors must also be accurately modeled. This paper proposes a calibration based method for constructing the complete system thermal model of a target processor without requiring any hard-to-get information such as the detailed processor floorplan or system power traces. Taking an example of a sufficiently complex Intel Xeon 8-core processor, we show that our approach yields an accurate thermal model, which is also lightweight both in terms of memory and compute requirements to be practically feasible for DSE over current processors. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP4-13, 938	FP-SCHEDULING FOR MODE-CONTROLLED DATAFLOW: A CASE STUDY Speakers: Alok Lele¹, Orlando Moreira² and Kees van Berkel² ¹Eindhoven University of Technology, NL; ²Ericsson B.V., NL Abstract Dual-Radio Simultaneous Access (DRSA) is an emerging topic in Software Defined Radio (SDR) in which two SDRs are running simultaneously on a shared hardware, typically a heterogeneous Multi-Processor System-on-Chip (MPSoC). Each SDR has a independent hard latency and/or throughput requirement and needs rigorous timing analysis. Moreover, SDRs are often modeled in enriched variants of dataflow to accommodate the growing dynamic execution of SDRs, making it a challenge to perform timing analysis on them. This paper considers the preemptive Fixed Priority Scheduling (FPS) of SDRs modeled in emph{Mode-Controlled Dataflow}. To the best of our knowledge this is the first attempt on static timing analysis of FPS for a (semi-)dynamic variant of synchronous dataflow. We propose a two-phase algorithm to determine the worst-case response time of an actor. We demonstrate our analysis results for a DRSA case study of two 4G-LTE receivers. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

9.6 Design, Synthesis and Validation of Analog Circuits

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Bayard

Chair:
Marie-Minerve Louerat, LIP6/CNRS, FR

Co-Chair:
Georges Gielen, ESAT - KU Leuven, BE

The session presents new synthesis and validation approaches to analog circuit design. Two design papers shade new light on Tunnel FETs and an ADC.

Time	Label	Presentation Title Authors
08:30	9.6.1	KNOWLEDGE-INTENSIVE, CAUSAL REASONING FOR ANALOG CIRCUIT TOPOLOGY SYNTHESIS IN EMERGENT AND INNOVATIVE APPLICATIONS Speakers: Alex Doboli, Fanshu Jiao and Sergio Montano, State University of New York at Stony Brook, US Abstract Analog circuit topology design has been difficult to automate. Topology synthesis involves searching an open-ended, widely extensible, and strongly discontinuous solution space. Existing algorithms cannot generate topologies beyond a constrained set of structures, or experience difficulties in evolving performance-effective yet minimal circuits. This paper proposes a new topology synthesis method that implements a design knowledge-intensive reasoning process to create novel circuit structures with all their features justified by the problem requirements. Two synthesis experiments demonstrate the capability of the method to generate circuits beyond the capabilities of existing topology synthesis algorithms. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.6.2	A CNN-INSPIRED MIXED SIGNAL PROCESSOR BASED ON TUNNEL TRANSISTORS Speakers: Behnam Sedighi, Indranil Palit, Xiaobo Sharon Hu and Michael Niemier, University of Notre Dame, US Abstract Novel devices are under investigation to extend the performance scaling trends that have long been associated with Moore's Law-based device scaling. Among the emerging devices being studied, tunnel FETs (or TFETs) are particularly attractive, especially when targeting low power systems. This paper studies the potential of analog/mixed-signal information processing using TFETs. The design of a highly-parallel processor -- inspired by cellular neural networks -- is presented. Signal processing is performed partially in the time-domain to better leverage the unique properties of TFETs, i.e., (i) steep slopes (high g_m/I_DS) in the subthreshold region, and (ii) high output resistance in the saturation region. Assuming an InAs TFET with feature sizes comparable to the 14 nm technology node, a power efficiency of at least 10,000 GOPS/W is projected. By comparison, state-of-the-art hardware assuming CMOS/FinFET technology promises a power efficiency only close to 1000 GOPS/W. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.6.3	LAYOUT-AWARE SIZING OF ANALOG ICS USING FLOORPLAN & ROUTING ESTIMATES FOR PARASITIC EXTRACTION Speakers: Nuno Lourenco, Ricardo Martins and Nuno Horta, Instituto de Telecomunicações, Instituto Superior Técnico – TU Lisbon, PT Abstract The design of analog integrated circuits (ICs) is characterized by time-consuming and non-systematic iterations between electrical and physical design steps in order to achieve successful post-layout designs. This paper presents an innovative methodology for automatic optimization-based sizing of analog ICs that takes into consideration complete layout-related data for both circuit's geometric requirements, which are obtained from the real-time in-loop floorplan packing, and circuits' electrical performance that is evaluated using circuit simulator and considering accurate layout parasitic estimates. In order to boost the parasitic extraction efficiency, the need for expensive detailed layout generation, as found in previous state-of-the-art layout-aware sizing approaches, is here circumvented. However, the interconnect parasitic capacitances that are major contributors to performance degradation and on-die signal integrity problems, must be accurately accounted for. Therefore, an empirical-based parasitic extraction is performed on an early-stage layout obtained from the floorplan, computing the optimal electromigration-aware wiring topology and shortest rectilinear paths in-loop, without the need for detailed routing. Finally, the methodology is demonstrated for the UMC 130nm design process using well-known analog building blocks proving the generality, accuracy and fast execution of the proposed approach. Download Paper (PDF; Only available from the DATE venue WiFi)
09:45	9.6.4	INITIAL TRANSIENT RESPONSE OF OSCILLATORS WITH LONG SETTLING TIME Speakers: Hans-Georg Brachtendorf and Bittner Kai, University of Applied Sciences of Upper Austria, AT Abstract The initial transient response of oscillators with high quality factor Q such as quartz crystal oscillators is orders of magnitudes larger than the period of oscillation. Therefore numerical solution by standard techniques of the underlying system of ordinary differential algebraic equations (DAEs) resulting from Kirchhoff's current and voltage laws is run time inefficient. In this paper numerical techniques for the calculation of the initial transient response and steady state solution are investigated. The efficiency results from reformulating the underlying system of ordinary DAEs by a suitable system of partial DAEs, known as multirate PDE, and from suitable finite difference time domain (FDTD) methods with small numerical dissipation of energy. Unlike Harmonic Balance the waveforms are free of spurious oscillations, caused by the non-compactness of the trigonometric polynomials. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP4-14, 191	AGEING SIMULATION OF ANALOGUE CIRCUITS AND SYSTEMS USING ADAPTIVE TRANSIENT EVALUATION Speakers: Felix Salfelder and Lars Hedrich, Goethe-Universitat Frankfurt a. M., DE Abstract Simulating ageing effects in analogue circuits requires both ageing models and a circuit simulator which is capable of a stress dependent, ageing and recovery aware model evaluation during long term transient simulation. Common approaches on reliability simulation often involve aged models, age precomputation, or lookup tables instead of integrated ageing simulation using memory aware ageing models. Long term transient ageing simulation enhances reliability simulation. This paper presents a framework to model and simulate ageing effects using an adaptive two-times evaluation scheme. This integrates full ageing effect models into behavioural device models. In addition, we introduce semantics for modelling stress levels and ageing parameters in hardware description languages. Our approach is a fully integrated simulation solution, enabling correct and efficient simulation of ageing systems over their lifetimes. We demonstrate how transistor level ageing effects critically affect the operation of a circuit. Our examples incorporate ageing monitors, redundant parts, and self-repair functionality into analogue systems. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP4-15, 62	A TOOL FOR THE ASSISTED DESIGN OF CHARGE REDISTRIBUTION SAR ADCS Speakers: Stefano Brenna¹, Andrea Bonetti², Andrea Bonfanti¹ and Andrea L. Lacaita¹ ¹Politecnico di Milano, IT; ²École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract The optimal design of SAR ADCs requires the accurate estimate of nonlinearity and parasitic effects in the feedback charge-redistribution DAC. Since the effects of both mismatch and stray capacitances depend on the specific array topology, complex calculations, custom modeling and heavy simulations in common circuit design environments are often required. This paper presents a MATLAB-based numerical tool to assist the design of the charge redistribution DACs adopted in SAR ADCs. The tool performs both parametric and statistical simulations taking into account capacitive mismatch and parasitic capacitances thus computing both differential and integral nonlinearity (DNL, INL). SNDR and ENoB degradation due to static non-linear effects is also estimated. An excellent agreement is obtained with the results of circuit simulators (e.g. Cadence Spectre) featuring up to 104 shorter simulation time, allowing statistical simulations which would be otherwise impracticable. Measurements on two fabricated SAR ADCs confirm that the proposed tool can be used as avalid instrument to assist the design of a charge redistribution SAR ADC and predict its static and dynamic metrics. Download Paper (PDF; Only available from the DATE venue WiFi)
10:02	IP4-16, 140	DETECTION OF ASYMMETRIC AGING-CRITICAL VOLTAGE CONDITIONS IN ANALOG POWER-DOWN MODE Speakers: Michael Zwerger and Helmut Graeb, Technische Universitaet Muenchen, DE Abstract In this work, a new verification method for the power-down mode of analog circuit blocks is presented. In power-down mode, matched transistors can be stressed with asymmetric voltages. This will cause time-dependent mismatch due to transistor aging. In order to avoid reliability problems, a new method for automatic detection of asymmetric power-down stress conditions is presented. Therefore, power-down voltage-matching rules are formulated. The method combines structural analysis and voltage propagation. Experimental results demonstrate the efficiency and effectiveness of the approach. Download Paper (PDF; Only available from the DATE venue WiFi)
10:03	IP4-17, 33	(Best Paper Award Candidate) HIGH PERFORMANCE SINGLE SUPPLY CMOS INVERTER LEVEL UP SHIFTER FOR MULTI-SUPPLY VOLTAGES DOMAINS Speakers: José-C. García¹, Juan A. Montiel-Nelson¹, J. Sosa¹ and Saeid Nooshabadi² ¹Institute for Applied Microelectronics, ES; ²Department of Electrical and Computer Engineering of Michigan Technological University, US Abstract A single supply CMOS inverter level shifter (ssqc-ls) for upconverting signals from 0.4V-1V logic level range up to 1.1V power supply domain is introduced. For guaranteing a low energy consumption, the proposed shifter is based on topological modifications of the structure qc-level shifter reported in [1]. For 0.5V input square wave switching at 500MHz, the inverter level shifter ssqc-ls using 1.2V of power supply achieves a 60% of Figure of Merit improvement in comparison against jy-ls [8] with a dual power supply voltage of 0.6V and 1.2V. Post-layout simulation results shown that ssqc-ls reaches a propagation delay of 0.75ns, an energy consumption of only 2.3pJ, and an energy-delay product of 1.73pJns for a capacitive loading condition of 950fF. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

9.7 Test Generation, Fault Simulation and Diagnosis

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Les Bans

Chair:
Jacob Abraham, The University of Texas at Austin, US

Co-Chair:
Bernd Becker, University Freiburg, DE

Speeding-up the test process is crucial from a technical and economical point of view. Novel methods are presented to accelerate silicon debug, fault simulation, test generation and diagnosis.

Time	Label	Presentation Title Authors
08:30	9.7.1	QUICK ERROR DETECTION TESTS WITH FAST RUNTIMES FOR EFFECTIVE POST-SILICON VALIDATION AND DEBUG Speakers: David Lin¹, Eswaran S², Sharad Kumar², Eric Rentschler³ and Subhasish Mitra¹ ¹Stanford University, US; ²Freescale Semiconductor, IN; ³Advanced Micro Devices, US Abstract Long error detection latency, the time elapsed from the occurrence of an error caused by a bug to its manifestation as an observable failure, severely limits the effectiveness of existing post-silicon validation and debug techniques. Traditional post-silicon validation tests can incur very long error detection latencies of millions or even billions of clock cycles. An earlier technique called Quick Error Detection (QED) shortens error detection latencies to only few hundred (or thousand) clock cycles. However, software-only QED (i.e., QED implemented entirely in software) can result in significantly increased post-silicon validation test runtimes. We present a new technique called Fast QED that overcomes this drawback of software-only QED, while preserving the error detection latency and bug coverage benefits of software-only QED. Simulation results using an OpenSPARC T2-like multi-core SoC and bugs abstracted from multiple commercial multi-core SoCs demonstrate: 1. Fast QED achieves 4 orders of magnitude improvement in test runtime as compared to software-only QED, with only 0.4% increase in chip area; 2. Fast QED improves error detection latencies by up to 5 orders of magnitude compared to non-QED tests, and also achieves improved error detection latencies compared to software-only QED; and, 3. Fast QED improves bug coverage by up to 2-fold compared to non-QED tests (similar to software-only QED). Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.7.2	(Best Paper Award Candidate) GPU-ACCELERATED SMALL DELAY FAULT SIMULATION Speakers: Eric Schneider¹, Stefan Holst², Michael Kochte¹, Xiaoqing Wen² and Hans-Joachim Wunderlich¹ ¹University of Stuttgart, DE; ²Kyushu Institute of Technology, JP Abstract The simulation of delay faults is an essential task in design validation and reliability assessment of circuits. Due to the high sensitivity of current nano-scale designs against smallest delay deviations, small delay faults recently became the focus of test research. Because of the subtle delay impact, traditional fault simulation approaches based on abstract timing models are not sufficient for representing small delay faults. Hence, timing accurate simulation approaches have to be utilized, which quickly become inapplicable for larger designs due to high computational requirements. In this work we present a waveform-accurate approach for fast high-throughput small delay fault simulation on Graphics Processing Units (GPUs). By exploiting parallelism from gates, faults and patterns, the proposed approach enables accurate exhaustive small delay fault simulation even for multi-million gate designs without fault dropping for the first time. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.7.3	FAULT SIMULATION WITH PARALLEL EXACT CRITICAL PATH TRACING IN MULTIPLE CORE ENVIRONMENT Speakers: Maksim Gorev, Raimund Ubar and Sergei Devadze, Tallinn University of Technology, EE Abstract A novel fault simulation method is proposed, based on exact critical path tracing beyond the Fan-out-Free Regions (FFR) throughout the fully simulated circuit. The method exploits two types of parallelism: bit-level parallelism for multiple pattern reasoning, and distribution the fault reasoning process between different cores in a multi-core processor environment. To increase the speed and accuracy of fault simulation, compared with previous methods, a mixed level fault reasoning approach is developed, were the fan-out re-convergence is handled on the higher FFR network level, and the fault simulation inside of FFRs relies on the gate-level information. To allow a uniform and seamless fault reasoning, Structurally Synthesized BDDs (SSBDD) are used for modeling on both levels. Experimental research demonstrated very promising results in increasing the speed and scalability of the method. Download Paper (PDF; Only available from the DATE venue WiFi)
09:45	9.7.4	ON THE AUTOMATIC GENERATION OF SBST TEST PROGRAMS FOR IN-FIELD TEST Speakers: Andreas Riefert¹, Riccardo Cantoro², Matthias Sauer¹, Matteo Sonza Reorda² and Bernd Becker¹ ¹University of Freiburg, DE; ²Politecnico di Torino, IT Abstract Software-based self-test (SBST) techniques are used to test processors against permanent faults introduced by the manufacturing process (often as a complementary approach with respect to DfT) or to perform in-field test in safety-critical applications. A major obstacle to their adoption is the high cost for developing effective test programs, since there is still a lack of suitable EDA algorithms and tools able to automatically generate SBST test programs. An efficient ATPG algorithm can serve as the foundation for the automatic generation of SBST test programs. In this work we first highlight the additional constraints characterizing SBST test programs wrt functional ones, with special emphasis on their usage for in-field test; then, we describe an ATPG framework targeting stuck-at faults based on Bounded Model Checking. The framework allows the user to flexibly specify the requirements of SBST test programs in the considered scenario. Finally, we demonstrate how a set of properly chosen requirements can be used to generate test programs matching these constraints. In our experiments we evaluate the framework with the miniMIPS microprocessor. The results show that the proposed method is the first able to automatically generate SBST test programs whose fault efficiency is superior to those produced with state-of-the-art manual approaches. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP4-18, 1031	EXPLORING THE IMPACT OF FUNCTIONAL TEST PROGRAMS RE-USED FOR POWER-AWARE TESTING Speakers: Aymen Touati¹, Alberto Bosio², Luigi Dilillo², Patrick Girard², Arnaud Virazel², Paolo Bernardi³ and Mateo Sonza Reorda³ ¹LIRMM, FR; ²LIRMM-UM2/CNRS, FR; ³Politecnico di Torino, IT Abstract Abstract— High power consumption during at-speed delay fault testing may lead to yield loss and premature aging. On the other hand, reducing too much test power might lead to test escape and reliability problems. Thus, to avoid these issues, test power has to map the power consumed during functional mode. Existing works target the generation of functional test programs able to maximize the power consumption in functional mode of microprocessor cores. The obtained power consumption will be used as threshold to tune the power consumed during testing. This paper investigates the impact of re-using such functional test programs for testing purposes. We propose to apply them by exploiting existing DfT architecture to maximize the delay fault coverage. Then, we combine them with the classical at-speed LOC and LOS delay fault testing schemes to further increase the fault coverage. Results show that it is possible to achieve a global test solution able to maximize the delay fault coverage while respecting the functional power budget. Keywords—Power Aware Test; Functional and Structural test; microprocessor test; ATPG. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP4-19, 621	A BREAKPOINT-BASED SILICON DEBUG TECHNIQUE WITH CYCLE-GRANULARITY FOR HANDSHAKE-BASED SOC Speakers: Hsin-Chen Chen¹, Chen-Rong Wu¹, Katherine Shu-Min Li² and Kuen-Jon Lee¹ ¹National Cheng Kung University, TW; ²National Sun Yat-sen University, TW Abstract The breakpoint-based silicon debug approach allows users to stop the normal (system) operations of the circuits under debug (CUDs), extract the internal states of the CUDs for examination, and then resume the normal operations for further debugging. However, most previous work on this approach adopts the transaction-level or handshake-level of granularity, i.e., the CUDs can be stopped only when a transaction or a handshake operation is completed. The granulations at these levels are often too coarse when a transaction or a handshake operation requires a large number of cycles to complete. In this paper, we present a novel debug mechanism, called the Protocol Agency Mechanism (PAM), which allows the breakpoint-based debug technique to be applied at the cycle- level granularity. The PAM can deal with transaction invalidation as well as protocol violation that may occur when a system is stopped and resumed. Experimental results show that the area overhead of the PAM is quite small and the performance impact on the system is negligible. Download Paper (PDF; Only available from the DATE venue WiFi)
10:02	IP4-20, 602	FAULT DIAGNOSIS IN DESIGNS WITH EXTREME LOW PIN TEST DATA COMPRESSORS Speakers: Subhadip Kundu¹, Parthajit Bhattacharya¹ and Rohit Kapur² ¹Synopsys India, IN; ²Synopsys Inc., US Abstract Diagnosis plays an important role to ramp up yield during IC manufacturing process. Limited observability due to test response compaction negatively affects the diagnosis procedure. With modern compressors - targeting very high test data compression, diagnosis becomes even more complicated. In this paper, a complete diagnosis methodology focussing on a novel mapping algorithm has been described. The mapping algorithm maps failures from compressor pins to scan cells with great accuracy (even in presence of don't cares in the responses), so that, normal scan diagnosis can be used to find out the actual defects. Experimental results on different industrial designs have proved that the proposed method almost match scan based diagnosis results. Download Paper (PDF; Only available from the DATE venue WiFi)
10:03	IP4-21, 573	OPTIMIZING DYNAMIC TRACE SIGNAL SELECTION USING MACHINE LEARNING AND LINEAR PROGRAMMING Speakers: Charlie Shucheng Zhu and Sharad Malik, Princeton University, US Abstract The success of post-silicon validation is limited by the low observability of the signals on the chip under debug. Trace buffers are used to enhance visibility of a subset of the internal signals during the chip's operation. These trace signals can be selected statically, i.e. the same trace signals are used through an entire debugging run, or dynamically where a different set of signals can be used in different parts of a debugging run. The focus of this work is on dynamic trace signal selection. Our technique uses machine learning for classification of different groups of inputs that are likely to trigger different faults, and a linear programming based optimization method for selecting the different sets of trace signals for different combinations of inputs and states. In contrast to existing methods, this technique is applicable to both transient and permanent faults. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

9.8 Hot Topic - Monolithic 3D: A Path to Real 3D Integrated Chips

Date: Thursday 12 March 2015
Time: 08:30 - 10:00
Location / Room: Salle Lesdiguières

Organisers:
Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH
Pierre-Emmanuel Gaillardon, École Polytechnique Fédérale de Lausanne (EPFL), CH

Chair:
Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH

Co-Chair:
Ian O’Connor, Institut des Nanotechnologies de Lyon, FR

As compared to standard 3D technologies, 3D Monolithic Integration (3DMI) overcomes the vertical connectivity challenge through the use of nano-scale inter-layer vias, which are orders-of-magnitude smaller than TSVs. In this hot topic session, we cover 3DMI for actual (FDSOI) and emerging (CNFETs and RRAM) technologies, and identify its promises from a design perspective.

Time	Label	Presentation Title Authors
08:30	9.8.1	A COMPREHENSIVE STUDY OF MONOLITHIC 3D CELL ON CELL DESIGN USING COMMERCIAL 2D TOOL Speakers: Olivier Billoint¹, Hossam Sarhan¹, Iyad Rayane², Maud Vinet¹, Perrine Batude¹, Claire Fenouillet-Beranger¹, Olivier Rozeau¹, Gérald Cibrario¹, Fabien Deprat¹, Aurélien Fustier¹, Jean-Eric Michallet¹, Olivier Faynot¹, Ogun Turkyilmaz¹, Jean-Frederic Christmann¹, Sébastien Thuries¹ and Fabien Clermidy¹ ¹CEA LETI, FR; ²Mentor, FR Abstract In this paper we present a methodology allowing an emulated-3D two tiers physical implementation of any design using 2D commercial tools. Place and Route is achieved through similar steps as required by 2D designs: pre clock tree synthesis (including placement), clock tree synthesis and routing; to which we added a folding step in order to emulate the 3D placement. Routing of both tiers in parallel using inter-tier metal layers is made possible by modifying input files of the tools. Our study covers power supply network on both tiers, forbidden inter-tier via on active placement and inter-tier back end flavors in order to refine quality of results. Benchmark results on two tiers 3D Monolithic integration have been done on several IPs (microcontroller, reconfigurable FFT and LDPC) using as reference ST 28nm FDSOI technology and show the correlation between cell density, routing congestion, wire length, operating frequency and power consumption. To our knowledge, this paper is the first one to evaluate monolithic 3D physical implementation using full 3D Back End description and taking into account power supply distribution on both tiers. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	9.8.2	MONOLITHIC 3D INTEGRATION: A PATH FROM CONCEPT TO REALITY Speakers: Max M. Shulaker, Tony F. Wu, Mohamed M. Sabry, Hai Wei, H.-S. Philip Wong and Subhasish Mitra, Stanford University, US Abstract Monolithic three-dimensional (3D) integration enables revolutionary digital system architectures of computation immersed in memory. Vertically-stacked layers of logic circuits and memories, with nano-scale inter-layer vias (with the same pitch and dimensions as tight-pitched metal layer vias), provide massive connectivity between the layers. The nano-scale inter-layer vias are orders of magnitude denser than conventional through silicon vias (TSVs). Such digital system architectures can achieve significant performance and energy efficiency benefits compared to today's designs. The massive vertical connectivity makes such architectures particularly attractive for abundant-data applications that impose stringent requirements with respect to low-latency data processing, high-bandwidth data transfer, and energy-efficient storage of massive amounts of data. We present an overview of our progress toward realizing monolithic 3D ICs, enabled by recent advances in emerging nanotechnologies such as carbon nanotube field-effect transistors and emerging memory technologies such as Resistive RAMs and Spin-Transfer Torque RAMs. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	9.8.3	A ULTRA-LOW-POWER FPGA BASED ON MONOLITHICALLY INTEGRATED RRAMS Speakers: Pierre-Emmanuel Gaillardon, Xifan Tang, Jury Sandrini, Maxime Thammasack, Somayyeh Rahimian Omam, Davide Sacchetto, Yusuf Leblebici and Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract Complex routing architectures, heavily using programmable switches, dominate the area, delay and power of Field Programmable Gate Arrays (FPGAs). With the ability of being monolithically integrated with CMOS chips, Resistive Random Access Memories (RRAMs) enable high-performance routing architectures through the replacement of Static Random Access Memory (SRAM)-based programming switches. Exploiting the very low on-resistance state achievable by RRAMs as well as the improved tolerance to power supply reduction, RRAM-based routing multiplexers can be used to significantly reduce the power consumption of FPGA systems with no performance compromises. By evaluating the opportunities of ultra-low-power RRAM-based FPGAs at the system level, we see an improvement of 12%, 26% and 81% in area, delay and power consumption at a mature technology node. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

IP4 Interactive Presentations

Date: Thursday 12 March 2015
Time: 10:00 - 10:30
Location / Room: Exhibition Area

Label	Presentation Title Authors
IP4-1	PWL: A PROGRESSIVE WEAR LEVELING TO MINIMIZE DATA MIGRATION OVERHEADS FOR NAND FLASH DEVICES Speakers: Fu-Hsin Chen¹, Ming-Chang Yang², Yuan-Hao Chang³ and Tei-Wei Kuo⁴ ¹Department of Computer Science and Information Engineering, National Taiwan University, TW; ²Graduate Institute of Networking and Multimedia, National Taiwan University, TW; ³Institute of Information Science, Academia Sinica, TW; ⁴Academia Sinica & National Taiwan University, TW Abstract As the endurance of flash memory keeps deteriorating, exploiting wear leveling techniques to improve the lifetime/endurance of flash memory has become a critical issue in the design of flash storage devices. In contrast to existing wear-leveling techniques that aggressively distributes the erases to all flash blocks by a fixed threshold, we propose a progressive wear leveling design to perform wear leveling in a "progressive" way to prevent any block from being worn out prematurely, and thereby to ultimately minimize the performance overheads caused by the unnecessary data migration. The results reveal that, instead of sacrificing the device lifetime, performing wear leveling in such a progressive way can not only minimize the performance overheads but even have potentials to extend the device lifespan. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-2	TOWARDS TRUSTABLE STORAGE USING SSDS WITH PROPRIETARY FTL Speakers: Xiaotong Cui¹, Minhui Zou¹, Liang Shi² and Kaijie Wu¹ ¹Chongqing University, CN; ²College of Computer Science, Chongqing University, CN Abstract In recent years, we have seen an increasing deployment of flash-based storage, such as SSD, in mission-critical applications due to its fast read/write speed, small form factor, strong shock resistance, and etc. SSD uses a host interface and a middle layer called flash translation layer (FTL) to maintain the compatibility with the traditional magnetic-based HDD. Unlike the traditional HDD where the host OS has the full control on where to access the data, SSD uses FTL to translate and implement all operations, and OS has no such control. Even worse, FTL, which is considered as one of most important intellectual property of SSD, is often proprietary. This brings up a security concern on design trustworthiness: what if the manufacturer either accidentally or intentionally implement those operations incorrectly or even maliciously? In this paper we analyze the possible threats and propose a simple yet effective countermeasure. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-3	USER-SPECIFIC SKIN TEMPERATURE-AWARE DVFS FOR SMARTPHONES Speakers: Begum Birsen Egilmez¹, Gokhan Memik¹, Seda Ogrenci-Memik¹ and Oğuz Ergin² ¹Northwestern University, US; ²TOBB University of Economics and Technology, TR Abstract Skin temperature of mobile devices intimately affects the user experience. Power management schemes built into smartphones can lead to quickly crossing a user's threshold of tolerable skin temperature. Furthermore, there is a significant variation among users in terms of their sensitivity. Hence, controlling the skin temperature as part of the device's power management scheme is paramount. To achieve this, we first present a method for estimating skin and screen temperature at run-time using a combination of available on-device thermal sensors and performance indicators. In an Android-based smartphone, we achieve 99.05% and 99.14% accuracy in estimations of back cover and screen temperatures, respectively. Leveraging this run-time predictor, we develop User-specific Skin Temperature-Aware (USTA) DVFS mechanism to control the skin temperature. Performance of USTA is tested both with benchmarks and user tests comparing USTA to the standard Android governor. The results show that more users prefer to use USTA as opposed to the default DVFS mechanism. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-4	FORMAL PROBABILISTIC ANALYSIS OF DISTRIBUTED DYNAMIC THERMAL MANAGEMENT Speaker: Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE Authors: Shafaq Iqtedar¹, Osman Hasan², Muhammad Shafique³ and Joerg Henkel³ ¹National University of Sciences and Technology (NUST), Islamabad, PK; ²National University of Sciences and Technology (NUST), Islamabad, ; ³Karlsruhe Institute of Technology (KIT), DE Abstract The prevalence of Dynamic Thermal Management (DTM) schemes coupled with demands for high reliability motivate the rigorous verification and testing of these schemes before deployment. Conventionally, these schemes are analyzed using either simulations or by running on real systems. But these traditional analysis techniques cannot exhaustively validate the distributed DTM schemes and thus compromise on the accuracy of the analysis results. Moreover, the randomness due to task assignments, task completion times and re-mappings, is often ignored in the analysis of distributed DTM schemes. We propose to overcome both of these limitations by using probabilistic model checking, which is a formal method for modeling and verifying concurrent systems with randomized behaviors. The paper presents a case study on the formal verification of a state-ofthe- art distributed DTM scheme using the PRISM model checker. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-5	A HYBRID QUASI MONTE CARLO METHOD FOR YIELD AWARE ANALOG CIRCUIT SIZING TOOL Speakers: Engin Afacan, Günhan Dündar, Gonenc Berkol, Ali Emre Pusane and İsmail Faik Baskaya, Bogazici University, TR Abstract Efficient yield estimation methods are required by yield aware automatic sizing tools, where many iterative variability analyses are performed. Quasi Monte Carlo (QMC) is a popular approach, in which samples are generated more homogeneously, hence faster convergence is obtained compared to the conventional MC. However, since QMC is deterministic and has no natural variance, there is no convenient way to obtain estimation error bounds. To determine the confidence interval of the estimated yield, scrambled QMC, in which samples are randomly permuted, is run multiple times to obtain stochastic variance by sacrificing computational cost. To palliate this challenge, this paper proposes a hybrid method, where a single QMC is performed to determine infeasible solutions in terms of yield, which is followed by a few scrambled QMC analyses providing variance and confidence interval of the estimated yield. Yield optimization is performed considering the worst case of the current estimation, thus the optimizer guarantees that the solution will satisfy the confidence interval. Furthermore, a yield ranking mechanism is also developed to enforce the optimizer to search for more robust solutions. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-6	FEATURE SELECTION FOR ALTERNATE TEST USING WRAPPERS: APPLICATION TO AN RF LNA CASE STUDY Speakers: Manuel Barragan¹ and Gildas Leger² ¹TIMA Laboratory, FR; ²Instituto de Microelectronica de Sevilla, IMSE-CNM, (CSIC - Universidad de Sevilla), ES Abstract Testing analog, mixed-signal and RF circuits represents the main cost component for testing complex SoCs. A promising solution to alleviate this cost is the Alternate Test strategy. Alternate test is an indirect test approach that replaces costly specification measurements by simpler signatures. Machine learning techniques are then used to map signatures and performances. One key point that still remains as an open problem is the conception of adequate simple measurement candidates. This work presents efficient algorithms for selecting information rich signatures. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-7	IMPROVING SIMD CODE GENERATION IN QEMU Speakers: Sheng-Yu Fu¹, Jan-Jan Wu² and Wei-Chung Hsu¹ ¹Department of Computer Science National Taiwan University, TW; ²Institute of Information Science Academia Sinica, TW Abstract Modern processors are often enhanced with SIMD instructions. For examples, the MMX, SSE, and AVX instruction set in the x86 architecture, and the Neon instruction set in the ARM architecture are SIMD instructions. Using these SIMD instructions could significantly increase the performance of applications, hence application binaries are likely to have a good fraction of instructions that are SIMD instructions. However, SIMD instruction translation has not attacked much attention in Dynamic Binary Translation (DBT). For example, in the popular QEMU system emulator, guest SIMD instructions are often emulated with a sequence of scalar instructions even when the host machines do have SIMD instructions to support such parallel computation, leaving a large potential for performance enhancement. In this paper, we propose two approaches, one to leverage the existing helper function implementation in QEMU, and the other to use a newly introduced vector IR (Intermediate Representation) to enhance the performance of SIMD instructions translation in DBT of QEMU. The approaches have been implemented in the QEMU to support ARM and IA32 frontend and x86-64 backend. Our preliminary experiments show that adding vector IR can significantly enhance the performance of guest applications containing SIMD instructions for both ARM and IA32 architectures when running with QEMU on the x86-64 platform. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-8	REUSE DISTANCE ANALYSIS FOR LOCALITY OPTIMIZATION IN LOOP-DOMINATED APPLICATIONS Speakers: Christakis Lezos, Grigoris Dimitroulakos and Konstantinos Masselos, University of Peloponnese, GR Abstract This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C code and exposes critical parts for memory related optimizations on embedded systems that can heavily affect systems performance, power and cost. The tool includes enhanced features for data reuse distance analysis and source code transformation recommendations for temporal locality optimization. Several of data reuse distance measurement algorithms have been implemented leading to different trade-offs between accuracy and profiling execution time. The proposed tool can be easily and seamlessly integrated into different software development environments offering a unified environment for application development and optimization. The novelties of our work over a similar optimization tool are also discussed. MemAddIn has been applied for the dynamic computation of data reuse distance for a number of different applications. Experimental results prove the effectiveness of the tool through the analysis and optimization of a realistic image processing application. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-9	TAPP: TEMPERATURE-AWARE APPLICATION MAPPING FOR NOC-BASED MANY-CORE PROCESSORS Speakers: Di Zhu, Lizhong Chen, Timothy Pinkston and Massoud Pedram, University of Southern California, US Abstract Application mapping with its ability to spread out high-power components can potentially be a good approach to mitigate the looming issue of hotspots in many-core processors. However, very few works have explored effective ways of making tradeoff between temperature and network latency. Moreover, on-chip routers, which are of high power density and may lead to hotspots, are not considered in these works. In this paper, we propose TAPP (Temperature-Aware Partitioning and Placement), an efficient application mapping algorithm to reduce on-chip hotspots while sacrificing little network performance. This algorithm "spreads" high-power cores and routers across the chip by performing hierarchical bi-partitioning of the cores and concurrently conducting placement of the cores onto tiles, and achieves high efficiency and superior scalability. Simulation results show that the proposed algorithm reduces the temperature by up to 6.80°C with minimal latency increase compared to the latency-oriented mapping solution. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-10	MALLEABLE NOC: DARK SILICON INSPIRED ADAPTABLE NETWORK ON CHIP Speakers: Haseeb Bokhari¹, Haris Javaid², Muhammad Shafique³, Joerg Henkel³ and Sri Parameswaran¹ ¹University of New South Wales, AU; ²Google Inc., ; ³Karlsruhe Institute of Technology (KIT), DE Abstract Network on Chip (NoC) has been envisioned as a scalable fabric for many core chips. However, NoCs can consume a considerable share of chip power. Moreover, diverse applications are executed in these multicore, where each application imposes a unique load on the NoC. To realise a NoC which is Energy and Delay efficient, we propose combining multiple VF optimized routers for each node (in traditional NoCs, we have only a single router per node) for efficient NoC for Dark Silicon chips. We present a generic NoC with routers designed for different VF levels, which are distributed across the chip. At runtime, depending on application profile, we combine these VF optimized routers to form constantly changing energy efficient NoC fabric. We call our architecture Malleable NoC. In this paper, we describe the architectural details of the proposed architecture and the runtime algorithms required to dynamically adapt the NoC resources. We show that for a variety of multi program benchmarks executing on Malleable NoC, Energy Delay product (EDP) can be reduced by up to 46% for widely differing workloads. We further show the effect on EDP savings for differing amounts of dark silicon area budget. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-11	TOPOLOGY IDENTIFICATION FOR SMART CELLS IN MODULAR BATTERIES Speakers: Sebastian Steinhorst and Martin Lukasiewycz, TUM CREATE, SG Abstract This paper proposes an approach to automatically identifying the topological order of smart cells in modular batteries. Emerging smart cell architectures enable battery management without centralized control by coordination of activities via communication. When connecting smart cells in series to form a battery pack, the topological order of the cells is not known and it cannot be automatically identified using the available communication bus. This order, however, is of particular importance for several battery management functions, including temperature control and active cell balancing which relate properties of the cells and their location. Therefore, this paper presents a methodology to automatically identify a topological order on the smart cells in a battery pack using a hybrid communication approach, involving both the communication and the balancing layer of the smart cell architecture. A prototypic implementation on a development platform shows the feasibility and scalability of the approach. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-12	LVS CHECK FOR PHOTONIC INTEGRATED CIRCUIT - CURVILINEAR FEATURE EXTRACTION AND VALIDATION Speakers: Ruping Cao¹, Julien Billoudet¹, John Ferguson¹, Lionel Couder², John Cayo², Alexandre Arriordaz¹ and Ian O'Connor³ ¹Mentor Graphics Corp, FR; ²Mentor Graphics Corp, US; ³Lyon Institute of Nanotechnology, FR Abstract This work is motivated by the demand of an electronic design automation (EDA) approach for the emerging ecosystem of the photonic integrated circuit (PIC) technology. A reliable physical verification flow cannot be achieved without the adaption of the traditional EDA tools to the photonic design verification needs. We analyze how layout versus schematic (LVS) checking is performed differently for photonic designs, and propose an LVS flow that addresses the particular need of curvilinear feature validation (curved path length and bend curvature extraction). We show that it is possible to reuse and extend the current LVS tools to perform such critical but non-traditional checks, which ensures a more reliable photonic layout implementation in term of functionality and circuit yield. Going forward, we propose possible future studies that can further improve the flows. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-13	FP-SCHEDULING FOR MODE-CONTROLLED DATAFLOW: A CASE STUDY Speakers: Alok Lele¹, Orlando Moreira² and Kees van Berkel² ¹Eindhoven University of Technology, NL; ²Ericsson B.V., NL Abstract Dual-Radio Simultaneous Access (DRSA) is an emerging topic in Software Defined Radio (SDR) in which two SDRs are running simultaneously on a shared hardware, typically a heterogeneous Multi-Processor System-on-Chip (MPSoC). Each SDR has a independent hard latency and/or throughput requirement and needs rigorous timing analysis. Moreover, SDRs are often modeled in enriched variants of dataflow to accommodate the growing dynamic execution of SDRs, making it a challenge to perform timing analysis on them. This paper considers the preemptive Fixed Priority Scheduling (FPS) of SDRs modeled in emph{Mode-Controlled Dataflow}. To the best of our knowledge this is the first attempt on static timing analysis of FPS for a (semi-)dynamic variant of synchronous dataflow. We propose a two-phase algorithm to determine the worst-case response time of an actor. We demonstrate our analysis results for a DRSA case study of two 4G-LTE receivers. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-14	AGEING SIMULATION OF ANALOGUE CIRCUITS AND SYSTEMS USING ADAPTIVE TRANSIENT EVALUATION Speakers: Felix Salfelder and Lars Hedrich, Goethe-Universitat Frankfurt a. M., DE Abstract Simulating ageing effects in analogue circuits requires both ageing models and a circuit simulator which is capable of a stress dependent, ageing and recovery aware model evaluation during long term transient simulation. Common approaches on reliability simulation often involve aged models, age precomputation, or lookup tables instead of integrated ageing simulation using memory aware ageing models. Long term transient ageing simulation enhances reliability simulation. This paper presents a framework to model and simulate ageing effects using an adaptive two-times evaluation scheme. This integrates full ageing effect models into behavioural device models. In addition, we introduce semantics for modelling stress levels and ageing parameters in hardware description languages. Our approach is a fully integrated simulation solution, enabling correct and efficient simulation of ageing systems over their lifetimes. We demonstrate how transistor level ageing effects critically affect the operation of a circuit. Our examples incorporate ageing monitors, redundant parts, and self-repair functionality into analogue systems. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-15	A TOOL FOR THE ASSISTED DESIGN OF CHARGE REDISTRIBUTION SAR ADCS Speakers: Stefano Brenna¹, Andrea Bonetti², Andrea Bonfanti¹ and Andrea L. Lacaita¹ ¹Politecnico di Milano, IT; ²École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract The optimal design of SAR ADCs requires the accurate estimate of nonlinearity and parasitic effects in the feedback charge-redistribution DAC. Since the effects of both mismatch and stray capacitances depend on the specific array topology, complex calculations, custom modeling and heavy simulations in common circuit design environments are often required. This paper presents a MATLAB-based numerical tool to assist the design of the charge redistribution DACs adopted in SAR ADCs. The tool performs both parametric and statistical simulations taking into account capacitive mismatch and parasitic capacitances thus computing both differential and integral nonlinearity (DNL, INL). SNDR and ENoB degradation due to static non-linear effects is also estimated. An excellent agreement is obtained with the results of circuit simulators (e.g. Cadence Spectre) featuring up to 104 shorter simulation time, allowing statistical simulations which would be otherwise impracticable. Measurements on two fabricated SAR ADCs confirm that the proposed tool can be used as avalid instrument to assist the design of a charge redistribution SAR ADC and predict its static and dynamic metrics. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-16	DETECTION OF ASYMMETRIC AGING-CRITICAL VOLTAGE CONDITIONS IN ANALOG POWER-DOWN MODE Speakers: Michael Zwerger and Helmut Graeb, Technische Universitaet Muenchen, DE Abstract In this work, a new verification method for the power-down mode of analog circuit blocks is presented. In power-down mode, matched transistors can be stressed with asymmetric voltages. This will cause time-dependent mismatch due to transistor aging. In order to avoid reliability problems, a new method for automatic detection of asymmetric power-down stress conditions is presented. Therefore, power-down voltage-matching rules are formulated. The method combines structural analysis and voltage propagation. Experimental results demonstrate the efficiency and effectiveness of the approach. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-17	(Best Paper Award Candidate) HIGH PERFORMANCE SINGLE SUPPLY CMOS INVERTER LEVEL UP SHIFTER FOR MULTI-SUPPLY VOLTAGES DOMAINS Speakers: José-C. García¹, Juan A. Montiel-Nelson¹, J. Sosa¹ and Saeid Nooshabadi² ¹Institute for Applied Microelectronics, ES; ²Department of Electrical and Computer Engineering of Michigan Technological University, US Abstract A single supply CMOS inverter level shifter (ssqc-ls) for upconverting signals from 0.4V-1V logic level range up to 1.1V power supply domain is introduced. For guaranteing a low energy consumption, the proposed shifter is based on topological modifications of the structure qc-level shifter reported in [1]. For 0.5V input square wave switching at 500MHz, the inverter level shifter ssqc-ls using 1.2V of power supply achieves a 60% of Figure of Merit improvement in comparison against jy-ls [8] with a dual power supply voltage of 0.6V and 1.2V. Post-layout simulation results shown that ssqc-ls reaches a propagation delay of 0.75ns, an energy consumption of only 2.3pJ, and an energy-delay product of 1.73pJns for a capacitive loading condition of 950fF. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-18	EXPLORING THE IMPACT OF FUNCTIONAL TEST PROGRAMS RE-USED FOR POWER-AWARE TESTING Speakers: Aymen Touati¹, Alberto Bosio², Luigi Dilillo², Patrick Girard², Arnaud Virazel², Paolo Bernardi³ and Mateo Sonza Reorda³ ¹LIRMM, FR; ²LIRMM-UM2/CNRS, FR; ³Politecnico di Torino, IT Abstract Abstract— High power consumption during at-speed delay fault testing may lead to yield loss and premature aging. On the other hand, reducing too much test power might lead to test escape and reliability problems. Thus, to avoid these issues, test power has to map the power consumed during functional mode. Existing works target the generation of functional test programs able to maximize the power consumption in functional mode of microprocessor cores. The obtained power consumption will be used as threshold to tune the power consumed during testing. This paper investigates the impact of re-using such functional test programs for testing purposes. We propose to apply them by exploiting existing DfT architecture to maximize the delay fault coverage. Then, we combine them with the classical at-speed LOC and LOS delay fault testing schemes to further increase the fault coverage. Results show that it is possible to achieve a global test solution able to maximize the delay fault coverage while respecting the functional power budget. Keywords—Power Aware Test; Functional and Structural test; microprocessor test; ATPG. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-19	A BREAKPOINT-BASED SILICON DEBUG TECHNIQUE WITH CYCLE-GRANULARITY FOR HANDSHAKE-BASED SOC Speakers: Hsin-Chen Chen¹, Chen-Rong Wu¹, Katherine Shu-Min Li² and Kuen-Jon Lee¹ ¹National Cheng Kung University, TW; ²National Sun Yat-sen University, TW Abstract The breakpoint-based silicon debug approach allows users to stop the normal (system) operations of the circuits under debug (CUDs), extract the internal states of the CUDs for examination, and then resume the normal operations for further debugging. However, most previous work on this approach adopts the transaction-level or handshake-level of granularity, i.e., the CUDs can be stopped only when a transaction or a handshake operation is completed. The granulations at these levels are often too coarse when a transaction or a handshake operation requires a large number of cycles to complete. In this paper, we present a novel debug mechanism, called the Protocol Agency Mechanism (PAM), which allows the breakpoint-based debug technique to be applied at the cycle- level granularity. The PAM can deal with transaction invalidation as well as protocol violation that may occur when a system is stopped and resumed. Experimental results show that the area overhead of the PAM is quite small and the performance impact on the system is negligible. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-20	FAULT DIAGNOSIS IN DESIGNS WITH EXTREME LOW PIN TEST DATA COMPRESSORS Speakers: Subhadip Kundu¹, Parthajit Bhattacharya¹ and Rohit Kapur² ¹Synopsys India, IN; ²Synopsys Inc., US Abstract Diagnosis plays an important role to ramp up yield during IC manufacturing process. Limited observability due to test response compaction negatively affects the diagnosis procedure. With modern compressors - targeting very high test data compression, diagnosis becomes even more complicated. In this paper, a complete diagnosis methodology focussing on a novel mapping algorithm has been described. The mapping algorithm maps failures from compressor pins to scan cells with great accuracy (even in presence of don't cares in the responses), so that, normal scan diagnosis can be used to find out the actual defects. Experimental results on different industrial designs have proved that the proposed method almost match scan based diagnosis results. Download Paper (PDF; Only available from the DATE venue WiFi)
IP4-21	OPTIMIZING DYNAMIC TRACE SIGNAL SELECTION USING MACHINE LEARNING AND LINEAR PROGRAMMING Speakers: Charlie Shucheng Zhu and Sharad Malik, Princeton University, US Abstract The success of post-silicon validation is limited by the low observability of the signals on the chip under debug. Trace buffers are used to enhance visibility of a subset of the internal signals during the chip's operation. These trace signals can be selected statically, i.e. the same trace signals are used through an entire debugging run, or dynamically where a different set of signals can be used in different parts of a debugging run. The focus of this work is on dynamic trace signal selection. Our technique uses machine learning for classification of different groups of inputs that are likely to trigger different faults, and a linear programming based optimization method for selecting the different sets of trace signals for different combinations of inputs and states. In contrast to existing methods, this technique is applicable to both transient and permanent faults. Download Paper (PDF; Only available from the DATE venue WiFi)

UB09 Session 9

Date: Thursday 12 March 2015
Time: 10:00 - 12:00
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB09.1	VDA-ADMF: AN AGILE MIGRATION FRAMEWORK FOR ANALOG LAYOUT DESIGN Presenter: Po-Cheng Pan, National Chiao Tung University, TW Authors: Ching-Yu Chin¹, Hung-Ming Chen¹, Tung-Chieh Chen², Jou-Chun Lin² and Yi-Peng Weng³ ¹National Chiao Tung University, TW; ²Synopsys Co., Ltd., TW; ³Taiwan Semiconductor Manufacturing Company, TW Abstract Layout generation in the late analog CMOS design is challenging by its increasing layout constraints and performance requirements. However, iterative refinement on manual design damages the productivity of analog layout. Therefore, it is more efficient to enroll the know-how from existing design instead of generating a new one. To contend with time-consuming analog layout for more possibilities, this software aims to demonstrate a fast layout prototyping framework for migration purpose into real layout design. In our framework, a reference analog layout design is given to generate potential layout candidates at the objective technology. The demonstration includes the original layout, the extracted topology with placement and routing, the generated layout figures, the dumped layout results and the simulated results. This procedure of migration provides a convincing exhibition of our migration framework. More information ...
UB09.2	AN FPGA LAB-ON-CHIP: AN ANALYSIS TOOL AND FRAMEWORK FOR ADVANCED MEASUREMENTS AND RELIABILITY ASSESSMENTS ON MODERN NANOSCALE FPGAS Presenter: Petr Pfeifer, Technical University of Liberec, CZ Abstract Wide portfolio of new technologies in design and manufacturing of advanced integrated circuits enables higher integration of complex structures at ultra-high nanoscale densities, but also sensitivity to various changes of the internal nanostructures and their parameters, resulting in the requirement of advanced reliability assessments. The developed and presented revolutionary new set of tools enables complex lab-on-chip solutions in nanoscale FPGAs and it allows easy implementation of tasks like completely on-chip internal parameter measurements in FPGAs, actual structure delays with respect to environmental parameters, device and platform identification, validation of selected design parameters, identification of crosstalk path and mutual impacts, as well as various changes in internal parameters. It actively supports design reconfiguration. The set of tools can be used for fast standalone or system built-in post-production device and platform parameter and quality checking and validation, parameter-aware placement and routing of critical design parts and performance optimization of existing designs, device aging identification and measurement, active and online data generation for reliability assessments and design reliability enhancements. It is available for FPGAs from 90nm down and will be demonstrated on advanced 28nm Xilinx FPGAs. More information ...
UB09.3	LINUX ON TSAR: PORTING THE LINUX KERNEL TO THE TSAR MANYCORE ARCHITECTURE Presenter: César Fuguet Tortolero, UPMC-LIP6, FR Authors: Joël Porquet and Alain Greiner, UPMC-LIP6, FR Abstract In this demonstration, we explain how we ported a Linux-based Operating System to the TSAR manycore architecture. In the associated poster, we describe the TSAR architecture and enumerate the pieces of software that usually need to be ported for a new processor architecture, and we give further details about our port. We also demonstrate this work by running Linux on a FPGA-based prototype of TSAR. The demo shows the entire boot process, from the powerup to the terminal prompt where the user can type in commands and interact with the hardware system. More information ...
UB09.4	ISP RAS VERIFICATION TOOLS: INTEGRATED APPROACH TO HARDWARE VERIFICATION AT UNIT AND SYSTEM LEVELS BASED ON STATIC AND DYNAMIC METHODS Presenter: Andrei Tatarnikov, Institute for System Programming of the Russian Academy of Sciences (ISP RAS), RU Authors: Mikhail Chupilko, Alexander Kamkin, Artem Kotsynyak and Sergey Smolov, Institute for System Programming of the Russian Academy of Sciences (ISP RAS), RU Abstract Verification has long been recognized as an integral part of the hardware design process. As each hardware design is developed from unit- and core-level point of view, verification process should account this fact and provide means for dealing with both of them. Applied approaches include both static (formal methods, source code analysis) and dynamic (testing) methods. To facilitate verification, it is important to provide a uniform methodology that would allow integrating different approaches. In this work, we present a set of verification tools that takes advantage exactly of combining static and dynamic approaches. This allows knowledge sharing between tools, which helps to build more accurate models of hardware designs to be used in verification activities at different levels of abstraction. Brief descriptions of the tools are given below. MicroTESK is a reconfigurable (retargetable and extendable) model-based test program generator for microprocessors and other programmable devices. Lightweight formal specifications customize the generator for a particular architecture and provide knowledge about situations to be covered by tests. A convenient test template framework allows rapid development of complex verification scenarios. Being retargetable, MicroTESK is able to support various RISC and CISC architectures. C++TESK is an open-source C++ based toolkit intended for automated functional testing of software components (mostly in C/C++) and RTL (HDL) models of digital hardware (in Verilog and VHDL). The main part of the toolkit is a library of C++ classes and macros that define facilities for constructing formal specifications (reference models), adapters of components under test, test scenarios and test coverage metrics. Basing on C++ descriptions provided by a user, a test system is compiled. It allows automatically generating and applying sequences of stimuli to the component under test, checking correctness of its reactions and collecting statistics on test execution. Besides the basic library, the toolkit includes a report generator, means for parallelizing test execution on computer clusters, and Eclipse-based IDE. The toolkit is planned to be integrated into UVM methodology. Retrascope is an extendable toolkit for RTL (HDL) models transformation and functional verification at unit level. Analyzing source HDL-code, it extracts control and data flows, transforms them into Extended Finite State Machines (EFSM), and generates covering test sequences for them. The toolkit supports RTL modules written in VHDL and Verilog. It can be used both from command line and from Eclipse-based IDE. More information ...
UB09.5	REAL-TIME PATTERN DETECTION OF MOVEMENT RELATED POTENTIALS BY SYNCHRONIZED EEG AND EMG Presenter: Valerio Francesco Annese, Politecnico di Bari, IT Author: Daniela De Venuto, Politecnico di Bari, IT Abstract Before the conscious intention to perform any voluntary movement, our brain has already activated the action, 1s before the muscle activity actually starts. The brain processes are necessary to determine the performance of the movement itself. The presence of both the premotor potential (also called "Bereitschaftspotential" or "Readiness Potential" in the 2-5 Hz band) and the Mu-rhythm (in the 7-12 Hz) is particularly interesting for the detection of voluntary movement. Therefore, the detection of these movements' related potentials (MRPs), before the EMG activation, indicates the movement intentionality. Due to the presence of artifacts (blinking, eye movement, swallowing etc.) that spoil EEG signals, the real-time detection of MRPs is particularly challenging. In this proposal, we describe a complete wearable system performing synchronous EEG and EMG monitoring to on-line detect MRPs and prevent unintentional and dangerous movements. The Bereitschaftspotential (BP) and Mu-rhythm detection is carried out through a wavelet analysis on differential signals captured 1-second before the EMG activation. This differential approach allows discerning if the recorded EEG activity is related to the motor cortex or if it is just a common artifact. The EEG/EMG monitoring system can face the strict requirements of ambient assisted living application (AAL), taking care of aged and disable people in a domestic environment. Specifically for this application, the system can be configured as following: data from 12 EEG channels are firstly collected in a central unit that wirelessly communicates with the gateway (24 bit resolution - 500 Hz sampling rate), the gateway receives data also from each of the 8 EMG nodes (12-bit resolution, 500 Hz sampling rate). For a comfortably use, a battery life of - at least - 10 hours, have to be implemented. Moreover, a working range of 10 meters (between nodes and gateway) is considered. Above all, the requirement of wearability is achieved by the transfer printing technology, produced using photolithography and dry etch techniques, that allows the creation of wireless, tiny and lightweight electrodes for both EEG and EMG printed on bio-polymers (Polycaprolactone). Since a huge amount of retrieved data is expected, a data rate of 250 kbps (~31 kBps) is needed: a good compromise in terms of power consumption and data rate is achieved through the standard IEEE- 802.15.1 (Bluetooth low energy -BLE). The gateway unit (a smartphone or a tabled) receives the EEG and EMG sensor data and performs signal analysis to identify possible MRPs patterns through wavelet analysis. In this contribute it will be delineated as case study the possible implementation in fall prevention where not only the unwanted muscle movement is detected but also a bio-feedback is activated to block the muscle and inform an assistive center. Nevertheless, the field of application of the system here presented covers a wide range of AAL applications including fall prevention, rehabilitation (i.e. walk monitoring), artificial limb control and neurodegenerative diseases diagnosis. More information ...
UB09.6	FUNCTIONAL ECO: AN EFFICIENT REWIRING ENHANCED FUNCTIONAL ECO Presenter: Tak Kei Lam, The Chinese University of Hong Kong, HK Authors: Xing Wei¹, Yi Diao¹, Tak Kei Lam² and Yu-Liang Wu¹ ¹Easy-Logic Technology Limited, HK; ²The Chinese University of Hong Kong, HK Abstract Circuit designs have been much more complex nowadays. Bugs and/or specification changes often happen in late design cycles. Running the whole design cycle again is time consuming and costly. Functional engineering change order (ECO), which is the process that patches an old implementation to accomplish a new specification, is therefore performed instead to save time and cost. In an ECO effort, minimizing the patch size is crucial since it gives a higher chance of successful insertion and minimal perturbation to a near or completely committed EDA outcome (e.g. satisfaction on area and timing constraints). However, an ECO work can be very difficult at this stage as the combinational signals of the old specification may have vanished after iterations of synthesis and optimizations. We implemented a practical prototype for functional ECO. Our result outperforms all results publicized in the ICCAD 2012 Contest. More information ...
UB09.7	3D-COSTAR: USING 3D-COSTAR FOR 2.5D-/3D-SIC COST ANALYSIS Presenter: Mottaqiallah Taouil, TU Delft, NL Authors: Mottaqiallah Taouil¹, Said Hamdioui¹ and Erik Jan Marinissen² ¹TU Delft, NL; ²IMEC, BE Abstract Selecting an appropriate and efficient test flow for a 2.5D/3D Stacked IC (2.5D-SIC/3D-SIC) is crucial for overall cost optimization. In this demonstration, we present 3D-COSTAR, a tool that considers costs involved in the whole 2.5D/3D-SIC chain, including design, manufacturing, test, packaging and logistics, e.g. related to shipping wafers between a foundry and a test house; and provides the estimated overall cost for 2.5D/3D-SICs and its cost breakdown for a given input parameter set, e.g., test flows, die yield and stack yield. Several case studies will be presented in which the overall cost and product quality (in defective parts per million) are analyzed. More information ...
UB09.8	WORKCRAFT: FRAMEWORK FOR INTERPRETED GRAPHS Presenter: Danil Sokolov, Newcastle University, GB Abstract Workcraft is a cross-platform framework for capture, simulation, synthesis and verification of graph models. It supports a wide range of popular graph formalisms and provides a plugin-based framework for modelling and analysis of new model types. More information ...
UB09.9	CRYPTOCHIP: DEMONSTRATION OF CRYPTOGRAPHIC ASIC PROTOTYPE Presenter: Xuan Thuy Ngo, Télécom ParisTech, FR Authors: Xuan Thuy Ngo, Jean-Luc Danger, Sylvain Guilley, Tarik Graba, Yves Mathieu and Zakaria Najm, Télécom ParisTech, FR Abstract We want to demonstrate a cryptographic ASIC implemented in ST 65nm technology. It features the following IPs: - Open Loop True Random Number Generator (TRNG). - Loop Physical Unclonable Function (PUF). - SRAM PUF. - Secure Clock. - Digital Sensor. - Advanced Encryption Standard (AES) with Piret-Trojan. - Active Shield. The demo consists in presenting the functionality and the security level of some of those IPs. More information ...
12:00	End of session
12:30	Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.1 SPECIAL DAY Hot Topic: Wearable Medical Applications

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Salle Oisans

Organiser:
Jo De Boeck, IMEC, BE

Chair:
Renzo Dal Molin, Sorin Group, FR

Co-Chair:
Chris Van Hoof, IMEC, BE

Wearable devices are hot. This session will treat opportunities in technology and application for devices that assist in prevention and monitoring in selected cases.

Time	Label	Presentation Title Authors
11:00	10.1.1	MOBILE HEALTH MONITORING: ADOPTION AND SYSTEM CHALLENGES Speaker: David Shanes, BioTelemetry, Inc., US Abstract Biotelemetry, formerly Cardionet, is leader in the advancement of mobile health monitoring by providing innovative products and services to help healthcare professionals track and diagnose patients in a more efficient, accurate, and cost-effective manner. This presentation will indicate the challenges in the adoption of innovative medical device systems including how to excite and capture customers. Also, it will address medical application considerations in electronic systems design where a unique system of requirements needs to be met including factors such as availability, dependability, ruggedness, operational environments, testing and documentation.
11:30	10.1.2	WEARABLE DEVICE FOR PHYSICAL AND EMOTIONAL HEALTH MONITORING Speaker: Srinivasan Murali, SmartCardia, CH Abstract Personal health monitoring systems are emerging as promising solutions to tackle health-care costs and delivery. There is a growing interest within the medical community in developing ultra-small, portable devices that can continuously monitor and process several vital body parameters, such as the Electrocardiogram (ECG) and breathing. In this talk, we will present a wearable device for physical and emotional health monitoring. The device obtains user's key physiological signals: ECG, breathing and skin conductance and derives the user's emotion states as well. We will present case studies on how the technology can improve physical and emotional health of users and the key challenges encountered during the design process.
12:00	10.1.3	GAIT ANALYSIS FOR FALL PREDICTION USING HIERARCHICAL TEXTILE-BASED CAPACITIVE SENSOR ARRAYS Speaker: Rebecca Baldwin, University of Maryland, Baltimore County, US Authors: Rebecca Baldwin, Stanislav Bobovych, Ryan Robucci, Nilanjan Banerjee and Chintan Patel, University of Maryland, Baltimore County, US Abstract Falls are a major cause of injuries in adults above the age of sixty-five. The economic aftermath of falls and their consequent hospitalization can be huge, totally more than 30 billion dollars in 2010 alone. A plausible way of mitigating this problem is accurate prediction of future falls and taking proactive remedial action. Spatio-temporal variation in gait is a reliable indicator of a future fall, however, existing systems focus on gait analysis in clinical settings and are not tuned towards continuous gait analysis. In this paper, we present the design of a novel textile capacitive sensor array-based system built into clothing that can reliably capture spatio-temporal gait attributes in a home setting. A key novel research contribution of our work is a context-aware hierarchical signal processing architecture that breaks down the signal processing algorithm into a hierarchy of processing elements. The lower power processing components perform generic feature extraction using observations derived from the capacitor plates, while the higher-level processors aggregate features to infer gait attributes such as stride speed and inter-leg spacing. The system activates the higher power processing elements only when it detects walking. We have prototyped our system using textile capacitive plates built into an ace-bandage and a custom FPGA-based system and show that our system can accurately detect gait attributes that have high correlation with falls, while consuming minimal energy as estimated for a multi-clock-domain 180-nm IC. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.2 Emerging Memory Architectures

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Belle Etoile

Chair:
Luca Perniola, CEA-Leti, FR

Co-Chair:
Pierre-Emmanuel Gaillardon, École Polytechnique Fédérale de Lausanne (EPFL), CH

Memories are of utmost importance in modern electronic systems. Emerging memory technologies hold a lot of promise to further integration density and performance levels, while reducing energy consumption. In this session, the first two papers introduce innovative solutions for better control of the endurance limitations of novel memories, while the last two papers investigate the gains in performance metrics from a system-level perspective.

Time	Label	Presentation Title Authors
11:00	10.2.1	HRERAM: A HYBRID RECONFIGURABLE RESISTIVE RANDOM-ACCESS MEMORY Speakers: Miguel Angel Lastras-Montaño, Amirali Ghofrani and Kwang-Ting Cheng, UC Santa Barbara, US Abstract Passive crossbar arrays of memristors have been identified as excellent alternatives for future random-access memories. One limitation is their inability of selecting a memory cell without the interference caused by the sneak-path currents from other partially selected cells, as it results not only in unnecessary waste of energy but also in larger current requirements. The complementary resistive switch (CRS), consisting in two anti-serially connected memristors, is considered a potential solution to the sneak-path problem. However, the destructive read operation and reduced endurance of the CRS render it unattractive for the otherwise excellent candidate for next-generation crossbar-based non-volatile memories. In this paper we explore the feasibility and tradeoffs of configuring part of the CRS memory into a memristive mode to mitigate these limitations. The inherent locality of memory accesses for most computer programs offers an opportunity for designing a cache-like adaptive CRS-based crossbar memory with hybrid configurations of CRS and memristive modes, enabling optimization for both endurance and energy consumption. Our simulation results validate that the proposed hybrid system achieves 1.5-7x reduction in energy consumption in comparison with a memristive-only memory system and significantly improves the endurance of the CRS-based memory. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.2.2	NCODE: LIMITING HARMFUL WRITES TO EMERGING MOBILE NVRAM THROUGH CODE SWAPPING Speakers: Kan Zhong, Duo Liu, Linbo Long, Xiao Zhu, Weichen Liu, Qingfeng Zhuge and Edwin Sha, Chongqing University, CN Abstract Mobile applications are becoming more and more powerful but also dependent on large main memories, which consume a large portion of system energy. Swapping to byte-addressable, non-volatile memory (NVRAM) is a promising solution to this problem. However, most NVRAMs have limited write endurance. To make it practical, the design of an NVRAM based swapping system must also consider endurance. In this paper, we target at prolonging the lifetime of NVRAM based swap area in mobile devices. Different form traditional wisdom, such as wear leveling and hot/cold data identification, we propose to build a system called nCode, which exploits the fact that code pages are easy to identify, read-only, and therefore a perfect candidate for swapping. Utilizing NVRAM's byte-addressability, we support execute-in-place (XIP) of the code pages in the swap area, without copying them back to DRAM based main memory. Experimental results based on the Google Nexus 5 smartphone show that nCode can effectively prolong the lifetime of NVRAM under various workloads. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.2.3	SYSTEM LEVEL EXPLORATION OF A STT-MRAM BASED LEVEL 1 DATA-CACHE Speakers: Manu Komalan¹, Jose Ignacio Gomez², Christian Tenllado², Francisco Tirado Fernandez² and Francky Catthoor³ ¹imec, UCM(Universidad Complutense de Madrid), ES; ²Universidad Complutense de Madrid, ES; ³IMEC, BE Abstract Since Non-Volatile Memory (NVM) technologies are being explored extensively nowadays as viable replacements for SRAM based memories in LLCs and even L2 caches, we try to take stock of their potential as level 1 (L1) data caches. These NVMs like Spin Torque Transfer RAM(STT-MRAM), Resistive-RAM(ReRAM) and Phase Change RAM (PRAM) are not subject to leakage problems with technology scaling. They also show significant area gains and lower dynamic power consumption. A direct drop-in replacement of SRAM by NVMs is, however, still not feasible due to a number of shortcomings, with latency (write or read) and/or endurance/reliability among them being the major issues. STT-MRAM is increasingly becoming the NVM of choice for high performance and general purpose embedded platforms due to characteristics like low access latency, low power and long lifetime. With advancements in cell technology, and taking into account the stringent reliability and performance requirements for advanced technology nodes, the major bottleneck to the use of STT-MRAM in high level caches has become read latency (instead of write latency as previously believed). The main focus of this paper is the exploration of read penalty issues in a NVM based L1 Data cache (D-cache) for an ARM like single core general purpose system. We propose a design method for the STT-MRAM based D-cache in such a platform. This design addresses the adverse effects due to the STT-MRAM read penalty issues by means of micro-architectural modifications along with code transformations. According to our simulations, the proposed modifications can effectively reduce the performance penalty introduced by the NVM (initially ~54%) to extremely tolerable levels (~8%). Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	10.2.4	HIGH PERFORMANCE AXI-4.0 BASED INTERCONNECT FOR EXTENSIBLE SMART MEMORY CUBES Speakers: Erfan Azarkhish¹, Igor Loi¹, Davide Rossi¹ and Luca Benini² ¹Università di Bologna, IT; ²Università di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract The recent technological breakthrough represented by the Hybrid Memory Cube is on its way to improve bandwidth, power consumption, and density. This is while heterogeneous 3D integration has provided another opportunity for revisiting near memory computation to fill the gap between the processors and memories even further. In this paper, we take the first step towards a "Smart Memory Cube (SMC)", a fully backward compatible and modular extension to the standard HMC, supporting near memory computation on its Logic Base (LoB), through a high performance interconnect designed for this purpose. The main feature of SMC is the high bandwidth, low latency, and AXI-4.0 compatible interconnect. This interconnect is designed to serve the huge bandwidth demand by HMC's serial links, and to provide extra bandwidth to a processor-in-memory (PIM) embedded in the Logic Base (LoB). Our results obtained from cycle accurate simulation demonstrate that the interconnect can easily meet the demands of current and future projections of HMC (Up to 87GB/s READ bandwidth with 4 serial links and 16 memory vaults, and 175GB/s with 8 serial links and 32 memory vaults, for injected random traffic). Moreover, the interference between the PIM traffic and the main links was found to be negligible with execution time increase of less than 5%, and average memory access time increase of less than 15% when 56GB/s bandwidth is requested by the main links and 15GB/s bandwidth is delivered to the PIM port. Moreover, preliminary logic synthesis with Synopsys Design Compiler confirms that our interconnect is implementable and realistic. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-1, 673	TOWARDS SYSTEMATIC DESIGN OF 3D PNML LAYOUTS Speakers: Robert Perricone¹, Yining Zhu², Katherine Sanders¹, X. Sharon Hu¹ and Michael Niemier¹ ¹University of Notre Dame, US; ²Zhejiang University, CN Abstract Nanomagnetic logic (NML) is a ``beyond-CMOS'' technology that uses bistable magnets to store, process, and move binary information. Compared to CMOS, NML has several advantages such as non-volatility, lower power consumption, and radiation hardness. Recently, NML devices with perpendicular magnetic anisotropy (pNML) have been experimentally demonstrated to perform logic operations in three dimensions. 3D pNML layouts provide additional benefits such as simplified signal routing and greater integration density. However, designing functional 3D pNML circuits can be challenging as one must consider the effects of fringing magnetic fields in three dimensions. Furthermore, the current process of designing 3D pNML layouts is little more than a trial-and-error-based approach, which is infeasible for larger, more complex designs. In this paper, we propose a systematic approach to designing 3D pNML layouts. Our design process leverages a machine learning-inspired prediction approach that examines the effects of varying individual device parameters (e.g., length, width, etc.) and predicts functional configurations. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP5-2, 733	DESTINY: A TOOL FOR MODELING EMERGING 3D NVM AND EDRAM CACHES Speakers: Matt Poremba¹, Sparsh Mittal², Dong Li², Jeffrey Vetter³ and Yuan Xie⁴ ¹Pennsylvania State University, US; ²Oak Ridge National Lab, US; ³Oak Ridge National Lab and Georgia Institute of Technology, US; ⁴University of California, Santa Barbara, US Abstract The continuous drive for performance has pushed the researchers to explore novel memory technologies (e.g. non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) in the design of caches. However, a comprehensive tool which models both conventional and emerging memory technologies for both 2D and 3D designs has been lacking. We present DESTINY, a microarchitecture-level tool for modeling 3D (and 2D) cache designs using SRAM, embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM) and phase change RAM (PCM). DESTINY facilitates design-space exploration across several dimensions, such as optimizing for a target (e.g. latency or area) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a desired optimization target etc. DESTINY has been validated against industrial cache prototypes. We believe that DESTINY will drive architecture and system-level studies and will be useful for researchers and designers. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.3 Modern Architectures for Real-Time Systems

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Stendhal

Chair:
Benny Akesson, Czech Technical University in Prague, CZ

Co-Chair:
Rodolfo Pellizzoni, University of Waterloo, CA

Introducing modern architectures, such as multicore platforms, in real-time systems is challenging. The papers in this session make a contribution in this direction by discussing new scheduling techniques for parallel real-time tasks, multicore architectures, and mixed-criticality systems.

Time	Label	Presentation Title Authors
11:00	10.3.1	THE FEDERATED SCHEDULING OF CONSTRAINED-DEADLINE SPORADIC DAG TASK SYSTEMS Speaker: Sanjoy Baruah, The University of North Carolina at Chapel Hill, US Abstract In the federated approach to multiprocessor scheduling, a task is either restricted to execute upon a single processor (as in partitioned scheduling), or has exclusive access to any processor upon which it may execute. Earlier studies concerning the federated scheduling of task systems represented using the sporadic DAG model were restricted to implicit-deadline task systems; the research reported here extends this study to the consideration of task systems represented using the more general {em constrained/}-deadline sporadic DAG model. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.3.2	RUN AND BE SAFE: MIXED-CRITICALITY SCHEDULING WITH TEMPORARY PROCESSOR SPEEDUP Speakers: Pengcheng Huang, Pratyush Kumar, Georgia Giannopoulou and Lothar Thiele, Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract Mixed-Criticality systems are arising due to the push from several major industries including avionics and automotive, where functionalities with different safety criticality levels are integrated into a modern computing platform to reduce size, weight and energy. The state-of-the-art research has focused on protecting critical tasks under the threat of task overrun, which is achieved by killing less critical tasks or degrading their services to free system resources to guarantee critical tasks. We take in this paper a different approach to protecting critical tasks by embracing rich features of modern computing platforms. In particular, we explore dynamic processor speedup to aid the scheduling of mixed-criticality systems. We show that speedup in situation of overrun can not only help to protect the timeliness of critical tasks, but also to improve the degraded services for less critical tasks. Furthermore, we show that speedup is even more attractive as it can help the system to recover faster to normal operation. Thus, speedup could only be temporarily required and incur low cost. The proposed techniques are validated by both theoretical analysis and experimental results with an industrial flight management system and extensive simulations. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.3.3	MULTI-CORE FIXED-PRIORITY SCHEDULING OF REAL-TIME TASKS WITH STATISTICAL DEADLINE GUARANTEE Speakers: Tianyi Wang¹, Linwei Niu², Shaolei Ren¹ and Gang Quan¹ ¹Florida International University, US; ²West Virginia State University, US Abstract The rising performance variance of IC chips and increased resource sharing in multi-core platforms have significantly degraded the predictability of real-time systems. The traditional deterministic approaches can be extremely pessimistic, if not feasible at all. In this paper, we adopt a probabilistic approach for fixed-priority preemptive scheduling of real-time tasks on multi-core platforms with statistical deadline miss ratio guarantee. Rather than a single-valued worst-case execution time (WCET), we formulate the task execution time as a probabilistic distribution. We develop a novel algorithm to partition real-time tasks on multiple homogenous cores, which takes not only task execution time distributions but their period relationships into considerations. Our extensive experimental results show that our proposed methods can greatly improve the schedulability of real-time tasks when compared with the traditional bin packing approaches. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.4 Energy Aware Data Center: Design and Management

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Chartreuse

Chair:
Andrea Bartolini, Università di Bologna, IT / ETH Zürich, CH

Co-Chair:
Andreas Burg, École Polytechnique Fédérale de Lausanne (EPFL), CH

The session covers various topics in improving data center energy efficiency, from hardware acceleration, scheduling to cooling.

Time	Label	Presentation Title Authors
11:00	10.4.1	MEMORY FAST-FORWARD: A LOW COST SPECIAL FUNCTION UNIT TO ENHANCE ENERGY EFFICIENCY IN GPU FOR BIG DATA PROCESSING Speakers: Eunhyeok Park¹, Junwhan Ahn², Sungpack Hong³, Sungjoo Yoo¹ and Sunggu Lee¹ ¹POSTECH, KR; ²SNU, KR; ³Oracle, US Abstract Energy efficiency in big data processing is one of key issues in servers. Big data processing, e.g., graph computation and MapReduce, is characterized by massive parallelism in computation and a large amount of fine-grained random memory accesses often with structural localities due to graph-like data dependency. Recently, GPU is gaining more and more attention for servers due to its capability of parallel computation. However, the current GPU architecture is not well suited to big data workload due to the limited capability of handling a large number of memory requests. In this paper, we present a special function unit, called memory fast-forward (MFF) unit, to address this problem. Our proposed MFF unit provides two key functions. First, it supports pointer chasing which enables computation threads to issue as many memory requests as possible to increase the potential of coalescing memory requests. Second, it coalesces memory requests bound for the same cache block, often due to structural locality, thereby reducing memory traffics. Both pointer chasing and memory request coalescing contribute to reducing memory stall time as well as improving the real utilization of memory bandwidth, by removing duplicate memory traffics, thereby improving performance and energy efficiency. Our experiments with four graph computation algorithms and real graphs show that the proposed MFF unit can improve the energy efficiency of GPU in graph computation by average 54.6% at a negligible area cost. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.4.2	POWER MINIMIZATION FOR DATA CENTER WITH GUARANTEED QOS Speakers: Shuo Liu¹, Soamar Homsi¹, Ming Fan¹, Shaolei Ren¹, Gang Quan¹ and Shangping Ren² ¹Florida International University, US; ²Illinois Institute of Technology, US Abstract Data centers have been widely employed to offer reliable and agile on-demand web services. However, the dramatic increase of the operational cost, largely due to the power consumptions, has posed a significant challenge to the service providers as services expand in both scale and scope. In this paper, we study the problem of how to improve resource utilization and minimize power consumption in a data center with guaranteed quality-of-service (QoS). Different from a common approach that separates requests with different QoS levels on different servers, we devise an approach to pack requests of the same service --- even with different QoS requirements --- into the same server to improve resource usage. We also develop a novel method to improve the system utilization without compromising the QoS levels by removing potential failure requests. Experimental results show superiority of our approach over other widely applied approaches. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.4.3	ENERGY-AWARE COOLING FOR HOT-WATER COOLED SUPERCOMPUTERS Speakers: Christian Conficoni¹, Andrea Bartolini², Andrea Tilli¹, Gianpietro Tecchiolli³ and Luca Benini⁴ ¹Università di Bologna, IT; ²Università di Bologna, IT / ETH Zürich, CH; ³Eurotech, IT; ⁴Università di Bologna / ETH Zürich, IT Abstract Hot-water liquid cooling is a key technology in future green supercomputers as it maximizes the cooling efficiency and energy reuse. However the cooling system still is responsible for a significant percentage of modern HPC power consumption. Standard design of liquid-cooling control relies on rules based on worst-case scenarios, or on CFD simulation of portion of the entire system, which cannot account for all the real supercomputer working conditions (workload and ambient temperature). In this work we first introduce an analytical model, based on lumped parameters, which can effectively describe the cooling components and dynamics, and can be used for analysis and control purposes. We then use it to design an energy-optimal control strategy which is capable to minimize the pump and chiller power consumption while, meeting the supercomputer cooling requirements. We validate the method with simulation tests, taking data from a real HPC cooling mechanism, and comparing the results with state-of-the-art commercial cooling system control strategies. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-3, 786	BIG-DATA STREAMING APPLICATIONS SCHEDULING WITH ONLINE LEARNING AND CONCEPT DRIFT DETECTION Speakers: Karim Kanoun¹ and Mihaela van der Schaar² ¹École Polytechnique Fédérale de Lausanne (EPFL), CH; ²University of California, Los Angeles, US Abstract Several techniques have been proposed to adapt Big-Data streaming applications to resource constraints. These techniques are mostly implemented at the application layer and make simplistic assumptions about the system resources and they are often agnostic to the system capabilities. Moreover, they often assume that the data streams characteristics and their processing needs are stationary, which is not true in practice. In fact, data streams are highly dynamic and may also experience concept drift, thereby requiring continuous online adaptation of the throughput and quality to each processing task. Hence, existing solutions for Big-Data streaming applications are often too conservative or too aggressive. To address these limitations, we propose an online energy-efficient scheduler which maximizes the QoS (i.e., throughput and output quality) of Big-Data streaming applications under energy and resources constraints. Our scheduler uses online adaptive reinforcement learning techniques and requires no offline information. Moreover, our scheduler is able to detect concept drifts and to smoothly adapt the scheduling strategy. Our experiments realized on a chain of tasks modeling real-life streaming application demonstrate that our scheduler is able to learn the scheduling policy and to adapt it such that it maximizes the targeted QoS given energy constraint as the Big-Data characteristics are dynamically changing. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.5 Reconfigurable Architectures and Applications

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Meije

Chair:
Christian Plessl, University of Paderborn, DE

Co-Chair:
Enno Lübbers, Intel Labs Europe, DE

Reconfigurable computing has vast potential for enhancing the performance of applications especially when using architectural optimizations. This session has two papers that focus on architectural enhancements while the third demonstrates a hardware accelerated bioinformatics application

Time	Label	Presentation Title Authors
11:00	10.5.1	HYBRID ADAPTIVE CLOCK MANAGEMENT FOR FPGA PROCESSOR ACCELERATION Speakers: Alexandru Gheolbanoiu¹, Lucian Petrica¹ and Sorin Cotofana² ¹University POLITEHNICA of Bucharest, RO; ²Delft University of Technology, NL Abstract As FPGAs speed, power efficiency, and logic capacity are increasing, so does the number of applications which make use of FPGA processors. However, due to placement and routing constraints, FPGA processors instruction delay balancing is a real challenge, especially when the implementation approaches the FPGA resource capacity. Consequently, even though some instructions can operate at high frequencies, the slow instructions determine the processor clock period, resulting in the underutilisation of the processor potential. However, the fast instructions latent performance may be harnessed through Adaptive Clock Management (ACM), i.e., by dynamically adapting the clock frequency such that each instruction gets sufficient time for correct completion. Up to date, ACM augmented FPGA processors have been proposed based on Clock Multiplexing (CM), but they suffer from long clock switching delays, which could nullify most of the ACM potential performance gain. This paper proposes an effective FPGA tailored clock manipulation approach able to leverage the ACM potential. We first evaluate Clock Stretching (CS), i.e., the temporary clock period augmentation, as a CM alternative in FPGA processor designs and introduce an FPGA specific CS circuit implementation. Subsequently, we evaluate the advantages and drawbacks of the two techniques and propose a Hybrid ACM, which monitors the processor instruction stream and determines the optimal adaptive clocking strategy in order to provide the maximum speedup for the executing program. Given that CS has very low latency at the expense of limited accuracy and dynamic range we rely on it when the program requires frequent clock period changes. Otherwise we utilise CM, which is rather slow but enables the FPGA processor operation at the edge of its hardware capabilities. We evaluate our proposal on a vector processor mapped on a Xilinx Zynq FPGA. Our experiments indicate that on Sum of Squared Differences algorithm, Neural network, and FIR filter execution traces the hybrid ACM provides up to $14$\% performance increase over the CM based ACM. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.5.2	A SCALABLE AND HIGH-DENSITY FPGA ARCHITECTURE WITH MULTI-LEVEL PHASE CHANGE MEMORY Speakers: Chunan Wei, Ashutosh Dhar and Deming Chen, University of Illinois, Urbana-Champaign, US Abstract As CMOS technology is stretched to its limits it has become imperative to look to alternative solutions for the next generation of FPGAs. In particular, due to the configurable nature of FPGAs, on-chip memory remains to be a major concern for designers. In this work we explore the use of Phase-Change Memory (PCM). We exploit the ability of PCM to exist in multiple intermediate states to store 2 bits per cell and develop a new Look Up Table (LUT) architecture. The new LUT can either store two functions with shared inputs or a single function with an additional input. We also explore the use of PCM in local routing mechanisms and thus propose a new Configurable Logic Block (CLB) composed of CMOS and PCM. The new design promises significant improvements in logic density and performance with area improvements of over 40% for all LUT sizes and delay improvements of 7% to 13% on an average for LUTs of size 10 to 6 . Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.5.3	FPGA ACCELERATED DNA ERROR CORRECTION Speaker: Ashutosh Dhar, University of Illinois at Urbana-Champaign, US Authors: Anand Ramachandran, Yun Heo, Wen-mei Hwu, Jian Ma and Deming Chen, University of Illinois at Urbana-Champaign, US Abstract Correcting errors in DNA sequencing data is an important process that can improve the quality of downstream analysis using the data. Even though many error-correction methods have been proposed for Illumina reads, their throughput is not high enough to process data from large genomes. The current paper describes the first FPGA-based error-correction tool, called FPGA Accelerated DNA Error Correction (FADE), which targets to improve the throughput of DNA error correction for Illumina reads. The base algorithm of FADE is BLESS that is highly accurate but slow. A Bloom filter that is the main data structure of BLESS and BLESS' error correction subroutines for different types of errors have been implemented on a FPGA. We compared our design with the software version of BLESS using DNA sequencing data generated from four genomes and we could achieve up to 43 times speedup for the best case, and 36 times speedup on the average. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-4, 1007	(Best Paper Award Candidate) DESIGN FLOW AND RUN-TIME MANAGEMENT FOR COMPRESSED FPGA CONFIGURATIONS Speakers: Christophe Huriaux¹, Antoine Courtay¹ and Olivier Sentieys² ¹University of Rennes 1 - IRISA, FR; ²INRIA, FR Abstract The aim of partially and dynamically reconfigurable hardware is to provide an increased flexibility through the load of multiple applications on the same reconfigurable fabric at the same time. However, a configuration bit-stream loaded at runtime should be created offline for each task of the application. Moreover, modern applications use a lot of specialized hardware blocks to perform complex operations, which tends to cancel the "single bit-stream for a single application" paradigm, as the logic content for different locations of the reconfigurable fabric may be different. In this paper we propose a design flow for generating compressed configuration bit-streams abstracted from their final position on the logic fabric. Those configurations will then be decoded and finalized in real-time and at run-time by a dedicated reconfiguration controller to be placed at a given physical location. Our experiments show that densely routed applications gain the most with a compression factor of more than 2× using the finest cluster size, but coarser coding can be implemented to achieve a compression factor up to 10×. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.6 Circuit Design and Test: From Characterization to Measurement

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Bayard

Chair:
Salvador Mir, TIMA/CNRS, FR

Co-Chair:
Christoph Grimm, University of Kaiserslautern, DE

This session covers eye-diagram analysis for high-speed circuits, statistical digital library characterization, analog test ordering in the context of muti-site testing, and estimation of defect detection probability of analog test.

Time	Label	Presentation Title Authors
11:00	10.6.1	(Best Paper Award Candidate) FAST EYE DIAGRAM ANALYSIS FOR HIGH-SPEED CMOS CIRCUITS Speakers: Seyed Nematollah Ahmadyan¹, Chenjie Gu², Suriyaprakash Natarajan², Eli Chiprout² and Shobha Vasudevan¹ ¹University of Illinois at Urbana-Champaign, US; ²Intel, US Abstract We present an efficient technique for analyzing eye diagrams of high speed CMOS circuits in the presence of non-idealities like noise and jitter. Our method involves geometric manipulations of the eye diagram topology to find area within the eye contours. We introduce random tree based simulations as an approach to computing the desired area. We typically show $20X$ speedup in generating the eye diagram as compared to the state-of-the-art Monte Carlo simulation based eye diagram analysis. For the same number of samples, Monte Carlo produces an eye diagram that is $8.51\%$ smaller than the ideal eye diagram. We generate an eye diagram that is $53.52\%$ smaller than the ideal eye, showing a $47\%$ improvement in quality. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.6.2	STATISTICAL LIBRARY CHARACTERIZATION USING BELIEF PROPAGATION ACROSS MULTIPLE TECHNOLOGY NODES Speakers: Li Yu¹, Sharad Saxena², Christopher Hess², Ibrahim Elfadel³, Dimitri Antoniadis¹ and Duane Boning¹ ¹Massachusetts Institute of Technology, US; ²PDF Solutions, Inc, US; ³Masdar Institute of Science and Technology, AE Abstract In this paper, we propose a novel flow to enable computationally efficient statistical characterization of standard cell libraries. The distinguishing feature of the proposed method is the usage of a limited combination of output capacitance, input slew rate and supply voltage for the extraction of statistical timing metrics of an individual logic gate. The efficiency of the proposed flow stems from the introduction of a novel, ultra-compact, nonlinear, analytical timing model, having only four universal regression parameters. This novel model facilitates the use of maximum-a-posteriori belief propagation to learn the prior parameter distribution for the parameters of the target technology from past characterizations of library cells belonging to various other technologies, including older ones. The framework then utilizes Bayesian inference to extract the new timing model parameters using an ultra-small set of additional timing measurements from the target technology. The proposed method is validated and benchmarked on several production-level cell libraries including a state-of-the-art 14-nm technology node and a variation-aware, compact transistor model. For the same accuracy as the conventional lookup-table approach, this new method achieves at least 15x reduction in simulation runs. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.6.3	COMBINING ADAPTIVE ALTERNATE TEST AND MULTI-SITE Speaker: Gildas Leger, Instituto de Microelectronica de Sevilla, IMSE-CNM, (CSIC - Universidad de Sevilla), ES Abstract Testing analog, mixed-signal and RF circuits represents one of the main cost components for complex SoCs. Multisite Testing is widely accepted as a straightforward technique to reduce the effective test time. This paper shows that an adaptive Alternate Test approach can be compatible with a multisite strategy. The proposed solution consists in ordering offline the signatures acquisition sequence and training incremental regression models for each new feature. These models can be used to diagnose the circuit as good, provided that the estimate of the performance is larger than the specification plus a guard-band related to the model error. If all the sites are diagnosed as good, the test program can be halted before completion. This decision is taken on-line and makes this scheme adaptive. We provide an analytical study of the expected test time reduction and of the test escape penalty that is incurred. Results obtained from post-layout MonteCarlo simulations of an LNA demonstrate the validity of the approach and show that significant test time improvements can be obtained, even for large number of sites, whenever the manufacturing yield is sufficiently high. Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	10.6.4	A METHOD FOR THE ESTIMATION OF DEFECT DETECTION PROBABILITY OF ANALOG/RF DEFECT-ORIENTED TESTS Speakers: John Liaperdos¹, Angela Arapoyanni² and Yiorgos Tsiatouhas³ ¹Technological Educational Institute of Peloponnese, Dept of Computer Engineering, GR; ²National and Kapodistrian University of Athens, Dept. of Informatics and Telecommunications, GR; ³University of Ioannina, GR Abstract A method to realistically estimate the defect detection probability achieved by defect-oriented analog/RF integrated circuit tests at the circuit design level is presented in this paper. The proposed method also provides insight to the efficiency of the various available defect-oriented testing techniques, thus allowing the selection of the most suitable for a specific circuit. The effect of structural defects in the presence of process variations and device mismatches is taken into account, by the exploitation of the defect probability distributions and the statistical models of the used technology. Although the proposed methodology is generally applicable to the entire class of analog circuits, its application to simple RF circuits which consist of a few elements seems to be more practical, due to the affordable computational cost implied by circuits with shorter defect dictionaries. In order to obtain results without a reliability compromise, the number of required statistical simulation runs is reduced through regression. The application of the proposed method on a typical RF mixer, designed in a 0.18um CMOS technology, is also presented. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-5, 212	EMPIRICAL MODELLING OF FDSOI CMOS INVERTER FOR SIGNAL/POWER INTEGRITY SIMULATION Speakers: Wael Dghais and Jonathan Rodriguez, Instituto de Telecomunicações, PT Abstract This paper presents a multiport empirical model based on artificial neural network for I/O memory interface (e.g. inverter) designed based on fully depleted silicon on isolator (FDSOI) CMOS 28 nm process for signal and power integrity assessments. The analog mixed-signal identification signals that carry the information about the I/O interface's nonlinear dynamic behavior are recorded from large signal simulation setup. The model's functions are extracted based on a nonlinear optimization algorithm and then implemented in Simulink software. The performance of the resulted model is validated in typical power and ground switching noise scenario. The developed empirical model accurately predicts the timing signal waveforms at the power, ground, and at the output port. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP5-6, 1003	ON-CHIP MEASUREMENT OF BANDGAP REFERENCE VOLTAGE USING A SMALL FORM FACTOR VCO BASED ZOOM-IN ADC Speakers: Osman Erol¹, Sule Ozev¹, Chandra K. H. Suresh², Rubin Parekhji³ and Lakshmanan Balasubramanian³ ¹ASU, US; ²NYU-Abu Dhabi, AE; ³TI, IN Abstract A robust and highly scalable technique for measuring the output voltage of a band-gap reference (BGR) circuit is described. The proposed technique is based on an ADC architecture that uses a voltage controlled oscillator (VCO) for voltage to frequency conversion. During production testing, an external voltage reference is used to approximate the voltage/frequency characteristics of the VCO with 5ms test time. The proposed zoom-in ADC approach is manufactured with 0.5um single well CMOS process. Measurement results indicate that 13 bits of resolution within the measurement range can be achieved with the zoom-in approach. Worst-case INL for the ADC is less than 0.25LSB (50V). Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.7 Expanding the Applicability of Formal Methods

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Les Bans

Chair:
Barbara Jobstmann, École Polytechnique Fédérale de Lausanne (EPFL), CH

Co-Chair:
Christoph Scholl, University Freiburg, DE

The first two papers propose improved solutions for error diagnosis and software bounded model checking. The next two papers are devoted to abstraction and synthesis techniques between RTL and high-level models of on-chip communication networks. Then the first IP expands the applicability of equivalence checkers to asynchronous circuits. The last two IPs present emerging applications of model checking.

Time	Label	Presentation Title Authors
11:00	10.7.1	AUTOMATED RECTIFICATION METHODOLOGIES TO FUNCTIONAL STATE-SPACE UNREACHABILITY Speakers: Ryan Berryhill and Andreas Veneris, University of Toronto, CA Abstract In the modern design cycle significant manual resources are dedicated to fix a design when verification shows that a state is not reachable. As traditional debugging typically involves the use of an error trace that exhibits the problem, there is little automation to aid an engineer understand why a state is not reachable and how to correct this problem. This paper presents a novel methodology that automates this task. In detail, a process that involves intertwined steps of state approximation, reachability analysis and traditional debugging is developed to identify design locations where fixes can be applied so the target state becomes reachable. An initial formulation identifies such error locations when the target state is reachable directly from the reachable set of states. This is later extended for the cases where more than one state transition is required to reach an unreachable state from the existing reachable set. Empirical results on industrial level designs show a performance which is orders of magnitude faster than the state-of-the-art confirming the practicality of the proposed automated methodology. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	10.7.2	OVER-APPROXIMATING LOOPS TO PROVE PROPERTIES USING BOUNDED MODEL CHECKING Speakers: Priyanka Darke, Bharti Chimdyalwar, Venkatesh R, Ulka Shrotri and Ravindra Metta, TCS, IN Abstract Bounded Model Checkers (BMCs) are widely used to detect violations of program properties up to a bounded execution length of the program. However when it comes to proving the properties, BMCs are unable to provide a sound result for programs with loops of large or unknown bounds. To address this limitation, we developed a new loop over-approximation technique LA. LA replaces a given loop in a program with an abstract loop having a smaller known bound by combining the techniques of output abstraction and a novel abstract acceleration, suitably augmented with a new application of induction. The resulting transformed program can then be fed to any bounded model checker to provide a sound proof of the desired properties. We call this approach, of LA followed by BMC, as LABMC. We evaluated the effectiveness of LABMC on some of the SV-COMP14 loop benchmarks, each with a property encoded into it. Well known BMCs failed to prove most of these properties due to loops of large, infinite or unknown bounds while LABMC obtained promising results. We also performed experiments on a real world automotive application on which the well known BMCs were able to prove only one of the 186 array accesses to be within array bounds. LABMC was able to successfully prove 131 of those array accesses to be within array bounds. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	10.7.3	AUTOMATIC EXTRACTION OF MICRO-ARCHITECTURAL MODELS OF COMMUNICATION FABRICS FROM REGISTER TRANSFER LEVEL DESIGNS Speakers: Sebastiaan Joosten and Julien Schmaltz, Eindhoven University of Technology, NL Abstract Multi-core processors and Systems-on-Chips are composed of a large number of processing and memory elements interconnected by complex communication fabrics. These fabrics are large systems made of many queues and distributed control logic. Recent studies have demonstrated that high levels models of these networks are either tractable for verification or can provide key invariants to improve hardware model checkers. Formally verifying Register Transfer Level (RTL) designs of these networks is an important challenge, yet still open. This paper bridges the gap between high level models and RTL designs. We propose an algorithm that from a Verilog description automatically produces its corresponding micro-architectural model. We prove that the extracted model is transfer equivalent to the original RTL circuit. We illustrate our approach on a typical example of communication fabrics: a scoreboard with credit-flow control. Download Paper (PDF; Only available from the DATE venue WiFi)
12:15	10.7.4	GALS SYNTHESIS AND VERIFICATION FOR XMAS MODELS Speakers: Frank Burns, Danil Sokolov and Alex Yakovlev, Newcastle University, GB Abstract In this paper a novel Globally Asynchronous Locally Synchronous~(GALS) synthesis and verification environment is introduced for xMAS models. xMAS models are a new communication paradigm which can be used to model circuits and networks for the purpose of synthesis, testing and verification. Previous attempts at synthesis and verification of xMAS models have been proposed for synchronous implementations only. This paper provides an extension of xMAS and translation into Circuit Petri net models for GALS synthesis and verification. Synthesis techniques based on Circuit Petri net translation are presented and a new xMAS component is introduced which acts as a wrapper for different GALS styles. Novel verification techniques using unfolding to occurrence nets are then proposed. Our results show that the work presented here provides a suitable platform for integrating xMAS into a GALS environment. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP5-7, 299	LOGICAL EQUIVALENCE CHECKING OF ASYNCHRONOUS CIRCUITS USING COMMERCIAL TOOLS Speakers: Arash Saifhashemi¹, Hsin-Ho Huang², Priyanka Bhalerao³ and Peter Beerel² ¹Intel, US; ²University of Southern California, US; ³yahoo, US Abstract We propose a method for logical equivalence check (LEC) of asynchronous circuits using commercial synchronous tools. In particular, we verify the equivalence of asynchronous circuits which are modeled at the CSP-level in SystemVerilog as well as circuits modeled at the micro-architectural level using conditional communication library primitives. Our approach is based on a novel three-valued logic model that abstracts the detailed handshaking protocol and is thus agnostic to different gate-level implementations, making it applicable to a variety of different design styles. Our experimental results with commercial LEC tools on a variety of computational blocks and an asynchronous microprocessor demonstrate the applicability and limitations of the proposed approach. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP5-8, 219	MAY-HAPPEN-IN-PARALLEL ANALYSIS OF ELECTRONIC SYSTEM LEVEL MODELS USING UPPAAL MODEL CHECKING Speakers: Che-Wei Chang and Rainer Doemer, University of California Irvine, US Abstract In this paper, we propose an approach for May- Happen-in-Parallel (MHP) analysis of electronic system level (ESL) design which models parallel discrete event simulation with concurrent automaton processes and formally identify those MHP states. Our MHP analysis utilizes formal verification by use of the UPPAAL model checker. The proposed approach converts the system model in SpecC SLDL into an UPPAAL model and generates a set of queries that automatically and completely finds all possible MHP pairs. The experimental results show our approach can report more precise MHP analysis results compared to other works at the cost of extended analysis run time. Download Paper (PDF; Only available from the DATE venue WiFi)
12:32	IP5-9, 583	VERIFYING SYNCHRONOUS REACTIVE SYSTEMS USING LAZY ABSTRACTION Speakers: Kumar Madhukar¹, Mandayam Srivas², Bjorn Wachter³, Daniel Kroening³ and Ravindra Metta¹ ¹Tata Research Development and Design Center, IN; ²Chennai Mathematical Institute, IN; ³University of Oxford, GB Abstract Embedded software systems are frequently modeled as a set of synchronous reactive processes. The transitions performed by the processes are given as sequential, atomic code blocks. Most existing verifiers flatten such programs into a global transition system, to be able to apply off-the-shelf verification methods. However, this monolithic approach fails to exploit the lock-step execution of the processes, severely limiting scalability. We present a novel formal verification technique that analyses synchronous concurrency explicitly rather than encoding it. We present a variant of Lazy Abstraction with Interpolants (LAWI), a technique successfully used in software verification, and tailor it to synchronous reactive concurrency. We exploit the synchronous communication structure by fixing an execution schedule, circumventing the exponential blow-up of state space caused by simulating synchronous behaviour by means of interleavings. The technique is implemented in SYMPARA, a verification tool for synchronous reactive systems. To evaluate the effectiveness of our technique, we compare SYMPARA with Bounded Model Checking and k-induction, and a LAWI-based verifier for multi-threaded (asynchronous) software. On several realistic examples SYMPARA outperforms the other tools by an order of magnitude. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

10.8 From IP to EDA Tools Enterprise Management: What is so special?

Date: Thursday 12 March 2015
Time: 11:00 - 12:30
Location / Room: Salle Lesdiguières

Moderator:
Gabrièle Saucier, Design and Reuse, FR

Panelists:
Huy-Nam Nguyen, Bull S.A.S., FR
Philippe Quinio, STMicroelectronics International, CH

IPs are today part of any Electronic Systems and it is more and more urgent to trace, monitor and more generally "manage" IPs in systems or products. The key feature of such a management is its multidisciplinary facet implying multiple management views and actors (IP Engineering, IP Sourcing, IP Procurement..).

Today an IP management platform needs to be a next generation web application hosted on an intranet server and receiving data from multiple sources (Design DB, IP Delivery DB, Product Shipment DB, Legal and Financial reports..). It aims at providing a reliable follow up to all of these departments such as IP Entry, IP Delivery, IP tracing in products… It delivers results (fee and royalty calculation for instance) as well as expertise for decision making (planning the future in terms of IP expenses, cost per product..).

The introductory talk will show how such a portal can be configured to fulfill the needs of an enterprise and what are the required "special" technical features missing in management tools presently available on the market.

Specific views namely Engineering view and Legal aspects will be commented by 2 speakers from companies veteran in IP management.

It will also be demonstrated that an amazing and straightforward extension concerns EDA Tool license management and optimization including integrated license monitoring. Such an extension aims at optimizing the tool cost for large enterprises using extensively and at a large scale a variety of development tools and gives an unique corporate global view on IP and Tools.

Time	Label	Presentation Title Authors
11:00	10.8.1	IP AND EDA TOOL NEXT GENERATION MANAGEMENT PLATFORM: WHAT ARE THE FEATURES REQUIRED? Panelist: Gabrièle Saucier, Design and Reuse, FR
11:30	10.8.2	BEST PRACTICE: THE IP QUALIFICATION VIEW Panelist: Huy-Nam Nguyen, Bull S.A.S., FR
11:45	10.8.3	BUSINESS AND LEGAL: THE SOURCING RISK Panelist: Philippe Quinio, STMicroelectronics International, CH
12:05	10.8.4	QUESTIONS AND AUDIENCE COMMENTS
12:30		End of session Lunch Break, Keynote lecture from 1320 - 1350 (Room Oisans) in Les Écrins Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

UB10 Session 10

Date: Thursday 12 March 2015
Time: 12:00 - 14:30
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB10.1	COMBINATION OF WSN AND 1ST ORDER KINETIC MODEL FOR REAL-TIME SHELF-LIFE PREDICTION OF PERISHABLE GOODS Presenter: Valerio Francesco Annese, Politecnico di Bari, IT Author: Daniela De Venuto, Politecnico di Bari, IT Abstract A complete and autonomous multi-sensing platform for perishable goods monitoring and shelf-life prediction, based on the combination of the wireless sensor network (WSN) technology and a further real-time data processing, is presented. The proposed approach offers an effective solution for waste and losses reduction in the supply chain of perishable products and, thus, an improvement of food safety, as well as food organoleptic qualities: in fact, we demonstrate the possibility to predict products shelf-life from the environment parameters such as temperature, relative humidity and light exposition in real-time. Although several models for shelf-life prediction have been already developed, none of them was embedded in a complete system supported by the real-time data availability, offered by an "ad hoc" WSN. In our infrastructure, system integration issues are carefully solved: data collected by the WSN are firstly uploaded on a cloud. An appropriate Java application makes these data available to any kind of elaboration. Then, we developed an algorithm that implements a 1st order kinetic model of the quality decay reaction, employed to evaluate remaining shelf-life of the monitored perishable product. The model takes into account the dependence of the degradation rate from the temperature according to Arrhenius law. To validate the platform we have conducted several case studies. Here we propose an 8-days monitoring of a warehouse of vegetable products (fresh tomatoes): the real-time shelf-life prediction was calculate through data coming from six multi-sensing nodes that were monitoring several environmental conditions in which the products were subjected. The implementation of the algorithm in an application for any kind of portable and non-portable devices (just like an iPad, smartphones, etc.) would result in a widespread diffusion of this technology. It is worth to notice that the complete infrastructure is a suitable low-cost and easy to implement solution for monitoring any perishable product (such as beverages, drugs, vaccines, blood, etc.) stored in any environmental condition (warehouse, transportation, store, etc.). More information ...
UB10.2	NETFPGA SUME: MAKING 100GBPS A COMMODITY Presenter: Noa Zilberman, University of Cambridge, GB Authors: Yury Audzevich, Georgina Kalogeridou and Andrew W. Moore, University of Cambridge, GB Abstract The demand-led growth of datacenter networks has meant that many constituent technologies are beyond the budget of the wider community. In order to make and validate timely and relevant new contributions, the wider community requires accessible evaluation, experimentation and demonstration environments with specification comparable to the subsystems of the most massive datacenter networks. We will demonstrate NetFPGA SUME, an open-source FPGA-based PCIe board with I/O capabilities for 100Gbps operation as NIC, multiport switch, firewall, or test/measurement environment. More information ...
UB10.3	LINUX ON TSAR: PORTING THE LINUX KERNEL TO THE TSAR MANYCORE ARCHITECTURE Presenter: César Fuguet Tortolero, UPMC-LIP6, FR Authors: Joël Porquet and Alain Greiner, UPMC-LIP6, FR Abstract In this demonstration, we explain how we ported a Linux-based Operating System to the TSAR manycore architecture. In the associated poster, we describe the TSAR architecture and enumerate the pieces of software that usually need to be ported for a new processor architecture, and we give further details about our port. We also demonstrate this work by running Linux on a FPGA-based prototype of TSAR. The demo shows the entire boot process, from the powerup to the terminal prompt where the user can type in commands and interact with the hardware system. More information ...
UB10.4	DESIGNING AND EVALUATING RESOURCE MANAGEMENT POLICIES FOR HETEROGENEOUS SYSTEM ARCHITECTURES Presenter: Gianluca Durelli, Politecnico di Milano, IT Authors: Cristiana Bolchini, Antonio Miele, Gabriele Pallotta, Marcello Pogliani and Marco Santambrogio, Politecnico di Milano, IT Abstract Current trends in computing architectures are going in the direction of heterogeneous systems (i.e. constituted by CPUs, GPUs, and FPGAs). The design space to effectively exploit these platforms is huge. Within this context, research is moving towards systems able to adapt themselves to a wide range of workloads to optimize performance/energy trade-offs. We propose a virtual platform (VP) to help designers to develop adaptive policies. The VP allows to perform an high-level evaluation of the policies with the possibility to customize both the architecture and the workload mix. More information ...
UB10.5	REAL-TIME PATTERN DETECTION OF MOVEMENT RELATED POTENTIALS BY SYNCHRONIZED EEG AND EMG Presenter: Valerio Francesco Annese, Politecnico di Bari, IT Author: Daniela De Venuto, Politecnico di Bari, IT Abstract Before the conscious intention to perform any voluntary movement, our brain has already activated the action, 1s before the muscle activity actually starts. The brain processes are necessary to determine the performance of the movement itself. The presence of both the premotor potential (also called "Bereitschaftspotential" or "Readiness Potential" in the 2-5 Hz band) and the Mu-rhythm (in the 7-12 Hz) is particularly interesting for the detection of voluntary movement. Therefore, the detection of these movements' related potentials (MRPs), before the EMG activation, indicates the movement intentionality. Due to the presence of artifacts (blinking, eye movement, swallowing etc.) that spoil EEG signals, the real-time detection of MRPs is particularly challenging. In this proposal, we describe a complete wearable system performing synchronous EEG and EMG monitoring to on-line detect MRPs and prevent unintentional and dangerous movements. The Bereitschaftspotential (BP) and Mu-rhythm detection is carried out through a wavelet analysis on differential signals captured 1-second before the EMG activation. This differential approach allows discerning if the recorded EEG activity is related to the motor cortex or if it is just a common artifact. The EEG/EMG monitoring system can face the strict requirements of ambient assisted living application (AAL), taking care of aged and disable people in a domestic environment. Specifically for this application, the system can be configured as following: data from 12 EEG channels are firstly collected in a central unit that wirelessly communicates with the gateway (24 bit resolution - 500 Hz sampling rate), the gateway receives data also from each of the 8 EMG nodes (12-bit resolution, 500 Hz sampling rate). For a comfortably use, a battery life of - at least - 10 hours, have to be implemented. Moreover, a working range of 10 meters (between nodes and gateway) is considered. Above all, the requirement of wearability is achieved by the transfer printing technology, produced using photolithography and dry etch techniques, that allows the creation of wireless, tiny and lightweight electrodes for both EEG and EMG printed on bio-polymers (Polycaprolactone). Since a huge amount of retrieved data is expected, a data rate of 250 kbps (~31 kBps) is needed: a good compromise in terms of power consumption and data rate is achieved through the standard IEEE- 802.15.1 (Bluetooth low energy -BLE). The gateway unit (a smartphone or a tabled) receives the EEG and EMG sensor data and performs signal analysis to identify possible MRPs patterns through wavelet analysis. In this contribute it will be delineated as case study the possible implementation in fall prevention where not only the unwanted muscle movement is detected but also a bio-feedback is activated to block the muscle and inform an assistive center. Nevertheless, the field of application of the system here presented covers a wide range of AAL applications including fall prevention, rehabilitation (i.e. walk monitoring), artificial limb control and neurodegenerative diseases diagnosis. More information ...
UB10.6	REAL-TIME MULTIPROCESSOR COMPILER DEMO: COMPILER FOR REAL-TIME MULTIPROCESSOR SYSTEMS WITH SHARED ACCELERATORS Presenter: Marco Bekooij, University of Twente, NL Authors: Guus Kuiper, Stefan Geuns, Philip Wilmanns, Joost Hausmans and Marco Bekooij, University of Twente, NL Abstract Accelerators are added in real-time multiprocessor systems for power-efficiency improvement and cost reduction. Sharing of these accelerators improves their utilization but without tool support it also complicates programming. This demonstration shows a multiprocessor compiler for a real-time multiprocessor system that contains support for the sharing of hardware accelerators. The capabilities of this compiler are demonstrated by mapping a packet based GMSK receiver application onto this multiprocessor system. The multiprocessor system is implemented on a Xilinx Virtex-6 FPGA to which an RF front-end is connected. This multiprocessor system contains 16 Microblaze processors and 5 accelerators. With this system a real-time digital audio stream is received and demodulated. More information ...
UB10.7	THE Ψ-CHART DESIGN APPROACH IN TTOOL/DIPLODOCUS: A FRAMEWORK FOR HW/SW CO-DESIGN OF DATA-DOMINATED SYSTEMS-ON-CHIP Presenter: Andrea Enrici, Télécom ParisTech, FR Authors: Ludovic Apvrille, Daniel Camara and Renaud Pacalet, Télécom ParisTech, FR Abstract In the scope of the DATE 2015 University Booth, we present our latest achievements for the system level design of parallel and distributed embedded systems. We propose a demonstration of a novel design approach, the Ψ-chart, in TTool/DIPLODOCUS, a UML/SysML framework for the design, validation and automatic code generation for data-dominated SoCs. The Ψ-chart is a design approach where communication patterns are designed with dedicated models, independently of a pair application-architecture, before mapping phase. It allows for a complete orthogonalization of concerns between the design of computations and communications, thus achieving faster Design Space Exploration, complete design portability as well as reduced design times and costs. The subject of our demonstration is the design of the physical layer (PHY) of the transmitter part of the Zigbee wireless standard (IEEE 802.15.4) mapped onto a MPSoC architecture with shared memory. Our demonstration will illustrate the full design of the Zigbee transmitter, from models to the automatic generation of the emulation code, via simulation and formal verification. We will validate our design by comparing the output samples produced by the emulation code, with a real implementation of the transmitter on a FPGA prototyping board. More information ...
UB10.8	INTERACTIVE VISUALIZATION OF ESL DESIGNS Presenter: Jannis Stoppe, University of Bremen, DE Authors: Robert Wille and Rolf Drechsler, University of Bremen/DFKI GmbH, DE Abstract In this work, we propose an improved visualization tool for SystemC which assists a designer in communicating a system's structure and behavior. Please see the uploaded pdf-file for details. More information ...
UB10.9	MAMMA: SPEECH ENHANCEMENT DEMO EXPLOITING MEMS MICROPHONE ARRAY FOR PEOPLE WITH DISABILITIES Presenter: Luca Sarti, University of Pisa, IT Authors: Alessandro Palla¹, Luca Fanucci¹ and Roberto Sannino² ¹University of Pisa, IT; ²STMicroelectronics, IT Abstract Disabled people, especially the ones with motor skill impairments, have difficulties in interaction with electronic devices. Indeed voice recognition could be exploited, but its performance strongly depends by the environmental noise. We propose a wearable speech enhancement system based on MEMS microphone array and an ARM Cortex M4 CPU featuring a beamforming technique and an adaptive acoustic echo cancellation filtering in order to increase SNR of acquired voice stream. An increase by 16.5 dB in the SNR is obtained when noise and voice come from opposite directions. Theoretical analysis and in-system measurements prove the effectiveness of the proposed solution. More information ...
14:30	End of session
15:30	Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.0 SPECIAL DAY Keynote

Date: Thursday 12 March 2015
Time: 13:15 - 14:00
Location / Room: Salle Oisans

Chair:
Jo De Boeck, IMEC, BE

Co-Chair:
David Atienza, École Polytechnique Fédérale de Lausanne (EPFL), CH

Time	Label	Presentation Title Authors
13:15	11.0.1	BEST IP AWARD PRESENTATION Speaker: Oliver Bringmann, University of Tuebingen / FZI, DE
13:20	11.0.2	BIOELECTRONIC MEDICINES - HERALDING IN A NEW THERAPEUTIC APPROACH Speaker: Kristoffer Famm, GSK, GB Abstract Imagine a day when electrical impulses are a mainstay of medical treatment, a day when your doctor will routinely administer microscopic devices that modulate signals in specific nerves for treatment effect. Every organ in our bodies is wired and controlled by nerves, so bioelectronic medicines may be applicable across a broad range of diseases just like molecular medicines are today. Through bioelectronic medicines, GSK, a leading pharmaceutical company, and its extensive network of research collaborators aim to bring the precision and intelligence of electronics right to the core of future treatments.
14:00		End of session
15:30		Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.1 SPECIAL DAY Hot Topic: Implantable Medical Applications

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Salle Oisans

Organiser:
Jo De Boeck, IMEC, BE

Chair:
Refet Firat Yazicioglu, IMEC, BE

Co-Chair:
Jean-Paul Linnartz, Philips, NL

Implantable devices obviously have stringent technical requirements dictated by the specific functionality in the body. This session brings expert views from industry leaders in the field and insight in the challenges for integrated circuits in emerging biomedical devices.

Time	Label	Presentation Title Authors
14:00	11.1.1	ACTIVE IMPLANTABLE MEDICAL DEVICES Speaker: Renzo Dal Molin, Sorin Group, FR Abstract Saving lives with cutting-edge technology is at the heart of active implantable medical devices. The introduced technology innovations are developed targeting both patients and health economic outcomes, they also contribute to a more efficient healthcare.
14:30	11.1.2	TOWARDS NEXT GENERATION DEEP BRAIN STIMULATION Speaker: Michael Decré, Medtronic Eindhoven Design Center, NL Abstract Deep Brain Stimulation (DBS) should soon reach a profound transformation step with the inclusion of high-resolution probes and active sensing. This will allow very precise brain modulation, enable chronic disease monitoring and finally open up closed-loop stimulation research. Driven by therapeutic improvements, Medtronic is actively pursuing its innovation agenda for the future of DBS.
15:00	11.1.3	INTEGRATED CIRCUITS AND MICROSYSTEMS FOR EMERGING BIOMEDICAL DEVICES Speaker: Minkyu Je, DGIST, Daegu Gyeongbuk Institute of Science and Technology, KR Abstract IC technologies and integrated microsystems enable emerging biomedical devices by providing seamless interface to various sensors and actuators, high-efficiency operation with various energy sources (especially, renewable ones), high-level integration and miniaturization, embedded intelligence, and connectivity.
15:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.2 Variability and Robustness for Emerging Technologies

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Belle Etoile

Chair:
Edith Beigne, CEA-Leti, FR

Co-Chair:
Andy Tyrrell, University of York, GB

Issues relating to smaller device sizing and novel technologies require more consideration of variability when designing systems and the related robustness of such systems. The first paper in this session considers the modelling of resistive switching random access memory and used in a number of designs to illustrate various properties and characteristics of such devices, including speed-power performance, variability and a neuromorphic computing application. The second paper introduces methods for improving the performance of Spin-Torque Transfer RAM (STTRAM) to reduce worst-case write latency and improve power over more global methods. The third paper proposes a joint optimization of the reliability at device circuit and architecture level. The device level is mainly considered through the energy barrier, circuit level through the transistor controlling the writing current and the architecture level through the error code correction scheme complexity. The final paper in the session compare sub-10nm node TFETs against the projected FinFETs of the same node at 0.25V in both inverter chains and in synthesizing a LEON3 processor.

Time	Label	Presentation Title Authors
14:00	11.2.1	VARIATION-AWARE, RELIABILITY-EMPHASIZED DESIGN AND OPTIMIZATION OF RRAM USING SPICE MODEL Speakers: Haitong Li¹, Zizhen Jiang², Peng Huang³, Yi Wu², Hong-Yu Chen², Bin Gao³, Xiaoyan Liu³, Jinfeng Kang³ and H.-S. Philip Wong² ¹Stanford University & Peking University, ; ²Stanford University, ; ³Peking University, Abstract Resistive switching random access memory (RRAM) is a leading candidate for next-generation nonvolatile and storage-class memories and monolithic integration of logic with memory interleaved in multiple layers. To meet the increasing need of device-circuit-system co-design and optimization for applications from digital memory systems to brain-inspired computing systems, a SPICE model of RRAM that can reproduce essential device physics in a circuit simulation environment is required. In this work, we develop an RRAM SPICE model that can capture all the essential device characteristics such as stochastic switching behaviors, multi-level cell, switching voltage variations, and resistance distributions. The model is verified and calibrated by a variety of electrical measurements on ~10 nm RRAMs. The model is applied to explore a wide range of applications including: 1) variation-aware design; 2) reliability-emphasized design; 3) speed-power assessment; 4) array architecture optimization; and 5) neuromorphic computing. This experimentally verified design tool not only enables system design that includes the complete suite of RRAM device features, but also provides solutions for system optimization that capitalize on device/circuit interaction. Download Paper (PDF; Only available from the DATE venue WiFi)
14:30	11.2.2	IMPACT OF PROCESS-VARIATIONS IN STTRAM AND ADAPTIVE BOOSTING FOR ROBUSTNESS Speakers: Seyedhamidreza Motaman, Swaroop Ghosh and Nitin Rathi, University of South Florida, US Abstract Spin-Torque Transfer Random Access Memory (STTRAM) is a promising technology for high density on-chip cache due to low standby power. Additionally, it offers fast access time, good endurance and retention. However, it suffers from poor write latency and write power. Additionally we observe that process variation can result in large spread in write and read latency variations. The performance of conventionally designed STTRAM cache can degrade as much as 10% due to process variations. We propose a novel and adaptive write current boosting to address this issue. The bits experiencing worst-case write latency are fixed through write current boosting. Simulations show 80% power improvement compared to boosting all bit-cells and 13% performance improvement compared to worst case latency due to process variation over a wide range of PARSEC benchmarks. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	11.2.3	DEVICE/CIRCUIT/ARCHITECTURE CO-DESIGN OF RELIABLE STT-MRAM Speakers: Zoha Pajouhi, Xuanyao Fong and Kaushik Roy, Purdue University, US Abstract Spin transfer torque magnetic random access memory (STT-MRAM), using magnetic tunnel junctions (MTJ) has garnered significant attention in the research community due to its immense potential for on-chip, high-density and non-volatile memory. However, process variations may significantly impact the achievable yield in STT-MRAM. To this end, several yield enhancement techniques that improve STT-MRAM failures at the bit-cell, and at the architecture level of design abstraction have been proposed in the literature. However, these techniques may lead to a suboptimal design because they do not consider the impact of design choices at every level of design abstraction. In this paper, we propose a unified device-circuit-architecture co-design framework to optimize and enhance the yield of STT-MRAM. We studied the interaction between device parameters (viz. energy barrier height) and bit-cell level parameters (viz. transistor width), together with different Error Correcting Codes (ECC) to optimize the robustness and energy efficiency of STT-MRAM cache. The advantages of our proposed approach to STT-MRAM design are explored at the 32nm technology node. We show that for a target yield of 500 Defects Per Million (DPM) for an example array with 64-bit word length, our proposed ap-proach with realistic parameters can save up to 15% and 13% in cell area and total power consumption, respectively, in compari-son with a design that does not use any array level yield en-hancement technique. Download Paper (PDF; Only available from the DATE venue WiFi)
15:15	11.2.4	SUB-10 NM FINFETS AND TUNNEL-FETS: FROM DEVICES TO SYSTEMS Speakers: Ankit Sharma, Arun Goud Akkala and Kaushik Roy, Purdue University, US Abstract In this paper, a detailed device/circuit/system level assessment of sub-10nm GaSb-InAs Tunneling Field Effect Transistors (TFET) versus Silicon FinFETs operating at near-threshold voltages is reported. A source underlapped GaSb-InAs TFET is used to achieve lower subthreshold swings than previously reported TFETs and an analytical justification is provided to explain the observed improvement. Through atomistic, 2D ballistic simulations using self-consistently, coupled Non-equilibrium Green's Function (NEGF)-Poisson approach, GaSb-InAs TFET and Silicon FinFET device characteristics are derived from which compact models are extracted for SPICE simulations. Circuit simulations of a 6-stage inverter chain show that sub-10nm underlapped TFETs are especially suited for near-threshold computing because of their ability to achieve higher throughput while consuming ~100x lower power compared to Si FinFETs. To analyze the suitability of sub-10 nm TFETs for medium-throughput and ultra-low power applications in future very large scale integrated designs, a LEON3 processor is synthesized at VDD=0.25V. The impact of interconnect parasitics on the performance of TFETs is considered by studying the power-performance of the LEON3 under varying wire-load conditions. Under moderate interconnect parasitics, TFETs-based processor is shown to exhibit more than 50% power reduction compared to FinFETs. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	IP5-10, 460	SPINTASTIC: SPIN-BASED STOCHASTIC LOGIC FOR ENERGY-EFFICIENT COMPUTING Speakers: Rangharajan Venkatesan¹, Swagath Venkataramani², Xuanyao Fong², Kaushik Roy² and Anand Raghunathan² ¹NVIDIA Corporation, US; ²Purdue University, US Abstract Spintronics is one of the leading technologies under consideration for the post-CMOS era. While spintronic memories have demonstrated great promise due to their density, non-volatility and low leakage, efforts to realize spintronic logic have been much less fruitful. Recent studies project the performance and energy efficiency of spintronic logic to be considerably inferior to CMOS. In this work, we explore Stochastic Computing (SC) as a new direction for the realization of energy-efficient logic using spintronic devices. We establish the synergy between stochastic computing and spintronics by demonstrating that (i) the peripheral circuits required for SC to convert to/from stochastic domains, which incur significant energy overheads in CMOS, can be efficiently realized by exploiting the characteristics of spintronic devices, and (ii) the low logic complexity and fine-grained parallelism in SC circuits can be leveraged to alleviate the shortcomings of spintronic logic. We propose SPINTASTIC, a new design approach in which all the components of stochastic circuits — stochastic number generators, stochastic arithmetic units, and stochastic-to-binary converters — are realized using spintronic devices. Our experiments on a range of benchmarks from different application domains demonstrate that SPINTASTIC achieves 2.8X improvement in energy over CMOS stochastic implementations and 1.9X over a CMOS binary baseline. Download Paper (PDF; Only available from the DATE venue WiFi)
15:31	IP5-11, 520	LEAKAGE POWER REDUCTION FOR DEEPLY-SCALED FINFET CIRCUITS OPERATING IN MULTIPLE VOLTAGE REGIMES USING FINE-GRAINED GATE-LENGTH BIASING TECHNIQUE Speakers: Ji Li, Qing Xie, Yanzhi Wang, Shahin Nazarian and Massoud Pedram, University of Southern California, US Abstract With the aggressive downscaling of the process technologies and importance of battery-powered systems, reducing leakage power consumption has become one of the most crucial design challenges for IC designers. This paper presents a device-circuit cross-layer framework to utilize fine-grained gate-length biased FinFETs for circuit leakage power reduction in the near- and super-threshold operation regimes. The impacts of Gate-Length Biasing (GLB) on circuit speed and leakage power are first studied using one of the most advanced technology nodes - a 7nm FinFET technology. Then multiple standard cell libraries using different leakage reduction techniques, such as GLB and Dual-VT, are built in multiple operating regimes at this technology node. It is demonstrated that, compared to Dual-VT, GLB is a more suitable technique for the advanced 7nm FinFET technology due to its capability of delivering a finer-grained trade-off between the leakage power and circuit speed, not to mention the lower manufacturing cost. The circuit synthesis results of a variety of ISCAS benchmark circuits using the presented GLB 7nm FinFET cell libraries show up to 70% leakage improvement with zero degradation in circuit speed in the near- and super-threshold regimes, respectively, compared to the standard 7nm FinFET cell library. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.3 Hot Topic - Multi/Many-Core Programming: Where Are We Standing?

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Stendhal

Organisers:
Rainer Leupers, RWTH Aachen, DE
Jeronimo Castrillon, Technische Universität Dresden, DE

Chair:
Norbert Wehn, University of Kaiserslautern, DE

Co-Chair:
Ayse K. Coskun, Boston University, US

Multi-processor systems have been in wide use for about ten years. During this time, several programming models have appeared in different domains. In particular, the academic community has been active in devising methods, often model-driven, to program heterogeneous embedded multi-processor platforms. This session analyzes the current standing from different perspectives, namely, from researchers working in methods in academia, from companies offering solutions and from companies requiring solutions.

Time	Label	Presentation Title Authors
14:00	11.3.1	5% OR 5X? THE PERFORMANCE GAP IN SIMD OPTIMIZATION, AND POSSIBLE SOLUTIONS Speaker: Ben Juurlink, TU Berlin, DE
14:15	11.3.2	MODEL-BASED DESIGN OF REAL-TIME SYSTEMS Speaker: Lothar Thiele, Swiss Federal Institute of Technology in Zurich (ETHZ), CH
14:30	11.3.3	PROGRAMMING ADAPTIVE AND ENERGY-EFFICIENT MANY-CORES Speaker: Jeronimo Castrillon, Technische Universität Dresden, DE
14:45	11.3.4	CONFIDENCE IN THE USE OF SOFTWARE TOOLS ACCORDING TO THE ISO 26262 IN AUTOMOTIVE MULTICORE APPLICATIONS Speaker: Ralph Jessenberger, BeOne Frankfurt GmbH, DE
15:00	11.3.5	AUTOMOTIVE MULTICORE MICROCONTROLLER SIMULATION, DEBUGGING AND ANALYSIS USING VIRTUAL PROTOTYPES Speaker: Victor Reyes, Synopsys Inc., US Abstract As the amount of software in the car continues growing, automotive companies have shifted to multicore microcontroller architectures as a way to contain both the number of ECUs (cost) and their power consumption (mileage). New multicore microcontrollers can deliver more performance and hence allow mapping more functions on it. At the same time, this extra performance is delivered at clock frequencies comparable to previous generations, hence keeping the more and more important power consumption under control. New challenges both from functional and performance angles are introduced when porting an existing software stack to a new multicore microcontroller architecture. From the functional aspect, new bugs such as race conditions, data coherence, etc are introduced. From the performance angle, different functions running on different cores do still share all other hardware resources, such as memories, on-chip busses and peripherals, which may introduce significant jitter on the software execution. Finding the appropriate configuration of such hardware resources that provides sufficient performance in all conditions is a daunting task. This is all very important especially for safety critical applications where "freedom of interference" must be ensured to achieve certification. In this presentation we will discuss, how multicore microcontroller simulation models (a.k.a. virtual prototypes) are used nowadays throughout the Automotive Supply chain, from Semiconductor companies, to Tier1 and to OEMs, to deal with the challenges of debugging and analyzing software on multicore architectures.
15:15	11.3.6	APPLYING MULTICORE COMPILER RESEARCH INTO INDUSTRIAL PRACTICES: AN EARLY EXPERIENCE REPORT Speaker: Weihua Sheng, Silexica Software Solutions GmbH, DE
15:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.4 Logic Synthesis: the Faithful, the Approximate and the Stochastic

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Chartreuse

Chair:
Alex Yakovlev, University of Newcastle, GB

Co-Chair:
Mohamed Sabry, Stanford University, US

Logic synthesis is evolving from traditional frameworks with fully-defined Boolean functions to account for the flexibilities afforded by observability don't cares, to generate smaller circuits through approximation and improve power-performance tradeoffs by taming stochastic computation.

Time	Label	Presentation Title Authors
14:00	11.4.1	A NEW APPROXIMATE ADDER WITH LOW RELATIVE ERROR AND CORRECT SIGN CALCULATION Speakers: Junjun Hu¹ and Weikang Qian² ¹Shanghai Jiao Tong University, CN; ²Shanghai Jiao Tong University (SJTU), CN Abstract Conventional precise adders need long delay and large power consumption to obtain accurate results. However, in recognition of the error tolerance of some applications such as multimedia processing and machine learning, a few recent works proposed approximate adders that generate inaccurate results occasionally to reduce the delay and power consumption. However, existing approximate adders rarely control the relative error and the potential sign error of the calculation results. In this paper, we propose a novel approximate adder that exploits the generate signals for carry speculation. Furthermore, we introduce a very low-cost error reduction module to effectively control the maximal relative error and a low-overhead sign correction module to fix the sign errors. Compared to the conventional adders, our adder is up to 4.3x faster and saves 47% power for a 32-bit addition. Compared to the existing approximate adders, our adder significantly reduces the maximal relative error and ensures correct sign calculation with comparable area, delay, and power consumption. Download Paper (PDF; Only available from the DATE venue WiFi)
14:30	11.4.2	TOWARDS BINARY CIRCUIT MODELS THAT FAITHFULLY CAPTURE PHYSICAL SOLVABILITY Speakers: Matthias Fuegger¹, Robert Najvirt¹, Thomas Nowak² and Ulrich Schmid¹ ¹TU Wien, AT; ²École Normale Supérieure, FR Abstract In contrast to analog models, binary circuit models are high-level abstractions that play an important role in assessing the correctness and performance characteristics of digital circuit designs: (i) modern circuit design relies on fast digital timing simulation tools and, hence, on binary-valued circuit models that faithfully model signal propagation, even throughout a complex design, and (ii) binary circuit models provide a level of abstraction that is amenable to formal correctness proofs. A mandatory feature of any such model is the ability to trace glitches and other short pulses precisely as they occur in physical circuits, as their presence may affect a circuit's correctness and its performance characteristics. Unfortunately, it was recently proved [Függer et al., ASYNC'13] that none of the existing binary-valued circuit models proposed so far, including the two most commonly used pure and inertial delay channels and any other bounded single-history channel, is realistic in the following sense: For the simple Short-Pulse Filtration (SPF) problem, which is related to a circuit's ability to suppress a single glitch, they showed that every bounded single-history channel either contradicts the unsolvability of SPF in bounded time or the solvability of SPF in unbounded time in physical circuits, i.e., no existing model correctly captures physical solvability with respect to glitch propagation. We propose a binary circuit model, based on so-called involution channels, which do not suffer from this deficiency. In sharp contrast to what is possible with all the existing models, they allow to solve the SPF problem precisely when this is possible in physical circuits. To the best of our knowledge, our involution channel model is hence the very first binary circuit model that realistically models glitch propagation, which makes it a promising candidate for developing more accurate tools for simulation and formal verification of digital circuits. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	11.4.3	A COUPLING AREA REDUCTION TECHNIQUE APPLYING ODC SHIFTING Speakers: Yi Diao¹, Tak-Kei Lam², Xing Wei¹ and Yu-Liang Wu² ¹The Chinese University of Hong Kong, CN; ²The Chinese University of Hong Kong, HK Abstract Circuit size reduction is a basic problem in today's integrated circuit (IC) design. Besides yielding a smaller area, reducing circuit size can also provide advantages in many operations throughout the design flow, including technology mapping, verification and place-and-route. In recent years, some node based logic synthesis algorithms have been proposed for this purpose. Node Addition and Removal (NAR) and Observability Don't Cares (ODCs) based node merging were found to be quite effective in reducing the number of nodes in a netlist. However, both methods do not address the effect of re-distributing ODCs and the results are virtually fixed after one iteration run. We study the implications of redistributing ODCs and propose a node-based and wire-based coupling synthesis scheme that can effectively find better solutions with the application of ODC shifting operations. Experimental results show that this approach can produce area reductions nearly double of the pure node-based algorithms. Download Paper (PDF; Only available from the DATE venue WiFi)
15:15	11.4.4	A GENERAL DESIGN OF STOCHASTIC CIRCUIT AND ITS SYNTHESIS Speakers: Zheng Zhao and Weikang Qian, Shanghai Jiao Tong University (SJTU), CN Abstract Stochastic computing (SC) is an unconventional paradigm to realize arithmetic computation, where real values are encoded as stochastic bit streams. Compared with conventional computation on binary radix encoding, SC can perform arithmetic computation with very simple circuits. It also has strong tolerance to soft errors. In this paper, we introduce a general design of combinational circuit for stochastic computing, together with its analysis. We further show a synthesis method that can implement arbitrary arithmetic functions with the proposed design. The experimental results demonstrated that compared with the previous methods, our approach produces a circuit with much smaller area and delay. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	IP5-12, 647	SUBHUNTER: A HIGH-PERFORMANCE AND SCALABLE SUB-CIRCUIT RECOGNITION METHOD WITH PRüFER-ENCODING Speakers: Hong-Yan Su, Chih-Hao Hsu and Yih-Lang Li, National Chiao Tung University, TW Abstract Sub-circuit recognition (SR) is a problem of recognizing sub-circuits within a given circuit and is a fundamental component in simulation, verification and testing of computer-aided design. The SR problem can be formulated as subgraph isomorphism problem. Performance of previous works is not scalable as the complexities of modern designs increase. In this paper we propose a novel Prüfer-encoding based SR algorithm that performs scalable and high-performance sub-circuit matching. Several techniques including tree structure partition, tree cutting and circuit graph encoding are proposed herein to decompose the SR problem into several small sub-sequence matching problems. A pre-filtering strategy is applied before matching to remove the sub-circuits that are not likely to be matched. A fast branch and bound approach is developed to identify all the sub-circuits within the given circuit. Experimental results show that SubHunter can achieve better performance than SubGemini and detect all the sub-circuits as well. As the circuit size increases, we can also achieve near linear runtime growth that outperforms the exponential growth for SubGemini, showing the scalability of the proposed algorithm. Download Paper (PDF; Only available from the DATE venue WiFi)
15:31	IP5-13, 87	TIMING VERIFICATION FOR ADAPTIVE INTEGRATED CIRCUITS Speakers: Rohit Kumar¹, Bing Li², Yiren Shen¹, Ulf Schlichtmann² and Jiang Hu¹ ¹Texas A&M University, College Station, US; ²Technische Universität München, DE Abstract An adaptive circuit can perform built-in self-detection of timing variations and accordingly adjust itself to avoid timing violations. Compared with conventional over-design approach, adaptive circuit design is conceptually advantageous in terms of power-efficiency. Although the advantage has been witnessed in numerous previous works including test chips, adaptive design is far from being widely used in practice. A key reason is the lack of corresponding timing verification support. We develop new timing analysis techniques to fill this void. A main challenge is the large runtime complexity due to numerous adaptivity configurations. We propose several pruning and reduction techniques and apply them in conjunction with statistical static timing analysis (SSTA). The proposed method is validated on benchmark circuits including the recent ISPD'13 suite, which has circuit as large as 150K gates. The results show that our method can achieve orders of magnitude speedup over Monte Carlo simulation with about the same accuracy. It is also several times faster than an exhaustive application of SSTA. Download Paper (PDF; Only available from the DATE venue WiFi)
15:32	IP5-14, 1045	A ROBUST APPROACH FOR PROCESS VARIATION AWARE MASK OPTIMIZATION Speakers: Jian Kuang, Wing-Kai Chow and Evangeline Young, The Chinese University of Hong Kong, HK Abstract As the minimum feature size continues to shrink, whereas the wavelength of light used for lithography remains constant, Resolution Enhancement Techniques are widely used to optimize mask, so as to improve the subwavelength printability. Besides correcting for error between the printed image and target shape, a mask optimization method also needs to consider process variation. In this paper, a robust mask optimization approach is proposed to optimize the process window as well as the Edge Placement Error (EPE) of the printed image. Experiments results on the public benchmarks are encouraging. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.5 Ultra-low Power Devices for Health and Rehabilitation

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Meije

Chair:
Georgios Karakonstantis, Queen's University, GB

Co-Chair:
José M. Moya, Technical University of Madrid, ES

The session addresses scientific contribution in the field of ultra-low power devices and communication for medical, health and rehabilitation application. The first paper presents an innovative wearable device to assist writing rehabilitation. The next two papers cover different key aspects related to signal processing approaches for wireless compression and low-power coding for future Internet-of-Things (IoT) devices.

Time	Label	Presentation Title Authors
14:00	11.5.1	PAPER, PEN AND INK: AN INNOVATIVE SYSTEM AND SOFTWARE FRAMEWORK TO ASSIST WRITING REHABILITATION Speakers: Leonardo Guardati¹, Filippo Casamassima¹, Elisabetta Farella² and Luca Benini³ ¹Università di Bologna, IT; ²Fondazione Bruno Kessler, IT; ³Università di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), Abstract Handwriting analysis and rehabilitation is an actively explored area in the diagnosis and treatment of Parkinson's disease, which is usually performed in an ambulatory setting under direct supervision of a clinician. Technology enhanced handwriting is actively explored to reduce the need of physical co-presence of clinician and patient and to enhance diagnostic precision through the computation of non-subjective handwriting quality metrics. This paper introduces an innovative handwriting rehabilitation system for PD patients which ensures a natural writing experience as it is based on pen and paper (as opposed to tablet and stylus). The system is designed for human-in-the loop operation and it can analyze handwriting in real-time and provide vocal feedback to guide the patient during the execution of exercises. We present a detailed comparative characterization of the key components of the system, namely wireless smart pens. In addition, in-field test assessed the system usability regarding its ease of use, calibration precision and vocal feedback effectiveness. Download Paper (PDF; Only available from the DATE venue WiFi)
14:30	11.5.2	AN ALL-DIGITAL SPIKE-BASED ULTRA-LOW-POWER IR-UWB DYNAMIC AVERAGE THRESHOLD CROSSING SCHEME FOR MUSCLE FORCE WIRELESS TRANSMISSION Speakers: Amirhossein Shahshahani¹, Masoud Shahshahani¹, Maurizio Martina¹, Guido Masera¹, Danilo Demarchi², Marco Crepaldi³, Paolo Motto Ros³ and Alberto Bonanno³ ¹Politecnico di Torino, IT; ²Politecnico di Torino / Istituto Italiano di Tecnologia@PoliTo, IT; ³Istituto Italiano di Tecnologia@PoliTo, IT Abstract We introduce an Impulse Radio Ultra-Wide Band (IR-UWB) radio transmission scheme for miniaturized biomedical applications based on a dynamic and adaptive voltage thresholding of surface Electro Myo Graphy (sEMG) signals. The amplified sEMG signal is compared to a DAC-generated threshold computed from the previous 1-bit history by custom digital control logic running at 2kHz clock and implementing an ad-hoc algorithm (Dynamic Average Threshold Crossing, D-ATC). The resulting events and the associated digitized voltage level can be both asynchronously radiated through IR-UWB. Analyzes show that the scheme is robust w.r.t. the sEMG signal variability and correlates by 96% with regard to raw muscle force information after signal is recomputed at the RX. This paper compares DATC with regard to a fixed threshold system and an Average Threshold Crossing (ATC) demonstrating improved robustness, and introduces the thresholding algorithm verified on a dataset of 190 sEMG recorded signals. The applied threshold resolution has been optimized to both minimize the size of transmitted data and to guarantee good correlation performance. The paper concludes with post-synthesis results of the D-ATC compact digital control logic in a 0.18μm CMOS process, demonstrating an extremely low power consumption at very low active area expenses. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	11.5.3	A PULSED-INDEX TECHNIQUE FOR SINGLE-CHANNEL, LOW-POWER, DYNAMIC SIGNALING Speakers: Shahzad Muzaffar, Jerald Yoo, Ayman Shabra and Ibrahim (Abe) Elfadel, Masdar Institute of Science and Technology, AE Abstract The most common operation of an IoT sensor is that of short activity bursts separated by long time intervals in sleep or listen modes. During the data bursts, sensed information has to be reliably communicated in real time without draining the energy resources of the sensor node. One way to save such resources is to efficiently code the data burst, use single-channel communication, and adopt ultra-low-power communication circuit techniques. Clock-data recovery (CDR) circuits are typically significant consumers of energy on traditional single-channel communication protocols. In this paper, we present a novel single-channel protocol that does not require any CDR circuitry. The protocol is based on the novel concept of a pulsed index where data is encoded to minimize the number of ON bits, move them to the LSB end of the packet, and transmit the ON bit indices in the form of a pulse stream. The pulse count is equal to the index of the ON bit. We call this protocol Pulsed Index Communication (PIC). Beside the elimination of CDR, we show that the implementation of PIC is very area-efficient, low-power and highly tolerant of clocking differences between transmitter and receiver. We present both an FPGA and an ASIC implementation of the protocol and use them to illustrate the performance, reliability and power consumption features of PIC signaling. In particular, we show that for an ASIC implementation on 65nm technology, PIC can reduce area by more than 80% and power by more than 70% in comparison with a CDR-based serial bit transfer protocol. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.6 Video Architectures for Multimedia and Communications

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Bayard

Chair:
Frederic Petro, TIMA, FR

Co-Chair:
Marcello Coppola, STMicroelectronics, FR

This session presents innovative work in video architectures and algorithms used in multimedia and communication systems.

Time	Label	Presentation Title Authors
14:00	11.6.1	SAPPHIRE: AN ALWAYS-ON CONTEXT-AWARE COMPUTER VISION SYSTEM FOR PORTABLE DEVICES Speakers: Swagath Venkataramani¹, Victor Bahl², Xian-Sheng Hua², Jie Liu², Jin Li², Matthai Phillipose², Bodhi Priyantha² and Mohammed Shoaib² ¹Purdue University, US; ²Microsoft Research, US Abstract Being aware of objects in the ambient provides a new dimension of context awareness. Towards this goal, we present a system that exploits powerful computer vision algo- rithms in the cloud by collecting data through always-on cameras on portable devices. To reduce comunication-energy costs, our system allows client devices to continually analyze streams of video and distill out frames that contain objects of interest. Through a dedicated image-classification engine SAPPHIRE, we show that if an object is found in 5% of all frames, we end up selecting 30% of them to be able to detect the object 90% of the time: 70% data reduction on the client device at a cost of < 60mW of power (45nm ASIC). By doing so, we demonstrate system-level energy reductions of 2X. Thanks to multiple levels of pipelining and parallel vector-reduction stages, SAPPHIRE consumes only 3.0 mJ/frame and 38 pJ/OP - estimated to be lower by 11.4X than a 45 nm GPU - and a slightly higher level of peak performance (29 vs. 20 GFLOPS). Further, compared to a parallelized sofware implementation on a mobile CPU, it provides a processing speed up of up to 235X (1.81 s vs. 7.7 ms/frame), which is necessary to meet the real-time processing needs of an always-on context-aware system. Download Paper (PDF; Only available from the DATE venue WiFi)
14:30	11.6.2	APPROXIMATE ASSOCIATIVE MEMRISTIVE MEMORY FOR ENERGY-EFFICIENT GPUS Speakers: Abbas Rahimi¹, Amirali Ghofrani², Kwang-Ting Cheng², Luca Benini³ and Rajesh Gupta¹ ¹UC San Diego, US; ²UC Santa Barbara, US; ³Università di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), IT Abstract Multimedia applications running on thousands of deep and wide pipelines working concurrently in GPUs have been an important target for power minimization both at the architectural and algorithmic levels. At the hardware level, energy-efficiency techniques that employ voltage overscaling face a barrier so-called "path walls": reducing operating voltage beyond a certain point generates massive number of timing errors that are impractical to tolerate. We propose an architectural innovation, called A2M2 module (approximate associative memristive memory) that exhibits few tolerable timing errors suitable for GPU applications under voltage overscaling. A2M2 is integrated with every floating point unit (FPU), and performs partial functionality of the associated FPU by pre-storing high frequency patterns for computational reuse that avoids overhead due to re-execution. Voltage overscaled A2M2 is designed to match an input search pattern with any of the stored patterns within a Hamming distance range of 0-2. This matching behavior under voltage overscaling leads to a controllable approximate computing for multimedia applications. Our experimental results for the AMD Southern Islands GPU show that four image processing kernels tolerate the mismatches during pattern matching resulting in a PSNR > 30dB. The A2M2 module with 8-row enables 28% voltage overscaling in 45nm technology resulting in 32% average energy saving for the kernels, while delivering an acceptable quality of service. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	11.6.3	PLATFORM-AWARE DYNAMIC CONFIGURATION SUPPORT FOR EFFICIENT TEXT PROCESSING ON HETEROGENEOUS SYSTEM Speakers: Mi Sun Park¹, Omesh Tickoo², Vijaykrishnan Narayanan¹, Mary Jane Irwin¹ and Ravi Iyer² ¹The Pennsylvania State University, US; ²Intel Labs, US Abstract Significant efforts have been made in accelerating computer vision and machine learning algorithms by utilizing parallel processors such as multi-core CPUs and GPUs. Although the suitability of GPU is well-known for computer graphics and image processing applications which require massively parallel floating-point computations, recent research movement towards general purpose computing on-GPU (GPGPU) makes it possible to take advantage of parallel processors to accelerate text processing applications as well. However, how to fully leverage different types of parallel processor architectures to obtain optimal performance (especially with text) without making specific efforts to each platform still remains a great challenge. We applied performance and accuracy enhancements to Naive Bayes algorithm to develop a practically sound implementation of text classification. A platform-aware dynamic configuration support automation flow is also proposed to support the seamless execution of our work across platforms. Experiments on various (integrated graphics, dedicated multiple GPUs) platforms demonstrate that our proposed approach improves both accuracy and performance of text classification. Download Paper (PDF; Only available from the DATE venue WiFi)
15:15	11.6.4	A DEBLOCKING FILTER HARDWARE ARCHITECTURE FOR THE HIGH EFFICIENCY VIDEO CODING STANDARD Speakers: Cláudio Diniz¹, Muhammad Shafique², Felipe Dalcin¹, Sergio Bampi¹ and Joerg Henkel² ¹Federal University of Rio Grande do Sul (UFRGS), BR; ²Karlsruhe Institute of Technology (KIT), DE Abstract The new deblocking filter (DF) tool of the next generation High Efficiency Video Coding (HEVC) standard is one of the most time consuming algorithms in video decoding. In order to achieve real-time performance at low-power consumption, we developed a hardware accelerator for this filter. This paper proposes a high throughput hardware architecture for HEVC deblocking filter employing hardware reuse to accelerate filtering decision units with a low area cost. Our architecture achieves either higher or equivalent throughput (4096x2048 @ 60 fps) with 5X-6X lower area compared to state-of-the-art deblocking filter architectures. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	IP5-15, 176	FASTTREE: A HARDWARE KD-TREE CONSTRUCTION ACCELERATION ENGINE FOR REAL-TIME RAY TRACING Speakers: Xingyu Liu, Yangdong Deng, Yufei Ni and Zonghui Li, Institute of Microelectronics, Tsinghua University, CN Abstract The ray tracing algorithm is well-known for its ability to generate photo-realistic rendering effects. Recent years have witnessed a renewed momentum in pushing it to real-time for better user experience. Today the construction of acceleration structures, e.g., kd-tree, has become the bottleneck of ray tracing. A dedicated hardware architecture, FastTree, was proposed for kd-tree construction by adopting a fully parallel construction algorithm. FastTree was validated by an FPGA prototype and evaluated as an ASIC implementation. Experiment result shows FastTree outperforms existing hardware construction engines by a factor of nearly 4X at a similar area and power budget. Download Paper (PDF; Only available from the DATE venue WiFi)
15:31	IP5-16, 688	REVERSE LONGSTAFF-SCHWARTZ AMERICAN OPTION PRICING ON HYBRID CPU/FPGA SYSTEMS Speakers: Christian Brugger, Javier Alejandro Varela, Norbert Wehn, Songyin Tang and Ralf Korn, University of Kaiserslautern, DE Abstract In today's markets, high-speed and energy-efficient computations are mandatory in the financial and insurance industry. At the same time, the gradual convergence of high-performance computing with embedded systems is having a huge impact on the design methodologies, where dedicated accelerators are implemented to increase performance and energy efficiency. This paper follows this trend and presents a novel way to price high-dimensional American options using techniques of the embedded community. The proposed architecture targets heterogeneous CPU/FPGA systems, and it exploits the FPGA reconfiguration to deliver high-throughput. With a bit-true algorithmic transformation based on recomputation, it is possible to eliminate the memory bottleneck and access costs. The result is a pricing system that is 16x faster and 268x more energy-efficient than an optimized Intel CPU implementation. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.7 Exploiting Dark Silicon

Date: Thursday 12 March 2015
Time: 14:00 - 15:30
Location / Room: Les Bans

Chair:
Olivier Heron, CEA LIST, FR

Co-Chair:
Domenik Helms, OFFIS, DE

The advent of the dark silicon area, raises the need for accurately, yet effectively regarding thermal properties of the system. Employing advanced power gating techniques will additionally raise the achievable gain. Both will be presented in this session.

Time	Label	Presentation Title Authors
14:00	11.7.1	MATEX: EFFICIENT TRANSIENT AND PEAK TEMPERATURE COMPUTATION FOR COMPACT THERMAL MODELS Speakers: Santiago Pagani¹, Jian-Jia Chen², Muhammad Shafique¹ and Joerg Henkel¹ ¹Karlsruhe Institute of Technology (KIT), DE; ²TU Dortmund, DE Abstract In many core systems, run-time scheduling decisions, such as task migration, core activations/deactivations, voltage/frequency scaling, etc., are typically used to optimize the resource usages. Such run-time decisions change the power consumption, which can in turn result in transient temperatures much higher than any steady-state scenarios. Therefore, to be thermally safe, it is important to evaluate the transient peaks before making resource management decisions. This paper presents a method for computing these transient peaks in just a few milliseconds, which is suited for run-time usage. This technique works for any compact thermal model consisting in a system of first-order differential equations, for example, RC thermal networks. Instead of using regular numerical methods, our algorithm is based on analytically solving the differential equations using matrix exponentials and linear algebra. This results in a mathematical expression which can easily be analyzed and differentiated to compute the maximum transient temperatures. Moreover, our method can also be used to efficiently compute all transient temperatures for any given time resolution without accuracy losses. We implement our solution as an open-source tool called MatEx. Our experimental evaluations show that the execution time of MatEx for peak temperature computation can be bounded to no more than 2.5 ms for systems with 76 thermal nodes, and to no more than 26.6 ms for systems with 268 thermal nodes, which is three orders of magnitude faster than the state-of-the-art for the same settings. Download Paper (PDF; Only available from the DATE venue WiFi)
14:30	11.7.2	DISTRIBUTED REINFORCEMENT LEARNING FOR POWER LIMITED MANY-CORE SYSTEM PERFORMANCE OPTIMIZATION Speakers: Zhuo Chen and Diana Marculescu, Carnegie Mellon University, US Abstract As power density emerges as the main constraint for many-core systems, controlling power consumption under the Thermal Design Power (TDP) while maximizing the performance becomes increasingly critical. To dynamically save power, Dynamic Voltage Frequency Scaling (DVFS) techniques have proved to be effective and are widely available commercially. In this paper, we present an On-line Distributed Reinforcement Learning (OD-RL) based DVFS control algorithm for many-core system performance improvement under power constraints. At the finer grain, a per-core Reinforcement Learning (RL) method is used to learn the optimal control policy of the Voltage/Frequency (VF) levels in a system model-free manner. At the coarser grain, an efficient global power budget reallocation algorithm is used to maximize the overall performance. The experiments show that compared to the state-of-the-art algorithms: 1) OD-RL produces up to 98% less budget overshoot, 2) up to 44.3x better throughput per over-the-budget energy and up to 23% higher energy efficiency, and 3) two orders of magnitude speedup over state-of-the-art techniques for systems with hundreds of cores. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	11.7.3	AN ENERGY-EFFICIENT VIRTUAL CHANNEL POWER-GATING MECHANISM FOR ON-CHIP NETWORKS Speakers: Amirhossein Mirhosseini¹, Mohammad Sadrosadati¹, Ali Fakhrzadehgan², Mehdi Modarressi³ and Hamid Sarbazi-Azad¹ ¹Sharif University of Technology, IR; ²University of Texas at Austin, US; ³University of Tehran, IR Abstract Power-gating is a promising method for reducing the leakage power of digital systems. In this paper, we propose a novel scheme for power-gating of virtual channels in on-chip networks. Since virtual channels are used to gain higher throughput under high traffic loads, any method that attempts to reduce their power consumption should manage to maintain the performance. Our scheme uses adaptive methods in order to tune itself with the workload so that power reduction does not cause performance degradation. Using this scheme, about 40% reduction in static power consumption can be achieved while the performance overhead is negligible. Download Paper (PDF; Only available from the DATE venue WiFi)
15:15	11.7.4	M-DTM: MIGRATION-BASED DYNAMIC THERMAL MANAGEMENT FOR HETEROGENEOUS MOBILE MULTI-CORE PROCESSORS Speakers: Young Geun Kim, Minyong Kim, Jae Min Kim and Sung Woo Chung, Korea University, KR Abstract Recently, mobile devices have employed heterogeneous multi-core processors which consist of high-performance big cores and low-power small cores. In heterogeneous mobile multi-core processors, the conventional DVFS (Dynamic Voltage and Frequency Scaling) -based DTM (Dynamic Thermal Management) is still adopted; it does not actively utilize the small cores to resolve thermal problem. In this paper, we propose M-DTM (Migration-based DTM) for heterogeneous mobile multi-core processors. In case of thermal emergency of the big cores, M-DTM migrates applications to the small cores instead of lowering the voltage and frequency of the big cores. In this way, M-DTM allows more time for the applications to run at the highest frequency of the big cores by cooling down the big cores more rapidly, compared to the conventional DTM. Through real measurement, we show that M-DTM improves performance by 10.6% and saves system-wide energy (not CPU energy) by 3.6%, on average, compared to the conventional DTM. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	IP5-17, 516	ACCURATE ELECTROTHERMAL MODELING OF THERMOELECTRIC GENERATORS Speakers: Mohammad Javad Dousti¹, Antonio Petraglia² and Massoud Pedram¹ ¹University of Southern California, US; ²Federal University of Rio de Janeiro, BR Abstract Thermoelectric generators (TEGs) provide a unique way for harvesting thermal energy. These devices are compact, durable, inexpensive, and scalable. Unfortunately, the conversion efficiency of TEGs is low. This requires careful design of energy harvesting systems including the interface circuitry between the TEG module and the load, with the purpose of minimizing power losses. In this paper, it is analytically shown that the traditional approach for estimating the internal resistance of TEGs may result in a significant loss of harvested power. This drawback comes from ignoring the dependence of the electrical behavior of TEGs on their thermal behavior. Accordingly, a systematic method for accurately determining the TEG input resistance is presented. Next, through a case study on automotive TEGs, it is shown that compared to prior art, more than 11% of power losses in the interface circuitry that lies between the TEG and the electrical load can be saved by the proposed modeling technique. In addition, it is demonstrated that the traditional approach would have resulted in a deviation from the target regulated voltage by as much as 59%. Download Paper (PDF; Only available from the DATE venue WiFi)
15:31	IP5-18, 926	EFFICIENCY-DRIVEN DESIGN TIME OPTIMIZATION OF A HYBRID ENERGY STORAGE SYSTEM WITH NETWORKED CHARGE TRANSFER INTERCONNECT Speakers: Qing Xie¹, Younghyun Kim², Donkyu Baek³, Yanzhi Wang¹, Massoud Pedram¹ and Naehyuck Chang⁴ ¹University of Southern California, US; ²Purdue University, US; ³Korea Advanced Institute of Science and Technology, KR; ⁴Seoul National University, KR Abstract This paper targets at the state-of-art hybrid energy storage systems (HESSs) with a networked charge transfer interconnect and solves a node placement problem in the HESS, where a node refers to a storage bank, a power source, or a load device, with its distributed power converter. In particular, the node placement problem is formulated as how to place the nodes in a HESS such that the optimal total charge transfer efficiency is achieved, with accurate modelings of all kinds of different components in the HESS. The methodology of FPGA placement problem is adopted to solve the node placement in HESS by properly defining a cost function that strongly relates the charge transfer efficiency to the node placement, properties of HESS components, as well as applications of the HESS. An algorithm that combines a quadratic programming method to generate an initial placement and a simulated annealing method to converge to the optimal placement result is presented in this paper. Experimental results demonstrate the efficacy of the placement algorithm and improvements in the charge transfer efficiency for various problem setups and scales. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00

11.8 Exhibition Keynote - Designing Systems for the Connected Autonomous Future: An Industry Perspective

Date: Thursday 12 March 2015
Time: 14:00 - 15:00
Location / Room: Salle Lesdiguières

Organiser:
John Zhao, MathWorks, US

Chair:
Jürgen Haase, edacentrum, DE

Moderator:
Paul Smith, MathWorks, US

Will I ever travel in an autonomous vehicle? Will my refrigerator really order food automatically from my grocery store? Can the watch I wear in the future warn me about an impending heart attack? Innovations at the SoC and board level are poised to provide the necessary computational power with low cost and high flexibility to make these products. However, designing the systems of the future -- whether an automobile, connected industrial machinery, medical device, consumer electronics, or an aerospace guidance system -- requires advances not only in embedded systems and software, but how they are designed and verified.

In this keynote, an expert from industry will discuss trends and innovations in systems that are incorporating more electronic content than ever before, and describe model-based development approaches that companies are using to create the system functionality that will power our connected autonomous future.

Time

Label

Presentation Title
Authors

14:00

11.8.1

EXHIBITION KEYNOTE MATHWORKS
Speaker:
Stéphane Louvet, Robert Bosch SAS, FR

15:00

End of session

15:30

Coffee Break in Exhibition Area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00

UB11 Session 11

Date: Thursday 12 March 2015
Time: 14:30 - 16:30
Location / Room: University Booth, Booth 4, Exhibition Area

Label	Presentation Title Authors
UB11.1	COMBINATION OF WSN AND 1ST ORDER KINETIC MODEL FOR REAL-TIME SHELF-LIFE PREDICTION OF PERISHABLE GOODS Presenter: Valerio Francesco Annese, Politecnico di Bari, IT Author: Daniela De Venuto, Politecnico di Bari, IT Abstract A complete and autonomous multi-sensing platform for perishable goods monitoring and shelf-life prediction, based on the combination of the wireless sensor network (WSN) technology and a further real-time data processing, is presented. The proposed approach offers an effective solution for waste and losses reduction in the supply chain of perishable products and, thus, an improvement of food safety, as well as food organoleptic qualities: in fact, we demonstrate the possibility to predict products shelf-life from the environment parameters such as temperature, relative humidity and light exposition in real-time. Although several models for shelf-life prediction have been already developed, none of them was embedded in a complete system supported by the real-time data availability, offered by an "ad hoc" WSN. In our infrastructure, system integration issues are carefully solved: data collected by the WSN are firstly uploaded on a cloud. An appropriate Java application makes these data available to any kind of elaboration. Then, we developed an algorithm that implements a 1st order kinetic model of the quality decay reaction, employed to evaluate remaining shelf-life of the monitored perishable product. The model takes into account the dependence of the degradation rate from the temperature according to Arrhenius law. To validate the platform we have conducted several case studies. Here we propose an 8-days monitoring of a warehouse of vegetable products (fresh tomatoes): the real-time shelf-life prediction was calculate through data coming from six multi-sensing nodes that were monitoring several environmental conditions in which the products were subjected. The implementation of the algorithm in an application for any kind of portable and non-portable devices (just like an iPad, smartphones, etc.) would result in a widespread diffusion of this technology. It is worth to notice that the complete infrastructure is a suitable low-cost and easy to implement solution for monitoring any perishable product (such as beverages, drugs, vaccines, blood, etc.) stored in any environmental condition (warehouse, transportation, store, etc.). More information ...
UB11.2	NETFPGA SUME: MAKING 100GBPS A COMMODITY Presenter: Noa Zilberman, University of Cambridge, GB Authors: Yury Audzevich, Georgina Kalogeridou and Andrew W. Moore, University of Cambridge, GB Abstract The demand-led growth of datacenter networks has meant that many constituent technologies are beyond the budget of the wider community. In order to make and validate timely and relevant new contributions, the wider community requires accessible evaluation, experimentation and demonstration environments with specification comparable to the subsystems of the most massive datacenter networks. We will demonstrate NetFPGA SUME, an open-source FPGA-based PCIe board with I/O capabilities for 100Gbps operation as NIC, multiport switch, firewall, or test/measurement environment. More information ...
UB11.3	FLARE: A RECONFIGURATION AWARE FLOORPLANNER Presenter: Riccardo Cattaneo, Politecnico di Milano, IT Authors: Marco Rabozzi and Marco Santambrogio, Politecnico di Milano, IT Abstract This demonstration presents a floorplanner tool addressing partially-reconfigurable FPGAs. The input of the tool consists of a set of regions described in terms of their heterogeneous resource requirements together with the number of interconnections among regions and the target FPGA of the partial reconfiguration (PR) design. Once the input are specified, the floorplanner allow the designer to manually or automatically perform the floorplan of the regions. More information ...
UB11.4	RECONFIGURABLE FPGA-BASED NON-INTRUSIVE BERT FOR PRODUCTION TEST Presenter: Sergei Odintsov, Tallinn University of Technology, EE Author: Artjom Jasnetski, Tallinn University of Technology, EE Abstract We introduce an FPGA-based Bit Error Rate (BER) tester solution for high-speed serial links targeting production environment. This solution does not require usage of external T&M equipment or extra DFT. As opposed to intrusive physical probing with external BER tester our approach produces more relevant output because measurement is done using transceivers in their functional mode. Introduced BERT instrument supports fine tuning of link parameters and pattern generation. This solution can replace long lasting BER test by quick evaluation of link quality using eye diagram. More information ...
UB11.5	REAL-TIME PATTERN DETECTION OF MOVEMENT RELATED POTENTIALS BY SYNCHRONIZED EEG AND EMG Presenter: Valerio Francesco Annese, Politecnico di Bari, IT Author: Daniela De Venuto, Politecnico di Bari, IT Abstract Before the conscious intention to perform any voluntary movement, our brain has already activated the action, 1s before the muscle activity actually starts. The brain processes are necessary to determine the performance of the movement itself. The presence of both the premotor potential (also called "Bereitschaftspotential" or "Readiness Potential" in the 2-5 Hz band) and the Mu-rhythm (in the 7-12 Hz) is particularly interesting for the detection of voluntary movement. Therefore, the detection of these movements' related potentials (MRPs), before the EMG activation, indicates the movement intentionality. Due to the presence of artifacts (blinking, eye movement, swallowing etc.) that spoil EEG signals, the real-time detection of MRPs is particularly challenging. In this proposal, we describe a complete wearable system performing synchronous EEG and EMG monitoring to on-line detect MRPs and prevent unintentional and dangerous movements. The Bereitschaftspotential (BP) and Mu-rhythm detection is carried out through a wavelet analysis on differential signals captured 1-second before the EMG activation. This differential approach allows discerning if the recorded EEG activity is related to the motor cortex or if it is just a common artifact. The EEG/EMG monitoring system can face the strict requirements of ambient assisted living application (AAL), taking care of aged and disable people in a domestic environment. Specifically for this application, the system can be configured as following: data from 12 EEG channels are firstly collected in a central unit that wirelessly communicates with the gateway (24 bit resolution - 500 Hz sampling rate), the gateway receives data also from each of the 8 EMG nodes (12-bit resolution, 500 Hz sampling rate). For a comfortably use, a battery life of - at least - 10 hours, have to be implemented. Moreover, a working range of 10 meters (between nodes and gateway) is considered. Above all, the requirement of wearability is achieved by the transfer printing technology, produced using photolithography and dry etch techniques, that allows the creation of wireless, tiny and lightweight electrodes for both EEG and EMG printed on bio-polymers (Polycaprolactone). Since a huge amount of retrieved data is expected, a data rate of 250 kbps (~31 kBps) is needed: a good compromise in terms of power consumption and data rate is achieved through the standard IEEE- 802.15.1 (Bluetooth low energy -BLE). The gateway unit (a smartphone or a tabled) receives the EEG and EMG sensor data and performs signal analysis to identify possible MRPs patterns through wavelet analysis. In this contribute it will be delineated as case study the possible implementation in fall prevention where not only the unwanted muscle movement is detected but also a bio-feedback is activated to block the muscle and inform an assistive center. Nevertheless, the field of application of the system here presented covers a wide range of AAL applications including fall prevention, rehabilitation (i.e. walk monitoring), artificial limb control and neurodegenerative diseases diagnosis. More information ...
UB11.6	WHERE IS IT? FIND THE CODE YOU ARE INTERESTED IN! Presenter: Jan Malburg, University of Bremen, DE Author: Görschwin Fey, University of Bremen / German Aerospace Center, DE Abstract The demonstration presents our tool for feature localization and debugging of RTL-designs. Feature localization helps a designer to find the code relevant for a certain feature and, thus, helps him to faster understand a design previously unknown to him. The developer can choose between three basic techniques for feature localization. In the area of debugging the tools allows fault localization, reverse debugging based on dynamic data- and control-flow of the design and dynamic slicing. More information ...
UB11.7	CRYPTOCHIP: DEMONSTRATION OF CRYPTOGRAPHIC ASIC PROTOTYPE Presenter: Xuan Thuy Ngo, Télécom ParisTech, FR Authors: Xuan Thuy Ngo, Jean-Luc Danger, Sylvain Guilley, Tarik Graba, Yves Mathieu and Zakaria Najm, Télécom ParisTech, FR Abstract We want to demonstrate a cryptographic ASIC implemented in ST 65nm technology. It features the following IPs: - Open Loop True Random Number Generator (TRNG). - Loop Physical Unclonable Function (PUF). - SRAM PUF. - Secure Clock. - Digital Sensor. - Advanced Encryption Standard (AES) with Piret-Trojan. - Active Shield. The demo consists in presenting the functionality and the security level of some of those IPs. More information ...
UB11.8	3D-COSTAR: USING 3D-COSTAR FOR 2.5D-/3D-SIC COST ANALYSIS Presenter: Mottaqiallah Taouil, TU Delft, NL Authors: Mottaqiallah Taouil¹, Said Hamdioui¹ and Erik Jan Marinissen² ¹TU Delft, NL; ²IMEC, BE Abstract Selecting an appropriate and efficient test flow for a 2.5D/3D Stacked IC (2.5D-SIC/3D-SIC) is crucial for overall cost optimization. In this demonstration, we present 3D-COSTAR, a tool that considers costs involved in the whole 2.5D/3D-SIC chain, including design, manufacturing, test, packaging and logistics, e.g. related to shipping wafers between a foundry and a test house; and provides the estimated overall cost for 2.5D/3D-SICs and its cost breakdown for a given input parameter set, e.g., test flows, die yield and stack yield. Several case studies will be presented in which the overall cost and product quality (in defective parts per million) are analyzed. More information ...
UB11.9	BONDCALC: THE BOND CALCULATOR Presenter: Carl Christoph Jung, Reutlingen University, DE Authors: Christian Silber¹ and Juergen Scheible² ¹Robert Bosch GmbH, DE; ²Reutlingen University, DE Abstract The Bond Calculator is a fast and exact tool to help designers to choose a bond wire, which does not fuse. The Bond Calculator is orders of magnitude faster than FEM and Easy-to-use. The Bond Calculator helps designers to estimate the temperature at the bond connection itself, by calculating the time and space dependence of the power delivered from the bond wire to the chip. These temperature changes can affect the durability of the bond connection. The Bond Calculator uses a simplified simulation model to calculate the temperature profile in a bond wire from the induced current profile. This software tool has been validated by FEM and measurement. More information ...
16:30	End of session

12.8 Tutorial: An Industry Approach to FPGA/ARM System Development and Verification

Date: Thursday 12 March 2015
Time: 15:00 - 17:30
Location / Room: Salle Lesdiguières

Organiser:
John Zhao, MathWorks, US

MATLAB and Simulink provide a rich environment for embedded-system development, with libraries of proven, specialized algorithms ready to use for specific applications. The environment enables a model-based design workflow for fast prototyping and implementation of the algorithms on heterogeneous embedded targets, such as MPSoC. A system-level design approach enables architectural exploration and partitioning, as well as coordination between SW and HW development workflows. Functional verification throughout the design process improves coverage and test-case generation while reducing the time and resources required.

In this set of tutorial sessions, you will learn

How to implement an application that leverages the FPGA and ARM core of a Zynq SOC

The flexibility and diversity of the approach through examples that include prototyping a motor control algorithm and a video-processing algorithm.

A HW/SW co-design workflow that combines system level design and simulation with automatic code generation

Successful use of the HW/SW co-design workflow in commercial development

Functional verification using MATLAB and Simulink in a SystemVerilog workflow illustrated by a detailed example

Time	Label	Presentation Title Authors
15:00	12.8.1	TUTORIAL TOPIC 1: "A HARDWARE / SOFTWARE CO-DESIGN APPROACH FOR MPSOC" Speaker: John Zhao, MathWorks, US
15:30	12.8.2	SESSION BREAK
16:00	12.8.3	TUTORIAL TOPIC 2: "PROTOTYPING MATLAB AND SIMULINK DESIGN ON FPGA" Speaker: John Zhao, MathWorks, US
16:30	12.8.4	TUTORIAL TOPIC 3: "CONNECTING SIMULINK WITH SYSTEMVERILOG FOR FUNCTIONAL VERIFICATION" Speaker: Giorgia Zucchelli, MathWorks, NL
17:30		End of session

IP5 Interactive Presentations

Date: Thursday 12 March 2015
Time: 15:30 - 16:00
Location / Room: Exhibition Area

Label	Presentation Title Authors
IP5-1	TOWARDS SYSTEMATIC DESIGN OF 3D PNML LAYOUTS Speakers: Robert Perricone¹, Yining Zhu², Katherine Sanders¹, X. Sharon Hu¹ and Michael Niemier¹ ¹University of Notre Dame, US; ²Zhejiang University, CN Abstract Nanomagnetic logic (NML) is a ``beyond-CMOS'' technology that uses bistable magnets to store, process, and move binary information. Compared to CMOS, NML has several advantages such as non-volatility, lower power consumption, and radiation hardness. Recently, NML devices with perpendicular magnetic anisotropy (pNML) have been experimentally demonstrated to perform logic operations in three dimensions. 3D pNML layouts provide additional benefits such as simplified signal routing and greater integration density. However, designing functional 3D pNML circuits can be challenging as one must consider the effects of fringing magnetic fields in three dimensions. Furthermore, the current process of designing 3D pNML layouts is little more than a trial-and-error-based approach, which is infeasible for larger, more complex designs. In this paper, we propose a systematic approach to designing 3D pNML layouts. Our design process leverages a machine learning-inspired prediction approach that examines the effects of varying individual device parameters (e.g., length, width, etc.) and predicts functional configurations. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-2	DESTINY: A TOOL FOR MODELING EMERGING 3D NVM AND EDRAM CACHES Speakers: Matt Poremba¹, Sparsh Mittal², Dong Li², Jeffrey Vetter³ and Yuan Xie⁴ ¹Pennsylvania State University, US; ²Oak Ridge National Lab, US; ³Oak Ridge National Lab and Georgia Institute of Technology, US; ⁴University of California, Santa Barbara, US Abstract The continuous drive for performance has pushed the researchers to explore novel memory technologies (e.g. non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) in the design of caches. However, a comprehensive tool which models both conventional and emerging memory technologies for both 2D and 3D designs has been lacking. We present DESTINY, a microarchitecture-level tool for modeling 3D (and 2D) cache designs using SRAM, embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM) and phase change RAM (PCM). DESTINY facilitates design-space exploration across several dimensions, such as optimizing for a target (e.g. latency or area) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a desired optimization target etc. DESTINY has been validated against industrial cache prototypes. We believe that DESTINY will drive architecture and system-level studies and will be useful for researchers and designers. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-3	BIG-DATA STREAMING APPLICATIONS SCHEDULING WITH ONLINE LEARNING AND CONCEPT DRIFT DETECTION Speakers: Karim Kanoun¹ and Mihaela van der Schaar² ¹École Polytechnique Fédérale de Lausanne (EPFL), CH; ²University of California, Los Angeles, US Abstract Several techniques have been proposed to adapt Big-Data streaming applications to resource constraints. These techniques are mostly implemented at the application layer and make simplistic assumptions about the system resources and they are often agnostic to the system capabilities. Moreover, they often assume that the data streams characteristics and their processing needs are stationary, which is not true in practice. In fact, data streams are highly dynamic and may also experience concept drift, thereby requiring continuous online adaptation of the throughput and quality to each processing task. Hence, existing solutions for Big-Data streaming applications are often too conservative or too aggressive. To address these limitations, we propose an online energy-efficient scheduler which maximizes the QoS (i.e., throughput and output quality) of Big-Data streaming applications under energy and resources constraints. Our scheduler uses online adaptive reinforcement learning techniques and requires no offline information. Moreover, our scheduler is able to detect concept drifts and to smoothly adapt the scheduling strategy. Our experiments realized on a chain of tasks modeling real-life streaming application demonstrate that our scheduler is able to learn the scheduling policy and to adapt it such that it maximizes the targeted QoS given energy constraint as the Big-Data characteristics are dynamically changing. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-4	(Best Paper Award Candidate) DESIGN FLOW AND RUN-TIME MANAGEMENT FOR COMPRESSED FPGA CONFIGURATIONS Speakers: Christophe Huriaux¹, Antoine Courtay¹ and Olivier Sentieys² ¹University of Rennes 1 - IRISA, FR; ²INRIA, FR Abstract The aim of partially and dynamically reconfigurable hardware is to provide an increased flexibility through the load of multiple applications on the same reconfigurable fabric at the same time. However, a configuration bit-stream loaded at runtime should be created offline for each task of the application. Moreover, modern applications use a lot of specialized hardware blocks to perform complex operations, which tends to cancel the "single bit-stream for a single application" paradigm, as the logic content for different locations of the reconfigurable fabric may be different. In this paper we propose a design flow for generating compressed configuration bit-streams abstracted from their final position on the logic fabric. Those configurations will then be decoded and finalized in real-time and at run-time by a dedicated reconfiguration controller to be placed at a given physical location. Our experiments show that densely routed applications gain the most with a compression factor of more than 2× using the finest cluster size, but coarser coding can be implemented to achieve a compression factor up to 10×. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-5	EMPIRICAL MODELLING OF FDSOI CMOS INVERTER FOR SIGNAL/POWER INTEGRITY SIMULATION Speakers: Wael Dghais and Jonathan Rodriguez, Instituto de Telecomunicações, PT Abstract This paper presents a multiport empirical model based on artificial neural network for I/O memory interface (e.g. inverter) designed based on fully depleted silicon on isolator (FDSOI) CMOS 28 nm process for signal and power integrity assessments. The analog mixed-signal identification signals that carry the information about the I/O interface's nonlinear dynamic behavior are recorded from large signal simulation setup. The model's functions are extracted based on a nonlinear optimization algorithm and then implemented in Simulink software. The performance of the resulted model is validated in typical power and ground switching noise scenario. The developed empirical model accurately predicts the timing signal waveforms at the power, ground, and at the output port. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-6	ON-CHIP MEASUREMENT OF BANDGAP REFERENCE VOLTAGE USING A SMALL FORM FACTOR VCO BASED ZOOM-IN ADC Speakers: Osman Erol¹, Sule Ozev¹, Chandra K. H. Suresh², Rubin Parekhji³ and Lakshmanan Balasubramanian³ ¹ASU, US; ²NYU-Abu Dhabi, AE; ³TI, IN Abstract A robust and highly scalable technique for measuring the output voltage of a band-gap reference (BGR) circuit is described. The proposed technique is based on an ADC architecture that uses a voltage controlled oscillator (VCO) for voltage to frequency conversion. During production testing, an external voltage reference is used to approximate the voltage/frequency characteristics of the VCO with 5ms test time. The proposed zoom-in ADC approach is manufactured with 0.5um single well CMOS process. Measurement results indicate that 13 bits of resolution within the measurement range can be achieved with the zoom-in approach. Worst-case INL for the ADC is less than 0.25LSB (50V). Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-7	LOGICAL EQUIVALENCE CHECKING OF ASYNCHRONOUS CIRCUITS USING COMMERCIAL TOOLS Speakers: Arash Saifhashemi¹, Hsin-Ho Huang², Priyanka Bhalerao³ and Peter Beerel² ¹Intel, US; ²University of Southern California, US; ³yahoo, US Abstract We propose a method for logical equivalence check (LEC) of asynchronous circuits using commercial synchronous tools. In particular, we verify the equivalence of asynchronous circuits which are modeled at the CSP-level in SystemVerilog as well as circuits modeled at the micro-architectural level using conditional communication library primitives. Our approach is based on a novel three-valued logic model that abstracts the detailed handshaking protocol and is thus agnostic to different gate-level implementations, making it applicable to a variety of different design styles. Our experimental results with commercial LEC tools on a variety of computational blocks and an asynchronous microprocessor demonstrate the applicability and limitations of the proposed approach. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-8	MAY-HAPPEN-IN-PARALLEL ANALYSIS OF ELECTRONIC SYSTEM LEVEL MODELS USING UPPAAL MODEL CHECKING Speakers: Che-Wei Chang and Rainer Doemer, University of California Irvine, US Abstract In this paper, we propose an approach for May- Happen-in-Parallel (MHP) analysis of electronic system level (ESL) design which models parallel discrete event simulation with concurrent automaton processes and formally identify those MHP states. Our MHP analysis utilizes formal verification by use of the UPPAAL model checker. The proposed approach converts the system model in SpecC SLDL into an UPPAAL model and generates a set of queries that automatically and completely finds all possible MHP pairs. The experimental results show our approach can report more precise MHP analysis results compared to other works at the cost of extended analysis run time. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-9	VERIFYING SYNCHRONOUS REACTIVE SYSTEMS USING LAZY ABSTRACTION Speakers: Kumar Madhukar¹, Mandayam Srivas², Bjorn Wachter³, Daniel Kroening³ and Ravindra Metta¹ ¹Tata Research Development and Design Center, IN; ²Chennai Mathematical Institute, IN; ³University of Oxford, GB Abstract Embedded software systems are frequently modeled as a set of synchronous reactive processes. The transitions performed by the processes are given as sequential, atomic code blocks. Most existing verifiers flatten such programs into a global transition system, to be able to apply off-the-shelf verification methods. However, this monolithic approach fails to exploit the lock-step execution of the processes, severely limiting scalability. We present a novel formal verification technique that analyses synchronous concurrency explicitly rather than encoding it. We present a variant of Lazy Abstraction with Interpolants (LAWI), a technique successfully used in software verification, and tailor it to synchronous reactive concurrency. We exploit the synchronous communication structure by fixing an execution schedule, circumventing the exponential blow-up of state space caused by simulating synchronous behaviour by means of interleavings. The technique is implemented in SYMPARA, a verification tool for synchronous reactive systems. To evaluate the effectiveness of our technique, we compare SYMPARA with Bounded Model Checking and k-induction, and a LAWI-based verifier for multi-threaded (asynchronous) software. On several realistic examples SYMPARA outperforms the other tools by an order of magnitude. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-10	SPINTASTIC: SPIN-BASED STOCHASTIC LOGIC FOR ENERGY-EFFICIENT COMPUTING Speakers: Rangharajan Venkatesan¹, Swagath Venkataramani², Xuanyao Fong², Kaushik Roy² and Anand Raghunathan² ¹NVIDIA Corporation, US; ²Purdue University, US Abstract Spintronics is one of the leading technologies under consideration for the post-CMOS era. While spintronic memories have demonstrated great promise due to their density, non-volatility and low leakage, efforts to realize spintronic logic have been much less fruitful. Recent studies project the performance and energy efficiency of spintronic logic to be considerably inferior to CMOS. In this work, we explore Stochastic Computing (SC) as a new direction for the realization of energy-efficient logic using spintronic devices. We establish the synergy between stochastic computing and spintronics by demonstrating that (i) the peripheral circuits required for SC to convert to/from stochastic domains, which incur significant energy overheads in CMOS, can be efficiently realized by exploiting the characteristics of spintronic devices, and (ii) the low logic complexity and fine-grained parallelism in SC circuits can be leveraged to alleviate the shortcomings of spintronic logic. We propose SPINTASTIC, a new design approach in which all the components of stochastic circuits — stochastic number generators, stochastic arithmetic units, and stochastic-to-binary converters — are realized using spintronic devices. Our experiments on a range of benchmarks from different application domains demonstrate that SPINTASTIC achieves 2.8X improvement in energy over CMOS stochastic implementations and 1.9X over a CMOS binary baseline. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-11	LEAKAGE POWER REDUCTION FOR DEEPLY-SCALED FINFET CIRCUITS OPERATING IN MULTIPLE VOLTAGE REGIMES USING FINE-GRAINED GATE-LENGTH BIASING TECHNIQUE Speakers: Ji Li, Qing Xie, Yanzhi Wang, Shahin Nazarian and Massoud Pedram, University of Southern California, US Abstract With the aggressive downscaling of the process technologies and importance of battery-powered systems, reducing leakage power consumption has become one of the most crucial design challenges for IC designers. This paper presents a device-circuit cross-layer framework to utilize fine-grained gate-length biased FinFETs for circuit leakage power reduction in the near- and super-threshold operation regimes. The impacts of Gate-Length Biasing (GLB) on circuit speed and leakage power are first studied using one of the most advanced technology nodes - a 7nm FinFET technology. Then multiple standard cell libraries using different leakage reduction techniques, such as GLB and Dual-VT, are built in multiple operating regimes at this technology node. It is demonstrated that, compared to Dual-VT, GLB is a more suitable technique for the advanced 7nm FinFET technology due to its capability of delivering a finer-grained trade-off between the leakage power and circuit speed, not to mention the lower manufacturing cost. The circuit synthesis results of a variety of ISCAS benchmark circuits using the presented GLB 7nm FinFET cell libraries show up to 70% leakage improvement with zero degradation in circuit speed in the near- and super-threshold regimes, respectively, compared to the standard 7nm FinFET cell library. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-12	SUBHUNTER: A HIGH-PERFORMANCE AND SCALABLE SUB-CIRCUIT RECOGNITION METHOD WITH PRüFER-ENCODING Speakers: Hong-Yan Su, Chih-Hao Hsu and Yih-Lang Li, National Chiao Tung University, TW Abstract Sub-circuit recognition (SR) is a problem of recognizing sub-circuits within a given circuit and is a fundamental component in simulation, verification and testing of computer-aided design. The SR problem can be formulated as subgraph isomorphism problem. Performance of previous works is not scalable as the complexities of modern designs increase. In this paper we propose a novel Prüfer-encoding based SR algorithm that performs scalable and high-performance sub-circuit matching. Several techniques including tree structure partition, tree cutting and circuit graph encoding are proposed herein to decompose the SR problem into several small sub-sequence matching problems. A pre-filtering strategy is applied before matching to remove the sub-circuits that are not likely to be matched. A fast branch and bound approach is developed to identify all the sub-circuits within the given circuit. Experimental results show that SubHunter can achieve better performance than SubGemini and detect all the sub-circuits as well. As the circuit size increases, we can also achieve near linear runtime growth that outperforms the exponential growth for SubGemini, showing the scalability of the proposed algorithm. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-13	TIMING VERIFICATION FOR ADAPTIVE INTEGRATED CIRCUITS Speakers: Rohit Kumar¹, Bing Li², Yiren Shen¹, Ulf Schlichtmann² and Jiang Hu¹ ¹Texas A&M University, College Station, US; ²Technische Universität München, DE Abstract An adaptive circuit can perform built-in self-detection of timing variations and accordingly adjust itself to avoid timing violations. Compared with conventional over-design approach, adaptive circuit design is conceptually advantageous in terms of power-efficiency. Although the advantage has been witnessed in numerous previous works including test chips, adaptive design is far from being widely used in practice. A key reason is the lack of corresponding timing verification support. We develop new timing analysis techniques to fill this void. A main challenge is the large runtime complexity due to numerous adaptivity configurations. We propose several pruning and reduction techniques and apply them in conjunction with statistical static timing analysis (SSTA). The proposed method is validated on benchmark circuits including the recent ISPD'13 suite, which has circuit as large as 150K gates. The results show that our method can achieve orders of magnitude speedup over Monte Carlo simulation with about the same accuracy. It is also several times faster than an exhaustive application of SSTA. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-14	A ROBUST APPROACH FOR PROCESS VARIATION AWARE MASK OPTIMIZATION Speakers: Jian Kuang, Wing-Kai Chow and Evangeline Young, The Chinese University of Hong Kong, HK Abstract As the minimum feature size continues to shrink, whereas the wavelength of light used for lithography remains constant, Resolution Enhancement Techniques are widely used to optimize mask, so as to improve the subwavelength printability. Besides correcting for error between the printed image and target shape, a mask optimization method also needs to consider process variation. In this paper, a robust mask optimization approach is proposed to optimize the process window as well as the Edge Placement Error (EPE) of the printed image. Experiments results on the public benchmarks are encouraging. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-15	FASTTREE: A HARDWARE KD-TREE CONSTRUCTION ACCELERATION ENGINE FOR REAL-TIME RAY TRACING Speakers: Xingyu Liu, Yangdong Deng, Yufei Ni and Zonghui Li, Institute of Microelectronics, Tsinghua University, CN Abstract The ray tracing algorithm is well-known for its ability to generate photo-realistic rendering effects. Recent years have witnessed a renewed momentum in pushing it to real-time for better user experience. Today the construction of acceleration structures, e.g., kd-tree, has become the bottleneck of ray tracing. A dedicated hardware architecture, FastTree, was proposed for kd-tree construction by adopting a fully parallel construction algorithm. FastTree was validated by an FPGA prototype and evaluated as an ASIC implementation. Experiment result shows FastTree outperforms existing hardware construction engines by a factor of nearly 4X at a similar area and power budget. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-16	REVERSE LONGSTAFF-SCHWARTZ AMERICAN OPTION PRICING ON HYBRID CPU/FPGA SYSTEMS Speakers: Christian Brugger, Javier Alejandro Varela, Norbert Wehn, Songyin Tang and Ralf Korn, University of Kaiserslautern, DE Abstract In today's markets, high-speed and energy-efficient computations are mandatory in the financial and insurance industry. At the same time, the gradual convergence of high-performance computing with embedded systems is having a huge impact on the design methodologies, where dedicated accelerators are implemented to increase performance and energy efficiency. This paper follows this trend and presents a novel way to price high-dimensional American options using techniques of the embedded community. The proposed architecture targets heterogeneous CPU/FPGA systems, and it exploits the FPGA reconfiguration to deliver high-throughput. With a bit-true algorithmic transformation based on recomputation, it is possible to eliminate the memory bottleneck and access costs. The result is a pricing system that is 16x faster and 268x more energy-efficient than an optimized Intel CPU implementation. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-17	ACCURATE ELECTROTHERMAL MODELING OF THERMOELECTRIC GENERATORS Speakers: Mohammad Javad Dousti¹, Antonio Petraglia² and Massoud Pedram¹ ¹University of Southern California, US; ²Federal University of Rio de Janeiro, BR Abstract Thermoelectric generators (TEGs) provide a unique way for harvesting thermal energy. These devices are compact, durable, inexpensive, and scalable. Unfortunately, the conversion efficiency of TEGs is low. This requires careful design of energy harvesting systems including the interface circuitry between the TEG module and the load, with the purpose of minimizing power losses. In this paper, it is analytically shown that the traditional approach for estimating the internal resistance of TEGs may result in a significant loss of harvested power. This drawback comes from ignoring the dependence of the electrical behavior of TEGs on their thermal behavior. Accordingly, a systematic method for accurately determining the TEG input resistance is presented. Next, through a case study on automotive TEGs, it is shown that compared to prior art, more than 11% of power losses in the interface circuitry that lies between the TEG and the electrical load can be saved by the proposed modeling technique. In addition, it is demonstrated that the traditional approach would have resulted in a deviation from the target regulated voltage by as much as 59%. Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-18	EFFICIENCY-DRIVEN DESIGN TIME OPTIMIZATION OF A HYBRID ENERGY STORAGE SYSTEM WITH NETWORKED CHARGE TRANSFER INTERCONNECT Speakers: Qing Xie¹, Younghyun Kim², Donkyu Baek³, Yanzhi Wang¹, Massoud Pedram¹ and Naehyuck Chang⁴ ¹University of Southern California, US; ²Purdue University, US; ³Korea Advanced Institute of Science and Technology, KR; ⁴Seoul National University, KR Abstract This paper targets at the state-of-art hybrid energy storage systems (HESSs) with a networked charge transfer interconnect and solves a node placement problem in the HESS, where a node refers to a storage bank, a power source, or a load device, with its distributed power converter. In particular, the node placement problem is formulated as how to place the nodes in a HESS such that the optimal total charge transfer efficiency is achieved, with accurate modelings of all kinds of different components in the HESS. The methodology of FPGA placement problem is adopted to solve the node placement in HESS by properly defining a cost function that strongly relates the charge transfer efficiency to the node placement, properties of HESS components, as well as applications of the HESS. An algorithm that combines a quadratic programming method to generate an initial placement and a simulated annealing method to converge to the optimal placement result is presented in this paper. Experimental results demonstrate the efficacy of the placement algorithm and improvements in the charge transfer efficiency for various problem setups and scales. Download Paper (PDF; Only available from the DATE venue WiFi)

12.1 SPECIAL DAY Hot Topic: Technology and Design Platforms for Diagnostics

Date: Thursday 12 March 2015
Time: 16:00 - 17:30
Location / Room: Salle Oisans

Organiser:
Jo De Boeck, IMEC, BE

Chair:
Chris Van Hoof, IMEC, BE

Co-Chair:
Minkyu Je, Daegu Gyeongbuk Institute of Science and Technology (DGIST), KR

Key to an efficient and effective treatment is early, fast and precise diagnose. This session showcases some of the recent advances and future potential of technologies that help enable the above mentioned requirements for patient centric care.

Time	Label	Presentation Title Authors
16:00	12.1.1	ULTRAFLEXIBLE INTEGRATED CIRCUITS FOR IMPERCEPTIBLE BIO-SENSORS Speaker: Teppei Araki, University of Tokyo, JP Abstract Flexible formfactor is extremely important in medical applications. This presentation demonstrates a 1 micron thick ultraflexible integration platform using thin film transistor technology, and enabling other device integration like OLED and regular diodes and detectors for medical applications like EEG, EMG. Demonstration of this technology in real medical and wearable applications will be given.
16:30	12.1.2	NANOELECTRONICS FOR DISRUPTIVE DIAGNOSTIC PLATFORMS Speaker: Liesbet Lagae, IMEC, BE Abstract Silicon nano-electronics and integrated nano-photonics technology provides an advanced toolbox for disruptive components and systems that will change the way we do diagnostics and therapy outcome monitoring. This enormous potential will be demonstrated by some of the recent developments.
17:00	12.1.3	(Best Paper Award Candidate) AN ULTRA-LOW POWER DUAL-MODE ECG MONITOR FOR HEALTHCARE AND WELLNESS Speaker: Daniele Bortolotti, Università di Bologna, IT Authors: Daniele Bortolotti¹, Mauro Mangia¹, Andrea Bartolini², Riccardo Rovatti¹, Gianluca Setti³ and Luca Benini⁴ ¹Università di Bologna, IT; ²Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ³University of Ferrara, IT; ⁴Università di Bologna / Swiss Federal Institute of Technology in Zurich (ETHZ), CH Abstract Technology scaling enables today the design of ultra-low cost wireless body sensor networks for wearable biomedical monitors. These devices, according to the application domain, show greatly varying tradeoffs in terms of energy consumption, resources utilization and reconstructed biosignal quality. To achieve minimal energy operation and extend battery life, several aspects must be considered, ranging from signal processing to the technological layers of the architecture. The recently proposed Rakeness-based Compressed Sensing (CS) expands the standard CS paradigm deploying the localization of input signal energy to further increase data compression without sensible RSNR degradation. This improvement can be used either to optimize the usage of a non volatile memory (NVM) to store in the device a record of the biosignal or to minimize the energy consumption for the transmission of the entire signal as well as some of its features. We specialize the sensing stage to achieve signal qualities suitable for both Healthcare (HC) and Wellness (WN), according to an external input (e.g. the patient). In this paper we envision a dual-operation wearable ECG monitor, considering a multi-core DSP for input biosignal compression and different technologies for either transmission or local storage. The experimental results show the effectiveness of the Rakeness approach (up to ≈ 70%) more energy efficient than the baseline) and evaluate the energy gains considering different use case scenarios. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

12.2 Solver Advances and Emerging Applications

Date: Thursday 12 March 2015
Time: 16:00 - 17:30
Location / Room: Belle Etoile

Chair:
Julien Schmaltz, Eindhoven University of Technology, NL

Co-Chair:
Gianpiero Cabodi, Politecnico di Torino, IT

The first three papers of this session present strong advances to the scalability of Boolean and arithmetic solvers.

Time	Label	Presentation Title Authors
16:00	12.2.1	SOLVING DQBF THROUGH QUANTIFIER ELIMINATION Speakers: Karina Gitina, Ralf Wimmer, Sven Reimer, Matthias Sauer, Christoph Scholl and Bernd Becker, University of Freiburg, DE Abstract We show how to solve dependency quantified Boolean formulas (DQBF) using a quantifier elimination strategy which yields an equivalent QBF that can be decided using any standard QBF solver. The elimination is accompanied by a number of optimizations which help reduce memory consumption and computation time. We apply our solver HQS to problems from the domain of verification of incomplete combinational circuits to demonstrate the effectiveness of the proposed algorithm. The results show enormous improvements both in the number of solved instances and in the computation times compared to existing work on validating DQBF. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.2.2	FORMAL VERIFICATION OF SEQUENTIAL GALOIS FIELD ARITHMETIC CIRCUITS USING ALGEBRAIC GEOMETRY Speakers: Xiaojun Sun¹, Priyank Kalla², Tim Pruss¹ and Florian Enescu³ ¹University of Utah, US; ²University of utah, US; ³Georgia State University, US Abstract Sequential Galois field (F_{2^k}) arithmetic circuits take k-bit inputs and produce a k-bit result, after k-clock cycles of operation. Formal verification of such sequential arithmetic circuits with large datapath size is beyond the capabilities of contemporary verification techniques. To address this problem, this paper describes a verification method based on algebraic geometry that: i) implicitly unrolls the sequential arithmetic circuit over multiple (k) clock-cycles; and ii) represents the function computed by the state-registers of the circuit, canonically, as a multi-variate word-level polynomial over F_{2^k}. Our approach employs the Groebner basis theory over a specific elimination ideal. Moreover, an efficient implementation is described to identify the k-cycle computation performed by the circuit at word-level. We can verify up to 100-bit sequential Galois field multipliers, whereas conventional techniques fail beyond 23-bit circuits. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.2.3	A UNIVERSAL MACRO BLOCK MAPPING SCHEME FOR ARITHMETIC CIRCUITS Speakers: Xing Wei, Yi Diao, Tak-Kei Lam and Yu-Liang Wu, The Chinese University of Hong Kong, HK Abstract A macro block is a functional unit that can be re-used in circuit designs. The problem of general macro block mapping is to identify such embedded parts, whose I/O signals are unknown, from the netlist that may have been optimized in various ways. The mapping results can then be used to ease the functional verification process or for replacement by more advanced intellectual property (IP) macros. In the past literatures, the mapping problem is mostly limited to the identification of a single adder or multiplier with I/O signals given, which is already NP-hard. However, in today's typical arithmetic circuits (like digital signal processing (DSP) applications), it is not unusual to have combinations of arithmetic operators implemented as macro blocks for performance gain. To solve this new practical mapping problem, we propose a flow to identify and build a forest of one-bit-adder trees using structural information and formal verification techniques, followed by algorithms that locate macro boundaries and I/O signal orders. Experimental results show that our algorithm is highly practical and scalable. It is capable of identifying any combinations of arbitrary adders and multipliers such as (a + b) × c and a × b + c × d + e × f , where each operand is a multi-bit constant or variable. Most of the benchmarks in ICCAD 2013 CAD Contest [1] can be well handled by our algorithm. Download Paper (PDF; Only available from the DATE venue WiFi)
17:15	12.2.4	TOWARDS AN ACCURATE RELIABILITY, AVAILABILITY AND MAINTAINABILITY ANALYSIS APPROACH FOR SATELLITE SYSTEMS BASED ON PROBABILISTIC MODEL CHECKING Speakers: Khaza Anuarul Hoque¹, Otmane Ait Mohamed¹ and Yvon Savaria² ¹Concordia University, CA; ²Polytechnique Montreal, CA Abstract From navigation to telecommunication, and from weather forecasting to military, or entertainment services - satellites play a major role in our daily lives. Satellites in the Medium Earth Orbit (MEO) and geostationary orbit have a life span of 10 years or more. Reliability, Availability and Maintainability (RAM) analysis of a satellite system is a crucial part at their design phase to ensure the highest availability and optimized reliability. This paper shows the formal modeling and verification of RAM related properties of a satellite system. In a previously reported approach, time between possible failures and time between repairs are assumed to follow an exponential distribution, which does not represent a realistic scenario. In contrast, in our work, discrete time delays in the classical Continuous Time Markov Chain (CTMC) are approximated using the Erlang distribution. This is done by approximating nonexponential holding time with several intermediate states based on a phase type distribution. The RAM properties are then verified using the PRISM model checker.We present and compare modeling results with those obtained with a previously reported approach that demonstrate an improved modeling accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

12.3 Patterning, Pairing, Placement and Packing

Date: Thursday 12 March 2015
Time: 16:00 - 17:30
Location / Room: Stendhal

Chair:
Dirk Stroobandt, Ghent University, BE

Co-Chair:
Patrick Groeneveld, Synopsys, US

Place-and-route remain at the core of physical design, but must address a variety of important objectives, constraints and concerns. They can be added by standard-cell design to improve routing congestion while keeping area small.

Time	Label	Presentation Title Authors
16:00	12.3.1	AN EFFECTIVE TRIPLE PATTERNING AWARE GRID-BASED DETAILED ROUTING APPROACH Speakers: Zhiqing Liu, Chuangwen Liu and Evangeline Young, The Chinese University of Hong Kong, HK Abstract Triple patterning lithography (TPL) is attracting more and more attention due to further scaling of the critical feature size. How fully the benefits of TPL can be utilized depends very much on both the decomposition and layout steps. However, it is non-trivial to perform detailed routing and layout decomposition simultaneously on a large-scale complicated circuit to achieve decomposability on one hand, and short wirelength, small number of stitches and small number of vias at the same time. Instead, in our approach, routing and coloring are done iteratively but integrated closely to reduce the problem complexity. The routing step is able to detect and avoid native conflicts as much as possible. If any conflicts occur in the coloring step, the router will rip-up and re-route to get rid of them. This technique proves to be effective and efficient in improving the quality of the coloring assignment. Compared with previous works~cite{dac12-ma} on TPL using the simultaneous routing and coloring method, the number of stitches and the number of vias are reduced by 76.8% and 2.1% respectively while our running time is 36.6% less and the wirelength is very comparable. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.3.2	SIMULTANEOUS TRANSISTOR PAIRING AND PLACEMENT FOR CMOS STANDARD CELLS Speakers: Ang Lu, Hsueh-Ju Lu, En-Jang Jang, Yu-Po Lin, Chun-Hsiang Hung, Chun-Chih Chuang and Rung-Bin Lin, Yuan Ze University, TW Abstract This paper presents an integer linear programming approach to transistor placement problem for CMOS standard cells with objectives of minimizing cell width, wiring density, wiring length, diffusion contour roughness, and misalignments of common ploy gates. Our approach considers transistor pairing and transistor placement simultaneously. It can achieve a smaller number of transistor chains than the well-known bipartite approach. About 31% of the 185 cells created by it have smaller widths and no cells whose widths are larger than their handcrafted counterparts. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.3.3	A TSV NOISE-AWARE 3-D PLACER Speakers: Yu-min Lee, Chun Chen, Jia-xing Song and Kuan-te Pan, National Chiao Tung University, TW Abstract In this work, a three-dimensional partitioning-based force-directed placer is developed to minimize coupling noise between through silicon vias (TSVs) in three-dimensional integrated circuits. TSV decoupling force is introduced and determined by the TSV coupling noise to separate TSVs with strong coupling noise. The experimental results indicate that TSV coupling noise can be effectively reduced by 36.3% on average with only 6.0% wirelength overhead. Besides, the developed 3-D placer shows great performance in wirelength that is competitive to a state-of-the-art 3-D placer. Download Paper (PDF; Only available from the DATE venue WiFi)
17:15	12.3.4	IDENTIFYING REDUNDANT INTER-CELL MARGINS AND ITS APPLICATION TO REDUCING ROUTING CONGESTION Speakers: Woohyun Chung, Seongbo Shim and Youngsoo Shin, Korea Advanced Institute of Science and Technology, KR Abstract A modern standard cell is embedded with extra space, called inter-cell margin, on its left and right ends. Margins are sometimes redundant, and so margins between some cell pairs can be removed for the benefit of area. Lithography simulations on whole layout to identify redundant margins take excessive amount of time, and thus are impractical. We propose to determine in advance the redundancy of margins between each cell pair; a few methods of approximation are introduced to accelerate the process, e.g. grouping cell pairs of similar boundary patterns, refining each group with geometry parameters, etc. Experiments indicate that the redundancy of margin is accurately determined in 93.7% of cell pairs; the remaining 6.3%, which are actually redundant, are declared irredundant by our method, so our method is inaccurate for those cell pairs yet is still safe. We take advantage of redundant margins and address the problem of routing congestion reduction. Placement is locally perturbed to identify more redundant margins; the cells in high congestion region are spread out after the margins in low congestion area are removed. The proposed method was evaluated on a few test circuits using 28-nm technology. The number of routing grids with congestion overflow was reduced by 43% with no impact on total wirelength. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

12.4 High-Level Specifications and Models

Date: Thursday 12 March 2015
Time: 16:00 - 17:30
Location / Room: Chartreuse

Chair:
Marc Geilen, Eindhoven University of Technology, NL

Co-Chair:
Laurence Pierre, TIMA Lab, FR

This session presents different aspects of high-level specifications and models. The first paper introduces a new model of computation, fixed-priority process networks to address the need of determinism for multiprocessor applications. The second paper proposes an approach to detect and resolve potential synchronization problems between a discrete event simulation and timed dataflow simulation. The third paper presents a framework for checking the logical consistency of requirements in specifications written in a subset of natural language.

Time	Label	Presentation Title Authors
16:00	12.4.1	MODELS FOR DETERMINISTIC EXECUTION OF REAL-TIME MULTIPROCESSOR APPLICATIONS Speakers: Peter Poplavko¹, Dario Socci², Paraskevas Bourgos³, Marius Bozga³ and Saddek Bensalem³ ¹Universite Joseph Fourier / Verimag, FR; ²Verimag, ; ³Verimag, FR Abstract With the proliferation of multi-cores in embedded real-time systems, many industrial applications are being (re-)targeted to multiprocessor platforms. However, exactly reproducible data values at the outputs as function of the data and timing of the inputs is less trivial to realize in multiprocessors, while it can be imperative for various practical reasons. Also for parallel platforms it is harder to evaluate the task utilization and ensure schedulability, especially for end-to-end communication timing constraints and aperiodic events. Based upon reactive system extensions of Kahn process networks, we propose a model of computation that employs synchronous events and event priority relations to ensure deterministic execution. For this model, we propose an online scheduling policy and establish a link to a well-developed scheduling theory. We also implement this model in publicly available prototype tools and evaluate them on state-of-the art multi-core hardware, with a streaming benchmark and an avionics case study. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.4.2	PRE-SIMULATION SYMBOLIC ANALYSIS OF SYNCHRONIZATION ISSUES BETWEEN DISCRETE EVENT AND TIMED DATA FLOW MODELS OF COMPUTATION Speakers: Liliana Andrade¹, Torsten Maehne¹, Alain Vachoux², Cédric Ben Aoun¹, François Pecheux¹ and Marie-Minerve Louerat³ ¹Pierre et Marie Curie University, LIP6, FR; ²École Polytechnique Fédérale de Lausanne (EPFL), CH; ³University Pierre et Marie Curie, FR Abstract The SystemC AMS extensions support heterogeneous modeling and make use of several Models of Computation (MoCs) that operate on different time scales in the Discrete Event (DE), Discrete Time (DT), and Continuous Time (CT) domains. The simulation of such heterogeneous models may raise synchronization problems that are hard to diagnose and to fix, especially when considering multi-rate data flow parts. In this paper, we show how to formally analyze the execution of Timed Data Flow (TDF) models including their interaction with the DE domain by converting the synchronization mechanics into a Coloured Petri Net (CPN) equivalent. The developed symbolic execution algorithm for the CPN allows to detect all DE-TDF synchronization issues before simulation and to propose appropriate sample delay settings for the TDF converter ports to make the system schedulable. The presented technique is validated with a case study including a vibration sensor model and its digital front end. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.4.3	FORMAL CONSISTENCY CHECKING OVER SPECIFICATIONS IN NATURAL LANGUAGES Speakers: Rongjie Yan¹, Chih-Hong Cheng² and Yesheng Chai³ ¹Institute of Software, Chinese Academy of Sciences, CN; ²ABB Corporate Research, DE; ³School of Computer Science & Technology, Soochow University, CN Abstract Early stages of system development involve outlining desired features such as functionality, availability, or usability. Specifications are derived from these features that concretize vague ideas presented in natural languages. The challenge for the verification and validation of specifications arises from the syntax and semantic gap between different representations and the need of automatic tools. In this paper, we present a requirement-consistency maintenance framework to produce consistent representations. The first part is the automatic translation from natural languages describing functionalities to formal logic with an abstraction of time. It extends pure syntactic parsing by adding semantic reasoning and the support of partitioning input and output variables. The second part is the use of synthesis techniques to examine if the requirements are consistent in terms of realizability. When the process fails, the formulas that cause the inconsistency are reported to locate the problem. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

12.5 New Perspectives in Next-Generation Medical Systems

Date: Thursday 12 March 2015
Time: 16:00 - 17:30
Location / Room: Meije

Chair:
Martin Rajman, École Polytechnique Fédérale de Lausanne (EPFL), CH

Co-Chair:
Giovanni De Micheli, École Polytechnique Fédérale de Lausanne (EPFL), CH

Missing Description

Time	Label	Presentation Title Authors
16:00	12.5.1	TACKLING THE BOTTLENECK OF DELAY TABLES IN 3D ULTRASOUND IMAGING Speakers: Aya Ibrahim¹, Pascal Hager², Andrea Bartolini³, Federico Angiolini¹, Marcel Arditi⁴, Luca Benini⁵ and Giovanni De Micheli¹ ¹École Polytechnique Fédérale de Lausanne (EPFL), CH; ²Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ³Università di Bologna, IT / ETH Zürich, CH; ⁴EPFL, CH; ⁵Università di Bologna / ETH Zürich, IT Abstract 3D ultrasound imaging is quickly becoming a reference technique for high-quality, accurate, expressive diagnostic medical imaging. Unfortunately, its computation requirements are huge and, today, demand expensive, power-hungry, bulky processing resources. A key bottleneck is the receive beamforming operation, which requires the application of many permutations of fine-grained delays among the digitized received echoes. To apply these delays in the digital domain, in principle large tables (billions of coefficients) are needed, and the access bandwidth to these tables can reach multiple TB/s, meaning that their storage both on-chip and off-chip is impractical. However, smarter implementations of the delay generation function, including forgoing the tables altogether, are possible. In this paper we explore efficient strategies to compute the delay function that controls the reconstruction of the image, and present a feasibility analysis for an FPGA platform. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.5.2	INTEGRATED CMOS RECEIVER FOR WEARABLE COIL ARRAYS IN MRI APPLICATIONS Speakers: Benjamin Sporrer¹, Luca Bettini², Christian Vogt², Andreas Mehmann², Jonas Reber³, Josip Marjanovic³, Thomas Burger², David Brunner³, Gerhard Tröster², Klaas P. Prüssmann³ and Qiuting Huang² ¹Integrated Systems Laboratory, Swiss Federal Institute of Technology (ETH), CH; ²Swiss Federal Institute of Technology in Zurich (ETHZ), CH; ³Swiss Federal Institute of Technology in Zurich (ETHZ) / University of Zurich (UZH), CH Abstract Surface coil arrays brought in proximity of the human body enhance the performance of an MRI measurement both in speed and signal-to-noise ratio. However, size and cabling of such arrays can deteriorate the performance of the imaging, or put at risk the safety of the patient. An integrated CMOS direct conversion receiver is proposed, to be placed directly onto the receive coil and enhance the usability. The integrated design needs to preserve the high performance (both in silent noise figure and dynamic range) of discrete solutions, which benefit from dedicated technologies for every receiver sub-block. To exploit the full potential of a coil array, the receiver on each module must also minimize the coupling to nearby modules. The PCB carrying the ASIC will be fabricated with flexible substrate materials to further enhance the wearability and comfort for the patient. Such a modular approach together with the transmission of data over optical fibers results in a lightweight system that allows us to achieve fast development times. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.5.3	TACTILE PROSTHETICS IN WISESKIN Speakers: John Farserotu¹, Jean-Dominique Decotignie¹, Vladimir Kopta¹, Daniel Camilo Rojas Quirós¹, Pierre-Nicolas Volpe¹, Jacek Baborowski¹, Christian Enz², Stéphanie Lacour², Hadrien Michaud², Roberto Martuzzi², Volker Koch³, Huaiqi Huang³, Tao Li³ and Christian Antfolk⁴ ¹CSEM, CH; ²École Polytechnique Fédérale de Lausanne (EPFL), CH; ³BFH, CH; ⁴Lundt University, SE Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

12.6 Medical Design Automation: Is All That Simulation and Model Reduction Getting Into Your "Head"?

Date: Thursday 12 March 2015
Time: 16:00 - 17:30
Location / Room: Bayard

Organisers:
Luis Miguel Silveira, INESC-ID, PT
Luca Daniel, MIT, US

Chair:
Luca Daniel, MIT, US

Co-Chair:
Luis Miguel Silveira, INESC-ID, PT

Tools and techniques originally developed by the Electronic Design Automation community for parasitic extraction, model reduction, or circuit simulation are having deep impact in alternative and exiting fields outside of the circuit world. In particular, this session shows several applications of such techniques to analyzing the functionality of the brain and of the nervous system, as well aiding the design of biomedical and medical instrumentation and diagnostics.

Time	Label	Presentation Title Authors
16:00	12.6.1	THE OLD, THE NEW, AND THE RECYCLED - EDA ALGORITHMS IN CONNECTOMIC Speaker: Lou Scheffer, Howard Hughes Medical Institute, US Abstract Connectomics seeks to extract detailed wiring diagrams of circuits of the nervous system. This makes it a combination of reverse engineering (as applied to chips), parasitic extraction, and model reduction. As biologists extract and work with the larger neural circuits that can now be extracted, they are running into many of the same problems that EDA faced long ago. This talk compares and contrasts connectomics with the equivalent processes for chips, notes the differences and similarities, and shows where algorithms developed for EDA can help connectomics.
16:30	12.6.2	COMPUTATIONAL MODELING AND SIMULATION OF SYNCHRONIZED FIRING BEHAVIORS OF THE BRAIN Speaker: Peng Li, Texas A&M, US Abstract Computational simulation is a critical enabler for understanding complex functions and neuronal dynamics of mammalian brains. However, several grant challenges, such as retaining biological realism in computer-based models, obtaining and managing a vast amount of biological data, and tackling high computational complexity, exist. Nevertheless, efficient computational techniques, capable of simulating large neural networks with biophysically accurate neuron models, are highly desirable. Such capability will fundamentally enable the test of hypotheses of neurological disorders and development of therapeutic treatments, as well as stimulate new engineering applications. In this talk, we will show how neuronal models of different complexities (behavioral oscillator models vs. Hodgkin-Huxley models) and global connectivity data may be leveraged to reason about the origins of oscillatory behaviors of the brain. The key focus of the talk will be placed on a large-scale biophysically detailed thalamocortical model and parallel numerical techniques that have been developed to efficiently handle widely spread time scales in the network. Our results suggest that computational techniques may shed light on the causes of absence seizures by associating abnormal brain level oscillation with several key cellular level mechanisms.
17:00	12.6.3	ELECTROMAGNETIC POWER DEPOSITION ANALYSIS TOOL FOR HIGH RESOLUTION MAGNETIC RESONANCE IMAGING BRAIN SCANS Speakers: Jorge F. Villena¹, Athanasios G. Polimeridis¹, Lawrence L. Wald², Elfar Adalsteinsson¹, Jakob K. White¹ and Luca Daniel¹ ¹Massachusetts Institute of Technology, US; ²Massachusetts General Hospital, Harvard Medical School, US Abstract MARIE (MAgnetic Resonance Integral Equation suite) is an open domain numerical software platform for fast electromagnetic (EM) analysis and design of Magnetic Resonance Imagine (MRI) scanners. The tool is based on a combination of surface and volume integral equation formulations. It exploits the characteristics of the different parts of an MRI system (coil array, shield and realistic body model), and it applies sophisticated numerical methods to rapidly perform all the required EM simulations to characterize the MRI design: computing the un-tuned coil port parameters; obtaining the current distribution for the tuned coils, and the corresponding electromagnetic field distribution in the inhomogeneous body for each transmit channel. The software runs on MATLAB and is able to solve a complex scattering problem in ~2-3 min. on a standard single GPU-accelerated windows desktop machine. On the same platform it can perform a frequency sweep of a complex coil in ~3-5 min. per frequency point. Furthermore, it can solve the complete inhomogeneous body and coil system in ~5-10 min. per port, depending on the model resolution and error tolerance required. The software could potentially be employed also on more advanced analyses, such as the generation of ultimate intrinsic Signal to Noise Ration (SNR) and Specific Absorption Rare (SAR) on realistic body models, fast coil design and optimization, and generation of patient specific protocols.
17:30		End of session

12.7 Brain Health and Mental Disorders: new challenges for electronic engineers

Date: Thursday 12 March 2015
Time: 16:00 - 17:30
Location / Room: Les Bans

Organiser:
Jo De Boeck, IMEC, BE

Chair:
Pablo Laguna, CIBER-BBN, ES

Co-Chair:
Josep Maria Haro, Parc Sanitari Sant Joan de Deu, ES

Taking well-known biomarkers of mental disorders together with some other indicators from physiological signals, a multiparametric marker can be elaborated from these constellations of individual data. This session will highlight the relevance of this kind of disorders, the proposed approach and where future opportunities lie for the broad DATE community.

Time	Label	Presentation Title Authors
16:00	12.7.1	TOWARDS A QUANTITATIVE MEASUREMENT OF MENTAL DISORDERS Speaker: Jordi Aguiló, CIBER-BBN, Centro Nacional de Microelectrónica, Universitat Autònoma de Barcelona, ES Abstract The population of Europe is aging at an unprecedented speed as the result of declining reproduction rates and increasing life expectancy. Today, chronic diseases are the main cause of illness in old age but mental disorders such as dementia and late-life depression play also a significant role as well as epilepsy and Alzheimer. Besides, because of the constant pressure the modern way of life imposes on the individuals, stress is also dramatically growing-up to point that the World Health Organization called it a World Wide Epidemic. In particular, the number of patients with dementia is expected to rise sharply as the prevalence of dementia doubles every 5.2 years exponentially between 65 and 85 years of age [3]. Late-life depression is also an important public health problem. Estimated 1-year prevalence rates for depression range from 3% to 10% [6]. For dementia, prevalence rates are estimated between 0.6% and 3.7% for 65 to 69 year olds and 25.2% to 75% for adults 90 years old or older. Additionally, multiple studies have shown high comorbidity between mental disorders and chronic physical illnesses in the elderly. Positive feedback has also been demonstrated between different diseases such as diabetes mellitus, cancer and cardiovascular disease and the occurrence of depression in the elderly. And vice-versa, in older populations there is also a positive feedback between depression and hypertension, diabetes and cardiovascular illnesses. Although the relationship between some biomarkers and mental or acute physical disorders has been known for a long time, due to the complexity of etiology of mental disorders none of them have been considered as gold standard reference. Recently, novel indicators such as the heart rate variability, respiratory abnormalities, neuropeptide Y (NPY) as well as inflammation biomarkers such as interleukin-6 or TNF-a has been proposed. In parallel, computing power has dramatically increased whilst technically and economically viable, non-invasive, reliable and efficient sensors are becoming usable. Taking advantage of these advances, multiparametric markers can be elaborated putting together these constellations of symptoms and indicators. These new multiparametric biomarkers will probably allow a quantitative assessment of the severity of mental disorders. In this session, we will review etiology, the most relevant symptoms, the recent results and a summary of the roadmap for mental health research in Europe; the new trends on using electrophysiological signals to evaluate psychophysiological states as well as the contribution of new nanomaterials in the setup of new Micro-Nano-Bio Systems for diagnosis of mental disorders.
16:15	12.7.2	IMPROVING THE MONITORING AND THE UNDERSTANDING OF MENTAL DISORDERS Speakers: Giovanni de Girolamo¹ and Josep Maria Haro² ¹IRCCS Fatebenefratelli, IT; ²Parc Sanitari Sant Joan de Deu, ES Abstract Classification and diagnosis of mental disorders is nowadays based on a descriptive taxonomy and is still lacking of biological markers. Devices and techniques coming from new technologies will allow real-time assessment of selected neurophysiological patterns will open the way to a new understanding of mental disorders, making possible new strategies on assessment of patients. In this presentation we will critically discuss these developments.
16:40	12.7.3	WORLD ANALYSIS OF NON-INVASIVE CARDIOVASCULAR SIGNALS FOR THE MONITORING OF PSYCHOPHYSIOLOGICAL STATES Speakers: Michele Orini¹ and Pablo Laguna² ¹Institute of Cardiovascular Science, University College London, GB; ²CIBER-BBN, Abstract The recent advances in biomedical electronics are paving the road to a new paradigm in health care. Stress and some mental disorders are related to the cardiovascular function in such a way that ECG signals can be used to continuously monitoring the psychophysiological state of patients. We review these methodologies within the new context.
17:05	12.7.4	HEALTHCARE IN AN INTEGRATED DIGITAL WORLD Speaker: Arben Merkoçi, Catalan Institution for Research and Advanced Studies (ICREA) and Institut Català de Nanociència i Nanotecnologia (ICN2), ES Abstract Example designs of nanomaterials-based biosystems related to various clinical biomarkers including neurodegenerative disease will be shown. The developed devices and strategies are intended to be of low cost while offering high analytical performance in screening diagnostic scenarios.
17:30		End of session

< Return to last page

Visit us at DATE 2015

Booth: 2+3

Booth: 11

Booth: 1

Booth: 5

Booth: 4

Booth: 9

Booth: 8

Booth: 10

Booth: 14

Submissions