Printer-friendly version PDF version
Friday Workshops

W2 ES4CPS - Engineering Simulations for Cyber-Physical Systems


 08:30am - 08:45am Opening session
 08:45am - 10:00am Invited talk: Eric Coelingh, Volvo Car Corporation:

"From SARTRE towards Autonomous Driving - 

An Experience Report and Outlook"

 10:00am - 10:30am Coffee break
10:30am - 12:00pm Benjamin Vedder, Thomas Arts, Jonny Vinter and Magnus Jonsson: “Combining Fault-Injection with Property-Based Testing” Krishnan Srinivasarengan, Goutam Y G and Girish Chandra: “Home Energy Simulation for Non-Intrusive Load Monitoring Applications” Shivam Bhasin, Tarik Graba, Jean-Luc Danger, Yves Mathieu, Daisuke Fujimoto and Makoto Nagata: “Physical Security Evaluation at an Early Design-Phase: A Side-Channel Aware Simulation Methodology”
 12:00pm - 01:00pm Lunch
01:00pm - 02:30pm Ashur Rafiev, Alexei Iliasov, Alexander Romanovsky, Andrey Mokhov, Fei Xia and Alex Yakovlev: “ArchOn: Architecture-open Resource-driven Cross-layer Modelling Framework” Peter Kourzanov: “DSL methods for CPS simulation in the cloud” Md. Abdullah Al Mamun and Jörgen Hansson: “Reducing Simulation Testing Time by Parallel Execution of Loosely-Coupled Segments of a Test Scenario”
 02:30pm - 03:00pm  Coffee break
 03:00pm - 04:00pm Delf Block, Sönke Heeren, Stefan Kühnel, Andre Leschke, Bernhard Rumpe and Vladislavs Serebro: “Simulations on Consumertests: A Perspective for Driver Assistance Systems” Daniel Cesarini, Luca Cassano, Alessio Fagioli and Marco Avvenuti: “Modeling and Simulation of Energy-Aware Adaptive Policies for Automatic Weather Stations”
 04:00pm - 05:00pm  Final discussion, closing session, and final remarks.

W4 Workshop on Design Automation for Understanding Hardware Designs


8:30 - 8:45 Opening session 
Chairs: Görschwin Fey, Emmanuelle Encrenaz-Tiphéne
8:45 - 9:30 Invited talk 
Managing Design Knowledge for IP Cores – State-of-the-art and Open Questions 
Alexander Rath 
Infineon Technologies AG, Munich, Germany
9:30 - 10:30 Technical session: Formal and semi-formal 

Automatic identification of logical relationships among internal signals with small numbers of test vectors
Masahiro Fujita, Takeshi Matsumoto and Satoshi Jo
University of Tokyo, Japan 

Using Natural Language Documentation in the Formal Verification of Hardware Designs
Christopher Harris and Ian Harris
University of California, Irvine, USA 

Understanding Compound Systems from their Components' Properties
Syed-Hussein Syed-Alwi and Emmanuelle Encrenaz
Université Pierre et Marie Curie Paris 6, France 

Design Understanding with Fast Prototyping from Assertions 
Katell Morin-Allory, Fatemeh Javaheri and Dominique Borrione
Univ. Grenoble Alpes, Grenoble, France
10:30 - 11:00 Coffee break & poster presentations

Detecting Concurrency Problems in System Level Designs
Alper Sen and Onder Kalaci
Bogazici University, Istanbul, Turkey
Automatically connecting hardware blocks via light-weight matching techniques
Jan Malburg1, Niklas Krafczyk1 and Goerschwin Fey1,2
1University of Bremen, Germany
2German Aerospace Center, Bremen, Germany
Exact Solution for Trace Signal Selection with Pseudo Boolean Optimization (PBO)
Shridhar Choudhary, Kousuke Oshima, Amir Masoud Gharehbaghi, Takeshi Matsumoto and Masahiro Fujita
The University of Tokyo, Tokyo, JAPAN
11:00 - 12:00 Invited talk 
Capturing and Validating Design Understanding using Formal Properties 
Raik Brinkmann 
OneSpinSolutions GmbH, Munich, Germany
12:00 - 13:00 Lunch
13:00 - 13:45 Invited talk 
Design Understanding in SOC Development - Recent Advances and New Challenges 
Lyes Benalycherif 
ST Microelectronics, Grenoble, France
13:45 - 14:15 Technical session: System level productivity

DiplodocusDF: Analyzing Hardware/Software Interactions with a Dinosaur
Andrea Enrici, Ludovic Apvrille and Renaud Pacalet
Telecom ParisTech, Biot, France 

Towards a Multi-dimensional and Dynamic Visualization for ESL Designs
Jannis Stoppe1, Marc Michael2, Mathias Soeken1,2, Robert Wille1,2,3 and Rolf Drechsler1,2
1DFKI GmbH, Bremen, Germany 
2University of Bremen, Bremen, Germany 
3Technical University Dresden, Germany 
14:15 - 15:00 Invited talk 
Software Reverse Engineering 
Rainer Koschke 
University of Bremen, Germany 
15:00 - 15:30 Coffee break & poster presentations
15:30 - 16:15 Technical session: Reverse and automatic engineering

Increasing Verilog’s Generative Power
Cherif Salama1 and Walid Taha2
1Ain Shams University, Cairo, Egypt 
2Halmstad University, Halmstad, Sweden 

zamiaCAD: Understand, Develop and Debug Hardware Designs
Maksim Jenihhin1, Valentin Tihhomirov1, Syed Saif Abrar1, Jaan Raik1 and Guenter Bartsch2
1Tallinn University of Technology, Estonia 
2zamiaCAD, Germany 

Mutation based Feature Localization
Jan Malburg1, Emmanuelle Encrenaz-Tiphene2 and Goerschwin Fey1,3
1University of Bremen, Germany 
2Université Pierre et Marie Curie Paris 6, France 
3German Aerospace Center, Bremen, Germany 
16:15 - 17:00 Panel
Design understanding – where do industry and acdemia team up? 
Panelists: Ian Harris, Lyes Benalycherif, Raik Brinkmann

The panel will summarize the results of the day and prioritize topics focusing on three questions:
  • What are the most urgent topics from an industrial perspective?
  • What are the most challenging topics from an academic perspective?
  • Where can we exploit synergies between academia and industry?

W5 3D Integration: Applications, Technology, Architecture, Design, Automation, and Test



Chair: Paul Franzon, North Carolina State University, US


08:15  Welcome Address

Saqib Khursheed, University of Liverpool, UK


08:20 Keynote Presentation: 3D Technology – Key Enabler for 3D Heterogeneous Integration

Jürgen Wolf, IZM Fraunhofer, DE.



3D integration is a key technology for microelectronics to meet the growing demands regarding more functionality, increase in performance, miniaturization and cost reduction and becomes important for application areas e.g. cyber physical systems, internet of things, ambient assisted living (AAL), information & communication, security and health. Interposers with Through Silicon Vias (TSVs) are becoming a very important element and a key enabler for the realization of 3D Systems-in-Packages (SiPs) whose main advantages are the decoupling of front end / back end processing for the implementation of TSVs and redistribution layers to integrate multiple devices into a system in package (WL-SiP). Specific applications result in technical approaches ranging from high density TSV integration, high density RDL for digital applications to interposers for RF applications as well as MEMS and sensor integration and optical interconnects. The presentation will highlight results and technical achievements for 3D integration using TSV interposer and addresses also the broad spectrum of topics from design, technology and reliability related to 3D systems.



M. Jürgen Wolf studied electrical engineering and joined Fraunhofer Institute for Reliability and Microintegration (IZM) in 1994 working in the field of wafer level packaging and system in package (SiP). Since 2011 he is head of department Wafer Level System Integration and also responsible for the management of “ASSID - All Silicon System Integration Dresden”. He is also involved in a number of research projects on national, European and international level. Wolf is a European representative in the technical working group Assembly & Packaging of ITRS, a board member of EURIPIDES, JISSO and a member of IEEE/SMTA. Furthermore, he is a representative of the Fraunhofer Cluster 3D Integration.

09:00  Special Session: Reliability and Thermal issues in 3D ICs


09:00  Overview of 3D-Reliability Research in Imec

Kristof Croes – IMEC, BE


09:20  Advanced Failure Analysis Techniques for 3D Packages

Frank Altmann - Fraunhofer IWM Halle, DE


09:40  Research Directions on Thermal Impact of 3D Assembly

Haykel Ben Jaama- CEA-LETI, FR


10:00  SESSION 2: Coffee Break & Posters


10:30  SESSION 3: Invited Talk and Panel


10:30  Invited Talk: Heterogeneous Sensor Integration; Increased Technology Readiness Level

Maaike Visser - SINTEF, Norway


11:00  Panel Session: Are Slow Standardization and CAD-Tool Development Hindering the Progress of 3D IC Design and Integration?


Moderator: Françoise Von Trapp – “Queen of 3D”, 3DInCites, US


Panelists:                       Brandon Wang – Cadence Design Systems, US

Juergen Schloeffel – Mentor Graphics, DE

Makoto Nagata  – Kobe University, JP

Mustafa Badaroglu – Qualcomm Technologies, BE




13:00 SESSION 4: Technology and Design Challenges for 3D ICs

Chair: Thomas Thärigen, Cascade Microtec GmbH, DE


13:00 Integration of Through -Silicon Vias in a High Performance BiCMOS Technology for RF -Grounding and 3D -Integration

M. Wietstruck1, M. Kaynak1, S. Marschmeyer1, K. Zoschke2, and B. Tillack1,3

1 IHP,DE ;  2 Fraunhofer IZM, DE ; 3 Technische Universität Berlin, DE


13:18 2.5D & 3D Technologies require Innovative Lithography Solutions

Klaus Ruhmer, Philippe Cochet,Roger McCleary

Rudolph Technologies, US


13:36 3D Wirebondless IGBT Module for High Power Applications

Z. Y. Gao1, Y. X. Ren1, Y.C. Lee2, H.L. Yiu2, X.Q. Shi1

1 Hong Kong Applied Science & Technology Research Institute (ASTRI), HK; 2Hong Kong Science & Technology Parks Corporation (HKSTP), HK


13:54 Towards Trustworthy NoC-Based 3D-MPSoCs

Johanna Sepúlveda1,2, Guy Gogniat2, Marius Strum1

1 University of São Paulo, BR;  2 LAB-STICC, Lorient, FR


14:12 A TSV-Property-aware Synthesis Method for Application Specific 3D-NoCs

Felix Miller, Thomas Wild, Andreas Herkersdorf, Vladimir Todorov, Daniel Mueller-Gritschneder, Ulf Schlichtmann

Technische Universität, München, DE


14:30   SESSION 5: Coffee Break & Posters


15:00   SESSION 6: Test and Thermal Challenges for 3D ICs

Chair: Basel Halak, U of Southampton, UK


15:00   Design, Test Generation, Processing, and Pre- and Post-Bond Measurement Results of a 3D-DfT Demonstrator Chip Stack

Erik Jan Marinissen1, Bart De Wachter1, Stephen O’Loughlin1, Sergej Deutsch2, Christos Papameletis2, Tobias Burgherr2

1IMEC, BE ; 2Cadence Design Systems, DE


15:18    Power and DFT Aware Partitioning for 3D-SOCs

Amit Kumar and Sudhakar M. Reddy

University of Iowa, US


15:36   System Level Thermal Modelling for 3D IC: A Memory-on-Logic 3D Test Case Study

Cristiano Santos 1,3, Pascal Vivet1, Denis Dutoit1, Philippe Garrault2, Nicolas Peltier2, Ricardo Reis3



15:54   Thermal Power Plane enabling Dual-Side Electrical Interconnects supporting High-Performance Chip Stacking

Thomas Brunschwiler1, Stefano Oggioni2, Timo Tick1, Gerd Schlottig1, Hubert Harrer3

1IBM Research, Zurich, CH; 2IBM ISC, Milan, IT; 3IBM STG, Böblingen, DE


16:12  Thermal Coupling in TSV-Based 3-D Integrated Circuits

Ioannis Savidis1 and Eby G. Friedman2

1Drexel University, US;  2University of Rochester, US


16:30  CLOSE 


A Novel Low-Power TSV Interconnection Scheme Based On Adiabatic Energy-Recovery Logic

Khaled Salah

Mentor Graphics, Cairo, Egypt


3D IC Test through Power Line Methodology

Alberto Pagani, Alessandro Motta

STMicroelectronics – SPA FMTR&D (Sense, Power & Automotive Front-end Manufacturing & Technology R&D)


2.5D Test Cost Optimization using 3D-COSTAR

Mottaqiallah Taouil1, Said Hamdioui1, Erik Jan Marinissen2 and Sudipta Bhawmik3

1Delft University of Technology, NL, 2IMEC, BE,  3Qualcomm, US


Test Pattern Retargeting in 3D SICs Using an IEEE P1687 based 3DFT architecture

Yassine Fkih1,2, Pascal Vivet1, Bruno Rouzeyre2, Marie-Lise Flottes2, Giorgio Di Natale2, Juergen Schloeffel3

1CEA-Leti, MINATEC Campus, FR, 2LIRMM, Univ Montpellier II/CNRS, FR, 3Mentor Graphics, DE


Impact Analysis of Through-Silicon-Via Variation on Performance and Energy Consumption of 3D Networks-on-Chip Architectures

Michael Opoku Agyeman, Ali Ahmadinia

School of Engineering and Built Environment Glasgow Caledonian University, Glasgow, UK


Processing and Microstructure of Solid-Liquid Interdiffusion Interconnects for 3D Integration

Iuliana Panchenko1, Juergen Grafe1, Maik Mueller2, Klaus-Juergen Wolter2, M. Juergen Wolf1, Klaus-Dieter Lang1

1Fraunhofer IZM ASSID, Moritzburg, Germany, 2Electronics Packaging Laboratory, TU Dresden, Dresden, Germany



M. Juergen Wolf , K.-D. Lang

Fraunhofer IZM ASSID, Berlin, Dresden, Germany

W8 3PMCES - Performance, Power and Predictability of Many-Core Embedded Systems


08:30 – 08:45

Opening Session
Adam Morawiec, Tapani Ahonen, and Walter Stechele

08:45 – 09:30  

Keynote Presentation

Trustworthy Contract Computing through Seamless System Build and Operation
Tapani Ahonen, Tampere University of Technology, FI

Computing systems are built and operated in a way akin to manufacturing pipelines.  In a pipeline organization all stages need to operate simultaneously without significant disruptions.  When one stage fails, the others will come to a halt either immediately or after a short grace period enabled by a buffer queue.  The ability to fix any possibly emerging issues as quickly as possible is crucial for maintaining system functionality and throughput. Hierarchically organized management is designed for keeping all the individual stages in efficient operation.  Such management organizations are prone to introducing inefficiencies.  Hierarchies become deeper with growing system complexity.  At higher levels of management the area of responsibility is wider while the capability to directly control low-level operations is weaker. Information exchange and instruction chains are usually cumbersome when they span many levels of organization.  This happens in part due to operations reserving information to internal use only as well as in part due to restrictive exchange interfaces.  Incoherency of information available for different operations in the organization often leads to unnecessary functional redundancy and indirect, wasteful methods to execute joint functions.

What purpose does it serve to encapsulate information in one operation or stage only?  In a pure pipeline organization each stage is supposed to execute a highly specialized sub-system consisting of fully independent operations no other stage is capable of.  Special tools and components for one operation are in the possession of one stage only.  It makes perfect sense to decline internal cooperation for ensuring local control, as other stages will not be needed, nor is their involvement helpful.  However, this is not the case for computing systems.  Software functions cannot be executed without supporting hardware and most application code cannot be executed without supporting operating system functions. Design time tools cannot produce good results without detailed information of the run-time environment.

09:30 – 10:00  

Session 1:

Predictable Portability of Parallel Code Based on Increased Semantic Information Sharing and Fork-Join Programming
Konstantin Popov, SICS Swedish ICT AB, SE

Heterogeneous parallel computing seems to be the way forward in all market segments where computers are used. Parallelism ranges from a few cores in small embedded systems to hundreds or even thousands in the HPC domain. Heterogeneity ranges from capability heterogeneous to functional heterogeneous systems with multiple completely different compute engines like GPGPUs or HW accelerators. This poses a tremendous challenge for software developers who currently need to tailor their software for each individual platform. In this talk I present an approach to ease the programming burden by means of open semantic information interfaces across software abstraction layers and using programming models that are amenable for analysis, modeling and good run-time scheduling decisions.

10:00 – 11:00

Session 2: Managing Execution in Dynamic Environment

Efficient Leader Election for Synchronous Shared-Memory Systems
Vicent Sanz Marco, Raimund Kirner, and Michael Zolda, University of Hertfordshire, UK

Leader election is a frequent problem for systems where it is important to coordinate activities of a group of actors. It has been extensively studied in the context of networked systems. But with the raise of many-core computer architectures, it also became important for shared-memory systems. In this paper we present an efficient leader election technique for synchronous shared-memory systems. Synchronous in our context means the response time of code sections with relevant communication patterns is bounded. This makes our approach more efficient compared to leader election methods explored for asynchronous shared-memory systems. Our leader election method is used to help making the scheduling layer LPEL fault tolerant.

A Proposal on Parallel Software Development for Network-on-Chip based Many-Core system (Short Paper)
Guoqing Zhang and Tapani Ahonen, Tampere University of Technology, FI

This paper gives a proposal on parallel software development for Network-on-Chip (NoC) based many-core systems. In the proposal, we recommend to use SLURM, an opensource software which is widely used in computer clusters, for hardware mapping and resource management, and recommend to use Message Passing Interface (MPI) to handle communications among cores in NoC systems. The partition concept in SLURM is suggested to be used for hardware mapping, especially for safety critical systems where require separated hardware mappings for safety-critical tasks and non-safety-critical tasks. We also demonstrate a methodology on porting the Symmetric Multi-Processor (SMP) architecture of Linux on top of SLURM-and-MPI enabled NoC systems by employing the scheduling domain concept of the Linux kernel. In the methodology, the cores on a NoC system are separated into partitions according to hardware topology to handle I/O-bound processes and CPU-bound processes respectively.

A Power-Aware Framework for Executing Streaming Programs on Networks-on-Chip(Short Paper)
Nilesh Karavadara, Simon Folie, Michael Zolda, Nga Nguyen, and Raimund Kirner, University of Hertfordshire, UK

Software developers are discovering that practices which have successfully served single-core platforms for decades do no longer work for multi-cores. Stream processing is a parallel execution model that is well-suited for architectures with multiple computational elements that are connected by a network. We propose a power-aware streaming execution layer for network-on-chip architectures that addresses the energy constraints of embedded devices. Our proof-of-concept implementation targets the Intel SCC processor, which connects 48 cores via a network-onchip. We motivate our design decisions and describe the status of our implementation.

11:00 – 11:30

Coffee Break and Poster Session

11:30 – 12:00  

Invited Talk:

Performance Prediction and Software Development on Many-core Processor Platform
Benoit Dupont de Dinechin, Kalray, FR

Computation-intensive embedded applications are often constrained by the end-to-end latency. Classic platforms that support these applications rely on FPGA or farms of digital signal processors that run under the supervision of micro-controllers. There are significant advantages to hosting those applications on many-core platforms, in particular reducing the system size, weight, and power (SWaP), while improving programmability. However, existing many-core platforms based on CPU or GPU and the associated software stacks do not allow for predictable or even repeatable processing latencies.

In this keynote the architecture and programming models of the MPPA-256 processor will be presented. It integrates 256 processing engine (PE) cores and 32 resource management (RM) VLIW cores on a single 28nm CMOS chip. We discuss how this processor and its programming models are especially suited for computational intensive applications under latency constraints. We also introduce the metalibm library generator for the Kalray VLIW cores, which automatically produces high-performance and correctly rounded implementations for application-specific specializations of the C99 libm and IEEE 754-2008 mathematical functions.

12:00 – 13:00  

Lunch and Poster Session

13:00 – 14:00

Session 3: Reliability and Safety of Multi-Core Platforms

Hardware/Software-based Runtime Control of Multicore Processors-on-Chip for Reliability Management under Aging Constraints
Walter Stechele and Erol Koser, Technical University Munich, DE

Multicore processors-on-chip are gaining interest in safety-critical applications like aeronautic, automotive, and medical. Traditional methods for reliability, e.g. triple module redundancy, might be too expensive, therefore recent research is turning towards reliability-aware runtime management of multicore processors, including dynamic voltage frequency scaling and dynamic load distribution. We will present a case study, introducing a reliability layer between multiprocessor hardware and middleware, to cope with aging related degradation.

MPSoC performance degradation (due to aging) predictability at high abstraction level and Applications
Olivier Heron, CEA-LIST, FR

The shrinking size of transistors and nano-wires results in increased device density, increased speed and reduced power consumption (Moore's Law). However, device reliability is reduced due to the non-ideal scaling of supply voltage. We observe two trends in semiconductor the industry. Firstly, MOS technology scaling is still continuing (and will continue). The failure physics become more complex and new failures, that were negligible in old technology modes, now emerge. The semiconductor industry has solutions to reach the ITRS requirements until the end of next year1. After that, only interim solutions are available. Secondly, Multi-Processor SoCs (MPSoCs) offer high performance; are cheap, consumes less power than high-end processors; and are able to support a large variety of applications. As MPSoCs are now applied in most market segments (automotive, consumer, HPC, etc.), both high-performance and reliability become major concerns, even for non-safety-critical applications. Current academic and commercial CAD tools aid the designer to get reliability projections of the chip in the very last design cycles before chip tape-out. However, MPSoC design and verification raise new challenges. They require new methodologies and CAD tools able to capture both architecture design and reliability at higher abstraction levels, such as transactional level. In the first development cycles, design space exploration is necessary to analyze different MPSoC configurations (memory sizes, processor pipeline depth and others) and SW and their impact on performance, power consumption and reliability. In this talk, I will present a methodology we propose in RELY project to model and predict reliability at high abstraction level and I will present some results on an MPSoC case study.

14:00 – 14:30  

Session 4: European Projects Cluster

European Project Cluster on Mixed-Criticality Systems
Salvador Trujillo (IK4-IKERLAN, ES), Roman Obermaisser (University of Siegen, DE), Kim Gruettner (OFFIS – Institute for Information Technology, DE), Francisco J. Cazorla (Barcelona Supercomputing Center and IIIA-CSIC, ES), and Jon Perez (IK4-IKERLAN, ES)

Modern embedded applications already integrate a multitude of functionalities with potentially different criticality levels into a single system and this trend is expected to grow in the near future. Without appropriate preconditions, the integration of mixed-criticality subsystems can lead to a significant and potentially unacceptable increase of engineering and certification costs. There are several ongoing research initiatives studying mixed criticality integration in multicore processors. Key challenges are the combination of software virtualization and hardware segregation and the extension of partitioning mechanisms jointly addressing significant extra-functional requirements (e.g., time, energy and power budgets, adaptivity, reliability, safety, security, volume, weight, etc.) along with development and certification methodology. This paper provides a summary of the challenges to be addressed in the design and development of future mixed criticality systems and the way in which some current European Projects on the topic address those challenges.

14:30 – 15:00  

Coffee Break and Poster Session

15:00 – 16:00  

Session 5: System Design Technologies

Timing Analysis of a Heterogeneous Architecture with Massively Parallel Processor Arrays
Deepak Gangadharan, Alexandru Tanase, Frank Hannig, and Jürgen Teich, University of Erlangen, Nuremberg, DE

In this paper, we present some analytical results from the timing analysis of a heterogeneous architecture with massively parallel processor arrays (MPPA). Specifically, in this work, the MPPA is a tightly-coupled processor array (TCPA). In recent work, the TCPA has been shown to be timing predictable and symbolic loop scheduling has been used to compute predictable schedules for the execution of each application mapped on the TCPA and run in parallel. However, the timing predictability provided by the TCPA can only be ensured if the shared resources on the TCPA tile provide the required input data rates to the TCPA. Towards this, we formulate a condition that needs to be satisfied over the local shared bus for the data transfers from the local memory to the TCPA in order to achieve the required application quality and latency of output data. Further, we also formulate another condition that must be satisfied by DMA data transfers from memory tile to TCPA tile during the arbitration in the memory tile so that the service levels provided by the NoC for the DMA transfers are maximally utilized.

An Accurate Power Estimation Method for MPSoC Based on SystemC Virtual Prototyping
Khouloud Zine Elabidine, LIP6, FR

The paper presents a novel method called DPE (Design Power Estimation) which estimates efficiently and fastly SoC’s power consumption. In fact our power modeling consists in defining fonctional activities that best charaterize each component of the considerated platform. In this work, we are not looking for making accurate estimations of power consumption; however we introduce a method that offers a global power characterization which helps performing design space exploration at an early stage of the design flow (SystemC virtual prototyping) and find the best trade-off between power and performance.

Parallelization of Object Detection Algorithm through Hardware Threads for MPSoCs
David Watson, Ali Ahmadinia, Gordon Morison, and Tom Buggy, Glasgow Caledonian University, UK

Adapting software applications to multiprocessor system on chips (MPSoCs) typically follows multi-threaded design flows and data dependence analysis to implement concurrency. However, to take advantage of the hardware customizations possible through reconfigurable MPSoCs, hardware threads (HWTs) can be used to increase application concurrency and throughput, whilst complimenting multi-threaded design flows. In this work, we show how applications can be analyzed and tailored to use HWTs increase the concurrency of applications. We show how task flow graphs (TFGs) can be changed into a Kahn Process Network (KPN) which describes how software tasks can interact with HWTs over FIFOs to maintain memory coherency at the software level. We evaluate our MPSoC designs based on performance increase, throughput, and energy efficiency with a data-intensive face detection algorithm and obtain performance increases up to 3.6x compared to software-only implementation, and throughput and energy efficiencies of up to 85.97MB/s and 11.92MB/W respectively.

16:00 – 16:45

Panel Session:
What is still needed to have a reliable embedded system development ecosystem in place?

Moderator: Achim Rettberg, OFFIS, Germany

    Sven Karlsson, DTU, DK
    Tapani Ahonen, TUT, FI
    Kim Grüttner, OFFIS, DE
    Benoit Dupont de Dinechin, Kalray, FR
    Walter Stechele, TU Munich, DE

16:45 – 17:00 Closing Session


  Adaptive Resource Control in Multi-core Systems
Alexei Iliasov, Ashur Rafiev, Alexander Romanovsky, Andrey Mokhov, Alex Yakovlev, and Fei Xia
, Newcastle University, UK

Multi-core systems present a set of unique challenges and opportunities. In this paper we discuss the issues of power-proportional computing in a multi-core environment and argue that a cross-layer approach spanning from hardware to user-facing software is necessary to successfully address this problem.

Criticality-Aware Functionality Allocation for Distributed Multicore Real-Time Systems
Junhe Gan, Paul Pop, and Jan Madsen, Technical University of Denmark, DK

We are interested in the implementation of mixed-criticality hard real-time applications on distributed architectures, composed of interconnected multicore processors, where each processing core is called a processing element (PE). The functionality of the mixed-criticality hard real-time applications is captured in the early design stages using functional blocks of different Safety-Integrity Levels (SILs). Before the applications are implemented, the functional blocks have to be decomposed into software tasks with SILs. Then, the software tasks have to be mapped and scheduled on the PEs of the architecture. We consider fixed-priority preemptive scheduling for tasks and non-preemptive scheduling for messages. We would like to determine the function-to-task decomposition, the type of PEs in the architecture and the mapping of tasks to the PEs, such that the total cost is minimized, the application is schedulable and the safety and security constraints are satisfied. The total costs capture the development and certification costs and the unit cost of the architecture. We propose a Genetic Algorithm-based approach to solve this two-objective optimization problem, and evaluate it using a real-life case-study from the automotive industry.

Estimating Video Decoding Energies And Processing Times Utilizing Virtual Hardware
Sebastian Berschneider, Christian Herglotz, Marc Reichenbach, Dietmar Fey, and André Kaup, Friedrich-Alexander-University Erlangen-Nuremberg, DE

The market for embedded devices increases permanently. Especially cell- and smartphones, which are substantial tools for many people, become more and more complex and serve nowadays as portable computers. An important problem to these devices is the energy efficiency. The accumulator battery can be discharged within a few hours, especially when a smartphone processes computationally intensive tasks like video decoding. Therefore, modern devices tend to include power efficient processors. But not only power efficient hardware effects the overall power consumption, also the design of algorithms regarding energy efficient programming is an important task. Usually, energy efficient development is done using real hardware, where programs are executed and power consumption is measured. This process is highly costly and error prone. Moreover, expensive hardware equipment is necessary. Therefore, in this work we present a design methodology that enables to run the application software on virtual hardware (CPU) that counts the instructions and memory accesses. By multiplying a priorly measured energy and time per instruction to these counts, energy and time estimations are possible, without having to run the target application on real hardware. As a result, we present a methodology for writing embedded applications with immediate feedback about these non-functional properties.

Increased Reliability of Many-Core Platforms through Thermal Feedback Control
Matthias Becker, Kristian Sandström, Moris Behnam, and Thomas Nolte, MRTC / Mälardalen University, SE

In this paper we present a low overhead thermal management approach to increase reliability of many-core embedded real-time systems. Each core is controlled by a feedback controller. We adapt the utilization of the core in order to decrease the dynamic power consumption and thus the corresponding heat development. Sophisticated control mechanisms allow us to migrate the load in advance, before reaching critical temperature values and thus we can migrate in a safe way with a guarantee to meet all deadlines.

Performance Analysis of a Computer Vision Application with the STHORM OpenCL SDK
Vítor Schwambach, Sébastien Cleyet-Merle, Alain Issard, STMicroelectronics, FR and Stéphane Mancini, TIMA lab, FR

Computer vision applications constitute one of the key drivers for embedded many-core architectures. To enable parallel application performance estimation and optimization early in the development flow, the development environment must provide the developer with simulation tools for fast and precise application-level performance analysis. In this work, we port a face detection application onto the STHORM many-core accelerator using the STHORM OpenCL SDK. We compare performance results obtained with the STHORM cycle-approximate simulator and a prototype implementation, and show that a high mismatch is present. We identify the key contributors to this mismatch, and propose that these be addressed in the upcoming versions of the SDK to allow more precise simulation results for early design space exploration.

PSE - Performance Simulation Environment
Jussi Hanhirova and Vesa Hirvisalo, Aalto University, FI

We use a resource reservation based simulation environment (PSE) as a research tool to experiment on how to co-model HW/SW schedulers. Our focus is on heterogenous systems with manycores. Task processing based systems use different load balancing schemes to make efficient use of resources and to schedule work within real-time constraints. As parallel MPSoCs are constantly evolving, simulation is a viable tool to explore different configurations.

Scaling Performance of FFT Computation on an Industrial Integrated GPU Co-processor: Experiments with Algorithm Adaptation.
Mohamed Amine Bergach and Serge Tissot, Kontron, FR, Michel Syska and Robert De Simone, Inria, FR

Recent Intel processors (IvyBridge, Haswell) contain an embedded on-chip GPU unit, in addition to the main CPU processor. In this work we consider the issue of efficiently mapping Fast Fourier Transform computation onto such coprocessor units. To achieve this we pursue three goals:

First, we want to study half-systematic ways to adjust the actual variant of the FFT algorithm, for a given size, to best fit the local memory capacity (the registers of a given GPU block) and perform computations without intermediate calls to distant memory;

Second, we want to study, by extensive experimentation, whether the remaining data transfers between memories (initial loads and final stores after each FFT computation) can be sustained by local interconnects at a speed matching the integrated GPU computations, or conversely if they have a negative impact on performance when computing FFTs on GPUs ”at full blast”;

Third, we want to record the energy consumption as observed in the previous experiments, and compare it to similar FFT implementations on the CPU side of the chip.

We report our work in this short paper and its companion poster, showing graphical results on a range of experiments. In broad terms, our findings are that GPUs can compute FFTs of a typical size faster than internal on-chip interconnects can provide them with data (by a factor of roughly 2), and that energy consumption is far smaller than on the CPU side.

Smart Scheduling of Streaming Applications via Timed Automata
Waheed Ahmad, Robert de Groote, Philip K.F. Hölzenspies, Mariëlle Stoelinga, and Jaco van de Pol, University of Twente, NE

Streaming applications such as video-in-video and multi-video conferencing impose high demands on system performance. On one hand, they require high system throughput. On the other hand, usage of the available resources must be kept to minimum in order to save energy. Synchronous dataflow (SDF) graphs are very popular computational models for analysing streaming applications. Recently, they are widely used for analysis of the streaming applications on a single processor as well as in a multiprocessing context. Smart scheduling techniques are critical for system lifetime so that the maximum throughput is obtained by running as few resources as possible.

Current maximum throughput calculation methods of the SDF graphs requires an unbounded number of processors or static order scheduling of tasks. Other novel methods involves the conversion of an SDF graph to an equivalent Homogeneous SDF graph (HSDF). This approach results in a bigger graph; in the worst case, the size of converted HSDF graph could be exponentially bigger.

This poster presents an alternative, novel approach to analyse SDF graphs on a given number of processors using a proved formalism for timed systems termed Timed Automata (TA). By definition, TA are automata in which the elapse of time is measured by clock variables. The conditions under which a transition can be taken are indicated by clock guards. Furthermore, invariants shows the conditions for a system to stay in a certain state. Synchronous communication between the timed automata is carried out by hand-shake synchronisation using input and output actions. Output and input actions are denoted with an exclamation mark and a question mark respectively, e.g. fire! and fire?. TA hold a good balance between expressiveness and tractability and are supported by various verification tools e.g. UPPAAL.

We translate the SDF graph of an application, and a given architecture of computer processors into separate timed automaton. Both automata synchronise using the actions "req" and "fire". In this way, timed automaton of the application SDF graph is mapped on the timed automata of the architecture model. After that, we can analyse the performance using different measures of interest.

In particular, the main contributions of this poster are: (1) Compositional translation of the SDF graphs into timed automata; (2) Exploiting the capabilities of UPPAAL to search the whole state-space and to find the schedule that fits on the available processors and maximises the throughput; (3) Finding the maximum throughput on homogeneous and heterogeneous platforms; (4) Quantitative model-checking. We also demonstrate that the deadlock freedom is preserved even if the number of processors varies.

Results show that in some cases, the maximum throughput of an SDF graph remains same even if the number of processors is reduced. Similarly, a trade-off between the given number of processors and the maximum throughput can be obtained efficiently. Moreover, the benefits of quantitative model-checking and verification of the user-defined properties can also be enjoyed using different contemporary model-checkers.

Future work includes energy optimal synthesis and scheduling, translation of the SDF graphs to Energy Aware Automata, extension of SDF graphs with energy costs and stochastics, dynamic power management (DPM) and reduction techniques of energy models. In order to tackle state-space explosion, we also plan to apply multi-core LTL model checking.

System Level Design Framework for Many-core Architectures
Pablo Peñil, Luis Diaz, and Pablo Sanchez, University of Cantabria, ES

The complexity of the embedded, many-core architectures has been constantly increasing their shipment volume in recent years, providing a solution for creating highly optimized complex systems. In order to deal with the complexity of these many-core architectures, users are requiring new design methodologies that encompass system specification and performance analysis from the initial stages of the design process. The performance analysis frameworks should include software application and many-core hardware platform co-simulation in order to obtain estimations of the software execution time and performance of platform HW resources. This paper presents a fully-integrated host-compiled simulation framework which enables obtaining fast performance estimations for high-level system models. This framework could be integrated in a design exploration methodology that enables to choose the optimal specification and software parallelization, facilitating system implementation and minimizing designer effort.



Mats Brorsson - professor of Computer Architecture at KTH, Sweden and a senior researcher at Swedish Institute of Computer Science (SICS). His current research are in programming models, run-time systems, operating systems and the architecture of parallel computer systems in particular multi- and many-core systems. Prof. Brorsson has authored and co-authored over 50 scientific papers in international conferences and journals.

Tapani Ahonen is a part-time Senior Scientist at Technoconsult (TC), Denmark and an Assistant Professor at Tampere University of Technology (TUT), Finland. His work is focused on proof-of-concept driven computer systems design with emphasis on many-core processing environments. Ahonen has an MSc in Electrical Engineering and a PhD in Information Technology from TUT. He has an extensive international publication record including edited books and journals, written book chapters and journal articles, invited talks in high-quality conferences, as well as full-length papers and paper abstracts in conference proceedings.

Sven Karlsson - associate professor at DTU Informatics, DTU, Denmark. His research interests are in programming models, compilers, architectures, operating systems and system software for parallel computers. He has published more than 30 papers in these fields.

Walter Stechele - associate professor at Technical University of Munich (TUM), Germany. His research interests include visual computing and robotic vision, with focus on Multi Processor System-on-Chip (MPSoC) architectures and design methodology, low power optimization, dynamic reconfiguration of FPGA devices, and applications in automotive and robotics.

Adam Morawiec - director at ECSI. He holds a PhD from TIMA Lab/INPG in Grenoble and is working in the domain of specification and design languages, system design and synthesis. He is an author of several scientific publications and editor of 4 books. He was also a chair of scientific conferences (DASIP, S4D, ESLsyn).

W7 Memristor Science & Technology



08:30W7.1Opening Session

Fernando Corinto, Politecnico di Torino, IT

Ronald Tetzlaff, Technische Universität Dresden, DE

08:45W7.2Invited Talk by Prof. L. O. Chua
08:45W7.2.1Memristor: State-of-the-Art
L. O. Chua, University of California, Berkeley, US

This exposition shows that the potassium ion-channels and the sodium ion-channels that are distributed over the entire length of the axons of our neurons are in fact locally-active memristors. In particular, they exhibit all of the fingerprints of memristors, including the characteristic pinched hysteresis Lissajous figures in the voltage-current plane, whose loop areas shrink as the frequency of the periodic excitation signal increases. Moreover the pinched hysteresis loops for the potassium ion-channel memristor, and the sodium ion-channel memristor, from the Hodgkin-Huxley axon circuit model are unique for each periodic excitation signal. An in-depth circuit-theoretic analysis and characterizations of these two classic biological memristors are presented via their small-signal memristive equivalent circuits, their frequency response, and their Nyquist plots. Just as the Hodgkin-Huxley circuit model has stood the test of time, its constituent potassium ion-channel and sodium ion-channel
10:00W7Coffee Break
10:15W7.4Session 1
10:15W7.4.1Invited Talk: Resistive Switching - From Basic Switching Mechanism to Device Applications
Thomas Mikolajick1, Hannes Mähne2, H. Wylezich2 and Stefan Slesazeck2
1NaMLab gGmbH and Technische Universität Dresden, DE; 2NaMLab gGmbH, DE

Resistive switching mechanisms are under intense study in the last 15 years mainly for applications in next generation memories. A variety of physical mechanisms exist that lead to different switching characteristics. Based on the portfolio of different device characteristics the device properties may be adjusted to different application needs. In this talk the progress in tailoring resistive switching characteristics both from literature as well as from the authors group will be shown and conclusions for prospects in semiconductor memories and other applications will be drawn.
11:15W7.4.2The art of SPICE modeling of memristive systems
Dalibor Biolek, University of Defense and Brno University of Technology, CZ

A methodology for accurate and reliable modeling of memristive devices in SPICE environment is presented. Due to specific features of SPICE-family programs, the simulation results can be burdened with errors, either evident or not apparent at first sight, or the solution may not be found at all. The above two kinds of problems, called imperfections and non-convergence issues, can be magnified in circuits containing memristive elements with a specific hysteresis behavior. Four key factors, influencing the accuracy and reliability, are discussed: numerical limits in SPICE, rules of building up behavioral models, the way of modeling the state and port equations, and setting the parameters of the analysis. The recommendations are applicable to a wide class of SPICE-family simulation programs. Demonstrations are given for PSpice and HSPICE.
13:00W7.5Session 2
13:00W7.5.1Modeling and simulation of memristive devices for memory and logic applications
Stephan Menzel1 and Rainer Waser2
1Forschungszentrum Jülich, DE; 2RWTH Aachen Universität, DE

Redox-based mesistive switching devices are a potential candidate for future non-volatile memory and logic applications. To enable circuit design using memristive devices predictive simulation models are required. In this work basic requirements are defined that needs to be fulfilled to accurately model memristive devices. In addition, a physics-based modeling approach for the resistive switching in ECM cells is presented which fulfills the relevant criteria. It is based on the electrochemical driven growth and dissolution of a metallic filament and covers self-consistently the basic experimental characteristics: I-V characteristics, nonlinear switching kinetics, and multilevel switching behavior.
13:45W7.5.2Memory Intensive Computing
Shahar Kvatinski, Technion – Israel Institute of Technology, IL

Over the past years, new memory technologies such as RRAM, STT-MRAM, PCM etc., have emerged. These technologies, located in the metal layers of the chip, are relatively fast, dense, and power-efficient, and can be considered as memristors. Usually, the use of these devices has been limited to flash, DRAM, and SRAM replacement. This talk is focused on different uses of memristors. For example, new memory structures, different than the conventional memory hierarchy, opening opportunity to a new era in computer architecture - the era of Memory Intensive Computing. Memristors can also be integrated with CMOS in logic circuits. Alternatively, they can be used as a stand-alone logic, suitable to perform logic within the memory and provide opportunity for new computer architectures, different than classical von Neumann.
14:30W7Coffee Break
15:00W7.6Session 3
15:00W7.6.1Ferroelectric Memristors for Neuromorphic Computing
Sören Boyn, CNRS/Thales, FR

Thanks to the progress in Nanotechnologies and Material Science, physicists and condensed matter scientists have recently been able to build smart nano-devices with enhanced capabilities. Some of these new devices show functionalities that could be extremely interesting for bio-inspired computing. It has been demonstrated for example that some analog and tunable nano-resistors called Memristors can mimic synapses on silicon. The industry is already developing dense networks of these nano-devices for classical digital memories. It is therefore no longer a dream to envisage building bio-inspired chips based on large-scale, high density parallel networks of these advanced devices, and taking advantage of their full functionalities. What's more, the inherent qualities of massively parallel architectures: the speed, the tolerance to defects and the low power consumption are more and more appreciated these days when computer processors are heating so much that they cannot be used at all times, and when transistors are shrinking so much that they will no longer be reliable. It is becoming a common thesis that bio-inspired chips such as Artificial Neural Networks will soon enter the market as a back-up or accelerator of more traditional computing architectures. In this talk, after a brief introduction on memristors nano-devices and their applications, I will focus on our work: the development of a new generation of memristors, based on purely electronic effects, the ferroelectric memristors. I will show that, by tuning interface properties and finely engineering the dynamics of ferroelectric polarization, we can control the response of these memristors. Furthermore, I will demonstrate their suitability in terms of endurance and retention.
15:00W7.6.2Is memristor the 4th circuit element?
Frank Zhigang Wang, School of Computing University of Kent Canterbury, GB

Chua proposed a Basic Circuit Element Quadrangle including the three classic elements (resistor, inductor and capacitor) and his formulated, named memristor as the fourth element. Based on an observation that this quadrangle may not be perfectly symmetric, we propose a Basic Circuit Element Triangle, in which memristor as well as mem-capacitor and mem-inductor lead three basic element classes, respectively. An intrinsic mathematical relationship is found to support this new classification. We believe that this triangle is concise, mathematically sound and aesthetically beautiful, compared with Chua's quadrangle. The importance of finding a correct circuit element table is similar to that of Mendeleev's Periodic Table of Chemical Elements in Chemistry. A correct circuit element table would also request to rewrite the physics textbooks.
15:00W7.6.3NbOx/Nb2O5 memristor modeling based on Chua's Unfolding Principle
Alon Ascoli1, Stefan Slesazeck2, Hannes Mähne2, Ronald Tetzlaff1 and Thomas Mikolajick3
1Technische Universität Dresden, DE; 2NaMLab gGmbH, DE; 3NaMLab gGmbH and Technische Universität Dresden, DE

Prof. Chua has recently introduced a systematic approach to the modeling of memristors known as Unfolding Principle. Sharing Chua's opinion that the availability of a general mathematical framework capable to capture the dynamics of real memristors would boost the ongoing exploration of their full potential in various applications developing new types of circuits including non-volatile memories, neuromorphic systems, spike-based signal processing machines and sensor systems, in this presentation we introduce a Unfolding Principle-based model for the threshold switching behavior of a NbOx/Nb2O5 memristor fabricated at NaMLab. The accuracy of the proposed mathematical description is demonstrated through a number of case studies. The proposed model is accurate yet simple and thus suited for time-efficient circuit simulations. The availability of reliable mathematical frameworks, such as the one proposed here, would certainly pave the way towards a more rapid, extensive and intensive introduction of the memristor into the realm of circuit elements at disposal of integrated circuit designers.
15:00W7.6.4Pattern Classification and Recognition with Memristive Circuits
Fabien Alibart1 and D. B. Strukov2
1CNRS, FR; 2University of California at Santa Barbara, US

We will discuss recent experimental results on pattern classification and recognition tasks implemented with memristive [1] (ReRAM [2]) neural networks. The Pt/TiO2-x/Pt memristive devices (Fig. 1a, b), which are utilized in both demonstrations, are fabricated with nanoscale e-beam-defined protrusion which localizes the active area during the forming process to ~(20 nm)3 volume and as a result helps in improving device yield. In particular, we will first discuss demonstration of pattern classification task for 3×3 binary images by a single-layer perceptron network implemented with 10 x 2 memristive crossbar circuits (Fig. 1c) in which synaptic weights are realized with memristive devices. The perceptron circuit is trained by ex-situ and in-situ methods to perform binary classification for a set of patterns from an original work by Widrow [3]. In the ex-situ case, the synaptic weights are calculated on the precursor software-based network and then imported sequentially to the crossbar circuits using variation-tolerant programming algorithm [4]. For the in-situ training, the weights are adjusted in parallel following perceptron learning rule by applying voltage pulses from pre-synaptic and post-synaptic neurons. Both approaches work successfully (Fig. 1d) despite significant variations in switching behavior of memristive devices as well as half-select and leakage problems in crossbar circuits [5].
15:00W7.6.5Memristor crossbar array circuits for neuromorphic applications
Kyeong-Sik Min, Kookmin University, KR

Crossbar array architecture is the most suitable to realize high-density memristor-based synapses. In this presentation, we discuss various crossbar array circuits for mimicking synaptic functions in terms of area, power, etc. In addition, variations in fabrication process, power supply voltage, etc that can affect the synaptic functions of memristor-based crossbar array will be analyzed and discussed in this presentation.
16:40W7.7Closing Session

W6 MEDIAN - Workshop on Manufacturable and Dependable Multicore Architectures at Nanoscale



08:30W6.1Opening Session

Mehdi Tahoori, Karlsruhe Institute of Technology, DE
Oliver Bringmann, FZI/University of Tuebingen, DE

Maria K Michael, University of Cyprus, CY

Ozcan Ozturk, Bilkent University, TR

Welcoming comments
08:45W6.2Keynote Talk
08:45W6.2.1Designing Efficient and Reliable Multicore Processors for Networking, Servers, and Beyond
Shubu Mukherjee, Cavium Networks, US

09:45W6.3Paper Session I: New Challenges at the System Level
09:45W6.3.1Multi-Core Emulation for Dependable and Adaptive Systems Prototyping
Cristiana Bolchini and Matteo Carminati, Politecnico di Milano, IT

09:45W6.3.2Fault-tolerant Routing Approach for 3D Stacked Meshes
Masoumeh Ebrahimi, Masoud Daneshtalab and Juha Plosila, University of Turku, FI

10:30W6Coffee Break
11:00W6.4Paper Session II: Reliability Threads in New Technologies
11:00W6.4.1Invited Talk - Steep Slope Devices: Opportunities and Challenges for Processor Design
Vijaykrishnan Narayanan, Penn State, US

11:00W6.4.2BTI reliability from Planar to FinFET nodes: Will the next node be more or less reliable?
Halil Kukner1, Pieter Weckx2, Praveen Raghavan1, Ben Kaczer1, Doyoung Jang1, Francky Catthoor3, Liesbet Van der Perre2, Rudy Lauwereins3 and Guido Groeseneken3
1IMEC, BE; 2KU Leuven, BE; 3IMEC, KU Leuven, BE

11:00W6.4.3Analysis of Random Dopant Fluctuations and Oxide Thickness on a 16nm L1 Cache Design*)
Cagri Eryilmaz1, Azam Seyedi2, Ozman Unsal3 and Andrian Cristal4
1Middle Eastern Technical University, TR and Barcelona Supercomputing Center, ES, ; 2Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, ES; 3Barcelona Supercomputing Center, ES; 4Barcelona Supercomputing Center, Universitat Politecnica de Catalunya and IIIA-CSIC, ES

12:00W6Lunch Break
13:00W6.5Paper Session III: Application Specific Solutions
13:00W6.5.1FPGA Defect Tolerance based on Equivalent Configurations Generation
Parthasarathy M. B. Rao, Abdulazim Amouri and Mehdi B. Tahoori, Karlsruhe Institute of Technology, DE

13:00W6.5.2A Complex Control System for Testing Fault-Tolerance Methodologies*)
Jakub Podivinsky, Marcela Simkova and Zdenek Kotasek, Brno University of Technology, CZ

13:30W6.6Panel Session

Said Hamdioui, TU Delft, NL

Matteo Sonza Reorda, Politecnico di Torino, IT

Mehdi Tahoori1, Oliver Bringmann2, Adrian Evans3 and Viacheslav Izosimov4
1Karlsruhe Institute of Technology, DE; 2FZI/University of Tuebingen, DE; 3iROC, FR; 4Semcon, SE
14:30W6.7Coffee Break & Poster Session
14:30W6.7.1BADR: Boosting Reliability Through Dynamic Redundancy
Ihsen Alouani1, Smail Niar1, Mazen Saghir2 and Fadi Kurdahi3
1University of Valenciennes, FR; 2Texas A&M University, QA; 3University of California at Irving, US

14:30W6.7.2Automatic Detection and Correction of Defective Pixels for Medical and Space Imagers
Eliahu Cohen1, Moriel Shnitser2, Tsvika Avraham2, Ofer Hadar2 and Yocheved Dotan3
1Tel-Aviv University, IL; 2Ben-Gurion University, IL; 3Ruppin Academic Center, IL

14:30W6.7.3Implementing Double Error Correction Orthogonal Latin Squares Codes in Xilinx FPGAs
Mustafa Demirci1, Pedro Reviriego2 and Juan Antonio Maestro2
1Alesan, TR; 2Universidad Antonio de Nebrija, ES

14:30W6.7.4On Reliability Enhancement Using Adaptive Core Voltage Scaling and Variations on TSMC 28nm LP process process FPGAs
Petr Pfeifer and Zdenek Pliva, Technical University of Liberec, CZ

14:30W6.7.5Power and Performance Optimization in Long-term Operation
André Romão1, Jorge Semião1, Carlos Leong2, Marcelino Santos3, Isabel Teixeira3 and Paulo Teixeira3
1University of Algarve, PT; 2INESC-ID, PT; 3Technical University of Lisbon, PT

15:00W6.8Paper Session IV: Resiliency, Self-Test and Self-Diagnosis
15:00W6.8.1Invited Talk - DEEP-ER: Scalable resiliency in Exascale Computing
Michael Kauschke, Intel, DE

15:00W6.8.2Improving the Reliability of Skewed Caches through ECC based Hashes
Sercan Yegin1, Burak Karsli1, Oguz Ergin1, Marco Ottavi2, Salvatore Pontarelli2 and Pedro Reviriego3
1TOBB University, TR; 2University of Rome Tor Vergata, IT; 3Universidad Antonio de Nebrija, ES

15:00W6.8.3A new Diagnostic method for VLIW Processors*)
Davide Sabena, Luca Sterpone and Matteo Sonza Reorda, Politecnico di Torino, IT

15:00W6.8.4Aging Monitoring Methodology for Built-In Self-Test Applications*)
João Coelho1, Jorge Semião1, Carlos Leong2, Marcelino Santos3, Isabel Teixeira3 and Paulo Teixeira3
1University of Algarve, PT; 2INESC-ID, PT; 3Technical University of Lisbon, PT

16:15W6.9Closing Session

*)indicates short paper

W3 Electronic System-Level Design towards Heterogeneous Computing



08:30W3.1Opening Session
08:45W3.2Session 1 Trends in Heterogeneous Computing: the industrial perspective
08:45W3.2.1Heterogeneous Computing in the Cloud: emerging trends from the industry
Steve Hebert, Nimbix,

09:15W3.2.2Higher Level Programming Abstractions for FPGAs using OpenCL
Bogdan Pasca, Altera European Technology Centre,

09:45W3.3Panel 1
Koen Bertels1, Steve Hebert2 and Bogdan Pasca3
1Delft University of Technology, NL; 2Nimbix, ; 3Altera European Technology Centre,
10:30W3Coffee Break+Poster Session 1
11:00W3.4Session 2 - Research challenges in Heterogeneous Computing design flows
11:00W3.4.1FPGA based accelerators for Big Data: Polymorphic computing for Big Data
Koen Bertels, Delft University of Technology,

11:30W3.4.2Mapping applications to heterogeneous accelerators: tool flows and run-time systems
Christian Plessl, University of Paderborn,

13:00W3.5Session 3 -Compilers and code optimization for hardware-accelerated platforms
13:00W3.5.1From Software Code to Hardware: Directions in High-Level Synthesis
Philippe Coussy, Université de Bretagne-Sud, Lab-STICC, FR

13:30W3.5.2Polyhedral compilation and code transformations for High-Level Synthesis
Louis-Noel Pouchet, University of California Los Angeles, US

14:00W3.6Session 4 - Towards higher-level design approaches
14:00W3.6.1CoDesign with Verity: bidirectional control-flow across the FPGA-CPU divide
Eduardo Aguilar Peleaz, Imperial College, GB

14:30W3.6.2Borrowing high-level paradigms from parallel computing: an OpenMP-based design flow
Alessandro Cilardo, University of Naples Federico II, IT

15:00W3.7Panel 2
Philippe Coussy1, Louis-Noel Pouchet2 and Eduardo Aguilar Peleaz3
1Université de Bretagne-Sud, Lab-STICC, FR; 2University of California Los Angeles, US; 3Imperial College, GB
15:30W3Coffee Break + Poster Session 2
16:00W3.8Session 5 - Current and emerging heterogeneous computing applications
16:00W3.8.1Heterogeneous HPC: combining FPGAs, CPUs, and GPUs for financial analytics
David Thomas, Imperial College, GB

16:30W3.9Panel 3
Alessandro Cilardo, University of Naples Federico II, IT
Steve Hebert1 and Bogdan Pasca2
1Nimbix, ; 2Altera European Technology Centre,
16:45W3.10Closing Session

W1 International Workshop on Dependable GPU Computing



08:30W1.1Opening Session
08:30W1.1.1Opening Remarks
Dimitris Gizopoulos1, Hans-Joachim Weunderlich2 and Paolo Prinetto3
1University of Athens, GR; 2University of Stuttgart, DE; 3Politecnico di Torino, It

08:30W1.1.2Keynote 1: GPGPU for dependable systems - a blessing or a curse?
Avi Mendelson, Technion, IL

09:15W1.2Invited Talk 1
09:15W1.2.1GPGPU Reliability - Challenges and Research Directions
Sudhanva Gurumurthi, AMD, US

09:45W1.3Session 1 - "Software Approaches for GPUs Dependability Enhancement"

Murali Annavaram, University of Southern California, Los Angeles, US

Amir Nahir, IBM Research, IL

09:45W1.3.1An improved fault mitigation strategy for CUDA Fermi GPUs
Stefano Di Carlo, Giulio Gambardella, Ippazio Martella, Paolo Prinetto, Daniele Rolfo and Pascal Trotta, Politecnico di Torino, IT

10:05W1.3.2Software-Based Techniques for Reducing the Vulnerability of GPU Applications
Si Li1, Vilas Sridharan2, Sudhanva Gurumurthi2 and Sudhakar Yalamanchili1
1Georgia Tech., US; 2AMD, US

10:25W1.3.3A-ABFT: Autonomous Algorithm-Based Fault Tolerance on GPUs
Claus Braun, Sebastian Halder and Hans-Joachim Wunderlich, University of Stuttgart, DE

10:45W1Coffee Break+Posters
11:30W1.4Invited Talk 2
11:30W1.4.1Reliable Acceleration - Reliability in a World of GPUs & Other Special Purpose Accelerators
Arijit Biswas, Intel, US

13:00W1.5Keynote 2
13:00W1.5.1GPU Related Errors in Large Scale Systems: A Study of Blue Waters Supercomputer at NCSA-Illinois
Ravishankar K. Iyer, University of Illinois at Urbana-Champaign, US

13:45W1.6Session 2 - "Fault Detection and Tolerance in GPUs"

Nathan DeBardeleben, Los Alamos National Laboratory, US

Hans-Joachim Wunderlich, University of Stuttgart, DE

13:45W1.6.1Benefits and Countermeasures of Increasing the GPU code Degree of Parallelism
Paolo Rech and Luigi Carro, UFRGS, BR

13:45W1.6.2On the Evaluation of Soft-Errors Detection Techniques for GPGPUs
Davide Sabena1, Matteo Sonza Reorda1, Luca Sterpone1, Paolo Rech2 and Luigi Carro2
1Politecnico di Torino, IT; 2UFRGS, BR

13:45W1.6.3Tolerating Hard Faults in GPGPUs
Waleed Dweik, Mohammad AbdelMajeed and Murali Annavaram, University of Southern California, US

14:45W1Coffee Break
15:15W1.7Panel Session

Dimitris Gizopoulos, University of Athens, GR

Sudhakar Yalamanchili1, Ravishankar K. Iyer2, Stefano Di Carlo3, Sudhanva Gurumurthi4, Arijit Biswas5 and Bodo Hoppe6
1Georgia Tech., US; 2University of Illinois at Urbana-Champaign, US; 3Politecnico di Torino, IT; 4AMD, US; 5Intel, US; 6IBM, DE
16:45W1.8Closing Session

Syndicate content