Workshop

Friday Workshops

W2 ES4CPS - Engineering Simulations for Cyber-Physical Systems

Workshop

Agenda

08:30am - 08:45am	Opening session
08:45am - 10:00am	Invited talk: Eric Coelingh, Volvo Car Corporation: "From SARTRE towards Autonomous Driving - An Experience Report and Outlook"
10:00am - 10:30am	Coffee break
10:30am - 12:00pm	Benjamin Vedder, Thomas Arts, Jonny Vinter and Magnus Jonsson: “Combining Fault-Injection with Property-Based Testing” Krishnan Srinivasarengan, Goutam Y G and Girish Chandra: “Home Energy Simulation for Non-Intrusive Load Monitoring Applications” Shivam Bhasin, Tarik Graba, Jean-Luc Danger, Yves Mathieu, Daisuke Fujimoto and Makoto Nagata: “Physical Security Evaluation at an Early Design-Phase: A Side-Channel Aware Simulation Methodology”
12:00pm - 01:00pm	Lunch
01:00pm - 02:30pm	Ashur Rafiev, Alexei Iliasov, Alexander Romanovsky, Andrey Mokhov, Fei Xia and Alex Yakovlev: “ArchOn: Architecture-open Resource-driven Cross-layer Modelling Framework” Peter Kourzanov: “DSL methods for CPS simulation in the cloud” Md. Abdullah Al Mamun and Jörgen Hansson: “Reducing Simulation Testing Time by Parallel Execution of Loosely-Coupled Segments of a Test Scenario”
02:30pm - 03:00pm	Coffee break
03:00pm - 04:00pm	Delf Block, Sönke Heeren, Stefan Kühnel, Andre Leschke, Bernhard Rumpe and Vladislavs Serebro: “Simulations on Consumertests: A Perspective for Driver Assistance Systems” Daniel Cesarini, Luca Cassano, Alessio Fagioli and Marco Avvenuti: “Modeling and Simulation of Energy-Aware Adaptive Policies for Automatic Weather Stations”
04:00pm - 05:00pm	Final discussion, closing session, and final remarks.

W4 Workshop on Design Automation for Understanding Hardware Designs

Workshop

Agenda

8:30 - 8:45	Opening session Chairs: Görschwin Fey, Emmanuelle Encrenaz-Tiphéne
8:45 - 9:30	Invited talk Managing Design Knowledge for IP Cores – State-of-the-art and Open Questions Alexander Rath Infineon Technologies AG, Munich, Germany
9:30 - 10:30	Technical session: Formal and semi-formal Automatic identification of logical relationships among internal signals with small numbers of test vectors Masahiro Fujita, Takeshi Matsumoto and Satoshi Jo University of Tokyo, Japan Using Natural Language Documentation in the Formal Verification of Hardware Designs Christopher Harris and Ian Harris University of California, Irvine, USA Understanding Compound Systems from their Components' Properties Syed-Hussein Syed-Alwi and Emmanuelle Encrenaz Université Pierre et Marie Curie Paris 6, France Design Understanding with Fast Prototyping from Assertions Katell Morin-Allory, Fatemeh Javaheri and Dominique Borrione Univ. Grenoble Alpes, Grenoble, France
10:30 - 11:00	Coffee break & poster presentations Posters: Detecting Concurrency Problems in System Level Designs Alper Sen and Onder Kalaci Bogazici University, Istanbul, Turkey Automatically connecting hardware blocks via light-weight matching techniques Jan Malburg1, Niklas Krafczyk1 and Goerschwin Fey1,2 1University of Bremen, Germany 2German Aerospace Center, Bremen, Germany Exact Solution for Trace Signal Selection with Pseudo Boolean Optimization (PBO) Shridhar Choudhary, Kousuke Oshima, Amir Masoud Gharehbaghi, Takeshi Matsumoto and Masahiro Fujita The University of Tokyo, Tokyo, JAPAN
11:00 - 12:00	Invited talk Capturing and Validating Design Understanding using Formal Properties Raik Brinkmann OneSpinSolutions GmbH, Munich, Germany
12:00 - 13:00	Lunch
13:00 - 13:45	Invited talk Design Understanding in SOC Development - Recent Advances and New Challenges Lyes Benalycherif ST Microelectronics, Grenoble, France
13:45 - 14:15	Technical session: System level productivity DiplodocusDF: Analyzing Hardware/Software Interactions with a Dinosaur Andrea Enrici, Ludovic Apvrille and Renaud Pacalet Telecom ParisTech, Biot, France Towards a Multi-dimensional and Dynamic Visualization for ESL Designs Jannis Stoppe¹, Marc Michael², Mathias Soeken^1,2, Robert Wille^1,2,3 and Rolf Drechsler^1,2 ¹DFKI GmbH, Bremen, Germany ²University of Bremen, Bremen, Germany ³Technical University Dresden, Germany
14:15 - 15:00	Invited talk Software Reverse Engineering Rainer Koschke University of Bremen, Germany
15:00 - 15:30	Coffee break & poster presentations
15:30 - 16:15	Technical session: Reverse and automatic engineering Increasing Verilog’s Generative Power Cherif Salama¹ and Walid Taha² ¹Ain Shams University, Cairo, Egypt ²Halmstad University, Halmstad, Sweden zamiaCAD: Understand, Develop and Debug Hardware Designs Maksim Jenihhin¹, Valentin Tihhomirov¹, Syed Saif Abrar¹, Jaan Raik¹ and Guenter Bartsch² ¹Tallinn University of Technology, Estonia ²zamiaCAD, Germany Mutation based Feature Localization Jan Malburg¹, Emmanuelle Encrenaz-Tiphene² and Goerschwin Fey^1,3 ¹University of Bremen, Germany ²Université Pierre et Marie Curie Paris 6, France ³German Aerospace Center, Bremen, Germany
16:15 - 17:00	Panel Design understanding – where do industry and acdemia team up? Panelists: Ian Harris, Lyes Benalycherif, Raik Brinkmann The panel will summarize the results of the day and prioritize topics focusing on three questions: What are the most urgent topics from an industrial perspective? What are the most challenging topics from an academic perspective? Where can we exploit synergies between academia and industry?

W5 3D Integration: Applications, Technology, Architecture, Design, Automation, and Test

Workshop

Agenda

08:15 SESSION 1: OPENING

Chair: Paul Franzon, North Carolina State University, US

08:15 Welcome Address

Saqib Khursheed, University of Liverpool, UK

08:20 Keynote Presentation: 3D Technology – Key Enabler for 3D Heterogeneous Integration

Jürgen Wolf, IZM Fraunhofer, DE.

Abstract:

3D integration is a key technology for microelectronics to meet the growing demands regarding more functionality, increase in performance, miniaturization and cost reduction and becomes important for application areas e.g. cyber physical systems, internet of things, ambient assisted living (AAL), information & communication, security and health. Interposers with Through Silicon Vias (TSVs) are becoming a very important element and a key enabler for the realization of 3D Systems-in-Packages (SiPs) whose main advantages are the decoupling of front end / back end processing for the implementation of TSVs and redistribution layers to integrate multiple devices into a system in package (WL-SiP). Specific applications result in technical approaches ranging from high density TSV integration, high density RDL for digital applications to interposers for RF applications as well as MEMS and sensor integration and optical interconnects. The presentation will highlight results and technical achievements for 3D integration using TSV interposer and addresses also the broad spectrum of topics from design, technology and reliability related to 3D systems.

Bio:

M. Jürgen Wolf studied electrical engineering and joined Fraunhofer Institute for Reliability and Microintegration (IZM) in 1994 working in the field of wafer level packaging and system in package (SiP). Since 2011 he is head of department Wafer Level System Integration and also responsible for the management of “ASSID - All Silicon System Integration Dresden”. He is also involved in a number of research projects on national, European and international level. Wolf is a European representative in the technical working group Assembly & Packaging of ITRS, a board member of EURIPIDES, JISSO and a member of IEEE/SMTA. Furthermore, he is a representative of the Fraunhofer Cluster 3D Integration.

09:00 Special Session: Reliability and Thermal issues in 3D ICs

09:00 Overview of 3D-Reliability Research in Imec

Kristof Croes – IMEC, BE

09:20 Advanced Failure Analysis Techniques for 3D Packages

Frank Altmann - Fraunhofer IWM Halle, DE

09:40 Research Directions on Thermal Impact of 3D Assembly

Haykel Ben Jaama- CEA-LETI, FR

10:00 SESSION 2: Coffee Break & Posters

10:30 SESSION 3: Invited Talk and Panel

10:30 Invited Talk: Heterogeneous Sensor Integration; Increased Technology Readiness Level

Maaike Visser - SINTEF, Norway

11:00 Panel Session: Are Slow Standardization and CAD-Tool Development Hindering the Progress of 3D IC Design and Integration?

Moderator: Françoise Von Trapp – “Queen of 3D”, 3DInCites, US

Panelists: Brandon Wang – Cadence Design Systems, US

Juergen Schloeffel – Mentor Graphics, DE

Makoto Nagata – Kobe University, JP

Mustafa Badaroglu – Qualcomm Technologies, BE

12:00 LUNCH BREAK

13:00 SESSION 4: Technology and Design Challenges for 3D ICs

Chair: Thomas Thärigen, Cascade Microtec GmbH, DE

13:00 Integration of Through -Silicon Vias in a High Performance BiCMOS Technology for RF -Grounding and 3D -Integration

M. Wietstruck¹, M. Kaynak¹, S. Marschmeyer¹, K. Zoschke², and B. Tillack^1,3

¹ IHP,DE ; ² Fraunhofer IZM, DE ; ³ Technische Universität Berlin, DE

13:18 2.5D & 3D Technologies require Innovative Lithography Solutions

Klaus Ruhmer, Philippe Cochet,Roger McCleary

Rudolph Technologies, US

13:36 3D Wirebondless IGBT Module for High Power Applications

Z. Y. Gao¹, Y. X. Ren¹, Y.C. Lee², H.L. Yiu², X.Q. Shi¹

¹Hong Kong Applied Science & Technology Research Institute (ASTRI), HK; ²Hong Kong Science & Technology Parks Corporation (HKSTP), HK

13:54 Towards Trustworthy NoC-Based 3D-MPSoCs

Johanna Sepúlveda^1,2, Guy Gogniat², Marius Strum¹

¹University of São Paulo, BR; ²LAB-STICC, Lorient, FR

14:12 A TSV-Property-aware Synthesis Method for Application Specific 3D-NoCs

Felix Miller, Thomas Wild, Andreas Herkersdorf, Vladimir Todorov, Daniel Mueller-Gritschneder, Ulf Schlichtmann

Technische Universität, München, DE

14:30 SESSION 5: Coffee Break & Posters

15:00 SESSION 6: Test and Thermal Challenges for 3D ICs

Chair: Basel Halak, U of Southampton, UK

15:00 Design, Test Generation, Processing, and Pre- and Post-Bond Measurement Results of a 3D-DfT Demonstrator Chip Stack

Erik Jan Marinissen¹, Bart De Wachter¹, Stephen O’Loughlin¹, Sergej Deutsch², Christos Papameletis², Tobias Burgherr²

¹IMEC, BE ; ²Cadence Design Systems, DE

15:18 Power and DFT Aware Partitioning for 3D-SOCs

Amit Kumar and Sudhakar M. Reddy

University of Iowa, US

15:36 System Level Thermal Modelling for 3D IC: A Memory-on-Logic 3D Test Case Study

Cristiano Santos ^1,3, Pascal Vivet¹, Denis Dutoit¹, Philippe Garrault², Nicolas Peltier², Ricardo Reis³

¹CEA-LETI,FR; ²DOCEA-Power, FR; ³UFRGS, BR

15:54 Thermal Power Plane enabling Dual-Side Electrical Interconnects supporting High-Performance Chip Stacking

Thomas Brunschwiler¹, Stefano Oggioni², Timo Tick¹, Gerd Schlottig¹, Hubert Harrer³

¹IBM Research, Zurich, CH; ²IBM ISC, Milan, IT; ³IBM STG, Böblingen, DE

16:12 Thermal Coupling in TSV-Based 3-D Integrated Circuits

Ioannis Savidis¹ and Eby G. Friedman²

¹Drexel University, US; ²University of Rochester, US

16:30 CLOSE

Posters

A Novel Low-Power TSV Interconnection Scheme Based On Adiabatic Energy-Recovery Logic

Khaled Salah

Mentor Graphics, Cairo, Egypt

3D IC Test through Power Line Methodology

Alberto Pagani, Alessandro Motta

STMicroelectronics – SPA FMTR&D (Sense, Power & Automotive Front-end Manufacturing & Technology R&D)

2.5D Test Cost Optimization using 3D-COSTAR

Mottaqiallah Taouil¹, Said Hamdioui¹, Erik Jan Marinissen² and Sudipta Bhawmik³

¹Delft University of Technology, NL, ²IMEC, BE, ³Qualcomm, US

Test Pattern Retargeting in 3D SICs Using an IEEE P1687 based 3DFT architecture

Yassine Fkih^1,2, Pascal Vivet¹, Bruno Rouzeyre², Marie-Lise Flottes², Giorgio Di Natale², Juergen Schloeffel³

¹CEA-Leti, MINATEC Campus, FR, ²LIRMM, Univ Montpellier II/CNRS, FR, ³Mentor Graphics, DE

Impact Analysis of Through-Silicon-Via Variation on Performance and Energy Consumption of 3D Networks-on-Chip Architectures

Michael Opoku Agyeman, Ali Ahmadinia

School of Engineering and Built Environment Glasgow Caledonian University, Glasgow, UK

Processing and Microstructure of Solid-Liquid Interdiffusion Interconnects for 3D Integration

Iuliana Panchenko¹, Juergen Grafe¹, Maik Mueller², Klaus-Juergen Wolter², M. Juergen Wolf¹, Klaus-Dieter Lang¹

¹Fraunhofer IZM ASSID, Moritzburg, Germany, ²Electronics Packaging Laboratory, TU Dresden, Dresden, Germany

TSV INTERPOSER PLATFORM FOR 3D HETEROGENEOUS INTEGRATION

M. Juergen Wolf , K.-D. Lang

Fraunhofer IZM ASSID, Berlin, Dresden, Germany

W8 3PMCES - Performance, Power and Predictability of Many-Core Embedded Systems

Workshop

Agenda

08:30 – 08:45	Opening Session Adam Morawiec, Tapani Ahonen, and Walter Stechele
08:45 – 09:30	Keynote Presentation Trustworthy Contract Computing through Seamless System Build and Operation Tapani Ahonen, Tampere University of Technology, FI Computing systems are built and operated in a way akin to manufacturing pipelines. In a pipeline organization all stages need to operate simultaneously without significant disruptions. When one stage fails, the others will come to a halt either immediately or after a short grace period enabled by a buffer queue. The ability to fix any possibly emerging issues as quickly as possible is crucial for maintaining system functionality and throughput. Hierarchically organized management is designed for keeping all the individual stages in efficient operation. Such management organizations are prone to introducing inefficiencies. Hierarchies become deeper with growing system complexity. At higher levels of management the area of responsibility is wider while the capability to directly control low-level operations is weaker. Information exchange and instruction chains are usually cumbersome when they span many levels of organization. This happens in part due to operations reserving information to internal use only as well as in part due to restrictive exchange interfaces. Incoherency of information available for different operations in the organization often leads to unnecessary functional redundancy and indirect, wasteful methods to execute joint functions. What purpose does it serve to encapsulate information in one operation or stage only? In a pure pipeline organization each stage is supposed to execute a highly specialized sub-system consisting of fully independent operations no other stage is capable of. Special tools and components for one operation are in the possession of one stage only. It makes perfect sense to decline internal cooperation for ensuring local control, as other stages will not be needed, nor is their involvement helpful. However, this is not the case for computing systems. Software functions cannot be executed without supporting hardware and most application code cannot be executed without supporting operating system functions. Design time tools cannot produce good results without detailed information of the run-time environment.
09:30 – 10:00	Session 1: Predictable Portability of Parallel Code Based on Increased Semantic Information Sharing and Fork-Join Programming Konstantin Popov, SICS Swedish ICT AB, SE Heterogeneous parallel computing seems to be the way forward in all market segments where computers are used. Parallelism ranges from a few cores in small embedded systems to hundreds or even thousands in the HPC domain. Heterogeneity ranges from capability heterogeneous to functional heterogeneous systems with multiple completely different compute engines like GPGPUs or HW accelerators. This poses a tremendous challenge for software developers who currently need to tailor their software for each individual platform. In this talk I present an approach to ease the programming burden by means of open semantic information interfaces across software abstraction layers and using programming models that are amenable for analysis, modeling and good run-time scheduling decisions.
10:00 – 11:00	Session 2: Managing Execution in Dynamic Environment Efficient Leader Election for Synchronous Shared-Memory Systems Vicent Sanz Marco, Raimund Kirner, and Michael Zolda, University of Hertfordshire, UK Leader election is a frequent problem for systems where it is important to coordinate activities of a group of actors. It has been extensively studied in the context of networked systems. But with the raise of many-core computer architectures, it also became important for shared-memory systems. In this paper we present an efficient leader election technique for synchronous shared-memory systems. Synchronous in our context means the response time of code sections with relevant communication patterns is bounded. This makes our approach more efficient compared to leader election methods explored for asynchronous shared-memory systems. Our leader election method is used to help making the scheduling layer LPEL fault tolerant. A Proposal on Parallel Software Development for Network-on-Chip based Many-Core system (Short Paper) Guoqing Zhang and Tapani Ahonen, Tampere University of Technology, FI This paper gives a proposal on parallel software development for Network-on-Chip (NoC) based many-core systems. In the proposal, we recommend to use SLURM, an opensource software which is widely used in computer clusters, for hardware mapping and resource management, and recommend to use Message Passing Interface (MPI) to handle communications among cores in NoC systems. The partition concept in SLURM is suggested to be used for hardware mapping, especially for safety critical systems where require separated hardware mappings for safety-critical tasks and non-safety-critical tasks. We also demonstrate a methodology on porting the Symmetric Multi-Processor (SMP) architecture of Linux on top of SLURM-and-MPI enabled NoC systems by employing the scheduling domain concept of the Linux kernel. In the methodology, the cores on a NoC system are separated into partitions according to hardware topology to handle I/O-bound processes and CPU-bound processes respectively. *A Power-Aware Framework for Executing Streaming Programs on Networks-on-Chip*(Short Paper) Nilesh Karavadara, Simon Folie, Michael Zolda, Nga Nguyen, and Raimund Kirner, University of Hertfordshire, UK Software developers are discovering that practices which have successfully served single-core platforms for decades do no longer work for multi-cores. Stream processing is a parallel execution model that is well-suited for architectures with multiple computational elements that are connected by a network. We propose a power-aware streaming execution layer for network-on-chip architectures that addresses the energy constraints of embedded devices. Our proof-of-concept implementation targets the Intel SCC processor, which connects 48 cores via a network-onchip. We motivate our design decisions and describe the status of our implementation.
11:00 – 11:30	Coffee Break and Poster Session
11:30 – 12:00	Invited Talk: Performance Prediction and Software Development on Many-core Processor Platform Benoit Dupont de Dinechin, Kalray, FR Computation-intensive embedded applications are often constrained by the end-to-end latency. Classic platforms that support these applications rely on FPGA or farms of digital signal processors that run under the supervision of micro-controllers. There are significant advantages to hosting those applications on many-core platforms, in particular reducing the system size, weight, and power (SWaP), while improving programmability. However, existing many-core platforms based on CPU or GPU and the associated software stacks do not allow for predictable or even repeatable processing latencies. In this keynote the architecture and programming models of the MPPA-256 processor will be presented. It integrates 256 processing engine (PE) cores and 32 resource management (RM) VLIW cores on a single 28nm CMOS chip. We discuss how this processor and its programming models are especially suited for computational intensive applications under latency constraints. We also introduce the metalibm library generator for the Kalray VLIW cores, which automatically produces high-performance and correctly rounded implementations for application-specific specializations of the C99 libm and IEEE 754-2008 mathematical functions.
12:00 – 13:00	Lunch and Poster Session
13:00 – 14:00	Session 3: Reliability and Safety of Multi-Core Platforms Hardware/Software-based Runtime Control of Multicore Processors-on-Chip for Reliability Management under Aging Constraints Walter Stechele and Erol Koser, Technical University Munich, DE Multicore processors-on-chip are gaining interest in safety-critical applications like aeronautic, automotive, and medical. Traditional methods for reliability, e.g. triple module redundancy, might be too expensive, therefore recent research is turning towards reliability-aware runtime management of multicore processors, including dynamic voltage frequency scaling and dynamic load distribution. We will present a case study, introducing a reliability layer between multiprocessor hardware and middleware, to cope with aging related degradation. MPSoC performance degradation (due to aging) predictability at high abstraction level and Applications Olivier Heron, CEA-LIST, FR The shrinking size of transistors and nano-wires results in increased device density, increased speed and reduced power consumption (Moore's Law). However, device reliability is reduced due to the non-ideal scaling of supply voltage. We observe two trends in semiconductor the industry. Firstly, MOS technology scaling is still continuing (and will continue). The failure physics become more complex and new failures, that were negligible in old technology modes, now emerge. The semiconductor industry has solutions to reach the ITRS requirements until the end of next year1. After that, only interim solutions are available. Secondly, Multi-Processor SoCs (MPSoCs) offer high performance; are cheap, consumes less power than high-end processors; and are able to support a large variety of applications. As MPSoCs are now applied in most market segments (automotive, consumer, HPC, etc.), both high-performance and reliability become major concerns, even for non-safety-critical applications. Current academic and commercial CAD tools aid the designer to get reliability projections of the chip in the very last design cycles before chip tape-out. However, MPSoC design and verification raise new challenges. They require new methodologies and CAD tools able to capture both architecture design and reliability at higher abstraction levels, such as transactional level. In the first development cycles, design space exploration is necessary to analyze different MPSoC configurations (memory sizes, processor pipeline depth and others) and SW and their impact on performance, power consumption and reliability. In this talk, I will present a methodology we propose in RELY project to model and predict reliability at high abstraction level and I will present some results on an MPSoC case study.
14:00 – 14:30	Session 4: European Projects Cluster European Project Cluster on Mixed-Criticality Systems Salvador Trujillo (IK4-IKERLAN, ES), Roman Obermaisser (University of Siegen, DE), Kim Gruettner (OFFIS – Institute for Information Technology, DE), Francisco J. Cazorla (Barcelona Supercomputing Center and IIIA-CSIC, ES), and Jon Perez (IK4-IKERLAN, ES) Modern embedded applications already integrate a multitude of functionalities with potentially different criticality levels into a single system and this trend is expected to grow in the near future. Without appropriate preconditions, the integration of mixed-criticality subsystems can lead to a significant and potentially unacceptable increase of engineering and certification costs. There are several ongoing research initiatives studying mixed criticality integration in multicore processors. Key challenges are the combination of software virtualization and hardware segregation and the extension of partitioning mechanisms jointly addressing significant extra-functional requirements (e.g., time, energy and power budgets, adaptivity, reliability, safety, security, volume, weight, etc.) along with development and certification methodology. This paper provides a summary of the challenges to be addressed in the design and development of future mixed criticality systems and the way in which some current European Projects on the topic address those challenges.
14:30 – 15:00	Coffee Break and Poster Session
15:00 – 16:00	Session 5: System Design Technologies Timing Analysis of a Heterogeneous Architecture with Massively Parallel Processor Arrays Deepak Gangadharan, Alexandru Tanase, Frank Hannig, and Jürgen Teich, University of Erlangen, Nuremberg, DE In this paper, we present some analytical results from the timing analysis of a heterogeneous architecture with massively parallel processor arrays (MPPA). Specifically, in this work, the MPPA is a tightly-coupled processor array (TCPA). In recent work, the TCPA has been shown to be timing predictable and symbolic loop scheduling has been used to compute predictable schedules for the execution of each application mapped on the TCPA and run in parallel. However, the timing predictability provided by the TCPA can only be ensured if the shared resources on the TCPA tile provide the required input data rates to the TCPA. Towards this, we formulate a condition that needs to be satisfied over the local shared bus for the data transfers from the local memory to the TCPA in order to achieve the required application quality and latency of output data. Further, we also formulate another condition that must be satisfied by DMA data transfers from memory tile to TCPA tile during the arbitration in the memory tile so that the service levels provided by the NoC for the DMA transfers are maximally utilized. An Accurate Power Estimation Method for MPSoC Based on SystemC Virtual Prototyping Khouloud Zine Elabidine, LIP6, FR The paper presents a novel method called DPE (Design Power Estimation) which estimates efficiently and fastly SoC’s power consumption. In fact our power modeling consists in defining fonctional activities that best charaterize each component of the considerated platform. In this work, we are not looking for making accurate estimations of power consumption; however we introduce a method that offers a global power characterization which helps performing design space exploration at an early stage of the design flow (SystemC virtual prototyping) and find the best trade-off between power and performance. Parallelization of Object Detection Algorithm through Hardware Threads for MPSoCs David Watson, Ali Ahmadinia, Gordon Morison, and Tom Buggy, Glasgow Caledonian University, UK Adapting software applications to multiprocessor system on chips (MPSoCs) typically follows multi-threaded design flows and data dependence analysis to implement concurrency. However, to take advantage of the hardware customizations possible through reconfigurable MPSoCs, hardware threads (HWTs) can be used to increase application concurrency and throughput, whilst complimenting multi-threaded design flows. In this work, we show how applications can be analyzed and tailored to use HWTs increase the concurrency of applications. We show how task flow graphs (TFGs) can be changed into a Kahn Process Network (KPN) which describes how software tasks can interact with HWTs over FIFOs to maintain memory coherency at the software level. We evaluate our MPSoC designs based on performance increase, throughput, and energy efficiency with a data-intensive face detection algorithm and obtain performance increases up to 3.6x compared to software-only implementation, and throughput and energy efficiencies of up to 85.97MB/s and 11.92MB/W respectively.
16:00 – 16:45	Panel Session: What is still needed to have a reliable embedded system development ecosystem in place? Moderator: Achim Rettberg, OFFIS, Germany Speakers: Sven Karlsson, DTU, DK Tapani Ahonen, TUT, FI Kim Grüttner, OFFIS, DE Benoit Dupont de Dinechin, Kalray, FR Walter Stechele, TU Munich, DE
16:45 – 17:00	Closing Session


POSTER PRESENTATIONS
	Adaptive Resource Control in Multi-core Systems Alexei Iliasov, Ashur Rafiev, Alexander Romanovsky, Andrey Mokhov, Alex Yakovlev, and Fei Xia, Newcastle University, UK Multi-core systems present a set of unique challenges and opportunities. In this paper we discuss the issues of power-proportional computing in a multi-core environment and argue that a cross-layer approach spanning from hardware to user-facing software is necessary to successfully address this problem. Criticality-Aware Functionality Allocation for Distributed Multicore Real-Time Systems Junhe Gan, Paul Pop, and Jan Madsen, Technical University of Denmark, DK We are interested in the implementation of mixed-criticality hard real-time applications on distributed architectures, composed of interconnected multicore processors, where each processing core is called a processing element (PE). The functionality of the mixed-criticality hard real-time applications is captured in the early design stages using functional blocks of different Safety-Integrity Levels (SILs). Before the applications are implemented, the functional blocks have to be decomposed into software tasks with SILs. Then, the software tasks have to be mapped and scheduled on the PEs of the architecture. We consider fixed-priority preemptive scheduling for tasks and non-preemptive scheduling for messages. We would like to determine the function-to-task decomposition, the type of PEs in the architecture and the mapping of tasks to the PEs, such that the total cost is minimized, the application is schedulable and the safety and security constraints are satisfied. The total costs capture the development and certification costs and the unit cost of the architecture. We propose a Genetic Algorithm-based approach to solve this two-objective optimization problem, and evaluate it using a real-life case-study from the automotive industry. Estimating Video Decoding Energies And Processing Times Utilizing Virtual Hardware Sebastian Berschneider, Christian Herglotz, Marc Reichenbach, Dietmar Fey, and André Kaup, Friedrich-Alexander-University Erlangen-Nuremberg, DE The market for embedded devices increases permanently. Especially cell- and smartphones, which are substantial tools for many people, become more and more complex and serve nowadays as portable computers. An important problem to these devices is the energy efficiency. The accumulator battery can be discharged within a few hours, especially when a smartphone processes computationally intensive tasks like video decoding. Therefore, modern devices tend to include power efficient processors. But not only power efficient hardware effects the overall power consumption, also the design of algorithms regarding energy efficient programming is an important task. Usually, energy efficient development is done using real hardware, where programs are executed and power consumption is measured. This process is highly costly and error prone. Moreover, expensive hardware equipment is necessary. Therefore, in this work we present a design methodology that enables to run the application software on virtual hardware (CPU) that counts the instructions and memory accesses. By multiplying a priorly measured energy and time per instruction to these counts, energy and time estimations are possible, without having to run the target application on real hardware. As a result, we present a methodology for writing embedded applications with immediate feedback about these non-functional properties. Increased Reliability of Many-Core Platforms through Thermal Feedback Control Matthias Becker, Kristian Sandström, Moris Behnam, and Thomas Nolte, MRTC / Mälardalen University, SE In this paper we present a low overhead thermal management approach to increase reliability of many-core embedded real-time systems. Each core is controlled by a feedback controller. We adapt the utilization of the core in order to decrease the dynamic power consumption and thus the corresponding heat development. Sophisticated control mechanisms allow us to migrate the load in advance, before reaching critical temperature values and thus we can migrate in a safe way with a guarantee to meet all deadlines. Performance Analysis of a Computer Vision Application with the STHORM OpenCL SDK Vítor Schwambach, Sébastien Cleyet-Merle, Alain Issard, STMicroelectronics, FR and Stéphane Mancini, TIMA lab, FR Computer vision applications constitute one of the key drivers for embedded many-core architectures. To enable parallel application performance estimation and optimization early in the development flow, the development environment must provide the developer with simulation tools for fast and precise application-level performance analysis. In this work, we port a face detection application onto the STHORM many-core accelerator using the STHORM OpenCL SDK. We compare performance results obtained with the STHORM cycle-approximate simulator and a prototype implementation, and show that a high mismatch is present. We identify the key contributors to this mismatch, and propose that these be addressed in the upcoming versions of the SDK to allow more precise simulation results for early design space exploration. PSE - Performance Simulation Environment Jussi Hanhirova and Vesa Hirvisalo, Aalto University, FI We use a resource reservation based simulation environment (PSE) as a research tool to experiment on how to co-model HW/SW schedulers. Our focus is on heterogenous systems with manycores. Task processing based systems use different load balancing schemes to make efficient use of resources and to schedule work within real-time constraints. As parallel MPSoCs are constantly evolving, simulation is a viable tool to explore different configurations. Scaling Performance of FFT Computation on an Industrial Integrated GPU Co-processor: Experiments with Algorithm Adaptation. Mohamed Amine Bergach and Serge Tissot, Kontron, FR, Michel Syska and Robert De Simone, Inria, FR Recent Intel processors (IvyBridge, Haswell) contain an embedded on-chip GPU unit, in addition to the main CPU processor. In this work we consider the issue of efficiently mapping Fast Fourier Transform computation onto such coprocessor units. To achieve this we pursue three goals: First, we want to study half-systematic ways to adjust the actual variant of the FFT algorithm, for a given size, to best fit the local memory capacity (the registers of a given GPU block) and perform computations without intermediate calls to distant memory; Second, we want to study, by extensive experimentation, whether the remaining data transfers between memories (initial loads and final stores after each FFT computation) can be sustained by local interconnects at a speed matching the integrated GPU computations, or conversely if they have a negative impact on performance when computing FFTs on GPUs ”at full blast”; Third, we want to record the energy consumption as observed in the previous experiments, and compare it to similar FFT implementations on the CPU side of the chip. We report our work in this short paper and its companion poster, showing graphical results on a range of experiments. In broad terms, our findings are that GPUs can compute FFTs of a typical size faster than internal on-chip interconnects can provide them with data (by a factor of roughly 2), and that energy consumption is far smaller than on the CPU side. Smart Scheduling of Streaming Applications via Timed Automata Waheed Ahmad, Robert de Groote, Philip K.F. Hölzenspies, Mariëlle Stoelinga, and Jaco van de Pol, University of Twente, NE Streaming applications such as video-in-video and multi-video conferencing impose high demands on system performance. On one hand, they require high system throughput. On the other hand, usage of the available resources must be kept to minimum in order to save energy. Synchronous dataflow (SDF) graphs are very popular computational models for analysing streaming applications. Recently, they are widely used for analysis of the streaming applications on a single processor as well as in a multiprocessing context. Smart scheduling techniques are critical for system lifetime so that the maximum throughput is obtained by running as few resources as possible. Current maximum throughput calculation methods of the SDF graphs requires an unbounded number of processors or static order scheduling of tasks. Other novel methods involves the conversion of an SDF graph to an equivalent Homogeneous SDF graph (HSDF). This approach results in a bigger graph; in the worst case, the size of converted HSDF graph could be exponentially bigger. This poster presents an alternative, novel approach to analyse SDF graphs on a given number of processors using a proved formalism for timed systems termed Timed Automata (TA). By definition, TA are automata in which the elapse of time is measured by clock variables. The conditions under which a transition can be taken are indicated by clock guards. Furthermore, invariants shows the conditions for a system to stay in a certain state. Synchronous communication between the timed automata is carried out by hand-shake synchronisation using input and output actions. Output and input actions are denoted with an exclamation mark and a question mark respectively, e.g. fire! and fire?. TA hold a good balance between expressiveness and tractability and are supported by various verification tools e.g. UPPAAL. We translate the SDF graph of an application, and a given architecture of computer processors into separate timed automaton. Both automata synchronise using the actions "req" and "fire". In this way, timed automaton of the application SDF graph is mapped on the timed automata of the architecture model. After that, we can analyse the performance using different measures of interest. In particular, the main contributions of this poster are: (1) Compositional translation of the SDF graphs into timed automata; (2) Exploiting the capabilities of UPPAAL to search the whole state-space and to find the schedule that fits on the available processors and maximises the throughput; (3) Finding the maximum throughput on homogeneous and heterogeneous platforms; (4) Quantitative model-checking. We also demonstrate that the deadlock freedom is preserved even if the number of processors varies. Results show that in some cases, the maximum throughput of an SDF graph remains same even if the number of processors is reduced. Similarly, a trade-off between the given number of processors and the maximum throughput can be obtained efficiently. Moreover, the benefits of quantitative model-checking and verification of the user-defined properties can also be enjoyed using different contemporary model-checkers. Future work includes energy optimal synthesis and scheduling, translation of the SDF graphs to Energy Aware Automata, extension of SDF graphs with energy costs and stochastics, dynamic power management (DPM) and reduction techniques of energy models. In order to tackle state-space explosion, we also plan to apply multi-core LTL model checking. *System Level Design Framework for Many-core Architectures* Pablo Peñil, Luis Diaz, and Pablo Sanchez, University of Cantabria, ES The complexity of the embedded, many-core architectures has been constantly increasing their shipment volume in recent years, providing a solution for creating highly optimized complex systems. In order to deal with the complexity of these many-core architectures, users are requiring new design methodologies that encompass system specification and performance analysis from the initial stages of the design process. The performance analysis frameworks should include software application and many-core hardware platform co-simulation in order to obtain estimations of the software execution time and performance of platform HW resources. This paper presents a fully-integrated host-compiled simulation framework which enables obtaining fast performance estimations for high-level system models. This framework could be integrated in a design exploration methodology that enables to choose the optimal specification and software parallelization, facilitating system implementation and minimizing designer effort.

Organisation

Mats Brorsson - professor of Computer Architecture at KTH, Sweden and a senior researcher at Swedish Institute of Computer Science (SICS). His current research are in programming models, run-time systems, operating systems and the architecture of parallel computer systems in particular multi- and many-core systems. Prof. Brorsson has authored and co-authored over 50 scientific papers in international conferences and journals.

Tapani Ahonen is a part-time Senior Scientist at Technoconsult (TC), Denmark and an Assistant Professor at Tampere University of Technology (TUT), Finland. His work is focused on proof-of-concept driven computer systems design with emphasis on many-core processing environments. Ahonen has an MSc in Electrical Engineering and a PhD in Information Technology from TUT. He has an extensive international publication record including edited books and journals, written book chapters and journal articles, invited talks in high-quality conferences, as well as full-length papers and paper abstracts in conference proceedings.

Sven Karlsson - associate professor at DTU Informatics, DTU, Denmark. His research interests are in programming models, compilers, architectures, operating systems and system software for parallel computers. He has published more than 30 papers in these fields.

Walter Stechele - associate professor at Technical University of Munich (TUM), Germany. His research interests include visual computing and robotic vision, with focus on Multi Processor System-on-Chip (MPSoC) architectures and design methodology, low power optimization, dynamic reconfiguration of FPGA devices, and applications in automotive and robotics.

Adam Morawiec - director at ECSI. He holds a PhD from TIMA Lab/INPG in Grenoble and is working in the domain of specification and design languages, system design and synthesis. He is an author of several scientific publications and editor of 4 books. He was also a chair of scientific conferences (DASIP, S4D, ESLsyn).

W7 Memristor Science & Technology

Workshop

Agenda

Time	Label	Session
08:30	W7.1	Opening Session Chair: Fernando Corinto, Politecnico di Torino, IT Co-Chair: Ronald Tetzlaff, Technische Universität Dresden, DE
08:45	W7.2	Invited Talk by Prof. L. O. Chua
08:45	W7.2.1	Memristor: State-of-the-Art L. O. Chua, University of California, Berkeley, US This exposition shows that the potassium ion-channels and the sodium ion-channels that are distributed over the entire length of the axons of our neurons are in fact locally-active memristors. In particular, they exhibit all of the fingerprints of memristors, including the characteristic pinched hysteresis Lissajous figures in the voltage-current plane, whose loop areas shrink as the frequency of the periodic excitation signal increases. Moreover the pinched hysteresis loops for the potassium ion-channel memristor, and the sodium ion-channel memristor, from the Hodgkin-Huxley axon circuit model are unique for each periodic excitation signal. An in-depth circuit-theoretic analysis and characterizations of these two classic biological memristors are presented via their small-signal memristive equivalent circuits, their frequency response, and their Nyquist plots. Just as the Hodgkin-Huxley circuit model has stood the test of time, its constituent potassium ion-channel and sodium ion-channel
10:00	W7	Coffee Break
10:15	W7.4	Session 1
10:15	W7.4.1	Invited Talk: Resistive Switching - From Basic Switching Mechanism to Device Applications Thomas Mikolajick¹, Hannes Mähne², H. Wylezich² and Stefan Slesazeck² ¹NaMLab gGmbH and Technische Universität Dresden, DE; ²NaMLab gGmbH, DE Resistive switching mechanisms are under intense study in the last 15 years mainly for applications in next generation memories. A variety of physical mechanisms exist that lead to different switching characteristics. Based on the portfolio of different device characteristics the device properties may be adjusted to different application needs. In this talk the progress in tailoring resistive switching characteristics both from literature as well as from the authors group will be shown and conclusions for prospects in semiconductor memories and other applications will be drawn.
11:15	W7.4.2	The art of SPICE modeling of memristive systems Dalibor Biolek, University of Defense and Brno University of Technology, CZ A methodology for accurate and reliable modeling of memristive devices in SPICE environment is presented. Due to specific features of SPICE-family programs, the simulation results can be burdened with errors, either evident or not apparent at first sight, or the solution may not be found at all. The above two kinds of problems, called imperfections and non-convergence issues, can be magnified in circuits containing memristive elements with a specific hysteresis behavior. Four key factors, influencing the accuracy and reliability, are discussed: numerical limits in SPICE, rules of building up behavioral models, the way of modeling the state and port equations, and setting the parameters of the analysis. The recommendations are applicable to a wide class of SPICE-family simulation programs. Demonstrations are given for PSpice and HSPICE.
12:00	W7	Lunch
13:00	W7.5	Session 2
13:00	W7.5.1	Modeling and simulation of memristive devices for memory and logic applications Stephan Menzel¹ and Rainer Waser² ¹Forschungszentrum Jülich, DE; ²RWTH Aachen Universität, DE Redox-based mesistive switching devices are a potential candidate for future non-volatile memory and logic applications. To enable circuit design using memristive devices predictive simulation models are required. In this work basic requirements are defined that needs to be fulfilled to accurately model memristive devices. In addition, a physics-based modeling approach for the resistive switching in ECM cells is presented which fulfills the relevant criteria. It is based on the electrochemical driven growth and dissolution of a metallic filament and covers self-consistently the basic experimental characteristics: I-V characteristics, nonlinear switching kinetics, and multilevel switching behavior.
13:45	W7.5.2	Memory Intensive Computing Shahar Kvatinski, Technion – Israel Institute of Technology, IL Over the past years, new memory technologies such as RRAM, STT-MRAM, PCM etc., have emerged. These technologies, located in the metal layers of the chip, are relatively fast, dense, and power-efficient, and can be considered as memristors. Usually, the use of these devices has been limited to flash, DRAM, and SRAM replacement. This talk is focused on different uses of memristors. For example, new memory structures, different than the conventional memory hierarchy, opening opportunity to a new era in computer architecture - the era of Memory Intensive Computing. Memristors can also be integrated with CMOS in logic circuits. Alternatively, they can be used as a stand-alone logic, suitable to perform logic within the memory and provide opportunity for new computer architectures, different than classical von Neumann.
14:30	W7	Coffee Break
15:00	W7.6	Session 3
15:00	W7.6.1	Ferroelectric Memristors for Neuromorphic Computing Sören Boyn, CNRS/Thales, FR Thanks to the progress in Nanotechnologies and Material Science, physicists and condensed matter scientists have recently been able to build smart nano-devices with enhanced capabilities. Some of these new devices show functionalities that could be extremely interesting for bio-inspired computing. It has been demonstrated for example that some analog and tunable nano-resistors called Memristors can mimic synapses on silicon. The industry is already developing dense networks of these nano-devices for classical digital memories. It is therefore no longer a dream to envisage building bio-inspired chips based on large-scale, high density parallel networks of these advanced devices, and taking advantage of their full functionalities. What's more, the inherent qualities of massively parallel architectures: the speed, the tolerance to defects and the low power consumption are more and more appreciated these days when computer processors are heating so much that they cannot be used at all times, and when transistors are shrinking so much that they will no longer be reliable. It is becoming a common thesis that bio-inspired chips such as Artificial Neural Networks will soon enter the market as a back-up or accelerator of more traditional computing architectures. In this talk, after a brief introduction on memristors nano-devices and their applications, I will focus on our work: the development of a new generation of memristors, based on purely electronic effects, the ferroelectric memristors. I will show that, by tuning interface properties and finely engineering the dynamics of ferroelectric polarization, we can control the response of these memristors. Furthermore, I will demonstrate their suitability in terms of endurance and retention.
15:00	W7.6.2	Is memristor the 4th circuit element? Frank Zhigang Wang, School of Computing University of Kent Canterbury, GB Chua proposed a Basic Circuit Element Quadrangle including the three classic elements (resistor, inductor and capacitor) and his formulated, named memristor as the fourth element. Based on an observation that this quadrangle may not be perfectly symmetric, we propose a Basic Circuit Element Triangle, in which memristor as well as mem-capacitor and mem-inductor lead three basic element classes, respectively. An intrinsic mathematical relationship is found to support this new classification. We believe that this triangle is concise, mathematically sound and aesthetically beautiful, compared with Chua's quadrangle. The importance of finding a correct circuit element table is similar to that of Mendeleev's Periodic Table of Chemical Elements in Chemistry. A correct circuit element table would also request to rewrite the physics textbooks.
15:00	W7.6.3	NbOx/Nb2O5 memristor modeling based on Chua's Unfolding Principle Alon Ascoli¹, Stefan Slesazeck², Hannes Mähne², Ronald Tetzlaff¹ and Thomas Mikolajick³ ¹Technische Universität Dresden, DE; ²NaMLab gGmbH, DE; ³NaMLab gGmbH and Technische Universität Dresden, DE Prof. Chua has recently introduced a systematic approach to the modeling of memristors known as Unfolding Principle. Sharing Chua's opinion that the availability of a general mathematical framework capable to capture the dynamics of real memristors would boost the ongoing exploration of their full potential in various applications developing new types of circuits including non-volatile memories, neuromorphic systems, spike-based signal processing machines and sensor systems, in this presentation we introduce a Unfolding Principle-based model for the threshold switching behavior of a NbOx/Nb2O5 memristor fabricated at NaMLab. The accuracy of the proposed mathematical description is demonstrated through a number of case studies. The proposed model is accurate yet simple and thus suited for time-efficient circuit simulations. The availability of reliable mathematical frameworks, such as the one proposed here, would certainly pave the way towards a more rapid, extensive and intensive introduction of the memristor into the realm of circuit elements at disposal of integrated circuit designers.
15:00	W7.6.4	Pattern Classification and Recognition with Memristive Circuits Fabien Alibart¹ and D. B. Strukov² ¹CNRS, FR; ²University of California at Santa Barbara, US We will discuss recent experimental results on pattern classification and recognition tasks implemented with memristive [1] (ReRAM [2]) neural networks. The Pt/TiO2-x/Pt memristive devices (Fig. 1a, b), which are utilized in both demonstrations, are fabricated with nanoscale e-beam-defined protrusion which localizes the active area during the forming process to ~(20 nm)3 volume and as a result helps in improving device yield. In particular, we will first discuss demonstration of pattern classification task for 3×3 binary images by a single-layer perceptron network implemented with 10 x 2 memristive crossbar circuits (Fig. 1c) in which synaptic weights are realized with memristive devices. The perceptron circuit is trained by ex-situ and in-situ methods to perform binary classification for a set of patterns from an original work by Widrow [3]. In the ex-situ case, the synaptic weights are calculated on the precursor software-based network and then imported sequentially to the crossbar circuits using variation-tolerant programming algorithm [4]. For the in-situ training, the weights are adjusted in parallel following perceptron learning rule by applying voltage pulses from pre-synaptic and post-synaptic neurons. Both approaches work successfully (Fig. 1d) despite significant variations in switching behavior of memristive devices as well as half-select and leakage problems in crossbar circuits [5].
15:00	W7.6.5	Memristor crossbar array circuits for neuromorphic applications Kyeong-Sik Min, Kookmin University, KR Crossbar array architecture is the most suitable to realize high-density memristor-based synapses. In this presentation, we discuss various crossbar array circuits for mimicking synaptic functions in terms of area, power, etc. In addition, variations in fabrication process, power supply voltage, etc that can affect the synaptic functions of memristor-based crossbar array will be analyzed and discussed in this presentation.
16:40	W7.7	Closing Session

W6 MEDIAN - Workshop on Manufacturable and Dependable Multicore Architectures at Nanoscale

Workshop

Agenda

Time	Label	Session
08:30	W6.1	Opening Session Organisers: Oliver Bringmann, FZI/University of Tuebingen, DE Mehdi Tahoori, Karlsruhe Institute of Technology, DE Chair: Maria K Michael, University of Cyprus, CY Co-Chair: Ozcan Ozturk, Bilkent University, TR Welcoming comments
08:45	W6.2	Keynote Talk
08:45	W6.2.1	Designing Efficient and Reliable Multicore Processors for Networking, Servers, and Beyond Shubu Mukherjee, Cavium Networks, US
09:45	W6.3	Paper Session I: New Challenges at the System Level
09:45	W6.3.1	Multi-Core Emulation for Dependable and Adaptive Systems Prototyping Cristiana Bolchini and Matteo Carminati, Politecnico di Milano, IT
09:45	W6.3.2	Fault-tolerant Routing Approach for 3D Stacked Meshes Masoumeh Ebrahimi, Masoud Daneshtalab and Juha Plosila, University of Turku, FI
10:30	W6	Coffee Break
11:00	W6.4	Paper Session II: Reliability Threads in New Technologies
11:00	W6.4.1	Invited Talk - Steep Slope Devices: Opportunities and Challenges for Processor Design Vijaykrishnan Narayanan, Penn State, US
11:00	W6.4.2	BTI reliability from Planar to FinFET nodes: Will the next node be more or less reliable? Halil Kukner¹, Pieter Weckx², Praveen Raghavan¹, Ben Kaczer¹, Doyoung Jang¹, Francky Catthoor³, Liesbet Van der Perre², Rudy Lauwereins³ and Guido Groeseneken³ ¹IMEC, BE; ²KU Leuven, BE; ³IMEC, KU Leuven, BE
11:00	W6.4.3	*Analysis of Random Dopant Fluctuations and Oxide Thickness on a 16nm L1 Cache Design^)** Cagri Eryilmaz¹, Azam Seyedi², Ozman Unsal³ and Andrian Cristal⁴ ¹Middle Eastern Technical University, TR and Barcelona Supercomputing Center, ES, ; ²Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, ES; ³Barcelona Supercomputing Center, ES; ⁴Barcelona Supercomputing Center, Universitat Politecnica de Catalunya and IIIA-CSIC, ES
12:00	W6	Lunch Break
13:00	W6.5	Paper Session III: Application Specific Solutions
13:00	W6.5.1	FPGA Defect Tolerance based on Equivalent Configurations Generation Parthasarathy M. B. Rao, Abdulazim Amouri and Mehdi B. Tahoori, Karlsruhe Institute of Technology, DE
13:00	W6.5.2	*A Complex Control System for Testing Fault-Tolerance Methodologies^)** Jakub Podivinsky, Marcela Simkova and Zdenek Kotasek, Brno University of Technology, CZ
13:30	W6.6	Panel Session Organiser: Said Hamdioui, TU Delft, NL Chair: Matteo Sonza Reorda, Politecnico di Torino, IT
		Panelists: Speakers: Mehdi Tahoori¹, Oliver Bringmann², Adrian Evans³ and Viacheslav Izosimov⁴ ¹Karlsruhe Institute of Technology, DE; ²FZI/University of Tuebingen, DE; ³iROC, FR; ⁴Semcon, SE
14:30	W6.7	Coffee Break & Poster Session
14:30	W6.7.1	BADR: Boosting Reliability Through Dynamic Redundancy Ihsen Alouani¹, Smail Niar¹, Mazen Saghir² and Fadi Kurdahi³ ¹University of Valenciennes, FR; ²Texas A&M University, QA; ³University of California at Irving, US
14:30	W6.7.2	Automatic Detection and Correction of Defective Pixels for Medical and Space Imagers Eliahu Cohen¹, Moriel Shnitser², Tsvika Avraham², Ofer Hadar² and Yocheved Dotan³ ¹Tel-Aviv University, IL; ²Ben-Gurion University, IL; ³Ruppin Academic Center, IL
14:30	W6.7.3	Implementing Double Error Correction Orthogonal Latin Squares Codes in Xilinx FPGAs Mustafa Demirci¹, Pedro Reviriego² and Juan Antonio Maestro² ¹Alesan, TR; ²Universidad Antonio de Nebrija, ES
14:30	W6.7.4	On Reliability Enhancement Using Adaptive Core Voltage Scaling and Variations on TSMC 28nm LP process process FPGAs Petr Pfeifer and Zdenek Pliva, Technical University of Liberec, CZ
14:30	W6.7.5	Power and Performance Optimization in Long-term Operation André Romão¹, Jorge Semião¹, Carlos Leong², Marcelino Santos³, Isabel Teixeira³ and Paulo Teixeira³ ¹University of Algarve, PT; ²INESC-ID, PT; ³Technical University of Lisbon, PT
15:00	W6.8	Paper Session IV: Resiliency, Self-Test and Self-Diagnosis
15:00	W6.8.1	Invited Talk - DEEP-ER: Scalable resiliency in Exascale Computing Michael Kauschke, Intel, DE
15:00	W6.8.2	Improving the Reliability of Skewed Caches through ECC based Hashes Sercan Yegin¹, Burak Karsli¹, Oguz Ergin¹, Marco Ottavi², Salvatore Pontarelli² and Pedro Reviriego³ ¹TOBB University, TR; ²University of Rome Tor Vergata, IT; ³Universidad Antonio de Nebrija, ES
15:00	W6.8.3	*A new Diagnostic method for VLIW Processors^)** Davide Sabena, Luca Sterpone and Matteo Sonza Reorda, Politecnico di Torino, IT
15:00	W6.8.4	*Aging Monitoring Methodology for Built-In Self-Test Applications^)** João Coelho¹, Jorge Semião¹, Carlos Leong², Marcelino Santos³, Isabel Teixeira³ and Paulo Teixeira³ ¹University of Algarve, PT; ²INESC-ID, PT; ³Technical University of Lisbon, PT
16:15	W6.9	Closing Session

^*)indicates short paper

W3 Electronic System-Level Design towards Heterogeneous Computing

Workshop

Agenda

Time	Label	Session
08:30	W3.1	Opening Session
08:45	W3.2	Session 1 Trends in Heterogeneous Computing: the industrial perspective
08:45	W3.2.1	Heterogeneous Computing in the Cloud: emerging trends from the industry Steve Hebert, Nimbix,
09:15	W3.2.2	Higher Level Programming Abstractions for FPGAs using OpenCL Bogdan Pasca, Altera European Technology Centre,
09:45	W3.3	Panel 1
		Panelists: Panelists: Koen Bertels¹, Steve Hebert² and Bogdan Pasca³ ¹Delft University of Technology, NL; ²Nimbix, ; ³Altera European Technology Centre,
10:30	W3	Coffee Break+Poster Session 1
11:00	W3.4	Session 2 - Research challenges in Heterogeneous Computing design flows
11:00	W3.4.1	FPGA based accelerators for Big Data: Polymorphic computing for Big Data Koen Bertels, Delft University of Technology,
11:30	W3.4.2	Mapping applications to heterogeneous accelerators: tool flows and run-time systems Christian Plessl, University of Paderborn,
12:00	W3	Lunch
13:00	W3.5	Session 3 -Compilers and code optimization for hardware-accelerated platforms
13:00	W3.5.1	From Software Code to Hardware: Directions in High-Level Synthesis Philippe Coussy, Université de Bretagne-Sud, Lab-STICC, FR
13:30	W3.5.2	Polyhedral compilation and code transformations for High-Level Synthesis Louis-Noel Pouchet, University of California Los Angeles, US
14:00	W3.6	Session 4 - Towards higher-level design approaches
14:00	W3.6.1	CoDesign with Verity: bidirectional control-flow across the FPGA-CPU divide Eduardo Aguilar Peleaz, Imperial College, GB
14:30	W3.6.2	Borrowing high-level paradigms from parallel computing: an OpenMP-based design flow Alessandro Cilardo, University of Naples Federico II, IT
15:00	W3.7	Panel 2
		Panelists: Speakers: Philippe Coussy¹, Louis-Noel Pouchet² and Eduardo Aguilar Peleaz³ ¹Université de Bretagne-Sud, Lab-STICC, FR; ²University of California Los Angeles, US; ³Imperial College, GB
15:30	W3	Coffee Break + Poster Session 2
16:00	W3.8	Session 5 - Current and emerging heterogeneous computing applications
16:00	W3.8.1	Heterogeneous HPC: combining FPGAs, CPUs, and GPUs for financial analytics David Thomas, Imperial College, GB
16:30	W3.9	Panel 3
		Panelists: Speaker: Alessandro Cilardo, University of Naples Federico II, IT Panelists: Steve Hebert¹ and Bogdan Pasca² ¹Nimbix, ; ²Altera European Technology Centre,
16:45	W3.10	Closing Session

W1 International Workshop on Dependable GPU Computing

Workshop

Agenda

Time	Label	Session
08:30	W1.1	Opening Session
08:30	W1.1.1	Opening Remarks Dimitris Gizopoulos¹, Hans-Joachim Weunderlich² and Paolo Prinetto³ ¹University of Athens, GR; ²University of Stuttgart, DE; ³Politecnico di Torino, It
08:30	W1.1.2	Keynote 1: GPGPU for dependable systems - a blessing or a curse? Avi Mendelson, Technion, IL
09:15	W1.2	Invited Talk 1
09:15	W1.2.1	GPGPU Reliability - Challenges and Research Directions Sudhanva Gurumurthi, AMD, US
09:45	W1.3	Session 1 - "Software Approaches for GPUs Dependability Enhancement" Chair: Murali Annavaram, University of Southern California, Los Angeles, US Co-Chair: Amir Nahir, IBM Research, IL
09:45	W1.3.1	An improved fault mitigation strategy for CUDA Fermi GPUs Stefano Di Carlo, Giulio Gambardella, Ippazio Martella, Paolo Prinetto, Daniele Rolfo and Pascal Trotta, Politecnico di Torino, IT
10:05	W1.3.2	Software-Based Techniques for Reducing the Vulnerability of GPU Applications Si Li¹, Vilas Sridharan², Sudhanva Gurumurthi² and Sudhakar Yalamanchili¹ ¹Georgia Tech., US; ²AMD, US
10:25	W1.3.3	A-ABFT: Autonomous Algorithm-Based Fault Tolerance on GPUs Claus Braun, Sebastian Halder and Hans-Joachim Wunderlich, University of Stuttgart, DE
10:45	W1	Coffee Break+Posters
11:30	W1.4	Invited Talk 2
11:30	W1.4.1	Reliable Acceleration - Reliability in a World of GPUs & Other Special Purpose Accelerators Arijit Biswas, Intel, US
12:00	W1	Lunch
13:00	W1.5	Keynote 2
13:00	W1.5.1	GPU Related Errors in Large Scale Systems: A Study of Blue Waters Supercomputer at NCSA-Illinois Ravishankar K. Iyer, University of Illinois at Urbana-Champaign, US
13:45	W1.6	Session 2 - "Fault Detection and Tolerance in GPUs" Chair: Nathan DeBardeleben, Los Alamos National Laboratory, US Co-Chair: Hans-Joachim Wunderlich, University of Stuttgart, DE
13:45	W1.6.1	Benefits and Countermeasures of Increasing the GPU code Degree of Parallelism Paolo Rech and Luigi Carro, UFRGS, BR
13:45	W1.6.2	On the Evaluation of Soft-Errors Detection Techniques for GPGPUs Davide Sabena¹, Matteo Sonza Reorda¹, Luca Sterpone¹, Paolo Rech² and Luigi Carro² ¹Politecnico di Torino, IT; ²UFRGS, BR
13:45	W1.6.3	Tolerating Hard Faults in GPGPUs Waleed Dweik, Mohammad AbdelMajeed and Murali Annavaram, University of Southern California, US
14:45	W1	Coffee Break
15:15	W1.7	Panel Session Organiser: Dimitris Gizopoulos, University of Athens, GR
		Panelists: Speakers: Sudhakar Yalamanchili¹, Ravishankar K. Iyer², Stefano Di Carlo³, Sudhanva Gurumurthi⁴, Arijit Biswas⁵ and Bodo Hoppe⁶ ¹Georgia Tech., US; ²University of Illinois at Urbana-Champaign, US; ³Politecnico di Torino, IT; ⁴AMD, US; ⁵Intel, US; ⁶IBM, DE
16:45	W1.8	Closing Session

< Return to last page

Submissions

Workshop

Agenda

Agenda

Agenda

Posters

Agenda

Organisation

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda

Agenda