DATE 2020 University Booth                                                                 

The University Booth is organised during DATE and will be located in the exhibition area at booth 11. All demonstrations will take place from Tuesday, March 10 to Thursday, March 12, 2020 during DATE. Universities and public research institutes have been invited to submit hardware or software demonstrators.

The University Booth programme is composed of 42 demonstrations from 11 countries, presenting software and hardware solutions. The programme is organised in 11 sessions of 2 or 2.5 h duration and will cover the topics:

The University Booth at DATE 2020 invites you to find out more about the latest trends in software and hardware from the international research community.

Most demonstrators will be shown more than once, giving visitors more flexibility to come to the booth and find out about the latest innovations.

We are sure that the demonstrators will give an attractive supplement to the DATE conference program and exhibition. We would like to thank all contributors to this programme.

More information is available online at https://www.date-conference.com/exhibition/university-booth. The University Booth programme is included in the conference booklet and available online at https://www.date-conference.com/exhibition/university-booth/programme. The following demonstrators will be presented at the University Booth.

A Binary Translation Framework for Automated Hardware Generation
Authors:
Nuno Paulino and João Canas Ferreira, INESC TEC / University of Porto, PT
Timeslots:

Abstract: Hardware specialization is an efficient solution for maximization of performance and minimization of energy consumption. This work is based on automated detection of workload by analysis of a compiled application, and on the automated generation of specialized hardware modules. We will present the current version of the binary analysis and translation framework. Currently, our implementation is capable of processing ARMv8 and MicroBlaze (32-bit) Executable and Linking Format (ELF) files or instruction traces. The framework can interpret the instructions for these two ISAs, and detect different types of instruction patterns. After detection, segments are converted into CDFG representations exposing the underlying Instruction Level Parallelism which we aim to exploit via automated hardware generation. On-going work is addressing the extraction of cyclical execution traces or static code blocks, more methods of hardware generation

A Digital Microfluidics Bio-Computing Platform
Authors:
Georgi Tanev, Luca Pezzarossa, Winnie Edith Svendsen and Jan Madsen, TU Denmark, DK
Timeslots:

Abstract: Digital microfluidics is a lab-on-a-chip (LOC) technology used to actuate small amounts of liquids on an array of individually addressable electrodes. Microliter sized droplets can be programmatically dispensed, moved, mixed, split, in a controlled environment which combined with miniaturized sensing techniques makes LOC suitable for a broad range of applications in the field of medical diagnostics and synthetic biology. Furthermore, a programmable digital microfluidics platform holds the potential to add a "fluidic subsystem" to the classical computation model thus opening the doors for cyber-physical bio-processors. To facilitate the programming and operation of such bio-fluidic computing, we propose dedicated instruction set architecture and virtual machine. A set of digital microfluidic core instructions as well as classic computing operations are executed on a virtual machine, which decouples the protocol execution from the LOC functionality.

At-Speed DFT Architecture for Bundled-Data Circuits
Authors:
Ricardo Aquino Guazzelli and Laurent Fesquet, Université Grenoble Alpes, FR
Timeslots:

Abstract: At-speed testing for asynchronous circuits is still an open concern in the literature. Due to its timing constraints between control and data paths, Design for Testability (DfT) methodologies must test both control and data paths at the same time in order to guarantee the circuit correctness. As Process Voltage Temperature (PVT) variations significantly impact circuit design in newer CMOS technologies and low-power techniques such as voltage scaling, the timing constraints between control and data paths must be tested after fabrication not only under nominal conditions but through a range of operating conditions. This work explores an at-speed testing approach for bundled data circuits, targetting the micropipeline template. The main target of this test approach focuses on whether the sized delay lines in control paths respect the local timing assumptions of the data paths.

Ateces: Automated Testing the Energy Consumption of Embedded Systems
Authors:
Eduard Enoiu, Mälardalen University, SE
Timeslots:

Abstract: The demostrator will focus on automatically generating test suites by selecting test cases using random test generation and mutation testing is a solution for improving the efficiency and effectiveness of testing. Specifically, we generate and select test cases based on the concept of energy-aware mutants, small syntactic modifications in the system architecture, intended to mimic real energy faults. Test cases that can distinguish a certain behavior from its mutations are sensitive to changes, and hence considered to be good at detecting faults. We applied this method on a brake by wire system and our results suggest that an approach that selects test cases showing diverse energy consumption can increase the fault detection ability. This kind of results should motivate both academia and industry to investigate the use of automatic test generation for energy consumption.

BCFELEAM: Backflow: Backward Edge Control flow Enforcement for Low end Arm Real-Time Systems
Author:
Bresch Cyril1, David Héy1, Roman Lysecky2 and Stephanie Chollet1
1LCIS, FR, 2University of Arizona, US
Timeslots:

Abstract: The C programming language is one of the most popular languages in embedded system programming. Indeed, C is efficient, lightweight and can easily meet high performance and deterministic real-time constraints. However, these assets come at a certain price. Indeed, C does not provide extra features for memory safety. As a result, attackers can easily exploit spatial memory vulnerabilities to hijack the execution flow of an application. The demonstration features a real-time connected infusion pump vulnerable to memory attacks. First, we showcase an exploit that remotely takes control of the pump. Then, we demonstrate the effectiveness of BackFlow, an LLVM-based compiler extension that enforces control-flow integrity in low-end ARM embedded systems.

Brook Sc: High-Level Certification-Friendly Programming for Gpu-Powered Safety Critical Systems
Authors:
Marc Benito, Matina Maria Trompouki and Leonidas Kosmidis, BSC / UPC, ES
Timeslots:

Abstract: Graphics processing units (GPUs) can provide the increased performance required in future critical systems, i.e. automotive and avionics. However, their programming models, e.g. CUDA or OpenCL, cannot be used in such systems as they violate safety critical programming guidelines. Brook SC (https://github.com/lkosmid/brook) was developed in UPC/BSC to allow safety-critical applications to be programmed in a CUDA-like GPU language, Brook, which enables the certification while increasing productivity. In our demo, an avionics application running on a realistic safety critical GPU software stack and hardware is show cased. In this Bachelor's thesis project, which was awarded a 2019 HiPEAC Technology Transfer Award, an Airbus prototype application performing general-purpose computations with a safety-critical graphics API was ported to Brook SC in record time, achieving an order of magnitude reduction in the lines of code to implement the same functionality without performance penalty.

Catanis: CAD Tool for Automatic Network Synthesis
Authors:
Davide Quaglia, Enrico Fraccaroli, Filippo Nevi and Sohail Mushtaq, Università di Verona, IT
Timeslots:

Abstract: The proliferation of communication technologies for embedded systems opened the way for new applications, e.g., Smart Cities and Industry 4.0. In such applications hundreds or thousands of smart devices interact together through different types of channels and protocols. This increasing communication complexity forces computer-aided design methodologies to scale up from embedded systems in isolation to the global inter-connected system. Network Synthesis is the methodology to optimally allocate functionality onto network nodes and define the communication infrastructure among them. This booth will demonstrate the functionality of a graphic tool for automatic network synthesis developed by the Computer Science Department of University of Verona. It allows to graphically specify the communication requirements of a smart space (e.g., its map can be considered) in terms of sensing and computation tasks together with a library of node types and communication protocols to be used.

CSI-Repute: A Low Power Embedded Device Clustering Approach To Genome Read Map-Ping
Authors:
Tousif Rahman1, Sidharth Maheshwari1, Rishad Shafik1, Ian Wilson1, Alex Yakovlev1 and Amit Acharyya2
1Newcastle University, GB, 2IIT Hyderabad, IN
Timeslots:

Abstract: The big data challenge of genomics is rooted in its requirements of extensive computational capability and results in large power and energy consumption. To encourage widespread usage of genome assembly tools there must be a transition from the existing predominantly software-based mapping tools, optimized for homogeneous high-performance systems, to more heterogeneous low power and cost-effective mapping systems. This demonstration will show a cluster system implementation for the REPUTE algorithm, (An OpenCL based Read Mapping Tool for Embedded Genomics) where cluster nodes are composed of low power single board computer (SBC) devices and the algorithm is deployed on each node spreading the genomic workload, we propose a working concept prototype to challenge current conventional high-performance many-core CPU based cluster nodes. This demonstration will highlight the advantage in the power and energy domains of using SBC clusters enabling potential solutions to low-cost genomics.

DEEPSENSE-FPGA: FPGA Acceleration of a Multimodal Neural Network
Authors:
Mehdi Trabelsi Ajili and Yuko Hara-Azumi, Tokyo Institute of Technology, JP
Timeslots:

Abstract: Currently, Internet of Things and Deep Learning (DL) are merging into one domain and creating outstanding technologies for various classification tasks. Such technologies require complex DL networks that are mainly targeting powerful platforms with rich computing resources like servers. Therefore, for resource-constrained embedded systems, new challenges of size, performance and power consumption have to be considered, particularly when edge devices handle multimodal data, i.e., different types of real-time sensing data (voice, video, text, etc.). Our ongoing project is focused on DeepSense, a multimodal DL framework combining Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to process time-series data, such as accelerometer and gyroscope to detect human activity. We aim at accelerating DeepSense by FPGA (Xilinx Zynq) in a hardware-software co-design manner. Our demo will show the latest achievements through latency and power consumption evaluations.

Design Automation for XBM Automata in Workcraft: Design Automation for Ex-Tended Burst-Mode Automata in Workcraft
Authors:
Alex Chan, Alex Yakovlev, Danil Sokolov and Victor Khomenko, Newcastle University, GB
Timeslots:

Abstract: Asynchronous circuits are known to have high performance, robustness and low power consumption, which are particularly beneficial for the area of so-called "little digital" controllers where low latency is crucial. However, asynchronous design is not widely adopted by industry, partially due to the steep learning curve inherent in the complexity of formal specifications, such as Signal Transition Graphs (STGs). In this demo, we promote a class of the Finite State Machine (FSM) model called Extended Burst-Mode (XBM) automata as a practical way to specify many asynchronous circuits. The XBM specification has been automated in the Workcraft toolkit (https://workcraft.org) with elaborate support for state encoding, conditionals and "don't care" signals. Formal verification and logic synthesis of the XBM automata is implemented via conversion to the established STG model, reusing existing methods and CAD tools. Tool support for the XBM flow will be demonstrated using several case studies.

Distributing Time-Sensitive Applications on Edge Computing Environments
Authors:
Eudald Sabaté Creixell1, Unai Perez Mendizabal1, Elli Kartsakli2, Maria A. Serrano Gracia3 and Eduardo Quiñones Moreno3
1BSC / UPC, ES, 2BSC, GR, 3BSC, ES
Timeslots:

Abstract: The proposed demonstration aims to showcase the capabilities of a task-based distributed programming framework for the execution of real-time applications in edge computing scenarios, in the context of smart cities. Edge computing shifts the computation close to the data source, alleviating the pressure on the cloud and reducing application response times. However, the development and deployment of distributed real-time applications is complex, due to the heterogeneous and dynamic edge environment where resources may not always be available. To address these challenges, our demo employs COMPSs, a highly portable and infrastructure-agnostic programming model, to efficiently distribute time-sensitive applications across the compute continuum. We will exhibit how COMPSs distributes the workload on different edge devices (e.g., NVIDIA GPUs and a Rasberry Pi), and how COMPSs re-adapts this distribution upon the availability (connection or disconnection) of devices.

DL PUF ENAU: Deep Learning Based Physically Unclonable Function Enrollment and Authenntication
Authors:
Amir Alipour1, David Hely2, Vincent Beroulle2 and Giorgio Di Natale3
1Grenoble INP / LCIS, FR, 2Grenoble INP, FR, 3CNRS / Grenoble INP / TIMA, FR
Timeslots:

Abstract: Physically Unclonable Functions (PUFs) have been addressed nowadays as a potential solution to improve the security in authentication and encryption process in Cyber Physical Systems. The research on PUF is actively growing due to its potential of being secure, easily implementable and expandable, using considerably less energy. To use PUF in common, the low level device Hardware Variation is captured per unit for device enrollment into a format called Challenge-Response Pair (CRP), and recaptured after device is deployed, and compared with the original for authentication. These enrollment + comparison functions can vary and be more data demanding for applications that demand robustness, and resilience to noise. In this demonstration, our aim is to show the potential of using Deep Learning for enrollment and authentication of PUF CRPs. Most importantly, during this demonstration, we will show how this method can save time and storage compared to other classical methods.

ECLT FPGA Component: Edge-to-Cloud Location-Transparent FPGA Component
Authors:
Takeshi Ohkawa, Tokai University, JP
Timeslots:

Abstract: To exploit the benefits of FPGA, it is necessary to improve the usability of FPGA from the software system as well as the design productivity of FPGA circuitry itself. Therefore, an FPGA component technology is expected in which software can access FPGA circuitry easily and communicate with other FPGA/software components through the network in the whole edge-to-cloud system using a variety of communication protocols. In this demonstration, a location-transparent FPGA component which is capable of image recognition processing and communicating with ROS (Robot Operating System) protocol are exhibited. The FPGA component works in the ROS system and the component can be in an arbitrary location in the Edge-to-Cloud network system.

EEC: Energy Efficient Computing Via Dynamic Voltage Scaling and In-Network Optical Processing
Authors:
Ryosuke Matsuo1, Jun Shiomi1, Yutaka Masuda2 and Tohru Ishihara2
1Kyoto University, JP, 2Nagoya University, JP
Timeslots:

Abstract: This poster demonstration will show results of our two research projects. The first one is on a project of energy efficient computing. In this project we developed a power management algorithm which keeps the target processor always running at the most energy efficient operating point by appropriately tuning the supply voltage and threshold voltage under a specific performance constraint. This algorithm is applicable to wide variety of processor systems including high-end processors and low-end embedded processors. We will show the results obtained with actual RISC processors designed using a 65nm technology. The second one is on a project of in-network optical computing. We show optical functional units such as parallel multipliers and optical neural networks. Several key techniques for reducing the power consumption of optical circuits will be also presented. Finally, we will show the results of optical circuit simulation, which demonstrate the light speed operation of the circuits.

ELSA: Eigenvalue Based Hybrid Linear System Abstraction: Behavioral Modeling of Transistor-Level Circuits Using Automatic Abstraction to Hybrid Automata
Authors:
Ahmad Tarraf and Lars Hedrich, University of Frankfurt, DE
Timeslots:

Abstract: Model abstraction of transistor-level circuits, while preserving an accurate behavior, is still an open problem. In this demo an approach is presented that automatically generates a hybrid automaton (HA) with linear states from an existing circuit netlist. The approach starts with a netlist at transistor level with full SPICE accuracy and ends at the system level description of the circuit in matlab or in Verilog-A. The resulting hybrid automaton exhibits linear behavior as well as the technology dependent nonlinear e.g. limiting behavior. The accuracy and speed-up of the Verilog-A generated models is evaluated based on several transistor level circuit abstractions of simple operational amplifiers up to a complex filters. Moreover, we verify the equivalence between the generated model and the original circuit. For the generated models in matlab syntax, a reachability analysis is performed using the reachability tool cora.

EUCLID-NIR GPU: An On-Board Processing Gpu-Accelerated Space Case Study Demon-Strator
Authors:
Ivan Rodriguez and Leonidas Kosmidis, BSC / UPC, ES
Timeslots:

Abstract: Embedded Graphics Processing Units (GPUs) are very attractive candidates for on-board payload processing of future space systems, thanks to their high performance and low-power consumption. Although there is significant interest from both academia and industry, there is no open and publicly available case study showing their capabilities, yet. In this master thesis project, which was performed within the GPU4S (GPU for Space) ESA-funded project, we have parallelised and ported the Euclid NIR (Near Infrared) image processing algorithm used in the European Space Agency's (ESA) mission to be launched in 2022, to an automotive GPU platform, the NVIDIA Xavier. In the demo we will present in real-time its significantly higher performance achieved compared to the original sequential implementation. In addition, visitors will have the opportunity to examine the images on which the algorithm operates, as well as to inspect the algorithm parallelisation through profiling and code inspection.

FASTHERMSIM: Fast and Accurate Thermal Simulations from Chiplets to System
Authors:
Yu-Min Lee, Chi-Wen Pan, Li-Rui Ho and Hong-Wen Chiou, National Chiao Tung University, TW
Timeslots:

Abstract: Recently, owing to the scaling down of technology and 2.5D/3D integration, power densities and temperatures of chips have been increasing significantly. Though commercial computational fluid dynamics tools can provide accurate thermal maps, they may lead to inefficiency in thermal-aware design with huge runtime. Thus, we develop the chip/package/system-level thermal analyzer, called FasThermSim, which can assist you to improve your design under thermal constraints in pre/post-silicon stages. In FasThermSim, we consider three heat transfer modes, conduction, convection, and thermal radiation. We convert them to temperature-independent terms by linearization methods and build a compact thermal model (CTM). By applying numerical methods to the CTM, the steady-state and transient thermal profiles can be solved efficiently without loss of accuracy. Finally, an easy-to-use thermal analysis tool is implemented for your design, which is flexible and compatible, with the graphic user interface.

FLETCHER: Transparent Generation of Hardware Interfaces for Accelerating Big Data Applications
Authors:
Zaid Al-Ars, Johan Peltenburg, Jeroen van Straten, Matthijs Brobbel and Joost Hoozemans, TU Delft, NL
Timeslots:

Abstract: This demo created by TUDelft is a software-hardware framework to allow for an efficient integration of FPGA hardware accelerators both on edge devices as well as in the cloud. The framework is called Fletcher, which is used to automatically generate data communication interfaces in hardware based on the widely used big data format Apache Arrow. This provides two distinct advantages. On the one hand, since the accelerators use the same data format as the software, data communication bottlenecks can be reduced. On the other hand, since a standardized data format is used, this allows for easy-to-use interfaces on the accelerator side, thereby reducing the design and development time. The demo shows how to use Fletcher for big data acceleration to decompress Snappy compressed files and perform filtering on the whole Wikipedia body of text. The demo enables 25 GB/s processing throughput.

FPGA-DSP: A Prototype for High Quality Digital Audio Signal Processing Based on an FPGA
Authors:
Bernhard Riess and Christian Epe, University of Applied Sciences Düsseldorf, DE
Timeslots:

Abstract: Our demonstrator presents a prototype of a new digital audio signal processing system which is based on an FPGA. It achieves a performance that up to now has been preserved to costly high-end solutions. Main components of the system are an analog/digital converter, an FPGA to perform the digital signal processing tasks, and a digital/analog converter implemented on a printed circuit board. To demonstrate the quality of the audio signal processing, infinite impulse response, finite impulse response filters and a delay effect were realized in VHDL. More advanced signal processing systems can easily be implemented due to the flexibility of the FPGA. Measured results were compared to state of the art audio signal processing systems with respect to size, performance and cost. Our prototype outperforms systems of the same price in quality, and outperforms systems of the same quality at a maximum of 20% of the price. Examples of the performance of our system can be heard in the demo.

FU: Low Power and Accuracy Configurable Approximate Arithmetic Units
Authors:
Tomoaki Ukezono and Toshinori Sato, Fukuoka University, JP
Timeslots:

Abstract: In this demonstration, we will introduce the approximate arithmetic units such as adder, multiplier, and MAC that are being studied in our system-architecture laboratory. Our approximate arithmetic units can reduce delay and power consumption at the expense of accuracy. Our approximate arithmetic units are intended to be applied to IoT edge devices that can process images, and are suitable for battery-driven and low-cost devices. The biggest feature of our approximate arithmetic units is that the circuit is configured so that the accuracy is dynamically variable, and the trade-off relationship between accuracy and power can be selected according to the usage status of the device. In this demonstration, we show the power consumption according to various accuracy-requirements based on actual data and claim the practicality of the proposed arithmetic units.

Fuzzing Embedded Binaries Leveraging Systemc-Based Virtual Prototypes
Authors:
Vladimir Herdt1, Daniel Grosse2 and Rolf Drechsler2
1DFKI, DE, 2University of Bremen / DFKI GmbH, DE
Timeslots:

Abstract: Verification of embedded Software (SW) binaries is very important. Mainly, simulation-based methods are employed that execute (randomly) generated test-cases on Virtual Prototypes (VPs). However, to enable a comprehensive VP-based verification, sophisticated test-case generation techniques need to be integrated. Our demonstrator combines state-of-the-art fuzzing techniques with SystemC-based VPs to enable a fast and accurate verification of embedded SW binaries. The fuzzing process is guided by the coverage of the embedded SW as well as the SystemC-based peripherals of the VP. The effectiveness of our approach is demonstrated by our experiments, using RISC-V SW binaries as an example.

Generating Asynchronous Circuits from Catapult
Authors:
Yoan Decoudu1, Jean Simatic2, Katell Morin-Allory3 and Laurent Fesquet3
1University Grenoble Alpes, FR, 2HawAI.Tech, FR, 3Université Grenoble Alpes, FR
Timeslots:

Abstract: In order to spread asynchronous circuit design to a large community of designers, High-Level Synthesis (HLS) is probably a good choice because it requires limited design technical skills. HLS usually provides an RTL description, which includes a data-path and a control-path. The desynchronization process is only applied to the control-path, which is a Finite State Machine (FSM). This method is sufficient to make asynchronous the circuit. Indeed, data are processed step by step in the pipeline stages, thanks to the desynchronized FSM. Thus, the data-path computation time is no longer related to the clock period but rather to the average time for processing data into the pipeline. This tends to improve speed when the pipeline stages are not well-balanced. Moreover, our approach helps to quickly designing data-driven circuits while maintaining a reasonable cost, a similar area and a short time-to-market..

INTACT: A 96-Core Processor With 6 Chiplets 3d-Stacked on an Active Interposer and A 16-Core Prototype Running Graphical Operating System
Authors:
Eric Guthmuller1, Pascal Vivet1, César Fuguet1, Yvain Thonnart1, Gaël Pillonnet2 and Fabien Clermidy1
1Université Grenoble Alpes / CEA List, FR, 2Université Grenoble Alpes / CEA-Leti, FR
Timeslots:

Abstract: We built a demonstrator for our 96-cores cache coherent 3D processor and a first prototype featuring 16 cores. The demonstrator consists in our 16-cores processor running commodity operating systems such as Linux and NetBSD on a PC-like motherboard with user-friendly devices such as a HDMI display, keyboard and mouse. A graphical desktop is displayed, and the user will interact with it through the keyboard and mouse. The demonstrator is able to run parallel applications to benchmark its performance in terms of scalability. The main innovation of our processor is its scalable cache coherent architecture based on distributed L2-caches and adaptive L3-caches. Additionally, the energy consumption is also measured and displayed by reading dynamically from the monitors of power-supply devices. Finally we will also show open packages of the 3D processor featuring 6 16-core chiplets (28 nm FDSOI) on an active interposer (65 nm) embedding Network-on-Chips, power management and IO controllers.

JOINTER: Joining Flexible Monitors with Heterogeneous Architectures
Authors:
Giacomo Valente1, Tiziana Fanni2, Carlo Sau3, Claudio Rubattu2, Francesca Palumbo2 and Luigi Poman-te1
1Università degli Studi dell'Aquila, IT, 2Università degli Studi di Sassari, IT, 3Università degli Studi di Cagliari, IT
Timeslots:

Abstract: As embedded systems grow more complex and shift toward heterogeneous architectures, understanding workload performance characteristics becomes increasingly difficult. In this regard, run-time monitoring systems can support on obtaining the desired visibility to characterize a system. This demo presents a framework that allows to develop complex heterogeneous architectures composed of programmable processors and dedicated accelerators on FPGA, together with customizable monitoring systems, keeping under control the introduced overhead. The whole development flow (and related prototypal EDA tools), that starts with the accelerators creation using a dataflow model, in parallel with the monitoring system customization using a library of elements, showing also the final joining, will be shown. Moreover, a comparison among different monitoring systems functionalities on different architectures developed on Zynq7000 SoC will be illustrated.

LAGARTO: First Silicon Risc-V Academic Processor Developed in Spain
Authors:
Guillem Cabo Pitarch1, Cristobal Ramirez Lazo1, Julian Pavon Rivera1, Vatistas Kostalabros1, Carlos Rojas Morales1, Miquel Moreto1, Jaume Abella1, Francisco J. Cazorla1, Adrian Cristal1, Roger Figueras1, Alberto Gonzalez1, Carles Hernandez1, Cesar Hernandez2, Neiel Leyva2, Joan Marimon1, Ricardo Mar-tinez3, Jonnatan Mendoza1, Francesc Moll4, Marco Antonio Ramirez2, Carlos Rojas1, Antonio Rubio4, Abraham Ruiz1, Nehir Sonmez1, Lluis Teres3, Osman Unsal5, Mateo Valero1, Ivan Vargas1 and Luis Villa2
1BSC / UPC, ES, 2CIC-IPN, MX, 3IMB-CNM (CSIC), ES, 4UPC, ES, 5BSC, ES
Timeslots:

Abstract: Open hardware is a possibility that has emerged in recent years and has the potential to be as disruptive as Linux was once, an open source software paradigm. If Linux managed to lessen the dependence of users in large companies providing software and software applications, it is envisioned that hardware based on ISAs open source can do the same in their own field. In the Lagarto tapeout four research institutions were involved: Centro de Investigación en Computación of the Mexican IPN, Centro Nacional de Microelectrónica of the CSIC, Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC). As a result, many bachelor, master and PhD students had the chance to achieve real-world experience with ASIC design and achieve a functional SoC. In the booth, you will find a live demo of the first ASIC and prototypes running on FPGA of the next versions of the SoC and core.

LEARNV: LEARNV: A Risc-V Based Embedded System Design Framework for Education and Research Development
Authors:
Noureddine Ait Said and Mounir Benabdenbi, TIMA Laboratory, FR
Timeslots:

Abstract: Designing a modern System on a Chip is based on the joint design of hardware and software (co-design). However, understanding the tight relationship between hardware and software is not straightforward. Moreover to validate new concepts in SoC design from the idea to the hardware implementation is time-consuming and often slowed by legacy issues (intellectual property of hardware blocks and expensive commercial tools). To overcome these issues we propose to use the open-source Rocket Chip environment for educational purposes, combined with the open-source LowRisc architecture to implement a custom SoC design on an FPGA board. The demonstration will present how students and engineers can take benefit from the environment to deepen their knowledge in HW and SW co-design. Using the LowRisC architecture, an image classification application based on the use of CNNs will serve as a demonstrator of the whole open-source hardware and software flow and will be mapped on a Nexys A7 FPGA board.

MDD-COP: A Preliminary Tool for Model-Driven Development Extended with Layer Diagram for Context-Oriented Programming
Authors:
Harumi Watanabe1, Chinatsu Yamamoto1, Takeshi Ohkawa1, Mikiko Sato1, Nobuhiko Ogura2 and Mana Tabei1
1Tokai University, JP, 2Tokyo City University, JP
Timeslots:

Abstract: This presentation introduces a preliminary tool for Model-Driven development (MDD) to generate programs for Context-Oriented Programming (COP). In modern embedded systems such as IoT and Industry 4.0, their software began to process multiple services by following the changing surrounding environments. COP is helpful for programming such software. In COP, we can consider the surrounding environments and multiple services as contexts and layers. Even though MDD is a powerful technique for developing such modern systems, the works of modeling for COP are limited. There are no works to mention the relation between UML (Unified Modeling Language) and COP. To solve this problem, we provide a COP generation from a layer diagram extended the package diagram of UML by stereotypes. In our approach, users draw a layer diagram and other UML diagrams, then xtUML, which is a major tool of MDD, generates XML code with layer information for COP; finally, our tool generates COP code from XML code.

PA-HLS: High-Level Annotation of Routing Congestion for Xilinx Vivado HLS Designs
Authors:
Osama Bin Tariq1, Junnan Shan1, Luciano Lavagno1, Georgios Floros2, Mihai Teodor Lazarescu1, Christos Sotiriou2 and Mario Roberto Casu1
1Politecnico di Torino, IT, 2University of Thessaly, GR
Timeslots:

Abstract: We will demo a novel high-level backannotation flow that reports routing congestion issues at the C++ source level by analyzing reports from FPGA physical design (Xilinx Vivado) and internal debugging files of the Vivado HLS tool. The flow annotates the C++ source code, identifying likely causes of congestion, e.g., on-chip memories or the DSP units. These shared resources often cause routing problems on FPGAs because they cannot be duplicated by physical design. We demonstrate on realistic large designs how the information provided by our flow can be used to both identify congestion issues at the C++ source level and solve them using HLS directives. The main demo steps are: 1-Extraction of the source-level debugging information from the Vivado HLS database 2-Generation of a list of net names involved in congestion areas and of their relative significance from the Vivado post global-routing database 3-Visualization of the C++ code lines that contribute most to congestion.

PAFUSI: Particle Filter Fusion Asic for Indoor Positioning
Authors:
Christian Schott, Marko Rößler, Daniel Froß, Marcel Putsche and Ulrich Heinkel, TU Chemnitz, DE
Timeslots:

Abstract: The meaning of data acquired from IoT devices is heavily enhanced if global or local position information of their acquirement is known. Infrastructure for indoor positioning as well as the IoT device involve the need of small, energy efficient but powerful devices that provide the location awareness. We propose the PAFUSI, a hardware implementation of an UWB position estimation algorithm that fulfils these requirements. Our design fuses distance measurements to fixed points in an environment to calculate the position in 3D space and is capable of using different positioning technologies like GPS, DecaWave or Nanotron as data source simultaneously. Our design comprises of an estimator which processes the data by means of a Sequential Monte Carlo method and a microcontroller core which configures and controls the measurement unit as well as it analyses the results of the estimator. The PAFUSI is manufactured as a monolithic integrated ASIC in a multi-project wafer in UMC's 65nm process.

Parallel Algorithm for CNN Inference and its Automatic Synthesis
Authors:
Takashi Matsumoto, Yukio Miyasaka, Xinpei Zhang and Masahiro Fujita, University of Tokyo, JP
Timeslots:

Abstract: Recently, Convolutional Neural Network (CNN) has surpassed conventional methods in the field of image processing. This demonstration shows a new algorithm to calculate CNN inference using processing elements arranged and connected based on the topology of the convolution. They are connected in mesh and calculate CNN inference in a systolic way. The algorithm performs the convolution of all elements with the same output feature in parallel. We demonstrate a method to automatically synthesize an algorithm, which simultaneously performs the convolution and the communication of pixels for the computation of the next layer. We show with several sizes of input layers, kernels, and strides and confirmed that the correct algorithms were synthesized. The synthesis method is extended to the sparse kernel. The synthesized algorithm requires fewer cycles than the original algorithm. There were the more chances to reduce the number of cycles with the sparser kernel.

Pre-Impact Fall Detection Architecture Based on Neuromuscular Connectivity Sta-Tistics
Authors:
Giovanni Mezzina, Sardar Mehboob Hussain and Daniela De Venuto, Politecnico di Bari, IT
Timeslots:

Abstract: In this demonstration, we propose an innovative multi-sensor architecture operating in the field of pre-impact fall detection (PIFD). The proposed architecture jointly analyzes cortical and muscular involvement when unexpected slippages occur during steady walking. The EEG and EMG are acquired through wearable and wireless devices. The control unit consists of an STM32L4 microcontroller and a Simulink modeling. The μC implements the EMG computation, while the cortical analysis and the final classification were entrusted to the Simulink model. The EMG computation block translates EMGs into binary signals, which are used both to enable cortical analyses and to extract a score to distinguish "standard" muscular behaviors from anomalous ones. The Simulink model evaluates the cortical responsiveness in five bands of interest and implements the logical-based network classifier. The system, tested on 6 healthy subjects, shows an accuracy of 96.21% and a detection time of ∼371 ms.

Rescued: A Rescue Demonstrator For Interdependent Aspects of Reliability, Security and Quality Towards A Complete EDA Flow
Author:
Nevin George1, Guilherme Cardoso Medeiros2, Junchao Chen3, Josie Esteban Rodriguez Condia4, Thomas Lange5, Aleksa Damljanovic4, Raphael Segabinazzi Ferreira1, Aneesh Balakrishnan5, Xinhui Lai6, Shayesteh Masoumian7, Dmytro Petryk3, Troya Cagil Koylu2, Felipe Augusto da Silva8, Ahmet Cagri Bagbaba8, Cemil Cem Gürsoy6, Said Hamdioui2, Mottaqiallah Taouil2, Milos Krstic3, Peter Langendoerfer3, Zoya Dyka3, Marcelo Brandalero1, Michael Hübner1, Jörg Nolte1, Heinrich Theodor Vierhaus1, Matteo Sonza Reorda4, Giovanni Squillero4, Luca Sterpone4, Jaan Raik6, Dan Alexandrescu5, Maximilien Glorieux5, Georgios Selimis7, Geert-Jan Schrijen7, Anton Klotz8, Christian Sauer8 and Maksim Jenihhin6
1Brandenburg University of Technology Cottbus-Senftenberg, DE, 2TU Delft, NL, 3Leibniz-Institut für innovative Mikroelektronik, DE, 4Politecnico di Torino, IT, 5IROC Technologies, FR, 6Tallinn University of Technology, EE, 7Intrinsic ID, NL, 8Cadence Design Systems GmbH, DE
Timeslots:

Abstract: The demonstrator highlights the various interdependent aspects of Reliability, Security and Quality in nanoelectronics system design within an EDA toolset and a processor architecture setup. The compelling need of attention towards these three aspects of nanoelectronic systems have been ever more pronounced over extreme miniaturization of technologies. Further, such systems have exploded in numbers with IoT devices, heavy and analogous interaction with the external physical world, complex safety-critical applications, and Artificial intelligence applications. RESCUE targets such aspects in the form, Reliability (functional safety, ageing, soft errors), Security (tamper-resistance, PUF technology, intelligent security) and Quality (novel fault models, functional test, FMEA/FMECA, verification/debug) spanning the entire hardware software system stack. The demonstrator is brought together by a group of PhD students under the banner of H2020-MSCA-ITN RESCUE European Union project.

RETINE: A Programmable 3D Stacked Vision Chip Enabling Low Latency Image Analysis
Author:
Stéphane Chevobbe1, Maria Lepecq1 and Laurent Millet2
1CEA LIST, FR, 2CEA-Leti, FR
Timeslots:

Abstract: We have developed and fabricated a 3D stacked imager called RETINE composed with 2 layers based on the replication of a programmable 3D tile in a matrix manner providing a highly parallel programmable architecture. This tile is composed by a 16x16 BSI binned pixels array with associated readout and 16 column ADC on the first layer coupled to an efficient SIMD processor of 16 PE on the second layer. The prototype of RETINE achieves high video rates, from 5500 fps in binned mode to 340 fps in full resolution mode. It operates at 80 MHz with 720 mW power consumption leading to 85 GOPS/W power efficiency. To highlight the capabilities of the RETINE chip we have developed a demonstration platform with an electronic board embedding a RETINE chip that films rotating disks. Three scenarii are available: high speed image capture, slow motion and composed image capture with parallel processing during acquisition.

RUMORE: A Framework For Runtime Monitoring And Trace Analysis For Component-Based Embedded Systems Design Flow
Author:
Vittoriano Muttillo1, Luigi Pomante1, Giacomo Valente1, Hector Posadas2, Javier Merino2 and Eugenio Villar2
1University of L'Aquila, IT, 2University of Cantabria, ES
Timeslots:

Abstract: The purpose of this demonstrator is to introduce runtime monitoring infrastructures and to analyze trace data. The goal is to show the concept among different monitoring requirements by defining a general reference architecture that can be adapted to different scenarios. Starting from design artifacts, generated by a system engineering modeling tool, a custom HW monitoring system infrastructure will be presented. This sub-system will be able to generate runtime artifacts for runtime verification. We will show how the RUMORE framework provides round-trip support in the development chain, injecting monitoring requirements from design models down to code and its execution on the platform and trace data back to the models, where the expected behavior will then compared with the actual behavior. This approach will be used towards optimizing design models for specific properties (e.g, for system performance).

SKELETOR: An Open Source EDA Tool Flow from Hierarchy Specification to HDL Devel-Opment
Author:
Ivan Rodriguez, Guillem Cabo, Javier Barrera, Jeremy Giesen, Alvaro Jover and Leonidas Kosmidis, BSC / UPC, ES
Timeslots:

Abstract: Large hardware design projects have high overhead for project bootstrapping, requiring significant effort for translating hardware specifications to hardware design language (HDL) files and setting up their corresponding development and verification infrastructure. Skeletor (https://github.com/jaquerinte/Skeletor) is an open source EDA tool developed as a student project at UPC/BSC, which simplifies this process, by increasing developer's productivity and reducing typing errors, while at the same time lowers the bar for entry in hardware development. Skeletor uses a C/verilog-like language for the specification of the modules in a hardware project hierarchy and their connections, which is used to generate automatically the require skeleton of source files, their devel-opment and verification testbenches and simulation scripts. Integration with KiCad schematics and support for syntax highlighting in code editors simplifies further its use. This demo is linked with work-shop W05.

SRSN: Secure Reconfigurable Test Network
Author:
Vincent Reynaud1, Emanuele Valea2, Paolo Maistri1, Regis Leveugle1, Marie-Lise Flottes2, Sophie Dupuis2, Bruno Rouzeyre2 and Giorgio Di Natale1
1TIMA Laboratory, FR, 2LIRMM, FR
Timeslots:

Abstract: The critical importance of testability for electronic devices led to the development of IEEE test standards. These methods, if not protected, offer a security backdoor to attackers. This demonstrator illustrates a state-of-the-art solution that prevents unauthorized usage of the test infrastructure based on the IEEE 1687 standard and implemented on an FPGA target.

SUBRISC+: Implementation and Evaluation of an Embedded Processor for Lightweight IOT EHEALTH
Author:
Mingyu Yang and Yuko Hara-Azumi, Tokyo Institute of Technology, JP
Timeslots:

Abstract: Although the rapid growth of Internet of Things (IoT) has enabled new opportunities for eHealth devices, the further development of complex systems is severely constrained by the power and energy supply on the battery-powered embedded systems. To address this issue, this work presents a processor design called "SubRISC+" targeting lightweight IoT eHealth. SubRISC+ is a processor design to achieve low power/energy consumption through its unique and compact architecture. As an example of lightweight eHealth applications on SubRISC+, we are working on the epileptic seizure detection using the dynamic time wrapping algorithm to deploy on wearable IoT eHealth devices. Simulation results show that 22% reduction on dynamic power and 50% reduction on leakage power and core area are achieved compared to Cortex-M0. As an ongoing work, the evaluation on a fabricated chip will be done within the first half of 2020.

SYSTEMC-CT/DE: A Simulator With Fast And Accurate Continuous Time And Discrete Events Interactions On Top Of Systemc.
Author:
Breytner Joseph Fernandez-Mesa, Liliana Andrade and Frédéric Pétrot, Université Grenoble Alpes / CNRS / TIMA Laboratory, FR
Timeslots:

Abstract: We have developed a continuous time (CT) and discrete events (DE) simulator on top of SystemC. Systems that mix both domains are critical and their proper functioning must be verified. Simulation serves to achieve this goal. Our simulator implements direct CT/DE synchronization, which enables a rich set of interactions between the domains: events from the CT models are able to trigger DE processes; events from the DE models are able to modify the CT equations. DE-based interactions are, then, simulated at their precise time by the DE kernel rather than at fixed time steps. We demonstrate our simulator by executing a set of challenging examples: they either require a superdense model of time or include Zeno behavior or are highly sensitive to accuracy errors. Results show that our simulator overcomes these issues, is accurate, and improves simulation speed w.r.t. fixed time steps; all of these advantages open up new possibilities for the design of a wider set of heterogeneous systems.

TAPASCO: The Open-Source Task-Parallel System Composer Framework
Author:
Carsten Heinz, Lukas Sommer, Lukas Weber, Jaco Hofmann and Andreas Koch, TU Darmstadt, DE
Timeslots:

Abstract: Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators.

UWB ACKATCK: Hijacking Devices in UWB Indoor Positioning Systems
Author:
Baptiste Pestourie, Vincent Beroulle and Nicolas Fourty, Université Grenoble Alpes, FR
Timeslots:

Abstract: Various radio-based Indoor Positioning Systems (IPS) have been proposed during the last decade as solutions to GPS inconsistency in indoor environments. Among the different radio technologies proposed for this purpose, 802.15.4 Ultra-Wideband (UWB) is by far the most performant, reaching up to 10 cm accuracy with 1000 Hz refresh rates. As a consequence, UWB is a popular technology for applications such as assets tracking in industrial environments or robots/drones indoor navigation. However, some security flaws in 802.15.4 standard expose UWB positioning to attacks. In this demonstration, we show how an attacker can exploit a vulnerability on 802.15.4 acknowledgment frames to hijack a device in a UWB positioning system. We demonsrate that using simply one cheap UWB chip, the attacker can take control over the positioning system and generate fake trajectories from a laptop. The results are observed in real-time in the 3D engine monitoring the positioning system.

Virtual Platforms For Complex Software Stacks
Author:
Lukas Jünger and Rainer Leupers, RWTH Aachen University, DE
Timeslots:

Abstract: This demonstration is going to showcase our "AVP64" Virtual Platform (VP), which models a multi-core ARMv8 (Cortex A72) system including several peripherals, such as an SDHCI and an ethernet controller. For the ARMv8 instruction set simulation a dynamic binary translation based solution is used. As the workload, the Xen hypervisor with two Linux Virtual Machines (VMs) is executed. Both VMs are connected to the simulation hosts' network subsystem via a virtual ethernet controller. One of the VMs executes a NodeJS-based server application offering a REST API via this network connection. An AngularJS client application on the host system can then connect to the server application to obtain and store data via the server's REST API. This data is read and written by the server application to the virtual SD Card connected to the SDHCI. For this, on SD card partition is passed to the VM through Xen's block device virtualization mechanism.

WALLANCE: An Alternative to Blockchain for IOT
Author:
Loic Dalmasso, Florent Bruguier, Pascal Benoit and Achraf Lamlih, Université de Montpellier, FR
Timeslots:

Abstract: Since the expansion of the Internet of Things (IoT), connected devices became smart and autonomous. Their exponentially increasing number and their use in many application domains result in a huge potential of cybersecurity threats. Taking into account the evolution of the IoT, security and interoperability are the main challenges, to ensure the reliability of the information. The blockchain technology provides a new approach to handle the trust in a decentralized network. However, current blockchain implementations cannot be used in IoT domain because of their huge need of computing power and storage utilization. This demonstrator presents a lightweight distributed ledger protocol dedicated to the IoT application, reducing the computing power and storage utilization, handling the scalability and ensuring the reliability of information.

See you at the University Booth!
University Booth Co-Chairs
Frédéric Pétrot, IMAG, FR and
Andreas Vörg, edacentrum, DE
university-booth@date-conference.com