UB02 Session 2

Label	Presentation Title Authors
UB02.1	FLETCHER: TRANSPARENT GENERATION OF HARDWARE INTERFACES FOR ACCELERATING BIG DATA APPLICATIONS Authors: Zaid Al-Ars, Johan Peltenburg, Jeroen van Straten, Matthijs Brobbel and Joost Hoozemans, TU Delft, NL Abstract This demo created by TUDelft is a software-hardware framework to allow for an efficient integration of FPGA hardware accelerators both on edge devices as well as in the cloud. The framework is called Fletcher, which is used to automatically generate data communication interfaces in hardware based on the widely used big data format Apache Arrow. This provides two distinct advantages. On the one hand, since the accelerators use the same data format as the software, data communication bottlenecks can be reduced. On the other hand, since a standardized data format is used, this allows for easy-to-use interfaces on the accelerator side, thereby reducing the design and development time. The demo shows how to use Fletcher for big data acceleration to decompress Snappy compressed files and perform filtering on the whole Wikipedia body of text. The demo enables 25 GB/s processing throughput. More information ...
UB02.2	A DIGITAL MICROFLUIDICS BIO-COMPUTING PLATFORM Authors: Georgi Tanev, Luca Pezzarossa, Winnie Edith Svendsen and Jan Madsen, TU Denmark, DK Abstract Digital microfluidics is a lab-on-a-chip (LOC) technology used to actuate small amounts of liquids on an array of individually addressable electrodes. Microliter sized droplets can be programmatically dispensed, moved, mixed, split, in a controlled environment which combined with miniaturized sensing techniques makes LOC suitable for a broad range of applications in the field of medical diagnostics and synthetic biology. Furthermore, a programmable digital microfluidics platform holds the potential to add a "fluidic subsystem" to the classical computation model thus opening the doors for cyber-physical bio-processors. To facilitate the programming and operation of such bio-fluidic computing, we propose dedicated instruction set architecture and virtual machine. A set of digital microfluidic core instructions as well as classic computing operations are executed on a virtual machine, which decouples the protocol execution from the LOC functionality. More information ...
UB02.3	VIRTUAL PLATFORMS FOR COMPLEX SOFTWARE STACKS Authors: Lukas Jünger and Rainer Leupers, RWTH Aachen University, DE Abstract This demonstration is going to showcase our "AVP64" Virtual Platform (VP), which models a multi-core ARMv8 (Cortex A72) system including several peripherals, such as an SDHCI and an ethernet controller. For the ARMv8 instruction set simulation a dynamic binary translation based solution is used. As the workload, the Xen hypervisor with two Linux Virtual Machines (VMs) is executed. Both VMs are connected to the simulation hosts' network subsystem via a virtual ethernet controller. One of the VMs executes a NodeJS-based server application offering a REST API via this network connection. An AngularJS client application on the host system can then connect to the server application to obtain and store data via the server's REST API. This data is read and written by the server application to the virtual SD Card connected to the SDHCI. For this, one SD card partition is passed to the VM through Xen's block device virtualization mechanism. More information ...
UB02.4	FPGA-DSP: A PROTOTYPE FOR HIGH QUALITY DIGITAL AUDIO SIGNAL PROCESSING BASED ON AN FPGA Authors: Bernhard Riess and Christian Epe, University of Applied Sciences Düsseldorf, DE Abstract Our demonstrator presents a prototype of a new digital audio signal processing system which is based on an FPGA. It achieves a performance that up to now has been preserved to costly high-end solutions. Main components of the system are an analog/digital converter, an FPGA to perform the digital signal processing tasks, and a digital/analog converter implemented on a printed circuit board. To demonstrate the quality of the audio signal processing, infinite impulse response, finite impulse response filters and a delay effect were realized in VHDL. More advanced signal processing systems can easily be implemented due to the flexibility of the FPGA. Measured results were compared to state of the art audio signal processing systems with respect to size, performance and cost. Our prototype outperforms systems of the same price in quality, and outperforms systems of the same quality at a maximum of 20% of the price. Examples of the performance of our system can be heard in the demo. More information ...
UB02.5	AT-SPEED DFT ARCHITECTURE FOR BUNDLED-DATA CIRCUITS Authors: Ricardo Aquino Guazzelli and Laurent Fesquet, Université Grenoble Alpes, FR Abstract At-speed testing for asynchronous circuits is still an open concern in the literature. Due to its timing constraints between control and data paths, Design for Testability (DfT) methodologies must test both control and data paths at the same time in order to guarantee the circuit correctness. As Process Voltage Temperature (PVT) variations significantly impact circuit design in newer CMOS technologies and low-power techniques such as voltage scaling, the timing constraints between control and data paths must be tested after fabrication not only under nominal conditions but through a range of operating conditions. This work explores an at-speed testing approach for bundled data circuits, targetting the micropipeline template. The main target of this test approach focuses on whether the sized delay lines in control paths respect the local timing assumptions of the data paths. More information ...
UB02.6	INTACT: A 96-CORE PROCESSOR WITH 6 CHIPLETS 3D-STACKED ON AN ACTIVE INTERPOSER AND A 16-CORE PROTOTYPE RUNNING GRAPHICAL OPERATING SYSTEM Authors: Eric Guthmuller¹, Pascal Vivet¹, César Fuguet¹, Yvain Thonnart¹, Gaël Pillonnet² and Fabien Clermidy¹ ¹Université Grenoble Alpes / CEA List, FR; ²Université Grenoble Alpes / CEA-Leti, FR Abstract We built a demonstrator for our 96-cores cache coherent 3D processor and a first prototype featuring 16 cores. The demonstrator consists in our 16-cores processor running commodity operating systems such as Linux and NetBSD on a PC-like motherboard with user-friendly devices such as a HDMI display, keyboard and mouse. A graphical desktop is displayed, and the user will interact with it through the keyboard and mouse. The demonstrator is able to run parallel applications to benchmark its performance in terms of scalability. The main innovation of our processor is its scalable cache coherent architecture based on distributed L2-caches and adaptive L3-caches. Additionally, the energy consumption is also measured and displayed by reading dynamically from the monitors of power-supply devices. Finally we will also show open packages of the 3D processor featuring 6 16-core chiplets (28 nm FDSOI) on an active interposer (65 nm) embedding Network-on-Chips, power management and IO controllers. More information ...
UB02.7	GENERATING ASYNCHRONOUS CIRCUITS FROM CATAPULT Authors: Yoan Decoudu¹, Jean Simatic², Katell Morin-Allory³ and Laurent Fesquet³ ¹University Grenoble Alpes, FR; ²HawAI.Tech, FR; ³Université Grenoble Alpes, FR Abstract In order to spread asynchronous circuit design to a large community of designers, High-Level Synthesis (HLS) is probably a good choice because it requires limited design technical skills. HLS usually provides an RTL description, which includes a data-path and a control-path. The desynchronization process is only applied to the control-path, which is a Finite State Machine (FSM). This method is sufficient to make asynchronous the circuit. Indeed, data are processed step by step in the pipeline stages, thanks to the desynchronized FSM. Thus, the data-path computation time is no longer related to the clock period but rather to the average time for processing data into the pipeline. This tends to improve speed when the pipeline stages are not well-balanced. Moreover, our approach helps to quickly designing data-driven circuits while maintaining a reasonable cost, a similar area and a short time-to-market. More information ...
UB02.8	WALLANCE: AN ALTERNATIVE TO BLOCKCHAIN FOR IOT Authors: Loic Dalmasso, Florent Bruguier, Pascal Benoit and Achraf Lamlih, Université de Montpellier, FR Abstract Since the expansion of the Internet of Things (IoT), connected devices became smart and autonomous. Their exponentially increasing number and their use in many application domains result in a huge potential of cybersecurity threats. Taking into account the evolution of the IoT, security and interoperability are the main challenges, to ensure the reliability of the information. The blockchain technology provides a new approach to handle the trust in a decentralized network. However, current blockchain implementations cannot be used in IoT domain because of their huge need of computing power and storage utilization. This demonstrator presents a lightweight distributed ledger protocol dedicated to the IoT application, reducing the computing power and storage utilization, handling the scalability and ensuring the reliability of information. More information ...
UB02.9	PRE-IMPACT FALL DETECTION ARCHITECTURE BASED ON NEUROMUSCULAR CONNECTIVITY STATISTICS Authors: Giovanni Mezzina, Sardar Mehboob Hussain and Daniela De Venuto, Politecnico di Bari, IT Abstract In this demonstration, we propose an innovative multi-sensor architecture operating in the field of pre-impact fall detection (PIFD). The proposed architecture jointly analyzes cortical and muscular involvement when unexpected slippages occur during steady walking. The EEG and EMG are acquired through wearable and wireless devices. The control unit consists of an STM32L4 microcontroller and a Simulink modeling. The C implements the EMG computation, while the cortical analysis and the final classification were entrusted to the Simulink model. The EMG computation block translates EMGs into binary signals, which are used both to enable cortical analyses and to extract a score to distinguish "standard" muscular behaviors from anomalous ones. The Simulink model evaluates the cortical responsiveness in five bands of interest and implements the logical-based network classifier. The system, tested on 6 healthy subjects, shows an accuracy of 96.21% and a detection time of ~371 ms. More information ...
UB02.10	JOINTER: JOINING FLEXIBLE MONITORS WITH HETEROGENEOUS ARCHITECTURES Authors: Giacomo Valente¹, Tiziana Fanni², Carlo Sau³, Claudio Rubattu², Francesca Palumbo² and Luigi Pomante¹ ¹Università degli Studi dell'Aquila, IT; ²Università degli Studi di Sassari, IT; ³Università degli Studi di Cagliari, IT Abstract As embedded systems grow more complex and shift toward heterogeneous architectures, understanding workload performance characteristics becomes increasingly difficult. In this regard, run-time monitoring systems can support on obtaining the desired visibility to characterize a system. This demo presents a framework that allows to develop complex heterogeneous architectures composed of programmable processors and dedicated accelerators on FPGA, together with customizable monitoring systems, keeping under control the introduced overhead. The whole development flow (and related prototypal EDA tools), that starts with the accelerators creation using a dataflow model, in parallel with the monitoring system customization using a library of elements, showing also the final joining, will be shown. Moreover, a comparison among different monitoring systems functionalities on different architectures developed on Zynq7000 SoC will be illustrated. More information ...
15:00	End of session

Label

Presentation Title
Authors

UB02.1

FLETCHER: TRANSPARENT GENERATION OF HARDWARE INTERFACES FOR ACCELERATING BIG DATA APPLICATIONS
Authors:
Zaid Al-Ars, Johan Peltenburg, Jeroen van Straten, Matthijs Brobbel and Joost Hoozemans, TU Delft, NL
Abstract
This demo created by TUDelft is a software-hardware framework to allow for an efficient integration of FPGA hardware accelerators both on edge devices as well as in the cloud. The framework is called Fletcher, which is used to automatically generate data communication interfaces in hardware based on the widely used big data format Apache Arrow. This provides two distinct advantages. On the one hand, since the accelerators use the same data format as the software, data communication bottlenecks can be reduced. On the other hand, since a standardized data format is used, this allows for easy-to-use interfaces on the accelerator side, thereby reducing the design and development time. The demo shows how to use Fletcher for big data acceleration to decompress Snappy compressed files and perform filtering on the whole Wikipedia body of text. The demo enables 25 GB/s processing throughput.
More information ...

UB02.2

A DIGITAL MICROFLUIDICS BIO-COMPUTING PLATFORM
Authors:
Georgi Tanev, Luca Pezzarossa, Winnie Edith Svendsen and Jan Madsen, TU Denmark, DK
Abstract
Digital microfluidics is a lab-on-a-chip (LOC) technology used to actuate small amounts of liquids on an array of individually addressable electrodes. Microliter sized droplets can be programmatically dispensed, moved, mixed, split, in a controlled environment which combined with miniaturized sensing techniques makes LOC suitable for a broad range of applications in the field of medical diagnostics and synthetic biology. Furthermore, a programmable digital microfluidics platform holds the potential to add a "fluidic subsystem" to the classical computation model thus opening the doors for cyber-physical bio-processors. To facilitate the programming and operation of such bio-fluidic computing, we propose dedicated instruction set architecture and virtual machine. A set of digital microfluidic core instructions as well as classic computing operations are executed on a virtual machine, which decouples the protocol execution from the LOC functionality.
More information ...

UB02.3

VIRTUAL PLATFORMS FOR COMPLEX SOFTWARE STACKS
Authors:
Lukas Jünger and Rainer Leupers, RWTH Aachen University, DE
Abstract
This demonstration is going to showcase our "AVP64" Virtual Platform (VP), which models a multi-core ARMv8 (Cortex A72) system including several peripherals, such as an SDHCI and an ethernet controller. For the ARMv8 instruction set simulation a dynamic binary translation based solution is used. As the workload, the Xen hypervisor with two Linux Virtual Machines (VMs) is executed. Both VMs are connected to the simulation hosts' network subsystem via a virtual ethernet controller. One of the VMs executes a NodeJS-based server application offering a REST API via this network connection. An AngularJS client application on the host system can then connect to the server application to obtain and store data via the server's REST API. This data is read and written by the server application to the virtual SD Card connected to the SDHCI. For this, one SD card partition is passed to the VM through Xen's block device virtualization mechanism.
More information ...

UB02.4

FPGA-DSP: A PROTOTYPE FOR HIGH QUALITY DIGITAL AUDIO SIGNAL PROCESSING BASED ON AN FPGA
Authors:
Bernhard Riess and Christian Epe, University of Applied Sciences Düsseldorf, DE
Abstract
Our demonstrator presents a prototype of a new digital audio signal processing system which is based on an FPGA. It achieves a performance that up to now has been preserved to costly high-end solutions. Main components of the system are an analog/digital converter, an FPGA to perform the digital signal processing tasks, and a digital/analog converter implemented on a printed circuit board. To demonstrate the quality of the audio signal processing, infinite impulse response, finite impulse response filters and a delay effect were realized in VHDL. More advanced signal processing systems can easily be implemented due to the flexibility of the FPGA. Measured results were compared to state of the art audio signal processing systems with respect to size, performance and cost. Our prototype outperforms systems of the same price in quality, and outperforms systems of the same quality at a maximum of 20% of the price. Examples of the performance of our system can be heard in the demo.
More information ...

UB02.5

AT-SPEED DFT ARCHITECTURE FOR BUNDLED-DATA CIRCUITS
Authors:
Ricardo Aquino Guazzelli and Laurent Fesquet, Université Grenoble Alpes, FR
Abstract
At-speed testing for asynchronous circuits is still an open concern in the literature. Due to its timing constraints between control and data paths, Design for Testability (DfT) methodologies must test both control and data paths at the same time in order to guarantee the circuit correctness. As Process Voltage Temperature (PVT) variations significantly impact circuit design in newer CMOS technologies and low-power techniques such as voltage scaling, the timing constraints between control and data paths must be tested after fabrication not only under nominal conditions but through a range of operating conditions. This work explores an at-speed testing approach for bundled data circuits, targetting the micropipeline template. The main target of this test approach focuses on whether the sized delay lines in control paths respect the local timing assumptions of the data paths.
More information ...

UB02.6

INTACT: A 96-CORE PROCESSOR WITH 6 CHIPLETS 3D-STACKED ON AN ACTIVE INTERPOSER AND A 16-CORE PROTOTYPE RUNNING GRAPHICAL OPERATING SYSTEM
Authors:
Eric Guthmuller¹, Pascal Vivet¹, César Fuguet¹, Yvain Thonnart¹, Gaël Pillonnet² and Fabien Clermidy¹
¹Université Grenoble Alpes / CEA List, FR; ²Université Grenoble Alpes / CEA-Leti, FR
Abstract
We built a demonstrator for our 96-cores cache coherent 3D processor and a first prototype featuring 16 cores. The demonstrator consists in our 16-cores processor running commodity operating systems such as Linux and NetBSD on a PC-like motherboard with user-friendly devices such as a HDMI display, keyboard and mouse. A graphical desktop is displayed, and the user will interact with it through the keyboard and mouse. The demonstrator is able to run parallel applications to benchmark its performance in terms of scalability. The main innovation of our processor is its scalable cache coherent architecture based on distributed L2-caches and adaptive L3-caches. Additionally, the energy consumption is also measured and displayed by reading dynamically from the monitors of power-supply devices. Finally we will also show open packages of the 3D processor featuring 6 16-core chiplets (28 nm FDSOI) on an active interposer (65 nm) embedding Network-on-Chips, power management and IO controllers.
More information ...

UB02.7

GENERATING ASYNCHRONOUS CIRCUITS FROM CATAPULT
Authors:
Yoan Decoudu¹, Jean Simatic², Katell Morin-Allory³ and Laurent Fesquet³
¹University Grenoble Alpes, FR; ²HawAI.Tech, FR; ³Université Grenoble Alpes, FR
Abstract
In order to spread asynchronous circuit design to a large community of designers, High-Level Synthesis (HLS) is probably a good choice because it requires limited design technical skills. HLS usually provides an RTL description, which includes a data-path and a control-path. The desynchronization process is only applied to the control-path, which is a Finite State Machine (FSM). This method is sufficient to make asynchronous the circuit. Indeed, data are processed step by step in the pipeline stages, thanks to the desynchronized FSM. Thus, the data-path computation time is no longer related to the clock period but rather to the average time for processing data into the pipeline. This tends to improve speed when the pipeline stages are not well-balanced. Moreover, our approach helps to quickly designing data-driven circuits while maintaining a reasonable cost, a similar area and a short time-to-market.
More information ...

UB02.8

WALLANCE: AN ALTERNATIVE TO BLOCKCHAIN FOR IOT
Authors:
Loic Dalmasso, Florent Bruguier, Pascal Benoit and Achraf Lamlih, Université de Montpellier, FR
Abstract
Since the expansion of the Internet of Things (IoT), connected devices became smart and autonomous. Their exponentially increasing number and their use in many application domains result in a huge potential of cybersecurity threats. Taking into account the evolution of the IoT, security and interoperability are the main challenges, to ensure the reliability of the information. The blockchain technology provides a new approach to handle the trust in a decentralized network. However, current blockchain implementations cannot be used in IoT domain because of their huge need of computing power and storage utilization. This demonstrator presents a lightweight distributed ledger protocol dedicated to the IoT application, reducing the computing power and storage utilization, handling the scalability and ensuring the reliability of information.
More information ...

UB02.9

PRE-IMPACT FALL DETECTION ARCHITECTURE BASED ON NEUROMUSCULAR CONNECTIVITY STATISTICS
Authors:
Giovanni Mezzina, Sardar Mehboob Hussain and Daniela De Venuto, Politecnico di Bari, IT
Abstract
In this demonstration, we propose an innovative multi-sensor architecture operating in the field of pre-impact fall detection (PIFD). The proposed architecture jointly analyzes cortical and muscular involvement when unexpected slippages occur during steady walking. The EEG and EMG are acquired through wearable and wireless devices. The control unit consists of an STM32L4 microcontroller and a Simulink modeling. The C implements the EMG computation, while the cortical analysis and the final classification were entrusted to the Simulink model. The EMG computation block translates EMGs into binary signals, which are used both to enable cortical analyses and to extract a score to distinguish "standard" muscular behaviors from anomalous ones. The Simulink model evaluates the cortical responsiveness in five bands of interest and implements the logical-based network classifier. The system, tested on 6 healthy subjects, shows an accuracy of 96.21% and a detection time of ~371 ms.
More information ...

UB02.10

JOINTER: JOINING FLEXIBLE MONITORS WITH HETEROGENEOUS ARCHITECTURES
Authors:
Giacomo Valente¹, Tiziana Fanni², Carlo Sau³, Claudio Rubattu², Francesca Palumbo² and Luigi Pomante¹
¹Università degli Studi dell'Aquila, IT; ²Università degli Studi di Sassari, IT; ³Università degli Studi di Cagliari, IT
Abstract
As embedded systems grow more complex and shift toward heterogeneous architectures, understanding workload performance characteristics becomes increasingly difficult. In this regard, run-time monitoring systems can support on obtaining the desired visibility to characterize a system. This demo presents a framework that allows to develop complex heterogeneous architectures composed of programmable processors and dedicated accelerators on FPGA, together with customizable monitoring systems, keeping under control the introduced overhead. The whole development flow (and related prototypal EDA tools), that starts with the accelerators creation using a dataflow model, in parallel with the monitoring system customization using a library of elements, showing also the final joining, will be shown. Moreover, a comparison among different monitoring systems functionalities on different architectures developed on Zynq7000 SoC will be illustrated.
More information ...

15:00

End of session