UB01 Session 1

Label	Presentation Title Authors
UB01.1	FUZZING EMBEDDED BINARIES LEVERAGING SYSTEMC-BASED VIRTUAL PROTOTYPES Authors: Vladimir Herdt¹, Daniel Grosse² and Rolf Drechsler² ¹DFKI, DE; ²University of Bremen / DFKI GmbH, DE Abstract Verification of embedded Software (SW) binaries is very important. Mainly, simulation-based methods are employed that execute (randomly) generated test-cases on Virtual Prototypes (VPs). However, to enable a comprehensive VP-based verification, sophisticated test-case generation techniques need to be integrated. Our demonstrator combines state-of-the-art fuzzing techniques with SystemC-based VPs to enable a fast and accurate verification of embedded SW binaries. The fuzzing process is guided by the coverage of the embedded SW as well as the SystemC-based peripherals of the VP. The effectiveness of our approach is demonstrated by our experiments, using RISC-V SW binaries as an example. More information ...
UB01.2	SKELETOR: AN OPEN SOURCE EDA TOOL FLOW FROM HIERARCHY SPECIFICATION TO HDL DEVELOPMENT Authors: Ivan Rodriguez, Guillem Cabo, Javier Barrera, Jeremy Giesen, Alvaro Jover and Leonidas Kosmidis, BSC / UPC, ES Abstract Large hardware design projects have high overhead for project bootstrapping, requiring significant effort for translating hardware specifications to hardware design language (HDL) files and setting up their corresponding development and verification infrastructure. Skeletor (https://github.com/jaquerinte/Skeletor) is an open source EDA tool developed as a student project at UPC/BSC, which simplifies this process, by increasing developer's productivity and reducing typing errors, while at the same time lowers the bar for entry in hardware development. Skeletor uses a C/verilog-like language for the specification of the modules in a hardware project hierarchy and their connections, which is used to generate automatically the require skeleton of source files, their development and verification testbenches and simulation scripts. Integration with KiCad schematics and support for syntax highlighting in code editors simplifies further its use. This demo is linked with workshop W05. More information ...
UB01.3	LAGARTO: FIRST SILICON RISC-V ACADEMIC PROCESSOR DEVELOPED IN SPAIN Authors: Guillem Cabo Pitarch¹, Cristobal Ramirez Lazo¹, Julian Pavon Rivera¹, Vatistas Kostalabros¹, Carlos Rojas Morales¹, Miquel Moreto¹, Jaume Abella¹, Francisco J. Cazorla¹, Adrian Cristal¹, Roger Figueras¹, Alberto Gonzalez¹, Carles Hernandez¹, Cesar Hernandez², Neiel Leyva², Joan Marimon¹, Ricardo Martinez³, Jonnatan Mendoza¹, Francesc Moll⁴, Marco Antonio Ramirez², Carlos Rojas¹, Antonio Rubio⁴, Abraham Ruiz¹, Nehir Sonmez¹, Lluis Teres³, Osman Unsal⁵, Mateo Valero¹, Ivan Vargas¹ and Luis Villa² ¹BSC / UPC, ES; ²CIC-IPN, MX; ³IMB-CNM (CSIC), ES; ⁴UPC, ES; ⁵BSC, ES Abstract Open hardware is a possibility that has emerged in recent years and has the potential to be as disruptive as Linux was once, an open source software paradigm. If Linux managed to lessen the dependence of users in large companies providing software and software applications, it is envisioned that hardware based on ISAs open source can do the same in their own field. In the Lagarto tapeout four research institutions were involved: Centro de Investigación en Computación of the Mexican IPN, Centro Nacional de Microelectrónica of the CSIC, Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC). As a result, many bachelor, master and PhD students had the chance to achieve real-world experience with ASIC design and achieve a functional SoC. In the booth, you will find a live demo of the first ASIC and prototypes running on FPGA of the next versions of the SoC and core. More information ...
UB01.4	PARALLEL ALGORITHM FOR CNN INFERENCE AND ITS AUTOMATIC SYNTHESIS Authors: Takashi Matsumoto, Yukio Miyasaka, Xinpei Zhang and Masahiro Fujita, University of Tokyo, JP Abstract Recently, Convolutional Neural Network (CNN) has surpassed conventional methods in the field of image processing. This demonstration shows a new algorithm to calculate CNN inference using processing elements arranged and connected based on the topology of the convolution. They are connected in mesh and calculate CNN inference in a systolic way. The algorithm performs the convolution of all elements with the same output feature in parallel. We demonstrate a method to automatically synthesize an algorithm, which simultaneously performs the convolution and the communication of pixels for the computation of the next layer. We show with several sizes of input layers, kernels, and strides and confirmed that the correct algorithms were synthesized. The synthesis method is extended to the sparse kernel. The synthesized algorithm requires fewer cycles than the original algorithm. There were the more chances to reduce the number of cycles with the sparser kernel. More information ...
UB01.5	FASTHERMSIM: FAST AND ACCURATE THERMAL SIMULATIONS FROM CHIPLETS TO SYSTEM Authors: Yu-Min Lee, Chi-Wen Pan, Li-Rui Ho and Hong-Wen Chiou, National Chiao Tung University, TW Abstract Recently, owing to the scaling down of technology and 2.5D/3D integration, power densities and temperatures of chips have been increasing significantly. Though commercial computational fluid dynamics tools can provide accurate thermal maps, they may lead to inefficiency in thermal-aware design with huge runtime. Thus, we develop the chip/package/system-level thermal analyzer, called FasThermSim, which can assist you to improve your design under thermal constraints in pre/post-silicon stages. In FasThermSim, we consider three heat transfer modes, conduction, convection, and thermal radiation. We convert them to temperature-independent terms by linearization methods and build a compact thermal model (CTM). By applying numerical methods to the CTM, the steady-state and transient thermal profiles can be solved efficiently without loss of accuracy. Finally, an easy-to-use thermal analysis tool is implemented for your design, which is flexible and compatible, with the graphic user interface. More information ...
UB01.6	INTACT: A 96-CORE PROCESSOR WITH 6 CHIPLETS 3D-STACKED ON AN ACTIVE INTERPOSER AND A 16-CORE PROTOTYPE RUNNING GRAPHICAL OPERATING SYSTEM Authors: Eric Guthmuller¹, Pascal Vivet¹, César Fuguet¹, Yvain Thonnart¹, Gaël Pillonnet² and Fabien Clermidy¹ ¹Université Grenoble Alpes / CEA List, FR; ²Université Grenoble Alpes / CEA-Leti, FR Abstract We built a demonstrator for our 96-cores cache coherent 3D processor and a first prototype featuring 16 cores. The demonstrator consists in our 16-cores processor running commodity operating systems such as Linux and NetBSD on a PC-like motherboard with user-friendly devices such as a HDMI display, keyboard and mouse. A graphical desktop is displayed, and the user will interact with it through the keyboard and mouse. The demonstrator is able to run parallel applications to benchmark its performance in terms of scalability. The main innovation of our processor is its scalable cache coherent architecture based on distributed L2-caches and adaptive L3-caches. Additionally, the energy consumption is also measured and displayed by reading dynamically from the monitors of power-supply devices. Finally we will also show open packages of the 3D processor featuring 6 16-core chiplets (28 nm FDSOI) on an active interposer (65 nm) embedding Network-on-Chips, power management and IO controllers. More information ...
UB01.7	EEC: ENERGY EFFICIENT COMPUTING VIA DYNAMIC VOLTAGE SCALING AND IN-NETWORK OPTICAL PROCESSING Authors: Ryosuke Matsuo¹, Jun Shiomi¹, Yutaka Masuda² and Tohru Ishihara² ¹Kyoto University, JP; ²Nagoya University, JP Abstract This poster demonstration will show results of our two research projects. The first one is on a project of energy efficient computing. In this project we developed a power management algorithm which keeps the target processor always running at the most energy efficient operating point by appropriately tuning the supply voltage and threshold voltage under a specific performance constraint. This algorithm is applicable to wide variety of processor systems including high-end processors and low-end embedded processors. We will show the results obtained with actual RISC processors designed using a 65nm technology. The second one is on a project of in-network optical computing. We show optical functional units such as parallel multipliers and optical neural networks. Several key techniques for reducing the power consumption of optical circuits will be also presented. Finally, we will show the results of optical circuit simulation, which demonstrate the light speed operation of the circuits. More information ...
UB01.8	CATANIS: CAD TOOL FOR AUTOMATIC NETWORK SYNTHESIS Authors: Davide Quaglia, Enrico Fraccaroli, Filippo Nevi and Sohail Mushtaq, Università di Verona, IT Abstract The proliferation of communication technologies for embedded systems opened the way for new applications, e.g., Smart Cities and Industry 4.0. In such applications hundreds or thousands of smart devices interact together through different types of channels and protocols. This increasing communication complexity forces computer-aided design methodologies to scale up from embedded systems in isolation to the global inter-connected system. Network Synthesis is the methodology to optimally allocate functionality onto network nodes and define the communication infrastructure among them. This booth will demonstrate the functionality of a graphic tool for automatic network synthesis developed by the Computer Science Department of University of Verona. It allows to graphically specify the communication requirements of a smart space (e.g., its map can be considered) in terms of sensing and computation tasks together with a library of node types and communication protocols to be used. More information ...
UB01.9	PRE-IMPACT FALL DETECTION ARCHITECTURE BASED ON NEUROMUSCULAR CONNECTIVITY STATISTICS Authors: Giovanni Mezzina, Sardar Mehboob Hussain and Daniela De Venuto, Politecnico di Bari, IT Abstract In this demonstration, we propose an innovative multi-sensor architecture operating in the field of pre-impact fall detection (PIFD). The proposed architecture jointly analyzes cortical and muscular involvement when unexpected slippages occur during steady walking. The EEG and EMG are acquired through wearable and wireless devices. The control unit consists of an STM32L4 microcontroller and a Simulink modeling. The C implements the EMG computation, while the cortical analysis and the final classification were entrusted to the Simulink model. The EMG computation block translates EMGs into binary signals, which are used both to enable cortical analyses and to extract a score to distinguish "standard" muscular behaviors from anomalous ones. The Simulink model evaluates the cortical responsiveness in five bands of interest and implements the logical-based network classifier. The system, tested on 6 healthy subjects, shows an accuracy of 96.21% and a detection time of ~371 ms. More information ...
UB01.10	JOINTER: JOINING FLEXIBLE MONITORS WITH HETEROGENEOUS ARCHITECTURES Authors: Giacomo Valente¹, Tiziana Fanni², Carlo Sau³, Claudio Rubattu², Francesca Palumbo² and Luigi Pomante¹ ¹Università degli Studi dell'Aquila, IT; ²Università degli Studi di Sassari, IT; ³Università degli Studi di Cagliari, IT Abstract As embedded systems grow more complex and shift toward heterogeneous architectures, understanding workload performance characteristics becomes increasingly difficult. In this regard, run-time monitoring systems can support on obtaining the desired visibility to characterize a system. This demo presents a framework that allows to develop complex heterogeneous architectures composed of programmable processors and dedicated accelerators on FPGA, together with customizable monitoring systems, keeping under control the introduced overhead. The whole development flow (and related prototypal EDA tools), that starts with the accelerators creation using a dataflow model, in parallel with the monitoring system customization using a library of elements, showing also the final joining, will be shown. Moreover, a comparison among different monitoring systems functionalities on different architectures developed on Zynq7000 SoC will be illustrated. More information ...
12:30	End of session

Label

Presentation Title
Authors

UB01.1

FUZZING EMBEDDED BINARIES LEVERAGING SYSTEMC-BASED VIRTUAL PROTOTYPES
Authors:
Vladimir Herdt¹, Daniel Grosse² and Rolf Drechsler²
¹DFKI, DE; ²University of Bremen / DFKI GmbH, DE
Abstract
Verification of embedded Software (SW) binaries is very important. Mainly, simulation-based methods are employed that execute (randomly) generated test-cases on Virtual Prototypes (VPs). However, to enable a comprehensive VP-based verification, sophisticated test-case generation techniques need to be integrated. Our demonstrator combines state-of-the-art fuzzing techniques with SystemC-based VPs to enable a fast and accurate verification of embedded SW binaries. The fuzzing process is guided by the coverage of the embedded SW as well as the SystemC-based peripherals of the VP. The effectiveness of our approach is demonstrated by our experiments, using RISC-V SW binaries as an example.
More information ...

UB01.2

SKELETOR: AN OPEN SOURCE EDA TOOL FLOW FROM HIERARCHY SPECIFICATION TO HDL DEVELOPMENT
Authors:
Ivan Rodriguez, Guillem Cabo, Javier Barrera, Jeremy Giesen, Alvaro Jover and Leonidas Kosmidis, BSC / UPC, ES
Abstract
Large hardware design projects have high overhead for project bootstrapping, requiring significant effort for translating hardware specifications to hardware design language (HDL) files and setting up their corresponding development and verification infrastructure. Skeletor (https://github.com/jaquerinte/Skeletor) is an open source EDA tool developed as a student project at UPC/BSC, which simplifies this process, by increasing developer's productivity and reducing typing errors, while at the same time lowers the bar for entry in hardware development. Skeletor uses a C/verilog-like language for the specification of the modules in a hardware project hierarchy and their connections, which is used to generate automatically the require skeleton of source files, their development and verification testbenches and simulation scripts. Integration with KiCad schematics and support for syntax highlighting in code editors simplifies further its use. This demo is linked with workshop W05.
More information ...

UB01.3

LAGARTO: FIRST SILICON RISC-V ACADEMIC PROCESSOR DEVELOPED IN SPAIN
Authors:
Guillem Cabo Pitarch¹, Cristobal Ramirez Lazo¹, Julian Pavon Rivera¹, Vatistas Kostalabros¹, Carlos Rojas Morales¹, Miquel Moreto¹, Jaume Abella¹, Francisco J. Cazorla¹, Adrian Cristal¹, Roger Figueras¹, Alberto Gonzalez¹, Carles Hernandez¹, Cesar Hernandez², Neiel Leyva², Joan Marimon¹, Ricardo Martinez³, Jonnatan Mendoza¹, Francesc Moll⁴, Marco Antonio Ramirez², Carlos Rojas¹, Antonio Rubio⁴, Abraham Ruiz¹, Nehir Sonmez¹, Lluis Teres³, Osman Unsal⁵, Mateo Valero¹, Ivan Vargas¹ and Luis Villa²
¹BSC / UPC, ES; ²CIC-IPN, MX; ³IMB-CNM (CSIC), ES; ⁴UPC, ES; ⁵BSC, ES
Abstract
Open hardware is a possibility that has emerged in recent years and has the potential to be as disruptive as Linux was once, an open source software paradigm. If Linux managed to lessen the dependence of users in large companies providing software and software applications, it is envisioned that hardware based on ISAs open source can do the same in their own field. In the Lagarto tapeout four research institutions were involved: Centro de Investigación en Computación of the Mexican IPN, Centro Nacional de Microelectrónica of the CSIC, Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC). As a result, many bachelor, master and PhD students had the chance to achieve real-world experience with ASIC design and achieve a functional SoC. In the booth, you will find a live demo of the first ASIC and prototypes running on FPGA of the next versions of the SoC and core.
More information ...

UB01.4

PARALLEL ALGORITHM FOR CNN INFERENCE AND ITS AUTOMATIC SYNTHESIS
Authors:
Takashi Matsumoto, Yukio Miyasaka, Xinpei Zhang and Masahiro Fujita, University of Tokyo, JP
Abstract
Recently, Convolutional Neural Network (CNN) has surpassed conventional methods in the field of image processing. This demonstration shows a new algorithm to calculate CNN inference using processing elements arranged and connected based on the topology of the convolution. They are connected in mesh and calculate CNN inference in a systolic way. The algorithm performs the convolution of all elements with the same output feature in parallel. We demonstrate a method to automatically synthesize an algorithm, which simultaneously performs the convolution and the communication of pixels for the computation of the next layer. We show with several sizes of input layers, kernels, and strides and confirmed that the correct algorithms were synthesized. The synthesis method is extended to the sparse kernel. The synthesized algorithm requires fewer cycles than the original algorithm. There were the more chances to reduce the number of cycles with the sparser kernel.
More information ...

UB01.5

FASTHERMSIM: FAST AND ACCURATE THERMAL SIMULATIONS FROM CHIPLETS TO SYSTEM
Authors:
Yu-Min Lee, Chi-Wen Pan, Li-Rui Ho and Hong-Wen Chiou, National Chiao Tung University, TW
Abstract
Recently, owing to the scaling down of technology and 2.5D/3D integration, power densities and temperatures of chips have been increasing significantly. Though commercial computational fluid dynamics tools can provide accurate thermal maps, they may lead to inefficiency in thermal-aware design with huge runtime. Thus, we develop the chip/package/system-level thermal analyzer, called FasThermSim, which can assist you to improve your design under thermal constraints in pre/post-silicon stages. In FasThermSim, we consider three heat transfer modes, conduction, convection, and thermal radiation. We convert them to temperature-independent terms by linearization methods and build a compact thermal model (CTM). By applying numerical methods to the CTM, the steady-state and transient thermal profiles can be solved efficiently without loss of accuracy. Finally, an easy-to-use thermal analysis tool is implemented for your design, which is flexible and compatible, with the graphic user interface.
More information ...

UB01.6

INTACT: A 96-CORE PROCESSOR WITH 6 CHIPLETS 3D-STACKED ON AN ACTIVE INTERPOSER AND A 16-CORE PROTOTYPE RUNNING GRAPHICAL OPERATING SYSTEM
Authors:
Eric Guthmuller¹, Pascal Vivet¹, César Fuguet¹, Yvain Thonnart¹, Gaël Pillonnet² and Fabien Clermidy¹
¹Université Grenoble Alpes / CEA List, FR; ²Université Grenoble Alpes / CEA-Leti, FR
Abstract
We built a demonstrator for our 96-cores cache coherent 3D processor and a first prototype featuring 16 cores. The demonstrator consists in our 16-cores processor running commodity operating systems such as Linux and NetBSD on a PC-like motherboard with user-friendly devices such as a HDMI display, keyboard and mouse. A graphical desktop is displayed, and the user will interact with it through the keyboard and mouse. The demonstrator is able to run parallel applications to benchmark its performance in terms of scalability. The main innovation of our processor is its scalable cache coherent architecture based on distributed L2-caches and adaptive L3-caches. Additionally, the energy consumption is also measured and displayed by reading dynamically from the monitors of power-supply devices. Finally we will also show open packages of the 3D processor featuring 6 16-core chiplets (28 nm FDSOI) on an active interposer (65 nm) embedding Network-on-Chips, power management and IO controllers.
More information ...

UB01.7

EEC: ENERGY EFFICIENT COMPUTING VIA DYNAMIC VOLTAGE SCALING AND IN-NETWORK OPTICAL PROCESSING
Authors:
Ryosuke Matsuo¹, Jun Shiomi¹, Yutaka Masuda² and Tohru Ishihara²
¹Kyoto University, JP; ²Nagoya University, JP
Abstract
This poster demonstration will show results of our two research projects. The first one is on a project of energy efficient computing. In this project we developed a power management algorithm which keeps the target processor always running at the most energy efficient operating point by appropriately tuning the supply voltage and threshold voltage under a specific performance constraint. This algorithm is applicable to wide variety of processor systems including high-end processors and low-end embedded processors. We will show the results obtained with actual RISC processors designed using a 65nm technology. The second one is on a project of in-network optical computing. We show optical functional units such as parallel multipliers and optical neural networks. Several key techniques for reducing the power consumption of optical circuits will be also presented. Finally, we will show the results of optical circuit simulation, which demonstrate the light speed operation of the circuits.
More information ...

UB01.8

CATANIS: CAD TOOL FOR AUTOMATIC NETWORK SYNTHESIS
Authors:
Davide Quaglia, Enrico Fraccaroli, Filippo Nevi and Sohail Mushtaq, Università di Verona, IT
Abstract
The proliferation of communication technologies for embedded systems opened the way for new applications, e.g., Smart Cities and Industry 4.0. In such applications hundreds or thousands of smart devices interact together through different types of channels and protocols. This increasing communication complexity forces computer-aided design methodologies to scale up from embedded systems in isolation to the global inter-connected system. Network Synthesis is the methodology to optimally allocate functionality onto network nodes and define the communication infrastructure among them. This booth will demonstrate the functionality of a graphic tool for automatic network synthesis developed by the Computer Science Department of University of Verona. It allows to graphically specify the communication requirements of a smart space (e.g., its map can be considered) in terms of sensing and computation tasks together with a library of node types and communication protocols to be used.
More information ...

UB01.9

PRE-IMPACT FALL DETECTION ARCHITECTURE BASED ON NEUROMUSCULAR CONNECTIVITY STATISTICS
Authors:
Giovanni Mezzina, Sardar Mehboob Hussain and Daniela De Venuto, Politecnico di Bari, IT
Abstract
In this demonstration, we propose an innovative multi-sensor architecture operating in the field of pre-impact fall detection (PIFD). The proposed architecture jointly analyzes cortical and muscular involvement when unexpected slippages occur during steady walking. The EEG and EMG are acquired through wearable and wireless devices. The control unit consists of an STM32L4 microcontroller and a Simulink modeling. The C implements the EMG computation, while the cortical analysis and the final classification were entrusted to the Simulink model. The EMG computation block translates EMGs into binary signals, which are used both to enable cortical analyses and to extract a score to distinguish "standard" muscular behaviors from anomalous ones. The Simulink model evaluates the cortical responsiveness in five bands of interest and implements the logical-based network classifier. The system, tested on 6 healthy subjects, shows an accuracy of 96.21% and a detection time of ~371 ms.
More information ...

UB01.10

JOINTER: JOINING FLEXIBLE MONITORS WITH HETEROGENEOUS ARCHITECTURES
Authors:
Giacomo Valente¹, Tiziana Fanni², Carlo Sau³, Claudio Rubattu², Francesca Palumbo² and Luigi Pomante¹
¹Università degli Studi dell'Aquila, IT; ²Università degli Studi di Sassari, IT; ³Università degli Studi di Cagliari, IT
Abstract
As embedded systems grow more complex and shift toward heterogeneous architectures, understanding workload performance characteristics becomes increasingly difficult. In this regard, run-time monitoring systems can support on obtaining the desired visibility to characterize a system. This demo presents a framework that allows to develop complex heterogeneous architectures composed of programmable processors and dedicated accelerators on FPGA, together with customizable monitoring systems, keeping under control the introduced overhead. The whole development flow (and related prototypal EDA tools), that starts with the accelerators creation using a dataflow model, in parallel with the monitoring system customization using a library of elements, showing also the final joining, will be shown. Moreover, a comparison among different monitoring systems functionalities on different architectures developed on Zynq7000 SoC will be illustrated.
More information ...

12:30

End of session