UB10 Session 10

Label	Presentation Title Authors
UB10.1	TAPASCO: THE OPEN-SOURCE TASK-PARALLEL SYSTEM COMPOSER FRAMEWORK Authors: Carsten Heinz, Lukas Sommer, Lukas Weber, Jaco Hofmann and Andreas Koch, TU Darmstadt, DE Abstract Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators. More information ...
UB10.2	RESCUED: A RESCUE DEMONSTRATOR FOR INTERDEPENDENT ASPECTS OF RELIABILITY, SECURITY AND QUALITY TOWARDS A COMPLETE EDA FLOW Authors: Nevin George¹, Guilherme Cardoso Medeiros², Junchao Chen³, Josie Esteban Rodriguez Condia⁴, Thomas Lange⁵, Aleksa Damljanovic⁴, Raphael Segabinazzi Ferreira¹, Aneesh Balakrishnan⁵, Xinhui Lai⁶, Shayesteh Masoumian⁷, Dmytro Petryk³, Troya Cagil Koylu², Felipe Augusto da Silva⁸, Ahmet Cagri Bagbaba⁸, Cemil Cem Gürsoy⁶, Said Hamdioui², Mottaqiallah Taouil², Milos Krstic³, Peter Langendoerfer³, Zoya Dyka³, Marcelo Brandalero¹, Michael Hübner¹, Jörg Nolte¹, Heinrich Theodor Vierhaus¹, Matteo Sonza Reorda⁴, Giovanni Squillero⁴, Luca Sterpone⁴, Jaan Raik⁶, Dan Alexandrescu⁵, Maximilien Glorieux⁵, Georgios Selimis⁷, Geert-Jan Schrijen⁷, Anton Klotz⁸, Christian Sauer⁸ and Maksim Jenihhin⁶ ¹Brandenburg University of Technology Cottbus-Senftenberg, DE; ²TU Delft, NL; ³Leibniz-Institut für innovative Mikroelektronik, DE; ⁴Politecnico di Torino, IT; ⁵IROC Technologies, FR; ⁶Tallinn University of Technology, EE; ⁷Intrinsic ID, NL; ⁸Cadence Design Systems GmbH, DE Abstract The demonstrator highlights the various interdependent aspects of Reliability, Security and Quality in nanoelectronics system design within an EDA toolset and a processor architecture setup. The compelling need of attention towards these three aspects of nanoelectronic systems have been ever more pronounced over extreme miniaturization of technologies. Further, such systems have exploded in numbers with IoT devices, heavy and analogous interaction with the external physical world, complex safety-critical applications, and Artificial intelligence applications. RESCUE targets such aspects in the form, Reliability (functional safety, ageing, soft errors), Security (tamper-resistance, PUF technology, intelligent security) and Quality (novel fault models, functional test, FMEA/FMECA, verification/debug) spanning the entire hardware software system stack. The demonstrator is brought together by a group of PhD students under the banner of H2020-MSCA-ITN RESCUE European Union project. More information ...
UB10.3	RETINE: A PROGRAMMABLE 3D STACKED VISION CHIP ENABLING LOW LATENCY IMAGE ANALYSIS Authors: Stéphane Chevobbe¹, Maria Lepecq¹ and Laurent Millet² ¹CEA LIST, FR; ²CEA-Leti, FR Abstract We have developed and fabricated a 3D stacked imager called RETINE composed with 2 layers based on the replication of a programmable 3D tile in a matrix manner providing a highly parallel programmable architecture. This tile is composed by a 16x16 BSI binned pixels array with associated readout and 16 column ADC on the first layer coupled to an efficient SIMD processor of 16 PE on the second layer. The prototype of RETINE achieves high video rates, from 5500 fps in binned mode to 340 fps in full resolution mode. It operates at 80 MHz with 720 mW power consumption leading to 85 GOPS/W power efficiency. To highlight the capabilities of the RETINE chip we have developed a demonstration platform with an electronic board embedding a RETINE chip that films rotating disks. Three scenarii are available: high speed image capture, slow motion and composed image capture with parallel processing during acquisition. More information ...
UB10.4	DL PUF ENAU: DEEP LEARNING BASED PHYSICALLY UNCLONABLE FUNCTION ENROLLMENT AND AUTHENNTICATION Authors: Amir Alipour¹, David Hely², Vincent Beroulle² and Giorgio Di Natale³ ¹Grenoble INP / LCIS, FR; ²Grenoble INP, FR; ³CNRS / Grenoble INP / TIMA, FR Abstract Physically Unclonable Functions (PUFs) have been addressed nowadays as a potential solution to improve the security in authentication and encryption process in Cyber Physical Systems. The research on PUF is actively growing due to its potential of being secure, easily implementable and expandable, using considerably less energy. To use PUF in common, the low level device Hardware Variation is captured per unit for device enrollment into a format called Challenge-Response Pair (CRP), and recaptured after device is deployed, and compared with the original for authentication. These enrollment + comparison functions can vary and be more data demanding for applications that demand robustness, and resilience to noise. In this demonstration, our aim is to show the potential of using Deep Learning for enrollment and authentication of PUF CRPs. Most importantly, during this demonstration, we will show how this method can save time and storage compared to other classical methods. More information ...
UB10.5	LAGARTO: FIRST SILICON RISC-V ACADEMIC PROCESSOR DEVELOPED IN SPAIN Authors: Guillem Cabo Pitarch¹, Cristobal Ramirez Lazo¹, Julian Pavon Rivera¹, Vatistas Kostalabros¹, Carlos Rojas Morales¹, Miquel Moreto¹, Jaume Abella¹, Francisco J. Cazorla¹, Adrian Cristal¹, Roger Figueras¹, Alberto Gonzalez¹, Carles Hernandez¹, Cesar Hernandez², Neiel Leyva², Joan Marimon¹, Ricardo Martinez³, Jonnatan Mendoza¹, Francesc Moll⁴, Marco Antonio Ramirez², Carlos Rojas¹, Antonio Rubio⁴, Abraham Ruiz¹, Nehir Sonmez¹, Lluis Teres³, Osman Unsal⁵, Mateo Valero¹, Ivan Vargas¹ and Luis Villa² ¹BSC / UPC, ES; ²CIC-IPN, MX; ³IMB-CNM (CSIC), ES; ⁴UPC, ES; ⁵BSC, ES Abstract Open hardware is a possibility that has emerged in recent years and has the potential to be as disruptive as Linux was once, an open source software paradigm. If Linux managed to lessen the dependence of users in large companies providing software and software applications, it is envisioned that hardware based on ISAs open source can do the same in their own field. In the Lagarto tapeout four research institutions were involved: Centro de Investigación en Computación of the Mexican IPN, Centro Nacional de Microelectrónica of the CSIC, Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC). As a result, many bachelor, master and PhD students had the chance to achieve real-world experience with ASIC design and achieve a functional SoC. In the booth, you will find a live demo of the first ASIC and prototypes running on FPGA of the next versions of the SoC and core. More information ...
UB10.6	SRSN: SECURE RECONFIGURABLE TEST NETWORK Authors: Vincent Reynaud¹, Emanuele Valea², Paolo Maistri¹, Regis Leveugle¹, Marie-Lise Flottes², Sophie Dupuis², Bruno Rouzeyre² and Giorgio Di Natale¹ ¹TIMA Laboratory, FR; ²LIRMM, FR Abstract The critical importance of testability for electronic devices led to the development of IEEE test standards. These methods, if not protected, offer a security backdoor to attackers. This demonstrator illustrates a state-of-the-art solution that prevents unauthorized usage of the test infrastructure based on the IEEE 1687 standard and implemented on an FPGA target. More information ...
UB10.7	DEEPSENSE-FPGA: FPGA ACCELERATION OF A MULTIMODAL NEURAL NETWORK Authors: Mehdi Trabelsi Ajili and Yuko Hara-Azumi, Tokyo Institute of Technology, JP Abstract Currently, Internet of Things and Deep Learning (DL) are merging into one domain and creating outstanding technologies for various classification tasks. Such technologies require complex DL networks that are mainly targeting powerful platforms with rich computing resources like servers. Therefore, for resource-constrained embedded systems, new challenges of size, performance and power consumption have to be considered, particularly when edge devices handle multimodal data, i.e., different types of real-time sensing data (voice, video, text, etc.). Our ongoing project is focused on DeepSense, a multimodal DL framework combining Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to process time-series data, such as accelerometer and gyroscope to detect human activity. We aim at accelerating DeepSense by FPGA (Xilinx Zynq) in a hardware-software co-design manner. Our demo will show the latest achievements through latency and power consumption evaluations. More information ...
UB10.8	GENERATING ASYNCHRONOUS CIRCUITS FROM CATAPULT Authors: Yoan Decoudu¹, Jean Simatic², Katell Morin-Allory³ and Laurent Fesquet³ ¹University Grenoble Alpes, FR; ²HawAI.Tech, FR; ³Université Grenoble Alpes, FR Abstract In order to spread asynchronous circuit design to a large community of designers, High-Level Synthesis (HLS) is probably a good choice because it requires limited design technical skills. HLS usually provides an RTL description, which includes a data-path and a control-path. The desynchronization process is only applied to the control-path, which is a Finite State Machine (FSM). This method is sufficient to make asynchronous the circuit. Indeed, data are processed step by step in the pipeline stages, thanks to the desynchronized FSM. Thus, the data-path computation time is no longer related to the clock period but rather to the average time for processing data into the pipeline. This tends to improve speed when the pipeline stages are not well-balanced. Moreover, our approach helps to quickly designing data-driven circuits while maintaining a reasonable cost, a similar area and a short time-to-market. More information ...
UB10.9	PA-HLS: HIGH-LEVEL ANNOTATION OF ROUTING CONGESTION FOR XILINX VIVADO HLS DESIGNS Authors: Osama Bin Tariq¹, Junnan Shan¹, Luciano Lavagno¹, Georgios Floros², Mihai Teodor Lazarescu¹, Christos Sotiriou² and Mario Roberto Casu¹ ¹Politecnico di Torino, IT; ²University of Thessaly, GR Abstract We will demo a novel high-level backannotation flow that reports routing congestion issues at the C++ source level by analyzing reports from FPGA physical design (Xilinx Vivado) and internal debugging files of the Vivado HLS tool. The flow annotates the C++ source code, identifying likely causes of congestion, e.g., on-chip memories or the DSP units. These shared resources often cause routing problems on FPGAs because they cannot be duplicated by physical design. We demonstrate on realistic large designs how the information provided by our flow can be used to both identify congestion issues at the C++ source level and solve them using HLS directives. The main demo steps are: 1-Extraction of the source-level debugging information from the Vivado HLS database 2-Generation of a list of net names involved in congestion areas and of their relative significance from the Vivado post global-routing database 3-Visualization of the C++ code lines that contribute most to congestion More information ...
UB10.10	ATECES: AUTOMATED TESTING THE ENERGY CONSUMPTION OF EMBEDDED SYSTEMS Author: Eduard Enoiu, Mälardalen University, SE Abstract The demostrator will focus on automatically generating test suites by selecting test cases using random test generation and mutation testing is a solution for improving the efficiency and effectiveness of testing. Specifically, we generate and select test cases based on the concept of energy-aware mutants, small syntactic modifications in the system architecture, intended to mimic real energy faults. Test cases that can distinguish a certain behavior from its mutations are sensitive to changes, and hence considered to be good at detecting faults. We applied this method on a brake by wire system and our results suggest that an approach that selects test cases showing diverse energy consumption can increase the fault detection ability. This kind of results should motivate both academia and industry to investigate the use of automatic test generation for energy consumption. More information ...
14:30	End of session

Label

Presentation Title
Authors

UB10.1

TAPASCO: THE OPEN-SOURCE TASK-PARALLEL SYSTEM COMPOSER FRAMEWORK
Authors:
Carsten Heinz, Lukas Sommer, Lukas Weber, Jaco Hofmann and Andreas Koch, TU Darmstadt, DE
Abstract
Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators.
More information ...

UB10.2

RESCUED: A RESCUE DEMONSTRATOR FOR INTERDEPENDENT ASPECTS OF RELIABILITY, SECURITY AND QUALITY TOWARDS A COMPLETE EDA FLOW
Authors:
Nevin George¹, Guilherme Cardoso Medeiros², Junchao Chen³, Josie Esteban Rodriguez Condia⁴, Thomas Lange⁵, Aleksa Damljanovic⁴, Raphael Segabinazzi Ferreira¹, Aneesh Balakrishnan⁵, Xinhui Lai⁶, Shayesteh Masoumian⁷, Dmytro Petryk³, Troya Cagil Koylu², Felipe Augusto da Silva⁸, Ahmet Cagri Bagbaba⁸, Cemil Cem Gürsoy⁶, Said Hamdioui², Mottaqiallah Taouil², Milos Krstic³, Peter Langendoerfer³, Zoya Dyka³, Marcelo Brandalero¹, Michael Hübner¹, Jörg Nolte¹, Heinrich Theodor Vierhaus¹, Matteo Sonza Reorda⁴, Giovanni Squillero⁴, Luca Sterpone⁴, Jaan Raik⁶, Dan Alexandrescu⁵, Maximilien Glorieux⁵, Georgios Selimis⁷, Geert-Jan Schrijen⁷, Anton Klotz⁸, Christian Sauer⁸ and Maksim Jenihhin⁶
¹Brandenburg University of Technology Cottbus-Senftenberg, DE; ²TU Delft, NL; ³Leibniz-Institut für innovative Mikroelektronik, DE; ⁴Politecnico di Torino, IT; ⁵IROC Technologies, FR; ⁶Tallinn University of Technology, EE; ⁷Intrinsic ID, NL; ⁸Cadence Design Systems GmbH, DE
Abstract
The demonstrator highlights the various interdependent aspects of Reliability, Security and Quality in nanoelectronics system design within an EDA toolset and a processor architecture setup. The compelling need of attention towards these three aspects of nanoelectronic systems have been ever more pronounced over extreme miniaturization of technologies. Further, such systems have exploded in numbers with IoT devices, heavy and analogous interaction with the external physical world, complex safety-critical applications, and Artificial intelligence applications. RESCUE targets such aspects in the form, Reliability (functional safety, ageing, soft errors), Security (tamper-resistance, PUF technology, intelligent security) and Quality (novel fault models, functional test, FMEA/FMECA, verification/debug) spanning the entire hardware software system stack. The demonstrator is brought together by a group of PhD students under the banner of H2020-MSCA-ITN RESCUE European Union project.
More information ...

UB10.3

RETINE: A PROGRAMMABLE 3D STACKED VISION CHIP ENABLING LOW LATENCY IMAGE ANALYSIS
Authors:
Stéphane Chevobbe¹, Maria Lepecq¹ and Laurent Millet²
¹CEA LIST, FR; ²CEA-Leti, FR
Abstract
We have developed and fabricated a 3D stacked imager called RETINE composed with 2 layers based on the replication of a programmable 3D tile in a matrix manner providing a highly parallel programmable architecture. This tile is composed by a 16x16 BSI binned pixels array with associated readout and 16 column ADC on the first layer coupled to an efficient SIMD processor of 16 PE on the second layer. The prototype of RETINE achieves high video rates, from 5500 fps in binned mode to 340 fps in full resolution mode. It operates at 80 MHz with 720 mW power consumption leading to 85 GOPS/W power efficiency. To highlight the capabilities of the RETINE chip we have developed a demonstration platform with an electronic board embedding a RETINE chip that films rotating disks. Three scenarii are available: high speed image capture, slow motion and composed image capture with parallel processing during acquisition.
More information ...

UB10.4

DL PUF ENAU: DEEP LEARNING BASED PHYSICALLY UNCLONABLE FUNCTION ENROLLMENT AND AUTHENNTICATION
Authors:
Amir Alipour¹, David Hely², Vincent Beroulle² and Giorgio Di Natale³
¹Grenoble INP / LCIS, FR; ²Grenoble INP, FR; ³CNRS / Grenoble INP / TIMA, FR
Abstract
Physically Unclonable Functions (PUFs) have been addressed nowadays as a potential solution to improve the security in authentication and encryption process in Cyber Physical Systems. The research on PUF is actively growing due to its potential of being secure, easily implementable and expandable, using considerably less energy. To use PUF in common, the low level device Hardware Variation is captured per unit for device enrollment into a format called Challenge-Response Pair (CRP), and recaptured after device is deployed, and compared with the original for authentication. These enrollment + comparison functions can vary and be more data demanding for applications that demand robustness, and resilience to noise. In this demonstration, our aim is to show the potential of using Deep Learning for enrollment and authentication of PUF CRPs. Most importantly, during this demonstration, we will show how this method can save time and storage compared to other classical methods.
More information ...

UB10.5

LAGARTO: FIRST SILICON RISC-V ACADEMIC PROCESSOR DEVELOPED IN SPAIN
Authors:
Guillem Cabo Pitarch¹, Cristobal Ramirez Lazo¹, Julian Pavon Rivera¹, Vatistas Kostalabros¹, Carlos Rojas Morales¹, Miquel Moreto¹, Jaume Abella¹, Francisco J. Cazorla¹, Adrian Cristal¹, Roger Figueras¹, Alberto Gonzalez¹, Carles Hernandez¹, Cesar Hernandez², Neiel Leyva², Joan Marimon¹, Ricardo Martinez³, Jonnatan Mendoza¹, Francesc Moll⁴, Marco Antonio Ramirez², Carlos Rojas¹, Antonio Rubio⁴, Abraham Ruiz¹, Nehir Sonmez¹, Lluis Teres³, Osman Unsal⁵, Mateo Valero¹, Ivan Vargas¹ and Luis Villa²
¹BSC / UPC, ES; ²CIC-IPN, MX; ³IMB-CNM (CSIC), ES; ⁴UPC, ES; ⁵BSC, ES
Abstract
Open hardware is a possibility that has emerged in recent years and has the potential to be as disruptive as Linux was once, an open source software paradigm. If Linux managed to lessen the dependence of users in large companies providing software and software applications, it is envisioned that hardware based on ISAs open source can do the same in their own field. In the Lagarto tapeout four research institutions were involved: Centro de Investigación en Computación of the Mexican IPN, Centro Nacional de Microelectrónica of the CSIC, Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC). As a result, many bachelor, master and PhD students had the chance to achieve real-world experience with ASIC design and achieve a functional SoC. In the booth, you will find a live demo of the first ASIC and prototypes running on FPGA of the next versions of the SoC and core.
More information ...

UB10.6

SRSN: SECURE RECONFIGURABLE TEST NETWORK
Authors:
Vincent Reynaud¹, Emanuele Valea², Paolo Maistri¹, Regis Leveugle¹, Marie-Lise Flottes², Sophie Dupuis², Bruno Rouzeyre² and Giorgio Di Natale¹
¹TIMA Laboratory, FR; ²LIRMM, FR
Abstract
The critical importance of testability for electronic devices led to the development of IEEE test standards. These methods, if not protected, offer a security backdoor to attackers. This demonstrator illustrates a state-of-the-art solution that prevents unauthorized usage of the test infrastructure based on the IEEE 1687 standard and implemented on an FPGA target.
More information ...

UB10.7

DEEPSENSE-FPGA: FPGA ACCELERATION OF A MULTIMODAL NEURAL NETWORK
Authors:
Mehdi Trabelsi Ajili and Yuko Hara-Azumi, Tokyo Institute of Technology, JP
Abstract
Currently, Internet of Things and Deep Learning (DL) are merging into one domain and creating outstanding technologies for various classification tasks. Such technologies require complex DL networks that are mainly targeting powerful platforms with rich computing resources like servers. Therefore, for resource-constrained embedded systems, new challenges of size, performance and power consumption have to be considered, particularly when edge devices handle multimodal data, i.e., different types of real-time sensing data (voice, video, text, etc.). Our ongoing project is focused on DeepSense, a multimodal DL framework combining Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to process time-series data, such as accelerometer and gyroscope to detect human activity. We aim at accelerating DeepSense by FPGA (Xilinx Zynq) in a hardware-software co-design manner. Our demo will show the latest achievements through latency and power consumption evaluations.
More information ...

UB10.8

GENERATING ASYNCHRONOUS CIRCUITS FROM CATAPULT
Authors:
Yoan Decoudu¹, Jean Simatic², Katell Morin-Allory³ and Laurent Fesquet³
¹University Grenoble Alpes, FR; ²HawAI.Tech, FR; ³Université Grenoble Alpes, FR
Abstract
In order to spread asynchronous circuit design to a large community of designers, High-Level Synthesis (HLS) is probably a good choice because it requires limited design technical skills. HLS usually provides an RTL description, which includes a data-path and a control-path. The desynchronization process is only applied to the control-path, which is a Finite State Machine (FSM). This method is sufficient to make asynchronous the circuit. Indeed, data are processed step by step in the pipeline stages, thanks to the desynchronized FSM. Thus, the data-path computation time is no longer related to the clock period but rather to the average time for processing data into the pipeline. This tends to improve speed when the pipeline stages are not well-balanced. Moreover, our approach helps to quickly designing data-driven circuits while maintaining a reasonable cost, a similar area and a short time-to-market.
More information ...

UB10.9

PA-HLS: HIGH-LEVEL ANNOTATION OF ROUTING CONGESTION FOR XILINX VIVADO HLS DESIGNS
Authors:
Osama Bin Tariq¹, Junnan Shan¹, Luciano Lavagno¹, Georgios Floros², Mihai Teodor Lazarescu¹, Christos Sotiriou² and Mario Roberto Casu¹
¹Politecnico di Torino, IT; ²University of Thessaly, GR
Abstract
We will demo a novel high-level backannotation flow that reports routing congestion issues at the C++ source level by analyzing reports from FPGA physical design (Xilinx Vivado) and internal debugging files of the Vivado HLS tool. The flow annotates the C++ source code, identifying likely causes of congestion, e.g., on-chip memories or the DSP units. These shared resources often cause routing problems on FPGAs because they cannot be duplicated by physical design. We demonstrate on realistic large designs how the information provided by our flow can be used to both identify congestion issues at the C++ source level and solve them using HLS directives. The main demo steps are: 1-Extraction of the source-level debugging information from the Vivado HLS database 2-Generation of a list of net names involved in congestion areas and of their relative significance from the Vivado post global-routing database 3-Visualization of the C++ code lines that contribute most to congestion
More information ...

UB10.10

ATECES: AUTOMATED TESTING THE ENERGY CONSUMPTION OF EMBEDDED SYSTEMS
Author:
Eduard Enoiu, Mälardalen University, SE
Abstract
The demostrator will focus on automatically generating test suites by selecting test cases using random test generation and mutation testing is a solution for improving the efficiency and effectiveness of testing. Specifically, we generate and select test cases based on the concept of energy-aware mutants, small syntactic modifications in the system architecture, intended to mimic real energy faults. Test cases that can distinguish a certain behavior from its mutations are sensitive to changes, and hence considered to be good at detecting faults. We applied this method on a brake by wire system and our results suggest that an approach that selects test cases showing diverse energy consumption can increase the fault detection ability. This kind of results should motivate both academia and industry to investigate the use of automatic test generation for energy consumption.
More information ...

14:30

End of session