UB05 Session 5

Label	Presentation Title Authors
UB05.1	TAPASCO: THE OPEN-SOURCE TASK-PARALLEL SYSTEM COMPOSER FRAMEWORK Authors: Carsten Heinz, Lukas Sommer, Lukas Weber, Jaco Hofmann and Andreas Koch, TU Darmstadt, DE Abstract Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators. More information ...
UB05.2	ELSA: EIGENVALUE BASED HYBRID LINEAR SYSTEM ABSTRACTION: BEHAVIORAL MODELING OF TRANSISTOR-LEVEL CIRCUITS USING AUTOMATIC ABSTRACTION TO HYBRID AUTOMATA Authors: Ahmad Tarraf and Lars Hedrich, University of Frankfurt, DE Abstract Model abstraction of transistor-level circuits, while preserving an accurate behavior, is still an open problem. In this demo an approach is presented that automatically generates a hybrid automaton (HA) with linear states from an existing circuit netlist. The approach starts with a netlist at transistor level with full SPICE accuracy and ends at the system level description of the circuit in matlab or in Verilog-A. The resulting hybrid automaton exhibits linear behavior as well as the technology dependent nonlinear e.g. limiting behavior. The accuracy and speed-up of the Verilog-A generated models is evaluated based on several transistor level circuit abstractions of simple operational amplifiers up to a complex filters. Moreover, we verify the equivalence between the generated model and the original circuit. For the generated models in matlab syntax, a reachability analysis is performed using the reachability tool cora. More information ...
UB05.3	EUCLID-NIR GPU: AN ON-BOARD PROCESSING GPU-ACCELERATED SPACE CASE STUDY DEMONSTRATOR Authors: Ivan Rodriguez and Leonidas Kosmidis, BSC / UPC, ES Abstract Embedded Graphics Processing Units (GPUs) are very attractive candidates for on-board payload processing of future space systems, thanks to their high performance and low-power consumption. Although there is significant interest from both academia and industry, there is no open and publicly available case study showing their capabilities, yet. In this master thesis project, which was performed within the GPU4S (GPU for Space) ESA-funded project, we have parallelised and ported the Euclid NIR (Near Infrared) image processing algorithm used in the European Space Agency's (ESA) mission to be launched in 2022, to an automotive GPU platform, the NVIDIA Xavier. In the demo we will present in real-time its significantly higher performance achieved compared to the original sequential implementation. In addition, visitors will have the opportunity to examine the images on which the algorithm operates, as well as to inspect the algorithm parallelisation through profiling and code inspection. More information ...
UB05.4	BCFELEAM: BACKFLOW: BACKWARD EDGE CONTROL FLOW ENFORCEMENT FOR LOW END ARM REAL-TIME SYSTEMS Authors: Bresch Cyril¹, David Héy¹, Roman Lysecky² and Stephanie Chollet¹ ¹LCIS, FR; ²University of Arizona, US Abstract The C programming language is one of the most popular languages in embedded system programming. Indeed, C is efficient, lightweight and can easily meet high performance and deterministic real-time constraints. However, these assets come at a certain price. Indeed, C does not provide extra features for memory safety. As a result, attackers can easily exploit spatial memory vulnerabilities to hijack the execution flow of an application. The demonstration features a real-time connected infusion pump vulnerable to memory attacks. First, we showcase an exploit that remotely takes control of the pump. Then, we demonstrate the effectiveness of BackFlow, an LLVM-based compiler extension that enforces control-flow integrity in low-end ARM embedded systems. More information ...
UB05.5	UWB ACKATCK: HIJACKING DEVICES IN UWB INDOOR POSITIONING SYSTEMS Authors: Baptiste Pestourie, Vincent Beroulle and Nicolas Fourty, Université Grenoble Alpes, FR Abstract Various radio-based Indoor Positioning Systems (IPS) have been proposed during the last decade as solutions to GPS inconsistency in indoor environments. Among the different radio technologies proposed for this purpose, 802.15.4 Ultra-Wideband (UWB) is by far the most performant, reaching up to 10 cm accuracy with 1000 Hz refresh rates. As a consequence, UWB is a popular technology for applications such as assets tracking in industrial environments or robots/drones indoor navigation. However, some security flaws in 802.15.4 standard expose UWB positioning to attacks. In this demonstration, we show how an attacker can exploit a vulnerability on 802.15.4 acknowledgment frames to hijack a device in a UWB positioning system. We demonsrate that using simply one cheap UWB chip, the attacker can take control over the positioning system and generate fake trajectories from a laptop. The results are observed in real-time in the 3D engine monitoring the positioning system. More information ...
UB05.6	DESIGN AUTOMATION FOR EXTENDED BURST-MODE AUTOMATA IN WORKCRAFT Authors: Alex Chan, Alex Yakovlev, Danil Sokolov and Victor Khomenko, Newcastle University, GB Abstract Asynchronous circuits are known to have high performance, robustness and low power consumption, which are particularly beneficial for the area of so-called "little digital" controllers where low latency is crucial. However, asynchronous design is not widely adopted by industry, partially due to the steep learning curve inherent in the complexity of formal specifications, such as Signal Transition Graphs (STGs). In this demo, we promote a class of the Finite State Machine (FSM) model called Extended Burst-Mode (XBM) automata as a practical way to specify many asynchronous circuits. The XBM specification has been automated in the Workcraft toolkit (https://workcraft.org) with elaborate support for state encoding, conditionals and "don't care" signals. Formal verification and logic synthesis of the XBM automata is implemented via conversion to the established STG model, reusing existing methods and CAD tools. Tool support for the XBM flow will be demonstrated using several case studies. More information ...
UB05.7	AT-SPEED DFT ARCHITECTURE FOR BUNDLED-DATA CIRCUITS Authors: Ricardo Aquino Guazzelli and Laurent Fesquet, Université Grenoble Alpes, FR Abstract At-speed testing for asynchronous circuits is still an open concern in the literature. Due to its timing constraints between control and data paths, Design for Testability (DfT) methodologies must test both control and data paths at the same time in order to guarantee the circuit correctness. As Process Voltage Temperature (PVT) variations significantly impact circuit design in newer CMOS technologies and low-power techniques such as voltage scaling, the timing constraints between control and data paths must be tested after fabrication not only under nominal conditions but through a range of operating conditions. This work explores an at-speed testing approach for bundled data circuits, targetting the micropipeline template. The main target of this test approach focuses on whether the sized delay lines in control paths respect the local timing assumptions of the data paths. More information ...
UB05.8	CATANIS: CAD TOOL FOR AUTOMATIC NETWORK SYNTHESIS Authors: Davide Quaglia, Enrico Fraccaroli, Filippo Nevi and Sohail Mushtaq, Università di Verona, IT Abstract The proliferation of communication technologies for embedded systems opened the way for new applications, e.g., Smart Cities and Industry 4.0. In such applications hundreds or thousands of smart devices interact together through different types of channels and protocols. This increasing communication complexity forces computer-aided design methodologies to scale up from embedded systems in isolation to the global inter-connected system. Network Synthesis is the methodology to optimally allocate functionality onto network nodes and define the communication infrastructure among them. This booth will demonstrate the functionality of a graphic tool for automatic network synthesis developed by the Computer Science Department of University of Verona. It allows to graphically specify the communication requirements of a smart space (e.g., its map can be considered) in terms of sensing and computation tasks together with a library of node types and communication protocols to be used. More information ...
UB05.9	PARALLEL ALGORITHM FOR CNN INFERENCE AND ITS AUTOMATIC SYNTHESIS Authors: Takashi Matsumoto, Yukio Miyasaka, Xinpei Zhang and Masahiro Fujita, University of Tokyo, JP Abstract Recently, Convolutional Neural Network (CNN) has surpassed conventional methods in the field of image processing. This demonstration shows a new algorithm to calculate CNN inference using processing elements arranged and connected based on the topology of the convolution. They are connected in mesh and calculate CNN inference in a systolic way. The algorithm performs the convolution of all elements with the same output feature in parallel. We demonstrate a method to automatically synthesize an algorithm, which simultaneously performs the convolution and the communication of pixels for the computation of the next layer. We show with several sizes of input layers, kernels, and strides and confirmed that the correct algorithms were synthesized. The synthesis method is extended to the sparse kernel. The synthesized algorithm requires fewer cycles than the original algorithm. There were the more chances to reduce the number of cycles with the sparser kernel. More information ...
UB05.10	FU: LOW POWER AND ACCURACY CONFIGURABLE APPROXIMATE ARITHMETIC UNITS Authors: Tomoaki Ukezono and Toshinori Sato, Fukuoka University, JP Abstract In this demonstration, we will introduce the approximate arithmetic units such as adder, multiplier, and MAC that are being studied in our system-architecture laboratory. Our approximate arithmetic units can reduce delay and power consumption at the expense of accuracy. Our approximate arithmetic units are intended to be applied to IoT edge devices that can process images, and are suitable for battery-driven and low-cost devices. The biggest feature of our approximate arithmetic units is that the circuit is configured so that the accuracy is dynamically variable, and the trade-off relationship between accuracy and power can be selected according to the usage status of the device. In this demonstration, we show the power consumption according to various accuracy-requirements based on actual data and claim the practicality of the proposed arithmetic units. More information ...
12:00	End of session

Label

Presentation Title
Authors

UB05.1

TAPASCO: THE OPEN-SOURCE TASK-PARALLEL SYSTEM COMPOSER FRAMEWORK
Authors:
Carsten Heinz, Lukas Sommer, Lukas Weber, Jaco Hofmann and Andreas Koch, TU Darmstadt, DE
Abstract
Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators.
More information ...

UB05.2

ELSA: EIGENVALUE BASED HYBRID LINEAR SYSTEM ABSTRACTION: BEHAVIORAL MODELING OF TRANSISTOR-LEVEL CIRCUITS USING AUTOMATIC ABSTRACTION TO HYBRID AUTOMATA
Authors:
Ahmad Tarraf and Lars Hedrich, University of Frankfurt, DE
Abstract
Model abstraction of transistor-level circuits, while preserving an accurate behavior, is still an open problem. In this demo an approach is presented that automatically generates a hybrid automaton (HA) with linear states from an existing circuit netlist. The approach starts with a netlist at transistor level with full SPICE accuracy and ends at the system level description of the circuit in matlab or in Verilog-A. The resulting hybrid automaton exhibits linear behavior as well as the technology dependent nonlinear e.g. limiting behavior. The accuracy and speed-up of the Verilog-A generated models is evaluated based on several transistor level circuit abstractions of simple operational amplifiers up to a complex filters. Moreover, we verify the equivalence between the generated model and the original circuit. For the generated models in matlab syntax, a reachability analysis is performed using the reachability tool cora.
More information ...

UB05.3

EUCLID-NIR GPU: AN ON-BOARD PROCESSING GPU-ACCELERATED SPACE CASE STUDY DEMONSTRATOR
Authors:
Ivan Rodriguez and Leonidas Kosmidis, BSC / UPC, ES
Abstract
Embedded Graphics Processing Units (GPUs) are very attractive candidates for on-board payload processing of future space systems, thanks to their high performance and low-power consumption. Although there is significant interest from both academia and industry, there is no open and publicly available case study showing their capabilities, yet. In this master thesis project, which was performed within the GPU4S (GPU for Space) ESA-funded project, we have parallelised and ported the Euclid NIR (Near Infrared) image processing algorithm used in the European Space Agency's (ESA) mission to be launched in 2022, to an automotive GPU platform, the NVIDIA Xavier. In the demo we will present in real-time its significantly higher performance achieved compared to the original sequential implementation. In addition, visitors will have the opportunity to examine the images on which the algorithm operates, as well as to inspect the algorithm parallelisation through profiling and code inspection.
More information ...

UB05.4

BCFELEAM: BACKFLOW: BACKWARD EDGE CONTROL FLOW ENFORCEMENT FOR LOW END ARM REAL-TIME SYSTEMS
Authors:
Bresch Cyril¹, David Héy¹, Roman Lysecky² and Stephanie Chollet¹
¹LCIS, FR; ²University of Arizona, US
Abstract
The C programming language is one of the most popular languages in embedded system programming. Indeed, C is efficient, lightweight and can easily meet high performance and deterministic real-time constraints. However, these assets come at a certain price. Indeed, C does not provide extra features for memory safety. As a result, attackers can easily exploit spatial memory vulnerabilities to hijack the execution flow of an application. The demonstration features a real-time connected infusion pump vulnerable to memory attacks. First, we showcase an exploit that remotely takes control of the pump. Then, we demonstrate the effectiveness of BackFlow, an LLVM-based compiler extension that enforces control-flow integrity in low-end ARM embedded systems.
More information ...

UB05.5

UWB ACKATCK: HIJACKING DEVICES IN UWB INDOOR POSITIONING SYSTEMS
Authors:
Baptiste Pestourie, Vincent Beroulle and Nicolas Fourty, Université Grenoble Alpes, FR
Abstract
Various radio-based Indoor Positioning Systems (IPS) have been proposed during the last decade as solutions to GPS inconsistency in indoor environments. Among the different radio technologies proposed for this purpose, 802.15.4 Ultra-Wideband (UWB) is by far the most performant, reaching up to 10 cm accuracy with 1000 Hz refresh rates. As a consequence, UWB is a popular technology for applications such as assets tracking in industrial environments or robots/drones indoor navigation. However, some security flaws in 802.15.4 standard expose UWB positioning to attacks. In this demonstration, we show how an attacker can exploit a vulnerability on 802.15.4 acknowledgment frames to hijack a device in a UWB positioning system. We demonsrate that using simply one cheap UWB chip, the attacker can take control over the positioning system and generate fake trajectories from a laptop. The results are observed in real-time in the 3D engine monitoring the positioning system.
More information ...

UB05.6

DESIGN AUTOMATION FOR EXTENDED BURST-MODE AUTOMATA IN WORKCRAFT
Authors:
Alex Chan, Alex Yakovlev, Danil Sokolov and Victor Khomenko, Newcastle University, GB
Abstract
Asynchronous circuits are known to have high performance, robustness and low power consumption, which are particularly beneficial for the area of so-called "little digital" controllers where low latency is crucial. However, asynchronous design is not widely adopted by industry, partially due to the steep learning curve inherent in the complexity of formal specifications, such as Signal Transition Graphs (STGs). In this demo, we promote a class of the Finite State Machine (FSM) model called Extended Burst-Mode (XBM) automata as a practical way to specify many asynchronous circuits. The XBM specification has been automated in the Workcraft toolkit (https://workcraft.org) with elaborate support for state encoding, conditionals and "don't care" signals. Formal verification and logic synthesis of the XBM automata is implemented via conversion to the established STG model, reusing existing methods and CAD tools. Tool support for the XBM flow will be demonstrated using several case studies.
More information ...

UB05.7

AT-SPEED DFT ARCHITECTURE FOR BUNDLED-DATA CIRCUITS
Authors:
Ricardo Aquino Guazzelli and Laurent Fesquet, Université Grenoble Alpes, FR
Abstract
At-speed testing for asynchronous circuits is still an open concern in the literature. Due to its timing constraints between control and data paths, Design for Testability (DfT) methodologies must test both control and data paths at the same time in order to guarantee the circuit correctness. As Process Voltage Temperature (PVT) variations significantly impact circuit design in newer CMOS technologies and low-power techniques such as voltage scaling, the timing constraints between control and data paths must be tested after fabrication not only under nominal conditions but through a range of operating conditions. This work explores an at-speed testing approach for bundled data circuits, targetting the micropipeline template. The main target of this test approach focuses on whether the sized delay lines in control paths respect the local timing assumptions of the data paths.
More information ...

UB05.8

CATANIS: CAD TOOL FOR AUTOMATIC NETWORK SYNTHESIS
Authors:
Davide Quaglia, Enrico Fraccaroli, Filippo Nevi and Sohail Mushtaq, Università di Verona, IT
Abstract
The proliferation of communication technologies for embedded systems opened the way for new applications, e.g., Smart Cities and Industry 4.0. In such applications hundreds or thousands of smart devices interact together through different types of channels and protocols. This increasing communication complexity forces computer-aided design methodologies to scale up from embedded systems in isolation to the global inter-connected system. Network Synthesis is the methodology to optimally allocate functionality onto network nodes and define the communication infrastructure among them. This booth will demonstrate the functionality of a graphic tool for automatic network synthesis developed by the Computer Science Department of University of Verona. It allows to graphically specify the communication requirements of a smart space (e.g., its map can be considered) in terms of sensing and computation tasks together with a library of node types and communication protocols to be used.
More information ...

UB05.9

PARALLEL ALGORITHM FOR CNN INFERENCE AND ITS AUTOMATIC SYNTHESIS
Authors:
Takashi Matsumoto, Yukio Miyasaka, Xinpei Zhang and Masahiro Fujita, University of Tokyo, JP
Abstract
Recently, Convolutional Neural Network (CNN) has surpassed conventional methods in the field of image processing. This demonstration shows a new algorithm to calculate CNN inference using processing elements arranged and connected based on the topology of the convolution. They are connected in mesh and calculate CNN inference in a systolic way. The algorithm performs the convolution of all elements with the same output feature in parallel. We demonstrate a method to automatically synthesize an algorithm, which simultaneously performs the convolution and the communication of pixels for the computation of the next layer. We show with several sizes of input layers, kernels, and strides and confirmed that the correct algorithms were synthesized. The synthesis method is extended to the sparse kernel. The synthesized algorithm requires fewer cycles than the original algorithm. There were the more chances to reduce the number of cycles with the sparser kernel.
More information ...

UB05.10

FU: LOW POWER AND ACCURACY CONFIGURABLE APPROXIMATE ARITHMETIC UNITS
Authors:
Tomoaki Ukezono and Toshinori Sato, Fukuoka University, JP
Abstract
In this demonstration, we will introduce the approximate arithmetic units such as adder, multiplier, and MAC that are being studied in our system-architecture laboratory. Our approximate arithmetic units can reduce delay and power consumption at the expense of accuracy. Our approximate arithmetic units are intended to be applied to IoT edge devices that can process images, and are suitable for battery-driven and low-cost devices. The biggest feature of our approximate arithmetic units is that the circuit is configured so that the accuracy is dynamically variable, and the trade-off relationship between accuracy and power can be selected according to the usage status of the device. In this demonstration, we show the power consumption according to various accuracy-requirements based on actual data and claim the practicality of the proposed arithmetic units.
More information ...

12:00

End of session