UB09 Session 9

Label	Presentation Title Authors
UB09.1	TAPASCO: THE OPEN-SOURCE TASK-PARALLEL SYSTEM COMPOSER FRAMEWORK Authors: Carsten Heinz, Lukas Sommer, Lukas Weber, Jaco Hofmann and Andreas Koch, TU Darmstadt, DE Abstract Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators. More information ...
UB09.2	RESCUED: A RESCUE DEMONSTRATOR FOR INTERDEPENDENT ASPECTS OF RELIABILITY, SECURITY AND QUALITY TOWARDS A COMPLETE EDA FLOW Authors: Nevin George¹, Guilherme Cardoso Medeiros², Junchao Chen³, Josie Esteban Rodriguez Condia⁴, Thomas Lange⁵, Aleksa Damljanovic⁴, Raphael Segabinazzi Ferreira¹, Aneesh Balakrishnan⁵, Xinhui Lai⁶, Shayesteh Masoumian⁷, Dmytro Petryk³, Troya Cagil Koylu², Felipe Augusto da Silva⁸, Ahmet Cagri Bagbaba⁸, Cemil Cem Gürsoy⁶, Said Hamdioui², Mottaqiallah Taouil², Milos Krstic³, Peter Langendoerfer³, Zoya Dyka³, Marcelo Brandalero¹, Michael Hübner¹, Jörg Nolte¹, Heinrich Theodor Vierhaus¹, Matteo Sonza Reorda⁴, Giovanni Squillero⁴, Luca Sterpone⁴, Jaan Raik⁶, Dan Alexandrescu⁵, Maximilien Glorieux⁵, Georgios Selimis⁷, Geert-Jan Schrijen⁷, Anton Klotz⁸, Christian Sauer⁸ and Maksim Jenihhin⁶ ¹Brandenburg University of Technology Cottbus-Senftenberg, DE; ²TU Delft, NL; ³Leibniz-Institut für innovative Mikroelektronik, DE; ⁴Politecnico di Torino, IT; ⁵IROC Technologies, FR; ⁶Tallinn University of Technology, EE; ⁷Intrinsic ID, NL; ⁸Cadence Design Systems GmbH, DE Abstract The demonstrator highlights the various interdependent aspects of Reliability, Security and Quality in nanoelectronics system design within an EDA toolset and a processor architecture setup. The compelling need of attention towards these three aspects of nanoelectronic systems have been ever more pronounced over extreme miniaturization of technologies. Further, such systems have exploded in numbers with IoT devices, heavy and analogous interaction with the external physical world, complex safety-critical applications, and Artificial intelligence applications. RESCUE targets such aspects in the form, Reliability (functional safety, ageing, soft errors), Security (tamper-resistance, PUF technology, intelligent security) and Quality (novel fault models, functional test, FMEA/FMECA, verification/debug) spanning the entire hardware software system stack. The demonstrator is brought together by a group of PhD students under the banner of H2020-MSCA-ITN RESCUE European Union project. More information ...
UB09.3	PAFUSI: PARTICLE FILTER FUSION ASIC FOR INDOOR POSITIONING Authors: Christian Schott, Marko Rößler, Daniel Froß, Marcel Putsche and Ulrich Heinkel, TU Chemnitz, DE Abstract The meaning of data acquired from IoT devices is heavily enhanced if global or local position information of their acquirement is known. Infrastructure for indoor positioning as well as the IoT device involve the need of small, energy efficient but powerful devices that provide the location awareness. We propose the PAFUSI, a hardware implementation of an UWB position estimation algorithm that fulfils these requirements. Our design fuses distance measurements to fixed points in an environment to calculate the position in 3D space and is capable of using different positioning technologies like GPS, DecaWave or Nanotron as data source simultaneously. Our design comprises of an estimator which processes the data by means of a Sequential Monte Carlo method and a microcontroller core which configures and controls the measurement unit as well as it analyses the results of the estimator. The PAFUSI is manufactured as a monolithic integrated ASIC in a multi-project wafer in UMC's 65nm process. More information ...
UB09.4	SKELETOR: AN OPEN SOURCE EDA TOOL FLOW FROM HIERARCHY SPECIFICATION TO HDL DEVELOPMENT Authors: Ivan Rodriguez, Guillem Cabo, Javier Barrera, Jeremy Giesen, Alvaro Jover and Leonidas Kosmidis, BSC / UPC, ES Abstract Large hardware design projects have high overhead for project bootstrapping, requiring significant effort for translating hardware specifications to hardware design language (HDL) files and setting up their corresponding development and verification infrastructure. Skeletor (https://github.com/jaquerinte/Skeletor) is an open source EDA tool developed as a student project at UPC/BSC, which simplifies this process, by increasing developer's productivity and reducing typing errors, while at the same time lowers the bar for entry in hardware development. Skeletor uses a C/verilog-like language for the specification of the modules in a hardware project hierarchy and their connections, which is used to generate automatically the require skeleton of source files, their development and verification testbenches and simulation scripts. Integration with KiCad schematics and support for syntax highlighting in code editors simplifies further its use. This demo is linked with workshop W05. More information ...
UB09.5	SYSTEMC-CT/DE: A SIMULATOR WITH FAST AND ACCURATE CONTINUOUS TIME AND DISCRETE EVENTS INTERACTIONS ON TOP OF SYSTEMC. Authors: Breytner Joseph Fernandez-Mesa, Liliana Andrade and Frédéric Pétrot, Université Grenoble Alpes / CNRS / TIMA Laboratory, FR Abstract We have developed a continuous time (CT) and discrete events (DE) simulator on top of SystemC. Systems that mix both domains are critical and their proper functioning must be verified. Simulation serves to achieve this goal. Our simulator implements direct CT/DE synchronization, which enables a rich set of interactions between the domains: events from the CT models are able to trigger DE processes; events from the DE models are able to modify the CT equations. DE-based interactions are, then, simulated at their precise time by the DE kernel rather than at fixed time steps. We demonstrate our simulator by executing a set of challenging examples: they either require a superdense model of time or include Zeno behavior or are highly sensitive to accuracy errors. Results show that our simulator overcomes these issues, is accurate, and improves simulation speed w.r.t. fixed time steps; all of these advantages open up new possibilities for the design of a wider set of heterogeneous systems. More information ...
UB09.6	PARALLEL ALGORITHM FOR CNN INFERENCE AND ITS AUTOMATIC SYNTHESIS Authors: Takashi Matsumoto, Yukio Miyasaka, Xinpei Zhang and Masahiro Fujita, University of Tokyo, JP Abstract Recently, Convolutional Neural Network (CNN) has surpassed conventional methods in the field of image processing. This demonstration shows a new algorithm to calculate CNN inference using processing elements arranged and connected based on the topology of the convolution. They are connected in mesh and calculate CNN inference in a systolic way. The algorithm performs the convolution of all elements with the same output feature in parallel. We demonstrate a method to automatically synthesize an algorithm, which simultaneously performs the convolution and the communication of pixels for the computation of the next layer. We show with several sizes of input layers, kernels, and strides and confirmed that the correct algorithms were synthesized. The synthesis method is extended to the sparse kernel. The synthesized algorithm requires fewer cycles than the original algorithm. There were the more chances to reduce the number of cycles with the sparser kernel. More information ...
UB09.7	EEC: ENERGY EFFICIENT COMPUTING VIA DYNAMIC VOLTAGE SCALING AND IN-NETWORK OPTICAL PROCESSING Authors: Ryosuke Matsuo¹, Jun Shiomi¹, Yutaka Masuda² and Tohru Ishihara² ¹Kyoto University, JP; ²Nagoya University, JP Abstract This poster demonstration will show results of our two research projects. The first one is on a project of energy efficient computing. In this project we developed a power management algorithm which keeps the target processor always running at the most energy efficient operating point by appropriately tuning the supply voltage and threshold voltage under a specific performance constraint. This algorithm is applicable to wide variety of processor systems including high-end processors and low-end embedded processors. We will show the results obtained with actual RISC processors designed using a 65nm technology. The second one is on a project of in-network optical computing. We show optical functional units such as parallel multipliers and optical neural networks. Several key techniques for reducing the power consumption of optical circuits will be also presented. Finally, we will show the results of optical circuit simulation, which demonstrate the light speed operation of the circuits. More information ...
UB09.8	SUBRISC+: IMPLEMENTATION AND EVALUATION OF AN EMBEDDED PROCESSOR FOR LIGHTWEIGHT IOT EHEALTH Authors: Mingyu Yang and Yuko Hara-Azumi, Tokyo Institute of Technology, JP Abstract Although the rapid growth of Internet of Things (IoT) has enabled new opportunities for eHealth devices, the further development of complex systems is severely constrained by the power and energy supply on the battery-powered embedded systems. To address this issue, this work presents a processor design called "SubRISC+" targeting lightweight IoT eHealth. SubRISC+ is a processor design to achieve low power/energy consumption through its unique and compact architecture. As an example of lightweight eHealth applications on SubRISC+, we are working on the epileptic seizure detection using the dynamic time wrapping algorithm to deploy on wearable IoT eHealth devices. Simulation results show that 22% reduction on dynamic power and 50% reduction on leakage power and core area are achieved compared to Cortex-M0. As an ongoing work, the evaluation on a fabricated chip will be done within the first half of 2020. More information ...
UB09.9	PA-HLS: HIGH-LEVEL ANNOTATION OF ROUTING CONGESTION FOR XILINX VIVADO HLS DESIGNS Authors: Osama Bin Tariq¹, Junnan Shan¹, Luciano Lavagno¹, Georgios Floros², Mihai Teodor Lazarescu¹, Christos Sotiriou² and Mario Roberto Casu¹ ¹Politecnico di Torino, IT; ²University of Thessaly, GR Abstract We will demo a novel high-level backannotation flow that reports routing congestion issues at the C++ source level by analyzing reports from FPGA physical design (Xilinx Vivado) and internal debugging files of the Vivado HLS tool. The flow annotates the C++ source code, identifying likely causes of congestion, e.g., on-chip memories or the DSP units. These shared resources often cause routing problems on FPGAs because they cannot be duplicated by physical design. We demonstrate on realistic large designs how the information provided by our flow can be used to both identify congestion issues at the C++ source level and solve them using HLS directives. The main demo steps are: 1-Extraction of the source-level debugging information from the Vivado HLS database 2-Generation of a list of net names involved in congestion areas and of their relative significance from the Vivado post global-routing database 3-Visualization of the C++ code lines that contribute most to congestion More information ...
UB09.10	FU: LOW POWER AND ACCURACY CONFIGURABLE APPROXIMATE ARITHMETIC UNITS Authors: Tomoaki Ukezono and Toshinori Sato, Fukuoka University, JP Abstract In this demonstration, we will introduce the approximate arithmetic units such as adder, multiplier, and MAC that are being studied in our system-architecture laboratory. Our approximate arithmetic units can reduce delay and power consumption at the expense of accuracy. Our approximate arithmetic units are intended to be applied to IoT edge devices that can process images, and are suitable for battery-driven and low-cost devices. The biggest feature of our approximate arithmetic units is that the circuit is configured so that the accuracy is dynamically variable, and the trade-off relationship between accuracy and power can be selected according to the usage status of the device. In this demonstration, we show the power consumption according to various accuracy-requirements based on actual data and claim the practicality of the proposed arithmetic units. More information ...
12:00	End of session

Label

Presentation Title
Authors

UB09.1

TAPASCO: THE OPEN-SOURCE TASK-PARALLEL SYSTEM COMPOSER FRAMEWORK
Authors:
Carsten Heinz, Lukas Sommer, Lukas Weber, Jaco Hofmann and Andreas Koch, TU Darmstadt, DE
Abstract
Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators.
More information ...

UB09.2

RESCUED: A RESCUE DEMONSTRATOR FOR INTERDEPENDENT ASPECTS OF RELIABILITY, SECURITY AND QUALITY TOWARDS A COMPLETE EDA FLOW
Authors:
Nevin George¹, Guilherme Cardoso Medeiros², Junchao Chen³, Josie Esteban Rodriguez Condia⁴, Thomas Lange⁵, Aleksa Damljanovic⁴, Raphael Segabinazzi Ferreira¹, Aneesh Balakrishnan⁵, Xinhui Lai⁶, Shayesteh Masoumian⁷, Dmytro Petryk³, Troya Cagil Koylu², Felipe Augusto da Silva⁸, Ahmet Cagri Bagbaba⁸, Cemil Cem Gürsoy⁶, Said Hamdioui², Mottaqiallah Taouil², Milos Krstic³, Peter Langendoerfer³, Zoya Dyka³, Marcelo Brandalero¹, Michael Hübner¹, Jörg Nolte¹, Heinrich Theodor Vierhaus¹, Matteo Sonza Reorda⁴, Giovanni Squillero⁴, Luca Sterpone⁴, Jaan Raik⁶, Dan Alexandrescu⁵, Maximilien Glorieux⁵, Georgios Selimis⁷, Geert-Jan Schrijen⁷, Anton Klotz⁸, Christian Sauer⁸ and Maksim Jenihhin⁶
¹Brandenburg University of Technology Cottbus-Senftenberg, DE; ²TU Delft, NL; ³Leibniz-Institut für innovative Mikroelektronik, DE; ⁴Politecnico di Torino, IT; ⁵IROC Technologies, FR; ⁶Tallinn University of Technology, EE; ⁷Intrinsic ID, NL; ⁸Cadence Design Systems GmbH, DE
Abstract
The demonstrator highlights the various interdependent aspects of Reliability, Security and Quality in nanoelectronics system design within an EDA toolset and a processor architecture setup. The compelling need of attention towards these three aspects of nanoelectronic systems have been ever more pronounced over extreme miniaturization of technologies. Further, such systems have exploded in numbers with IoT devices, heavy and analogous interaction with the external physical world, complex safety-critical applications, and Artificial intelligence applications. RESCUE targets such aspects in the form, Reliability (functional safety, ageing, soft errors), Security (tamper-resistance, PUF technology, intelligent security) and Quality (novel fault models, functional test, FMEA/FMECA, verification/debug) spanning the entire hardware software system stack. The demonstrator is brought together by a group of PhD students under the banner of H2020-MSCA-ITN RESCUE European Union project.
More information ...

UB09.3

PAFUSI: PARTICLE FILTER FUSION ASIC FOR INDOOR POSITIONING
Authors:
Christian Schott, Marko Rößler, Daniel Froß, Marcel Putsche and Ulrich Heinkel, TU Chemnitz, DE
Abstract
The meaning of data acquired from IoT devices is heavily enhanced if global or local position information of their acquirement is known. Infrastructure for indoor positioning as well as the IoT device involve the need of small, energy efficient but powerful devices that provide the location awareness. We propose the PAFUSI, a hardware implementation of an UWB position estimation algorithm that fulfils these requirements. Our design fuses distance measurements to fixed points in an environment to calculate the position in 3D space and is capable of using different positioning technologies like GPS, DecaWave or Nanotron as data source simultaneously. Our design comprises of an estimator which processes the data by means of a Sequential Monte Carlo method and a microcontroller core which configures and controls the measurement unit as well as it analyses the results of the estimator. The PAFUSI is manufactured as a monolithic integrated ASIC in a multi-project wafer in UMC's 65nm process.
More information ...

UB09.4

SKELETOR: AN OPEN SOURCE EDA TOOL FLOW FROM HIERARCHY SPECIFICATION TO HDL DEVELOPMENT
Authors:
Ivan Rodriguez, Guillem Cabo, Javier Barrera, Jeremy Giesen, Alvaro Jover and Leonidas Kosmidis, BSC / UPC, ES
Abstract
Large hardware design projects have high overhead for project bootstrapping, requiring significant effort for translating hardware specifications to hardware design language (HDL) files and setting up their corresponding development and verification infrastructure. Skeletor (https://github.com/jaquerinte/Skeletor) is an open source EDA tool developed as a student project at UPC/BSC, which simplifies this process, by increasing developer's productivity and reducing typing errors, while at the same time lowers the bar for entry in hardware development. Skeletor uses a C/verilog-like language for the specification of the modules in a hardware project hierarchy and their connections, which is used to generate automatically the require skeleton of source files, their development and verification testbenches and simulation scripts. Integration with KiCad schematics and support for syntax highlighting in code editors simplifies further its use. This demo is linked with workshop W05.
More information ...

UB09.5

SYSTEMC-CT/DE: A SIMULATOR WITH FAST AND ACCURATE CONTINUOUS TIME AND DISCRETE EVENTS INTERACTIONS ON TOP OF SYSTEMC.
Authors:
Breytner Joseph Fernandez-Mesa, Liliana Andrade and Frédéric Pétrot, Université Grenoble Alpes / CNRS / TIMA Laboratory, FR
Abstract
We have developed a continuous time (CT) and discrete events (DE) simulator on top of SystemC. Systems that mix both domains are critical and their proper functioning must be verified. Simulation serves to achieve this goal. Our simulator implements direct CT/DE synchronization, which enables a rich set of interactions between the domains: events from the CT models are able to trigger DE processes; events from the DE models are able to modify the CT equations. DE-based interactions are, then, simulated at their precise time by the DE kernel rather than at fixed time steps. We demonstrate our simulator by executing a set of challenging examples: they either require a superdense model of time or include Zeno behavior or are highly sensitive to accuracy errors. Results show that our simulator overcomes these issues, is accurate, and improves simulation speed w.r.t. fixed time steps; all of these advantages open up new possibilities for the design of a wider set of heterogeneous systems.
More information ...

UB09.6

PARALLEL ALGORITHM FOR CNN INFERENCE AND ITS AUTOMATIC SYNTHESIS
Authors:
Takashi Matsumoto, Yukio Miyasaka, Xinpei Zhang and Masahiro Fujita, University of Tokyo, JP
Abstract
Recently, Convolutional Neural Network (CNN) has surpassed conventional methods in the field of image processing. This demonstration shows a new algorithm to calculate CNN inference using processing elements arranged and connected based on the topology of the convolution. They are connected in mesh and calculate CNN inference in a systolic way. The algorithm performs the convolution of all elements with the same output feature in parallel. We demonstrate a method to automatically synthesize an algorithm, which simultaneously performs the convolution and the communication of pixels for the computation of the next layer. We show with several sizes of input layers, kernels, and strides and confirmed that the correct algorithms were synthesized. The synthesis method is extended to the sparse kernel. The synthesized algorithm requires fewer cycles than the original algorithm. There were the more chances to reduce the number of cycles with the sparser kernel.
More information ...

UB09.7

EEC: ENERGY EFFICIENT COMPUTING VIA DYNAMIC VOLTAGE SCALING AND IN-NETWORK OPTICAL PROCESSING
Authors:
Ryosuke Matsuo¹, Jun Shiomi¹, Yutaka Masuda² and Tohru Ishihara²
¹Kyoto University, JP; ²Nagoya University, JP
Abstract
This poster demonstration will show results of our two research projects. The first one is on a project of energy efficient computing. In this project we developed a power management algorithm which keeps the target processor always running at the most energy efficient operating point by appropriately tuning the supply voltage and threshold voltage under a specific performance constraint. This algorithm is applicable to wide variety of processor systems including high-end processors and low-end embedded processors. We will show the results obtained with actual RISC processors designed using a 65nm technology. The second one is on a project of in-network optical computing. We show optical functional units such as parallel multipliers and optical neural networks. Several key techniques for reducing the power consumption of optical circuits will be also presented. Finally, we will show the results of optical circuit simulation, which demonstrate the light speed operation of the circuits.
More information ...

UB09.8

SUBRISC+: IMPLEMENTATION AND EVALUATION OF AN EMBEDDED PROCESSOR FOR LIGHTWEIGHT IOT EHEALTH
Authors:
Mingyu Yang and Yuko Hara-Azumi, Tokyo Institute of Technology, JP
Abstract
Although the rapid growth of Internet of Things (IoT) has enabled new opportunities for eHealth devices, the further development of complex systems is severely constrained by the power and energy supply on the battery-powered embedded systems. To address this issue, this work presents a processor design called "SubRISC+" targeting lightweight IoT eHealth. SubRISC+ is a processor design to achieve low power/energy consumption through its unique and compact architecture. As an example of lightweight eHealth applications on SubRISC+, we are working on the epileptic seizure detection using the dynamic time wrapping algorithm to deploy on wearable IoT eHealth devices. Simulation results show that 22% reduction on dynamic power and 50% reduction on leakage power and core area are achieved compared to Cortex-M0. As an ongoing work, the evaluation on a fabricated chip will be done within the first half of 2020.
More information ...

UB09.9

PA-HLS: HIGH-LEVEL ANNOTATION OF ROUTING CONGESTION FOR XILINX VIVADO HLS DESIGNS
Authors:
Osama Bin Tariq¹, Junnan Shan¹, Luciano Lavagno¹, Georgios Floros², Mihai Teodor Lazarescu¹, Christos Sotiriou² and Mario Roberto Casu¹
¹Politecnico di Torino, IT; ²University of Thessaly, GR
Abstract
We will demo a novel high-level backannotation flow that reports routing congestion issues at the C++ source level by analyzing reports from FPGA physical design (Xilinx Vivado) and internal debugging files of the Vivado HLS tool. The flow annotates the C++ source code, identifying likely causes of congestion, e.g., on-chip memories or the DSP units. These shared resources often cause routing problems on FPGAs because they cannot be duplicated by physical design. We demonstrate on realistic large designs how the information provided by our flow can be used to both identify congestion issues at the C++ source level and solve them using HLS directives. The main demo steps are: 1-Extraction of the source-level debugging information from the Vivado HLS database 2-Generation of a list of net names involved in congestion areas and of their relative significance from the Vivado post global-routing database 3-Visualization of the C++ code lines that contribute most to congestion
More information ...

UB09.10

FU: LOW POWER AND ACCURACY CONFIGURABLE APPROXIMATE ARITHMETIC UNITS
Authors:
Tomoaki Ukezono and Toshinori Sato, Fukuoka University, JP
Abstract
In this demonstration, we will introduce the approximate arithmetic units such as adder, multiplier, and MAC that are being studied in our system-architecture laboratory. Our approximate arithmetic units can reduce delay and power consumption at the expense of accuracy. Our approximate arithmetic units are intended to be applied to IoT edge devices that can process images, and are suitable for battery-driven and low-cost devices. The biggest feature of our approximate arithmetic units is that the circuit is configured so that the accuracy is dynamically variable, and the trade-off relationship between accuracy and power can be selected according to the usage status of the device. In this demonstration, we show the power consumption according to various accuracy-requirements based on actual data and claim the practicality of the proposed arithmetic units.
More information ...

12:00

End of session