8.6 Designing reliable embedded architectures under uncertainty

Time	Label	Presentation Title Authors
17:00	8.6.1	SUPPORTING RUNTIME RECONFIGURABLE VLIWS CORES THROUGH DYNAMIC BINARY TRANSLATION Speaker: Simon Rokicki, Univ Rennes, INRIA, CNRS, IRISA, FR Authors: Simon Rokicki¹, Erven Rohou² and Steven Derrien³ ¹Irisa, FR; ²Inria, FR; ³University of Rennes 1/IRISA, FR Abstract Single ISA Heterogeneous multi-cores such as the ARM big.LITTLE have proven to be an attractive solution to explore different energy/performance trade-offs. Such architectures combine Out of Order cores with smaller in-order ones to offer different power/energy profiles. They however do not really exploit the characteristics of workloads (compute intensive vs control dominated). In this work, we propose to enrich these architectures with runtime configurable VLIW cores, which are very efficient at compute intensive kernels. To preserve the single ISA programming model, we resort to Dynamic Binary Translation, and use this technique to enable dynamic code specialization for runtime reconfigurable VLIW cores. Our proposed DBT framework targets the RISC-V ISA, for which both OoO and in-order implementations exist. Our experimental results show that our approach can lead to best-case performance and energy efficiency when compared against static VLIW configurations. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.6.2	USFI: ULTRA-LIGHTWEIGHT SOFTWARE FAULT ISOLATION FOR IOT-CLASS DEVICES Speaker: Zelalem Aweke, University of Michigan, US Authors: Zelalem Birhanu Aweke and Todd Austin, University of Michigan, US Abstract Embedded device security is a particularly difficult challenge, as the quantity of devices makes them attractive tar- gets, while their cost-sensitive design leads to less-than-desirable security implementations. Most current low-end embedded de- vices do not include any form of security or only include simple memory protection support. One line of research in crafting low- cost security for low-end embedded devices has focused on sand- boxing trusted code from untrusted code using both hardware and software techniques. These previous attempts suffer from large trusted code bases (e.g., including the entire kernel), high runtime overheads (e.g., due to code instrumentation), partial protection (e.g., only provide write protection), or heavyweight hardware modifications. In this work, we leverage the rudimentary memory protection support found in modern IoT-class microcontrollers to build a low-profile, low-overhead, flexible sandboxing mechanism that can provide isolation between tightly-coupled software modules. With our approach, named uSFI, only the trust management code must be trusted. Through the use of a static verifier and monitored inter-module transitions, module code at all privilege levels (including the kernel) is able to run uninstrumented and untrusted code. We implemented uSFI on an ARMv7-M based processor, both bare metal and running the freeRTOS kernel, and analyzed the performance using the MiBench embedded benchmark suite and two additional highly detailed applications. We found that performance overheads were minimal, with at most 1.1% slowdown, and code size overheads were also low, at a maximum of 10%. In addition, our trusted code base is trivially small at only 150 lines of code. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.6.3	CONVERGING SAFETY AND HIGH-PERFORMANCE DOMAINS: INTEGRATING OPENMP INTO ADA Speaker: Sara Royuela, Barcelona Supercomputing Center, ES Authors: Sara Royuela¹, Eduardo Quinones¹ and Luis Miguel Pinho² ¹barcelona supercomputing center, ES; ²polytechnic institute of porto, PT Abstract The use of parallel heterogeneous embedded architectures is needed to implement the level of performance required in advanced safety-critical systems. Hence, there is a demand for using high level parallel programming models capable of efficiently exploiting the performance opportunities. In this paper, we evaluate the incorporation of OpenMP, a parallel programming model used in HPC, into Ada, a language spread in safety-critical domains. We demonstrate that the execution model of OpenMP is compatible with the recently proposed Ada tasklet model, meant to exploit fine-grain structured parallelism. Moreover, we show the compatibility of the OpenMP and tasklet models, enabling the use of OpenMP directives in Ada to further exploit unstructured parallelism and heterogeneous computation. Finally, we state the safety properties of OpenMP and analyze the interoperability between the OpenMP and Ada runtimes. Overall, we conclude that OpenMP can be effectively incorporated into Ada without jeopardizing its safety properties. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	8.6.4	COMPILER-DRIVEN ERROR ANALYSIS FOR DESIGNING APPROXIMATE ACCELERATORS Speaker: Jorge Castro-Godínez, Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), DE Authors: Jorge Castro-Godinez¹, Sven Esser¹, Muhammad Shafique², Santiago Pagani¹ and Joerg Henkel¹ ¹Karlsruhe Institute of Technology, DE; ²TU Wien, AT Abstract Approximate Computing has emerged as a design paradigm suitable to applications with inherent error resilience. This paradigm aims to reduce the associated computing costs (such as execution time, area, or energy) of exact calculations by reducing the quality of their results. Several approximate arithmetic circuits have been proposed, which can be used to implement hardware blocks such as approximate accelerators. However, to satisfy quality constraints in these accelerators, it is imperative to assess how the errors introduced by approximate circuits propagate through other exact and approximate computations, and finally accumulate at the output. This is, in particular, crucial to enable high-level synthesis of approximate accelerators. This work proposes a compiler-driven error analysis methodology to evaluate the behavior of errors generated from approximate adders in the design of approximate accelerators. We present CEDA, a tool to perform a static analysis of the error propagation. This tool uses #pragma-based annotated C/C++ source code as input. With these annotations, exact additions are replaced by approximate ones during the code analysis to estimate the error at the output. The error estimations produced by our tool are comparable to those obtained through simulations. Download Paper (PDF; Only available from the DATE venue WiFi)
18:31	IP4-2, 105	PARALLEL CODE GENERATION OF SYNCHRONOUS PROGRAMS FOR A MANY-CORE ARCHITECTURE Speaker: Amaury Graillat, Verimag - Univ. Grenoble Alpes, FR Authors: Amaury Graillat¹, Matthieu Moy², Pascal Raymond³ and Benoît Dupont de Dinechin⁴ ¹Verimag - Univ. Grenoble Alpes, FR; ²Univ. Grenoble Alpes, Verimag, FR; ³VERIMAG/CNRS, FR; ⁴Kalray, FR Abstract AmEmbedded systems tend to require more and more computational power. Many-core architectures are good candi- dates since they offer power and are considered more time predictable than classical multi-cores. Data-flow Synchronous languages such as Lustre or Scade are widely used for avionic critical software. Programs are described by networks of computational nodes. Implementation of such programs on a many-core architecture must ensure a bounded response time and preserve the functional behavior by taking interference into account. We consider the top-level node of a Lustre application as a software architecture description where each sub-node corresponds to a potential parallel task. Given a mapping (tasks to cores), we automatically generate code suitable for the targeted many-core architecture. This minimizes memory interferences and allows usage of a framework to compute the Worst-Case Response Time. Download Paper (PDF; Only available from the DATE venue WiFi)
18:32	IP4-3, 272	SOCRATES - A SEAMLESS ONLINE COMPILER AND SYSTEM RUNTIME AUTOTUNING FRAMEWORK FOR ENERGY-AWARE APPLICATIONS Speaker: Gianluca Palermo, Politecnico di Milano, IT Authors: Davide Gadioli¹, Ricardo Nobre², Pedro Pinto³, Emanuele Vitali¹, Amir H. Ashouri⁴, Gianluca Palermo¹, Cristina Silvano¹ and João M. P. Cardoso⁵ ¹Politecnico di Milano, IT; ²University of Porto / INESC TEC, PT; ³Faculty of Engineering, University of Porto, PT; ⁴University of Toronto, Canada, CA; ⁵University of Porto, PT Abstract Configuring program parallelism and selecting optimal compiler options according to the underlying platform architecture is a difficult task if completely demanded to the programmer or done by using a default one-fits-all policy generated by the compiler or runtime system. Given the dynamics of the problem, a runtime selection of the best configuration is obviously the desirable solution. However, implementing this solution into the application requires the insertion of a lot of glue code for profiling and runtime selection. This represents a programming wall to actually make it feasible. This paper presents a structured approach called SOCRATES, based on a Domain Specific Language (LARA) and a runtime autotuner (mARGOt), to alleviate this effort. LARA has been used to hide the glue code insertion, thus separating the pure functional application description from extra-functional requirements. mARGOT has been used for the automatic selection of the best configuration according to the runtime evolution of the application. To demonstrated the effectiveness of the proposed approach, we evaluated SOCRATES by varying the application workloads, hardware resources and energy efficiency requirements for 12 OpenMP Polybench/C with respect to a standard one-fits-all solution. Download Paper (PDF; Only available from the DATE venue WiFi)
18:33	IP4-4, 377	NON-INTRUSIVE PROGRAM TRACING OF NON-PREEMPTIVE MULTITASKING SYSTEMS USING POWER CONSUMPTION Speaker: Kamal Lamichhane, University of Waterloo, CA Authors: Kamal Lamichhane, Carlos Moreno and Sebastian Fischmeister, University of Waterloo, CA Abstract System tracing, runtime monitoring, execution reconstruction are useful techniques for protecting the safety and integrity of systems. Furthermore, with time-aware or overhead-aware techniques being available, these techniques can also be used to monitor and secure production systems. As operating systems gain in popularity, even in deeply embedded systems, these techniques face the challenge to support multitasking. In this paper, we propose a novel non-intrusive technique, which efficiently reconstructs the execution trace of non-preemptive multitasking system by observing power consumption characteristics. Our technique uses the control-flow graph (CFG) of the application program to identify the most likely block of code that the system is executing at any given point in time. For the purpose of the experimental evaluation, we first instrument the source code to obtain power consumption information for each basic block, which is used as the training data for our Dynamic Time Warping and k-Nearest Neighbours (k-NN) classifier. Once the system is trained, this technique is used to identify live code-block execution (LCBE). We show that the technique can reconstruct the execution flow of programs in a multi-tasking environment with high accuracy. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

Time

Label

Presentation Title
Authors

17:00

8.6.1

SUPPORTING RUNTIME RECONFIGURABLE VLIWS CORES THROUGH DYNAMIC BINARY TRANSLATION
Speaker:
Simon Rokicki, Univ Rennes, INRIA, CNRS, IRISA, FR
Authors:
Simon Rokicki¹, Erven Rohou² and Steven Derrien³
¹Irisa, FR; ²Inria, FR; ³University of Rennes 1/IRISA, FR
Abstract
Single ISA Heterogeneous multi-cores such as the ARM big.LITTLE have proven to be an attractive solution to explore different energy/performance trade-offs. Such architectures combine Out of Order cores with smaller in-order ones to offer different power/energy profiles. They however do not really exploit the characteristics of workloads (compute intensive vs control dominated). In this work, we propose to enrich these architectures with runtime configurable VLIW cores, which are very efficient at compute intensive kernels. To preserve the single ISA programming model, we resort to Dynamic Binary Translation, and use this technique to enable dynamic code specialization for runtime reconfigurable VLIW cores. Our proposed DBT framework targets the RISC-V ISA, for which both OoO and in-order implementations exist. Our experimental results show that our approach can lead to best-case performance and energy efficiency when compared against static VLIW configurations.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:30

8.6.2

USFI: ULTRA-LIGHTWEIGHT SOFTWARE FAULT ISOLATION FOR IOT-CLASS DEVICES
Speaker:
Zelalem Aweke, University of Michigan, US
Authors:
Zelalem Birhanu Aweke and Todd Austin, University of Michigan, US
Abstract
Embedded device security is a particularly difficult challenge, as the quantity of devices makes them attractive tar- gets, while their cost-sensitive design leads to less-than-desirable security implementations. Most current low-end embedded de- vices do not include any form of security or only include simple memory protection support. One line of research in crafting low- cost security for low-end embedded devices has focused on sand- boxing trusted code from untrusted code using both hardware and software techniques. These previous attempts suffer from large trusted code bases (e.g., including the entire kernel), high runtime overheads (e.g., due to code instrumentation), partial protection (e.g., only provide write protection), or heavyweight hardware modifications. In this work, we leverage the rudimentary memory protection support found in modern IoT-class microcontrollers to build a low-profile, low-overhead, flexible sandboxing mechanism that can provide isolation between tightly-coupled software modules. With our approach, named uSFI, only the trust management code must be trusted. Through the use of a static verifier and monitored inter-module transitions, module code at all privilege levels (including the kernel) is able to run uninstrumented and untrusted code. We implemented uSFI on an ARMv7-M based processor, both bare metal and running the freeRTOS kernel, and analyzed the performance using the MiBench embedded benchmark suite and two additional highly detailed applications. We found that performance overheads were minimal, with at most 1.1% slowdown, and code size overheads were also low, at a maximum of 10%. In addition, our trusted code base is trivially small at only 150 lines of code.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:00

8.6.3

CONVERGING SAFETY AND HIGH-PERFORMANCE DOMAINS: INTEGRATING OPENMP INTO ADA
Speaker:
Sara Royuela, Barcelona Supercomputing Center, ES
Authors:
Sara Royuela¹, Eduardo Quinones¹ and Luis Miguel Pinho²
¹barcelona supercomputing center, ES; ²polytechnic institute of porto, PT
Abstract
The use of parallel heterogeneous embedded architectures is needed to implement the level of performance required in advanced safety-critical systems. Hence, there is a demand for using high level parallel programming models capable of efficiently exploiting the performance opportunities. In this paper, we evaluate the incorporation of OpenMP, a parallel programming model used in HPC, into Ada, a language spread in safety-critical domains. We demonstrate that the execution model of OpenMP is compatible with the recently proposed Ada tasklet model, meant to exploit fine-grain structured parallelism. Moreover, we show the compatibility of the OpenMP and tasklet models, enabling the use of OpenMP directives in Ada to further exploit unstructured parallelism and heterogeneous computation. Finally, we state the safety properties of OpenMP and analyze the interoperability between the OpenMP and Ada runtimes. Overall, we conclude that OpenMP can be effectively incorporated into Ada without jeopardizing its safety properties.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:15

8.6.4

COMPILER-DRIVEN ERROR ANALYSIS FOR DESIGNING APPROXIMATE ACCELERATORS
Speaker:
Jorge Castro-Godínez, Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), DE
Authors:
Jorge Castro-Godinez¹, Sven Esser¹, Muhammad Shafique², Santiago Pagani¹ and Joerg Henkel¹
¹Karlsruhe Institute of Technology, DE; ²TU Wien, AT
Abstract
Approximate Computing has emerged as a design paradigm suitable to applications with inherent error resilience. This paradigm aims to reduce the associated computing costs (such as execution time, area, or energy) of exact calculations by reducing the quality of their results. Several approximate arithmetic circuits have been proposed, which can be used to implement hardware blocks such as approximate accelerators. However, to satisfy quality constraints in these accelerators, it is imperative to assess how the errors introduced by approximate circuits propagate through other exact and approximate computations, and finally accumulate at the output. This is, in particular, crucial to enable high-level synthesis of approximate accelerators. This work proposes a compiler-driven error analysis methodology to evaluate the behavior of errors generated from approximate adders in the design of approximate accelerators. We present CEDA, a tool to perform a static analysis of the error propagation. This tool uses #pragma-based annotated C/C++ source code as input. With these annotations, exact additions are replaced by approximate ones during the code analysis to estimate the error at the output. The error estimations produced by our tool are comparable to those obtained through simulations.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:31

IP4-2, 105

PARALLEL CODE GENERATION OF SYNCHRONOUS PROGRAMS FOR A MANY-CORE ARCHITECTURE
Speaker:
Amaury Graillat, Verimag - Univ. Grenoble Alpes, FR
Authors:
Amaury Graillat¹, Matthieu Moy², Pascal Raymond³ and Benoît Dupont de Dinechin⁴
¹Verimag - Univ. Grenoble Alpes, FR; ²Univ. Grenoble Alpes, Verimag, FR; ³VERIMAG/CNRS, FR; ⁴Kalray, FR
Abstract
AmEmbedded systems tend to require more and more computational power. Many-core architectures are good candi- dates since they offer power and are considered more time predictable than classical multi-cores. Data-flow Synchronous languages such as Lustre or Scade are widely used for avionic critical software. Programs are described by networks of computational nodes. Implementation of such programs on a many-core architecture must ensure a bounded response time and preserve the functional behavior by taking interference into account. We consider the top-level node of a Lustre application as a software architecture description where each sub-node corresponds to a potential parallel task. Given a mapping (tasks to cores), we automatically generate code suitable for the targeted many-core architecture. This minimizes memory interferences and allows usage of a framework to compute the Worst-Case Response Time.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:32

IP4-3, 272

SOCRATES - A SEAMLESS ONLINE COMPILER AND SYSTEM RUNTIME AUTOTUNING FRAMEWORK FOR ENERGY-AWARE APPLICATIONS
Speaker:
Gianluca Palermo, Politecnico di Milano, IT
Authors:
Davide Gadioli¹, Ricardo Nobre², Pedro Pinto³, Emanuele Vitali¹, Amir H. Ashouri⁴, Gianluca Palermo¹, Cristina Silvano¹ and João M. P. Cardoso⁵
¹Politecnico di Milano, IT; ²University of Porto / INESC TEC, PT; ³Faculty of Engineering, University of Porto, PT; ⁴University of Toronto, Canada, CA; ⁵University of Porto, PT
Abstract
Configuring program parallelism and selecting optimal compiler options according to the underlying platform architecture is a difficult task if completely demanded to the programmer or done by using a default one-fits-all policy generated by the compiler or runtime system. Given the dynamics of the problem, a runtime selection of the best configuration is obviously the desirable solution. However, implementing this solution into the application requires the insertion of a lot of glue code for profiling and runtime selection. This represents a programming wall to actually make it feasible. This paper presents a structured approach called SOCRATES, based on a Domain Specific Language (LARA) and a runtime autotuner (mARGOt), to alleviate this effort. LARA has been used to hide the glue code insertion, thus separating the pure functional application description from extra-functional requirements. mARGOT has been used for the automatic selection of the best configuration according to the runtime evolution of the application. To demonstrated the effectiveness of the proposed approach, we evaluated SOCRATES by varying the application workloads, hardware resources and energy efficiency requirements for 12 OpenMP Polybench/C with respect to a standard one-fits-all solution.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:33

IP4-4, 377

NON-INTRUSIVE PROGRAM TRACING OF NON-PREEMPTIVE MULTITASKING SYSTEMS USING POWER CONSUMPTION
Speaker:
Kamal Lamichhane, University of Waterloo, CA
Authors:
Kamal Lamichhane, Carlos Moreno and Sebastian Fischmeister, University of Waterloo, CA
Abstract
System tracing, runtime monitoring, execution reconstruction are useful techniques for protecting the safety and integrity of systems. Furthermore, with time-aware or overhead-aware techniques being available, these techniques can also be used to monitor and secure production systems. As operating systems gain in popularity, even in deeply embedded systems, these techniques face the challenge to support multitasking. In this paper, we propose a novel non-intrusive technique, which efficiently reconstructs the execution trace of non-preemptive multitasking system by observing power consumption characteristics. Our technique uses the control-flow graph (CFG) of the application program to identify the most likely block of code that the system is executing at any given point in time. For the purpose of the experimental evaluation, we first instrument the source code to obtain power consumption information for each basic block, which is used as the training data for our Dynamic Time Warping and k-Nearest Neighbours (k-NN) classifier. Once the system is trained, this technique is used to identify live code-block execution (LCBE). We show that the technique can reconstruct the execution flow of programs in a multi-tasking environment with high accuracy.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:30

End of session