8.6 Designing reliable embedded architectures under uncertainty

Printer-friendly version PDF version

Date: Wednesday 21 March 2018
Time: 17:00 - 18:30
Location / Room: Konf. 4

Chair:
Oliver Bringmann, Universität Tübingen, DE

Co-Chair:
Amit Singh, University of Essex, GB

Reliability is a target which can be reached in some different ways, by using, for instance, fault-tolerant architectures or by exploiting adaptable architecture. The session presents original contributions in both directions.On the first paper, reconfigurable VLIW processors are targeted by means of dynamic binary translation to explore a performance-energy trade-off. The following two papers propose solutions for fault prevention, detection and isolation, without compromising performance. In the last paper, the potential use of approximate, low-power functional units is targeted while remaining within the overall error budget of an application.

TimeLabelPresentation Title
Authors
17:008.6.1SUPPORTING RUNTIME RECONFIGURABLE VLIWS CORES THROUGH DYNAMIC BINARY TRANSLATION
Speaker:
Simon Rokicki, Univ Rennes, INRIA, CNRS, IRISA, FR
Authors:
Simon Rokicki1, Erven Rohou2 and Steven Derrien3
1Irisa, FR; 2Inria, FR; 3University of Rennes 1/IRISA, FR
Abstract
Single ISA Heterogeneous multi-cores such as the ARM big.LITTLE have proven to be an attractive solution to explore different energy/performance trade-offs. Such architectures combine Out of Order cores with smaller in-order ones to offer different power/energy profiles. They however do not really exploit the characteristics of workloads (compute intensive vs control dominated). In this work, we propose to enrich these architectures with runtime configurable VLIW cores, which are very efficient at compute intensive kernels. To preserve the single ISA programming model, we resort to Dynamic Binary Translation, and use this technique to enable dynamic code specialization for runtime reconfigurable VLIW cores. Our proposed DBT framework targets the RISC-V ISA, for which both OoO and in-order implementations exist. Our experimental results show that our approach can lead to best-case performance and energy efficiency when compared against static VLIW configurations.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.6.2USFI: ULTRA-LIGHTWEIGHT SOFTWARE FAULT ISOLATION FOR IOT-CLASS DEVICES
Speaker:
Zelalem Aweke, University of Michigan, US
Authors:
Zelalem Birhanu Aweke and Todd Austin, University of Michigan, US
Abstract
Embedded device security is a particularly difficult challenge, as the quantity of devices makes them attractive tar- gets, while their cost-sensitive design leads to less-than-desirable security implementations. Most current low-end embedded de- vices do not include any form of security or only include simple memory protection support. One line of research in crafting low- cost security for low-end embedded devices has focused on sand- boxing trusted code from untrusted code using both hardware and software techniques. These previous attempts suffer from large trusted code bases (e.g., including the entire kernel), high runtime overheads (e.g., due to code instrumentation), partial protection (e.g., only provide write protection), or heavyweight hardware modifications. In this work, we leverage the rudimentary memory protection support found in modern IoT-class microcontrollers to build a low-profile, low-overhead, flexible sandboxing mechanism that can provide isolation between tightly-coupled software modules. With our approach, named uSFI, only the trust management code must be trusted. Through the use of a static verifier and monitored inter-module transitions, module code at all privilege levels (including the kernel) is able to run uninstrumented and untrusted code. We implemented uSFI on an ARMv7-M based processor, both bare metal and running the freeRTOS kernel, and analyzed the performance using the MiBench embedded benchmark suite and two additional highly detailed applications. We found that performance overheads were minimal, with at most 1.1% slowdown, and code size overheads were also low, at a maximum of 10%. In addition, our trusted code base is trivially small at only 150 lines of code.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.6.3CONVERGING SAFETY AND HIGH-PERFORMANCE DOMAINS: INTEGRATING OPENMP INTO ADA
Speaker:
Sara Royuela, Barcelona Supercomputing Center, ES
Authors:
Sara Royuela1, Eduardo Quinones1 and Luis Miguel Pinho2
1barcelona supercomputing center, ES; 2polytechnic institute of porto, PT
Abstract
The use of parallel heterogeneous embedded architectures is needed to implement the level of performance required in advanced safety-critical systems. Hence, there is a demand for using high level parallel programming models capable of efficiently exploiting the performance opportunities. In this paper, we evaluate the incorporation of OpenMP, a parallel programming model used in HPC, into Ada, a language spread in safety-critical domains. We demonstrate that the execution model of OpenMP is compatible with the recently proposed Ada tasklet model, meant to exploit fine-grain structured parallelism. Moreover, we show the compatibility of the OpenMP and tasklet models, enabling the use of OpenMP directives in Ada to further exploit unstructured parallelism and heterogeneous computation. Finally, we state the safety properties of OpenMP and analyze the interoperability between the OpenMP and Ada runtimes. Overall, we conclude that OpenMP can be effectively incorporated into Ada without jeopardizing its safety properties.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:158.6.4COMPILER-DRIVEN ERROR ANALYSIS FOR DESIGNING APPROXIMATE ACCELERATORS
Speaker:
Jorge Castro-Godínez, Chair for Embedded Systems (CES), Karlsruhe Institute of Technology (KIT), DE
Authors:
Jorge Castro-Godinez1, Sven Esser1, Muhammad Shafique2, Santiago Pagani1 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2TU Wien, AT
Abstract
Approximate Computing has emerged as a design paradigm suitable to applications with inherent error resilience. This paradigm aims to reduce the associated computing costs (such as execution time, area, or energy) of exact calculations by reducing the quality of their results. Several approximate arithmetic circuits have been proposed, which can be used to implement hardware blocks such as approximate accelerators. However, to satisfy quality constraints in these accelerators, it is imperative to assess how the errors introduced by approximate circuits propagate through other exact and approximate computations, and finally accumulate at the output. This is, in particular, crucial to enable high-level synthesis of approximate accelerators. This work proposes a compiler-driven error analysis methodology to evaluate the behavior of errors generated from approximate adders in the design of approximate accelerators. We present CEDA, a tool to perform a static analysis of the error propagation. This tool uses #pragma-based annotated C/C++ source code as input. With these annotations, exact additions are replaced by approximate ones during the code analysis to estimate the error at the output. The error estimations produced by our tool are comparable to those obtained through simulations.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP4-2, 105PARALLEL CODE GENERATION OF SYNCHRONOUS PROGRAMS FOR A MANY-CORE ARCHITECTURE
Speaker:
Amaury Graillat, Verimag - Univ. Grenoble Alpes, FR
Authors:
Amaury Graillat1, Matthieu Moy2, Pascal Raymond3 and Benoît Dupont de Dinechin4
1Verimag - Univ. Grenoble Alpes, FR; 2Univ. Grenoble Alpes, Verimag, FR; 3VERIMAG/CNRS, FR; 4Kalray, FR
Abstract
AmEmbedded systems tend to require more and more computational power. Many-core architectures are good candi- dates since they offer power and are considered more time predictable than classical multi-cores. Data-flow Synchronous languages such as Lustre or Scade are widely used for avionic critical software. Programs are described by networks of computational nodes. Implementation of such programs on a many-core architecture must ensure a bounded response time and preserve the functional behavior by taking interference into account. We consider the top-level node of a Lustre application as a software architecture description where each sub-node corresponds to a potential parallel task. Given a mapping (tasks to cores), we automatically generate code suitable for the targeted many-core architecture. This minimizes memory interferences and allows usage of a framework to compute the Worst-Case Response Time.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:32IP4-3, 272SOCRATES - A SEAMLESS ONLINE COMPILER AND SYSTEM RUNTIME AUTOTUNING FRAMEWORK FOR ENERGY-AWARE APPLICATIONS
Speaker:
Gianluca Palermo, Politecnico di Milano, IT
Authors:
Davide Gadioli1, Ricardo Nobre2, Pedro Pinto3, Emanuele Vitali1, Amir H. Ashouri4, Gianluca Palermo1, Cristina Silvano1 and João M. P. Cardoso5
1Politecnico di Milano, IT; 2University of Porto / INESC TEC, PT; 3Faculty of Engineering, University of Porto, PT; 4University of Toronto, Canada, CA; 5University of Porto, PT
Abstract
Configuring program parallelism and selecting optimal compiler options according to the underlying platform architecture is a difficult task if completely demanded to the programmer or done by using a default one-fits-all policy generated by the compiler or runtime system. Given the dynamics of the problem, a runtime selection of the best configuration is obviously the desirable solution. However, implementing this solution into the application requires the insertion of a lot of glue code for profiling and runtime selection. This represents a programming wall to actually make it feasible. This paper presents a structured approach called SOCRATES, based on a Domain Specific Language (LARA) and a runtime autotuner (mARGOt), to alleviate this effort. LARA has been used to hide the glue code insertion, thus separating the pure functional application description from extra-functional requirements. mARGOT has been used for the automatic selection of the best configuration according to the runtime evolution of the application. To demonstrated the effectiveness of the proposed approach, we evaluated SOCRATES by varying the application workloads, hardware resources and energy efficiency requirements for 12 OpenMP Polybench/C with respect to a standard one-fits-all solution.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:33IP4-4, 377NON-INTRUSIVE PROGRAM TRACING OF NON-PREEMPTIVE MULTITASKING SYSTEMS USING POWER CONSUMPTION
Speaker:
Kamal Lamichhane, University of Waterloo, CA
Authors:
Kamal Lamichhane, Carlos Moreno and Sebastian Fischmeister, University of Waterloo, CA
Abstract
System tracing, runtime monitoring, execution reconstruction are useful techniques for protecting the safety and integrity of systems. Furthermore, with time-aware or overhead-aware techniques being available, these techniques can also be used to monitor and secure production systems. As operating systems gain in popularity, even in deeply embedded systems, these techniques face the challenge to support multitasking. In this paper, we propose a novel non-intrusive technique, which efficiently reconstructs the execution trace of non-preemptive multitasking system by observing power consumption characteristics. Our technique uses the control-flow graph (CFG) of the application program to identify the most likely block of code that the system is executing at any given point in time. For the purpose of the experimental evaluation, we first instrument the source code to obtain power consumption information for each basic block, which is used as the training data for our Dynamic Time Warping and k-Nearest Neighbours (k-NN) classifier. Once the system is trained, this technique is used to identify live code-block execution (LCBE). We show that the technique can reconstruct the execution flow of programs in a multi-tasking environment with high accuracy.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session