2.7 Compilation and Code Transformations for Reconfigurable Computing

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Les Bans

Chair:
Dirk Stroobandt, Ghent University, BE

Co-Chair:
Marco Platzner, University of Paderborn, DE

This session presents techniques for efficient compilation to CGRAs and a code transformation approach to enhance embedded system security.

Time	Label	Presentation Title Authors
11:30	2.7.1	JOINT AFFINE TRANSFORMATION AND LOOP PIPELINING FOR MAPPING NESTED LOOP ON CGRA Speakers: Shouyi YIN¹, Dajiang Liu¹, Leibo Liu², Shaojun Wei¹ and Yike Guo³ ¹Tsinghua University, CN; ²Institute of Microelectronics and The National Lab for Information Science and Technology, Tsinghua University, CN; ³Imperial College, London, UK, GB Abstract Coarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on CGRAs for acceleration. The loop pipelining techniques are usually used to exploit the parallelism of loops. However, for nested loops, the existing loop pipelining methods often result in poor hardware utilization and low execution performance. To tackle this problem, this paper makes two contributions: 1) a pipelining-beneficial affine transformation method which can optimize the initiation interval (II) of nested loop and enable multiple loop pipelines merging; 2) a multi-pipeline merging method which can improve hardware utilization further. The experimental results show that our approach can improve the performance of nested loop by up to 56% on average, as compared to the state-of-the-art techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.7.2	PATH SELECTION BASED ACCELERATION OF CONDITIONALS IN CGRAS Speakers: Shri Hari Rajendran Radhika, Aviral Shrivastava and Mahdi Hamzeh, Arizona State University, US Abstract Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achieving high performance at low power consumption. While CGRAs can efficiently accelerate loop kernels, accelerating loops with control flow (loops with if-then-else structures) is quite challenging. Existing techniques use predication to handle control flow execution - in which they execute operations from both the paths, but commit only the result of operations from the path taken by branch at run time. However, this results in increased resource usage and therefore poor mapping and lower acceleration. The state-of-the- art dual issue scheme fetches instructions from both the paths, but executes only the ones from the correct path but this scheme has an overhead in instruction fetch bandwidth. In this paper, we propose a solution in which after resolving the branching condition, we fetch and execute instructions only from the path taken by branch. Experimental results show that our solution achieves 34.6% better performance and 52.1% lower energy consumption on an average compared to state of the art dual issue scheme without imposing any overhead in instruction fetch bandwidth. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.7.3	HARDWARE-ASSISTED CODE OBFUSCATION FOR FPGA SOFT MICROPROCESSORS Speakers: Meha Kainth, Lekshmi Krishnan, Chaitra Narayana, Sandesh Virupaksha and Russell Tessier, University of Massachusetts, US Abstract Soft microprocessors are vital components of many embedded FPGA systems. As the application domain for FPGAs expands, the security of the software used by soft processors increases in importance. Although software confidentiality approaches (e.g. encryption) are effective, code obfuscation is known to be an effective enhancement that further deters code understanding for attackers. The availability of specialization in FPGAs provides a unique opportunity for code obfuscation on a per-application basis with minimal hardware overhead. In this paper we describe a new technique to obfuscate soft microprocessor code which is located outside the FPGA chip in an unprotected area. Our approach provides customizable, data-dependent control flow modification to make it difficult for attackers to easily understand program behavior. The application of the approach to three benchmarks illustrates a control flow cyclomatic complexity increase of about 7x with a modest logic overhead for the soft processor. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-10, 97	A UNIFIED HARDWARE/SOFTWARE MPSOC SYSTEM CONSTRUCTION AND RUN-TIME FRAMEWORK Speakers: Sam Skalicky¹, Andrew Schmidt², Matthew French² and Sonia Lopez¹ ¹Rochester Institute of Technology, US; ²USC/ISI, US Abstract With the continual enhancement of heterogeneous resources in FPGA devices, utilizing these resources becomes a challenging burden for developers. Especially with the inclusion of sophisticated multiple processor system-on-chips, the necessary skill set to effectively leverage these resources spans both hardware and software expertise. The maturation of high level synthesis tools and programming languages aim to alleviate these complexities, yet there still exist systematic gaps that must be bridged to provide a more cohesive hardware/software development environment. High level MPSoC design initiatives such as Redsharc have reduced the costs of entry, simplifying application implementation. We propose a unified hardware/software framework for system construction, leveraging Redsharc's APIs, efficient on-chip interconnects, and run-time controllers. We present system level abstractions that enable compilation and implementation tools for hardware and software to be merged into a single configurable system development environment. Finally, we demonstrate our proposed framework with Redsharc, using AES encryption/decryption spanning software implementations on ARM and MicroBlaze processors and hardware kernels. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-11, 15	(AS)^2: ACCELERATOR SYNTHESIS USING ALGORITHMIC SKELETONS FOR RAPID DESIGN SPACE EXPLORATION Speakers: Shakith Fernando¹, Mark Wijtvliet¹, Cedric Nugteren¹, Akash Kumar² and Henk Corporaal³ ¹Eindhoven University of Technology, NL; ²National University of Singapore, SG; ³TU/e (Eindhoven University of Technology), NL Abstract Hardware accelerators in heterogeneous multiprocessor system-on-chips are becoming popular as a means of meeting performance and energy efficiency requirements of modern embedded systems. Current design methods for accelerator synthesis, such as High-Level Synthesis, are not fully automated. Therefore, time consuming manual iterations are required to explore efficient accelerator alternatives: the programmer is still required to think in terms of the underlying architecture. In this paper, we present (AS)^2: a design flow for Accelerator Synthesis using Algorithmic Skeletons. Skeletonization separates the structure of a parallel computation from an algorithms' functionality, enabling efficient implementations without requiring the programmer to have hardware knowledge. We define three such skeletons (for three image processing kernels), enabling FPGA specific parallelization techniques and optimizations. As a case study, we present a design space exploration of these skeletons and show how multiple design points with area-performance trade-offs, for the accelerators, can be efficiently and rapidly synthesized. We show that (AS)^2 is a promising direction for accelerator synthesis as it generates a Pareto front of 8 design points in under half an hour, for each of the three image processing kernels. Download Paper (PDF; Only available from the DATE venue WiFi)
13:02	IP1-12, 646	ASSISTED GENERATION OF FRAME CONDITIONS FOR FORMAL MODELS Speakers: Philipp Niemann, Frank Hilken, Martin Gogolla and Robert Wille, University of Bremen, DE Abstract Modeling languages such as UML or SysML allow for the validation and verification of the structure and the behavior of designs even in the absence of a specific implementation. However, formal models inherit a severe drawback: Most of them hardly provide a comprehensive and determinate description of transitions from one system state to another. This problem can be addressed by additionally specifying so-called frame conditions. However, only naive "workarounds" based on trivial heuristics or completely relying on a manual creation have been proposed for their generation thus far. In this work, we aim for a solution which neither leaves the burden of generating frame conditions entirely on the designer (avoiding the introduction of another time-consuming and expensive design step) nor is completely automatic (which, due to ambiguities, is not possible anyway). For this purpose, a systematic design methodology for the assisted generation of frame conditions is proposed. Download Paper (PDF; Only available from the DATE venue WiFi)
13:03	IP1-13, 1052	TOWARDS A META-LANGUAGE FOR THE CONCURRENCY CONCERN IN DSLS Speakers: Julien Deantoni¹, Papa Issa Diallo², Ciprian Teodorov², Joel Champeau² and Benoit Combemale³ ¹I3S, University of Nice Sophia Antipolis, FR; ²Lab-STICC - ENSTA Bretagne, FR; ³IRISA, Universty of Rennes1, FR Abstract Concurrency is of primary interest in the development of complex software-intensive systems, as well as the deployment on modern platforms. Furthermore, Domain-Specific Languages (DSLs) are increasingly used in industrial processes to separate and abstract the various concerns of complex systems. % However, reifying the definition of the DSL concurrency remains a challenge. This not only prevents leveraging the concurrency concern of a particular domain or platform, but it also hinders: (1) the development of a complete understanding of the DSL semantics; (2) the effectiveness of concurrency-aware analysis techniques; (3) the analysis of the deployment on parallel architectures. % In this paper, we introduce the key ideas leading toward MoCCML, a dedicated meta-language for formally specifying the concurrency concern within the definition of a DSL. The concurrency constraints can reflect the knowledge in a particular domain, but also the constraints of a particular platform. MoCCML comes with a complete language workbench to help a DSL designer in the definition of the concurrency directly within the concepts of the DSL itself, and a generic workbench to simulate and analyze any model conforming to this DSL. % MoCCML is illustrated on the definition of an lightweight extension of SDF (Synchronous Data Flow). Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00