2.7 Compilation and Code Transformations for Reconfigurable Computing

Printer-friendly version PDF version

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Les Bans

Chair:
Dirk Stroobandt, Ghent University, BE

Co-Chair:
Marco Platzner, University of Paderborn, DE

This session presents techniques for efficient compilation to CGRAs and a code transformation approach to enhance embedded system security.

TimeLabelPresentation Title
Authors
11:302.7.1JOINT AFFINE TRANSFORMATION AND LOOP PIPELINING FOR MAPPING NESTED LOOP ON CGRA
Speakers:
Shouyi YIN1, Dajiang Liu1, Leibo Liu2, Shaojun Wei1 and Yike Guo3
1Tsinghua University, CN; 2Institute of Microelectronics and The National Lab for Information Science and Technology, Tsinghua University, CN; 3Imperial College, London, UK, GB
Abstract
Coarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on CGRAs for acceleration. The loop pipelining techniques are usually used to exploit the parallelism of loops. However, for nested loops, the existing loop pipelining methods often result in poor hardware utilization and low execution performance. To tackle this problem, this paper makes two contributions: 1) a pipelining-beneficial affine transformation method which can optimize the initiation interval (II) of nested loop and enable multiple loop pipelines merging; 2) a multi-pipeline merging method which can improve hardware utilization further. The experimental results show that our approach can improve the performance of nested loop by up to 56% on average, as compared to the state-of-the-art techniques.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.7.2PATH SELECTION BASED ACCELERATION OF CONDITIONALS IN CGRAS
Speakers:
Shri Hari Rajendran Radhika, Aviral Shrivastava and Mahdi Hamzeh, Arizona State University, US
Abstract
Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achieving high performance at low power consumption. While CGRAs can efficiently accelerate loop kernels, accelerating loops with control flow (loops with if-then-else structures) is quite challenging. Existing techniques use predication to handle control flow execution - in which they execute operations from both the paths, but commit only the result of operations from the path taken by branch at run time. However, this results in increased resource usage and therefore poor mapping and lower acceleration. The state-of-the- art dual issue scheme fetches instructions from both the paths, but executes only the ones from the correct path but this scheme has an overhead in instruction fetch bandwidth. In this paper, we propose a solution in which after resolving the branching condition, we fetch and execute instructions only from the path taken by branch. Experimental results show that our solution achieves 34.6% better performance and 52.1% lower energy consumption on an average compared to state of the art dual issue scheme without imposing any overhead in instruction fetch bandwidth.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.7.3HARDWARE-ASSISTED CODE OBFUSCATION FOR FPGA SOFT MICROPROCESSORS
Speakers:
Meha Kainth, Lekshmi Krishnan, Chaitra Narayana, Sandesh Virupaksha and Russell Tessier, University of Massachusetts, US
Abstract
Soft microprocessors are vital components of many embedded FPGA systems. As the application domain for FPGAs expands, the security of the software used by soft processors increases in importance. Although software confidentiality approaches (e.g. encryption) are effective, code obfuscation is known to be an effective enhancement that further deters code understanding for attackers. The availability of specialization in FPGAs provides a unique opportunity for code obfuscation on a per-application basis with minimal hardware overhead. In this paper we describe a new technique to obfuscate soft microprocessor code which is located outside the FPGA chip in an unprotected area. Our approach provides customizable, data-dependent control flow modification to make it difficult for attackers to easily understand program behavior. The application of the approach to three benchmarks illustrates a control flow cyclomatic complexity increase of about 7x with a modest logic overhead for the soft processor.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-10, 97A UNIFIED HARDWARE/SOFTWARE MPSOC SYSTEM CONSTRUCTION AND RUN-TIME FRAMEWORK
Speakers:
Sam Skalicky1, Andrew Schmidt2, Matthew French2 and Sonia Lopez1
1Rochester Institute of Technology, US; 2USC/ISI, US
Abstract
With the continual enhancement of heterogeneous resources in FPGA devices, utilizing these resources becomes a challenging burden for developers. Especially with the inclusion of sophisticated multiple processor system-on-chips, the necessary skill set to effectively leverage these resources spans both hardware and software expertise. The maturation of high level synthesis tools and programming languages aim to alleviate these complexities, yet there still exist systematic gaps that must be bridged to provide a more cohesive hardware/software development environment. High level MPSoC design initiatives such as Redsharc have reduced the costs of entry, simplifying application implementation. We propose a unified hardware/software framework for system construction, leveraging Redsharc's APIs, efficient on-chip interconnects, and run-time controllers. We present system level abstractions that enable compilation and implementation tools for hardware and software to be merged into a single configurable system development environment. Finally, we demonstrate our proposed framework with Redsharc, using AES encryption/decryption spanning software implementations on ARM and MicroBlaze processors and hardware kernels.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-11, 15(AS)^2: ACCELERATOR SYNTHESIS USING ALGORITHMIC SKELETONS FOR RAPID DESIGN SPACE EXPLORATION
Speakers:
Shakith Fernando1, Mark Wijtvliet1, Cedric Nugteren1, Akash Kumar2 and Henk Corporaal3
1Eindhoven University of Technology, NL; 2National University of Singapore, SG; 3TU/e (Eindhoven University of Technology), NL
Abstract
Hardware accelerators in heterogeneous multiprocessor system-on-chips are becoming popular as a means of meeting performance and energy efficiency requirements of modern embedded systems. Current design methods for accelerator synthesis, such as High-Level Synthesis, are not fully automated. Therefore, time consuming manual iterations are required to explore efficient accelerator alternatives: the programmer is still required to think in terms of the underlying architecture. In this paper, we present (AS)^2: a design flow for Accelerator Synthesis using Algorithmic Skeletons. Skeletonization separates the structure of a parallel computation from an algorithms' functionality, enabling efficient implementations without requiring the programmer to have hardware knowledge. We define three such skeletons (for three image processing kernels), enabling FPGA specific parallelization techniques and optimizations. As a case study, we present a design space exploration of these skeletons and show how multiple design points with area-performance trade-offs, for the accelerators, can be efficiently and rapidly synthesized. We show that (AS)^2 is a promising direction for accelerator synthesis as it generates a Pareto front of 8 design points in under half an hour, for each of the three image processing kernels.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:02IP1-12, 646ASSISTED GENERATION OF FRAME CONDITIONS FOR FORMAL MODELS
Speakers:
Philipp Niemann, Frank Hilken, Martin Gogolla and Robert Wille, University of Bremen, DE
Abstract
Modeling languages such as UML or SysML allow for the validation and verification of the structure and the behavior of designs even in the absence of a specific implementation. However, formal models inherit a severe drawback: Most of them hardly provide a comprehensive and determinate description of transitions from one system state to another. This problem can be addressed by additionally specifying so-called frame conditions. However, only naive "workarounds" based on trivial heuristics or completely relying on a manual creation have been proposed for their generation thus far. In this work, we aim for a solution which neither leaves the burden of generating frame conditions entirely on the designer (avoiding the introduction of another time-consuming and expensive design step) nor is completely automatic (which, due to ambiguities, is not possible anyway). For this purpose, a systematic design methodology for the assisted generation of frame conditions is proposed.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:03IP1-13, 1052TOWARDS A META-LANGUAGE FOR THE CONCURRENCY CONCERN IN DSLS
Speakers:
Julien Deantoni1, Papa Issa Diallo2, Ciprian Teodorov2, Joel Champeau2 and Benoit Combemale3
1I3S, University of Nice Sophia Antipolis, FR; 2Lab-STICC - ENSTA Bretagne, FR; 3IRISA, Universty of Rennes1, FR
Abstract
Concurrency is of primary interest in the development of complex software-intensive systems, as well as the deployment on modern platforms. Furthermore, Domain-Specific Languages (DSLs) are increasingly used in industrial processes to separate and abstract the various concerns of complex systems. % However, reifying the definition of the DSL concurrency remains a challenge. This not only prevents leveraging the concurrency concern of a particular domain or platform, but it also hinders: (1) the development of a complete understanding of the DSL semantics; (2) the effectiveness of concurrency-aware analysis techniques; (3) the analysis of the deployment on parallel architectures. % In this paper, we introduce the key ideas leading toward MoCCML, a dedicated meta-language for formally specifying the concurrency concern within the definition of a DSL. The concurrency constraints can reflect the knowledge in a particular domain, but also the constraints of a particular platform. MoCCML comes with a complete language workbench to help a DSL designer in the definition of the concurrency directly within the concepts of the DSL itself, and a generic workbench to simulate and analyze any model conforming to this DSL. % MoCCML is illustrated on the definition of an lightweight extension of SDF (Synchronous Data Flow).

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only).

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00