2.3 System Level Design Methods

Date: Tuesday 10 March 2015
Time: 11:30 - 13:00
Location / Room: Stendhal

Chair:
Yuichi Nakamura, NEC, JP

Co-Chair:
Andreas Herkersdorf, TU München, DE

This session tackles complex system-level design problems in state-of-the-art FPGA-based designs and schedulability-critical systems. The first talk proposes a runtime system assigning multi-clock domains in FPGA-based designs for minimizing the makespan of multiple tasks. The second talk studies novel multi-cycling optimization in high-level synthesis which is driven by software profiling. The third talk presents useful schedulability analysis and formulation on execution time bound for integrated modular avionic systems. Finally two IP talks propose an automated design flow for asynchronous dataflow networks to achieve better performance and area as well as feature localization for SystemC designs.

Time	Label	Presentation Title Authors
11:30	2.3.1	ONLINE BINDING OF APPLICATIONS TO MULTIPLE CLOCK DOMAINS IN SHARED FPGA-BASED SYSTEMS Speakers: Farzad Samie¹, Lars Bauer¹, Chih-Ming Hsieh² and Joerg Henkel¹ ¹Karlsruhe Institute of Technology (KIT), DE; ²Karlsruhe Institute of Technology, DE Abstract Modern FPGA-based platforms provide multiple clock domains and their frequencies can be changed at runtime by using PLLs and clock multiplexers. This is especially beneficial for platforms that run several applications simultaneously (e.g. modern wireless sensor nodes that are shared by multiple users), as different processing modules may be fed by different clock frequencies at different time windows. However, since the number of clock domains on a platform is limited, several processing modules need to share the same clock domain. In this paper, we study the problem of binding multiple applications to multiple clock domains, such that the latest finishing time of any application (i.e. the makespan) is minimized. We present an Integer Linear Programming (ILP) formulation and then propose a novel algorithm that (i) quickly identifies those applications that are dominated by others (and thus can be ignored without losing optimality) and that (ii) uses the ascending property of the optimal binding to reduce the search space. The experimental results show up to 17% makespan reduction compared to state-of-theart. The overhead when executing on a low-power SmartFusion2 SoC equipped with an ARM Cortex-M3 core is on average 8.9 ms, i.e. our algorithm is suitable for runtime decisions. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.3.2	PROFILING-DRIVEN MULTI-CYCLING IN FPGA HIGH-LEVEL SYNTHESIS Speakers: Stefan Hadjis¹, Andrew Canis¹, Ryoya Sobue², Yuko Hara-Azumi³, Hiroyuki Tomiyama² and Jason Anderson¹ ¹University of Toronto, CA; ²Ritsumeikan University, JP; ³Tokyo Institute of Technology, JP Abstract Multi-cycling is a well-known strategy to improve performance in digital design, wherein the required time for selected combinational paths is lengthened to multiple clock cycles (rather than just one). The approach can be applied to paths associated with computations whose results are not needed immediately -- such paths are allowed multiple clock cycles to "complete" reducing the opportunity for them to form the critical path of the circuit. In this paper, we consider multi-cycling in the high-level synthesis context (HLS) and use software profiling to guide multi-cycling optimizations. Specifically, prior to HLS, we execute the program in software with typical datasets to gather data on the number of times each code segment executes. During HLS, we then extend the schedule for infrequently executed code segments and apply multi-cycling to the dilated schedules, which exhibit greater opportunities for multi-cycling. In essence, our approach ensures that non-frequently executed code segments will not form the critical path of the HLS-generated circuit. In an experimental study targeting the Altera Stratix IV FPGA, we evaluate the impact on speed performance and area for both traditional multi-cycling, as well as the proposed software profiling-driven multi-cycling, and show that profiling-driven multi-cycling leads to a geomean speedup of over 10% across 13 benchmark circuits, with some circuit speedups in excess of 30%. Circuit area is reduced by 11%, yielding a mean 20% improvement in area-delay product. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.3.3	SCHEDULABILITY BOUND FOR INTEGRATED MODULAR AVIONICS PARTITIONS Speakers: Jung-Eun Kim¹, Tarek Abdelzaher² and Lui Sha¹ ¹Department of Computer Science, University of Illinois at Urbana-Champaign, US; ²University of Illinois, US Abstract In the avionics industry, as a hierarchical scheduling architecture Integrated Modular Avionics System has been widely adopted for its isolating capability. In practice, in an early development phase, a system developer does not know much about task execution times, but only task periods and IMA partition information. In such a case the schedulability bound for a task in a given partition tells a developer how much of the execution time the task can have to be schedulable. Once the developer knows the bound, then the developer can deal with any combination of execution times under the bound, which is safe in terms of schedulability. We formulate the problem as linear programming that is commonly used in the avionics industry for schedulability analysis, and compare the bound with other existing ones which are obtained with no period information. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-3, 759	DE-ELASTISATION: FROM ASYNCHRONOUS DATAFLOWS TO SYNCHRONOUS CIRCUITS Speakers: Mahdi Jelodari Mamaghani, Jim Garside and Doug Edwards, University of Manchester, GB Abstract Whilst asynchronous VLSI programming provides a flexible abstract formalism to realise concurrent systems, the resulting performance is still an issue when adapting the flow in the industrial context. The asynchronous design paradigm provides `elasticity' which enables the system to tolerate delays in communication and computation; the drawback is that it imposes a communication overhead to the system which becomes prohibitively expensive when applied at a fine-grained level. This paper proposes a 'de-elastisation' technique in a CAD flow for asynchronous dataflow networks to improve the circuits' performance and area. To preserve the architectural advantages of asynchronous design (e.g. short cycles) the type of circuits are classified into blocking and non-blocking loops upon which our de-elastisation scheme relies. The technique is incorporated in the Teak CAD flow. Experimental results on several substantial case studies show significant performance and area improvement. This work shows 3x improvement for the first category of circuits, suitable for iterative realisations and DSP-like architectures and 4x for the second category which are suitable for concurrent realisations. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-4, 677	AUTOMATED FEATURE LOCALIZATION FOR DYNAMICALLY GENERATED SYSTEMC DESIGNS Speakers: Jannis Stoppe¹, Robert Wille¹ and Rolf Drechsler² ¹University of Bremen, DE; ²University of Bremen/DFKI GmbH, DE Abstract Due to the large complexity of today's circuits and systems, all components e.g. in a System of Chip (SoC) cannot be designed from scratch anymore. As a consequence, designers frequently work on components which they did not create themselves and, hence, design understanding becomes a critical issue. Approaches for feature localization may help here by pinpointing to distinguished characteristics of a design. However, existing approaches for feature localization of SoCs mainly focused on the Register Transfer Level; existing solutions for the Electronic System Level (using languages such as SystemC) have severe limits. In this work, we propose an approach for advanced feature localization in SystemC designs. By this, we overcome main limitations of previously proposed solutions, in particular the missing support for dynamic descriptions, while keep the proposed solution as non-intrusive as possible. The benefits of the proposed approach are confirmed by means of a case study. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break, Keynote session from 1320 - 1420 (Room Oisans) sponsored by Mentor Graphics in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00