6.7 Application-Mapping Strategies for Many-Cores

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Les Bans

Chair:
Amit Kumar Singh, University of York, GB

Co-Chair:
Marc Geilen, Eindhoven University of Technology, NL

This session deals with application performance. The first paper proposes a performance model to guide run-time mapping. The other two papers optimize performance, one by mapping tasks to match the parallelism of the underlying architecture, and the other by identifying shared memory sections to facilitate parallel execution.

Time	Label	Presentation Title Authors
11:00	6.7.1	ADAPTIVE ON-THE-FLY APPLICATION PERFORMANCE MODELING FOR MANY CORES Speakers: Sebastian Kobbe, Lars Bauer and Joerg Henkel, Karlsruhe Institute of Technology (KIT), DE Abstract Resource management for a many-core system entails allocating cores to applications and binding tasks of the applications to particular cores. Accurate on-the-fly estimates of different core allocations w.r.t. application performance are required before binding the tasks to cores for execution efficiency. We propose an adaptive on-the-fly application performance model that largely al-leviates this increasingly important problem. It allows reacting to spontaneous workload variations and it considers topological properties of resources. Extensive evaluations show that the aver-age estimation error is reduced from 14.7% to 4.5%, resulting in high quality of on-the-fly adaptive application mapping. Our work is a first milestone towards optimality of systems that exhibit a high degree of spontaneous workload variations. Download Paper (PDF; Only available from the DATE venue WiFi)
11:30	6.7.2	CUSTOMIZATION OF OPENCL APPLICATIONS FOR EFFICIENT TASK MAPPING UNDER HETEROGENEOUS PLATFORM CONSTRAINTS Speakers: Edoardo Paone¹, Francesco Robino², Gianluca Palermo¹, Vittorio Zaccaria¹, Ingo Sander² and Cristina Silvano¹ ¹Politecnico di Milano, IT; ²KTH Royal Institute of Technology, SE Abstract When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes each application kernel for a given platform and 2) a task-mapping phase that maximizes the overall application throughput by exploiting concurrency in the application task graph. The tuning phase is suitable for customizing parameterized OpenCL kernels considering device-specific constraints. Then, the mapping phase improves task-level parallelism for multi-device execution accounting for the overhead of memory transfers, overheads implied by multiple OpenCL contexts for different device vendors. Benefits of the proposed design flow have been assessed on a stereo-matching application targeting two commercial heterogeneous platforms. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	6.7.3	ENABLING MULTI-THREADED APPLICATIONS ON HYBRID SHARED MEMORY MANYCORE ARCHITECTURES Speakers: Tushar Rawat and Aviral Shrivastava, Arizona State University, US Abstract As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. In this paper, we present a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application, and maps it to the off-chip shared memory such that it enables execution on HSM architectures. This improves the performance of our benchmarks by 32x. Following, we identify and map the frequently accessed shared data to the on-chip shared memory. This further improves the performance of our benchmarks by 8x on average. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	IP3-8, 739	MAXIMIZING COMMON IDLE TIME ON MULTI-CORE PROCESSORS WITH SHARED MEMORY Speakers: Chenchen Fu¹, Yingchao Zhao², Minming Li¹ and Jason Xue³ ¹Department of Computer Science, City University of Hong Kong, HK; ²Department of Computer Science, Caritas Institute of Higher Education, Hong Kong, HK; ³City University of Hong Kong, HK Abstract Reducing energy consumption is a critical problem in most of the computing systems today. This paper focuses on reducing the energy consumption of the shared main memory in multi-core processors by putting it into sleep state when all the cores are idle. Based on this idea, this work presents systematic analysis of different assignment and scheduling models and proposes a series of scheduling schemes to maximize the common idle time of all cores. An optimal scheduling scheme is proposed assuming the number of cores is unbounded. When the number of cores is bounded, an efficient heuristic algorithm is proposed. The experimental results show that the heuristic algorithm works efficiently and can save as much as 25.6% memory energy compared to a conventional multi-core scheduling scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
12:31	IP3-9, 78	MAXIMIZING IO PERFORMANCE VIA CONFLICT REDUCTION FOR FLASH MEMORY STORAGE SYSTEMS Speakers: Qiao Li¹, Liang Shi², Congming Gao¹, Kaijie Wu¹, Jason Chun Xue³, Qingfeng Zhuge¹ and H.-M. Edwin Sha⁴ ¹Chongqing University, CN; ²College of Computer Science, Chongqing University, CN; ³City University of Hong Kong, HK; ⁴Chongqing University and University of Texas at Dallas, CN Abstract Flash memory has been widely deployed during the recent years with the improvement of bit density and technology scaling. However, a significant performance degradation is also introduced with the development trend. The latency of IO requests on flash memory storage systems is composed of access conflict latency, data transfer latency, flash chip access latency and ECC encoding/decoding latency. Studies show that the access conflict latency, which is mainly induced by the slow transfer latency and access latency, has become the dominate part of the IO latency, especially for IO intensive applications. This paper proposes to reduce the flash access conflict latency through the reduction of the transfer and flash access latencies. A latency model is built to construct the relationship among the transfer latency and access latency based on the reliability characteristics of flash memory. Simulation experiments show that the proposed approach achieves significant performance improvement. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30		End of session Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00