6.7 Application-Mapping Strategies for Many-Cores

Printer-friendly version PDF version

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Les Bans

Chair:
Amit Kumar Singh, University of York, GB

Co-Chair:
Marc Geilen, Eindhoven University of Technology, NL

This session deals with application performance. The first paper proposes a performance model to guide run-time mapping. The other two papers optimize performance, one by mapping tasks to match the parallelism of the underlying architecture, and the other by identifying shared memory sections to facilitate parallel execution.

TimeLabelPresentation Title
Authors
11:006.7.1ADAPTIVE ON-THE-FLY APPLICATION PERFORMANCE MODELING FOR MANY CORES
Speakers:
Sebastian Kobbe, Lars Bauer and Joerg Henkel, Karlsruhe Institute of Technology (KIT), DE
Abstract
Resource management for a many-core system entails allocating cores to applications and binding tasks of the applications to particular cores. Accurate on-the-fly estimates of different core allocations w.r.t. application performance are required before binding the tasks to cores for execution efficiency. We propose an adaptive on-the-fly application performance model that largely al-leviates this increasingly important problem. It allows reacting to spontaneous workload variations and it considers topological properties of resources. Extensive evaluations show that the aver-age estimation error is reduced from 14.7% to 4.5%, resulting in high quality of on-the-fly adaptive application mapping. Our work is a first milestone towards optimality of systems that exhibit a high degree of spontaneous workload variations.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.7.2CUSTOMIZATION OF OPENCL APPLICATIONS FOR EFFICIENT TASK MAPPING UNDER HETEROGENEOUS PLATFORM CONSTRAINTS
Speakers:
Edoardo Paone1, Francesco Robino2, Gianluca Palermo1, Vittorio Zaccaria1, Ingo Sander2 and Cristina Silvano1
1Politecnico di Milano, IT; 2KTH Royal Institute of Technology, SE
Abstract
When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes each application kernel for a given platform and 2) a task-mapping phase that maximizes the overall application throughput by exploiting concurrency in the application task graph. The tuning phase is suitable for customizing parameterized OpenCL kernels considering device-specific constraints. Then, the mapping phase improves task-level parallelism for multi-device execution accounting for the overhead of memory transfers, overheads implied by multiple OpenCL contexts for different device vendors. Benefits of the proposed design flow have been assessed on a stereo-matching application targeting two commercial heterogeneous platforms.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.7.3ENABLING MULTI-THREADED APPLICATIONS ON HYBRID SHARED MEMORY MANYCORE ARCHITECTURES
Speakers:
Tushar Rawat and Aviral Shrivastava, Arizona State University, US
Abstract
As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. In this paper, we present a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application, and maps it to the off-chip shared memory such that it enables execution on HSM architectures. This improves the performance of our benchmarks by 32x. Following, we identify and map the frequently accessed shared data to the on-chip shared memory. This further improves the performance of our benchmarks by 8x on average.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-8, 739MAXIMIZING COMMON IDLE TIME ON MULTI-CORE PROCESSORS WITH SHARED MEMORY
Speakers:
Chenchen Fu1, Yingchao Zhao2, Minming Li1 and Jason Xue3
1Department of Computer Science, City University of Hong Kong, HK; 2Department of Computer Science, Caritas Institute of Higher Education, Hong Kong, HK; 3City University of Hong Kong, HK
Abstract
Reducing energy consumption is a critical problem in most of the computing systems today. This paper focuses on reducing the energy consumption of the shared main memory in multi-core processors by putting it into sleep state when all the cores are idle. Based on this idea, this work presents systematic analysis of different assignment and scheduling models and proposes a series of scheduling schemes to maximize the common idle time of all cores. An optimal scheduling scheme is proposed assuming the number of cores is unbounded. When the number of cores is bounded, an efficient heuristic algorithm is proposed. The experimental results show that the heuristic algorithm works efficiently and can save as much as 25.6% memory energy compared to a conventional multi-core scheduling scheme.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:31IP3-9, 78MAXIMIZING IO PERFORMANCE VIA CONFLICT REDUCTION FOR FLASH MEMORY STORAGE SYSTEMS
Speakers:
Qiao Li1, Liang Shi2, Congming Gao1, Kaijie Wu1, Jason Chun Xue3, Qingfeng Zhuge1 and H.-M. Edwin Sha4
1Chongqing University, CN; 2College of Computer Science, Chongqing University, CN; 3City University of Hong Kong, HK; 4Chongqing University and University of Texas at Dallas, CN
Abstract
Flash memory has been widely deployed during the recent years with the improvement of bit density and technology scaling. However, a significant performance degradation is also introduced with the development trend. The latency of IO requests on flash memory storage systems is composed of access conflict latency, data transfer latency, flash chip access latency and ECC encoding/decoding latency. Studies show that the access conflict latency, which is mainly induced by the slow transfer latency and access latency, has become the dominate part of the IO latency, especially for IO intensive applications. This paper proposes to reduce the flash access conflict latency through the reduction of the transfer and flash access latencies. A latency model is built to construct the relationship among the transfer latency and access latency based on the reliability characteristics of flash memory. Simulation experiments show that the proposed approach achieves significant performance improvement.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only).

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00