Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints
Edoardo Paone1,a, Francesco Robino2,e, Gianluca Palermo1,b, Vittorio Zaccaria1,c,
Ingo Sander2,f and Cristina Silvano1,d
1Politecnico di Milano, Italy.
2KTH Royal Institute of Technology, Sweden.
When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes each application kernel for a given platform and 2) a task-mapping phase that maximizes the overall application throughput by exploiting concurrency in the application task graph. The tuning phase is suitable for customizing parameterized OpenCL kernels considering devicespecific constraints. Then, the mapping phase improves task-level parallelism for multi-device execution accounting for the overhead of memory transfers — overheads implied by multiple OpenCL contexts for different device vendors. Benefits of the proposed design flow have been assessed on a stereo-matching application targeting two commercial heterogeneous platforms.
Full Text (PDF)