DATE 2019

Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications

Arian Maghazeh¹, Sudipta Chattopadhyay², Petru Eles¹ and Zebo Peng¹
¹Department of Computer and Information Science, Linköping University, Sweden
²Singapore University of Technology and Design, Singapore

ABSTRACT

We present a software approach to address the data latency issue for certain GPU applications. Each application is modeled as a kernel graph, where the nodes represent individual GPU kernels and the edges capture data dependencies. Our technique exploits the GPU L2 cache to accelerate parameter passing between the kernels. The key idea is that, instead of having each kernel process the entire input in one invocation, we subdivide the input into fragments (which fit in the cache) and, ideally, process each fragment in one continuous sequence of kernel invocations. Our proposed technique is oblivious to kernel functionalities and requires minimal source code modification. We demonstrate our technique on a full-fledged image processing application and improve the performance on average by 30% over various settings.

Full Text (PDF)