DATE 2017

Effective Cache Bank Placement for GPUs

Mohammad Sadrosadati^1,a, Amirhossein Mirhosseini², Shahin Roozkhosh^1,b, Hazhir Bakhishi^1,c and Hamid Sarbazi-Azad^1,3,d,e
¹Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
^asadrosadati@ce.sharif.edu
^broozkhosh@ce.sharif.edu
^cbakhishi@ce.sharif.edu
^dazad@sharif.edu
^eazad@ipm.ir
²Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, USA.
miramir@umich.edu
³Computer Science School, Institute for Researches in Fundamental Sciences, Tehran, Iran

ABSTRACT

The placement of the Last Level Cache (LLC) banks in the GPU on-chip network can significantly affect the performance of memory-intensive workloads. In this paper, we attempt to offer a placement methodology for the LLC banks to maximize the performance of the on-chip network connecting the LLC banks to the streaming multiprocessors in GPUs. We argue that an efficient placement needs to be derived based on a novel metric that considers the latency hiding capability of the GPUs through thread level parallelism. To this end, we propose a throughput aware metric, called Effective Latency Impact (ELI). Moreover, we define an optimization problem to formulate our placement approach based on the ELI metric mathematically. To solve this optimization problem, we deploy a heuristic solution as this optimization problem is NP-hard. Experimental results show that our placement approach improves the performance by up to 15.7% compared to the state-of-the-art placement.

Full Text (PDF)