Effective Cache Bank Placement for GPUs

Mohammad Sadrosadati1,a, Amirhossein Mirhosseini2, Shahin Roozkhosh1,b, Hazhir Bakhishi1,c and Hamid Sarbazi-Azad1,3,d,e
1Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
asadrosadati@ce.sharif.edu
broozkhosh@ce.sharif.edu
cbakhishi@ce.sharif.edu
dazad@sharif.edu
eazad@ipm.ir
2Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, USA.
miramir@umich.edu
3Computer Science School, Institute for Researches in Fundamental Sciences, Tehran, Iran

ABSTRACT


The placement of the Last Level Cache (LLC) banks in the GPU on-chip network can significantly affect the performance of memory-intensive workloads. In this paper, we attempt to offer a placement methodology for the LLC banks to maximize the performance of the on-chip network connecting the LLC banks to the streaming multiprocessors in GPUs. We argue that an efficient placement needs to be derived based on a novel metric that considers the latency hiding capability of the GPUs through thread level parallelism. To this end, we propose a throughput aware metric, called Effective Latency Impact (ELI). Moreover, we define an optimization problem to formulate our placement approach based on the ELI metric mathematically. To solve this optimization problem, we deploy a heuristic solution as this optimization problem is NP-hard. Experimental results show that our placement approach improves the performance by up to 15.7% compared to the state-of-the-art placement.



Full Text (PDF)