Taming Data Caches for Predictable Execution on GPU-based SoCs

Björn Forsberg1,a, Luca Benini1,2,b and Andrea Marongiu1,2,c
1Swiss Federal Institute of Technology Zürich
abjoernf@iis.ee.ethz.ch
2University of Bologna
blbenini@iis.ee.ethz.ch
ca.marongiu@iis.ee.ethz.ch

ABSTRACT


Heterogeneous SoCs (HeSoCs) typically share a single DRAM between the CPU and GPU, making workloads susceptible to memory interference, and predictable execution troublesome. State-of-the art predictable execution models (PREM) for HeSoCs prefetch data to the GPU scratchpad memory (SPM), for computations to be insensitive to CPU-generated DRAM traffic. However, the amount of work that the small SPM sizes allow is typically insufficient to absorb CPU/GPU synchronization costs. On-chip caches are larger, and would solve this issue, but have been argued too unpredictable due to self-evictions. We show how self-eviction can be minimized in GPU caches via clever managing of prefetches, thus lowering the performance cost, while retaining timing predictability.



Full Text (PDF)