DATE 2019

Taming Data Caches for Predictable Execution on GPU-based SoCs

Björn Forsberg^1,a, Luca Benini^1,2,b and Andrea Marongiu^1,2,c
¹Swiss Federal Institute of Technology Zürich
^abjoernf@iis.ee.ethz.ch
²University of Bologna
^blbenini@iis.ee.ethz.ch
^ca.marongiu@iis.ee.ethz.ch

ABSTRACT

Heterogeneous SoCs (HeSoCs) typically share a single DRAM between the CPU and GPU, making workloads susceptible to memory interference, and predictable execution troublesome. State-of-the art predictable execution models (PREM) for HeSoCs prefetch data to the GPU scratchpad memory (SPM), for computations to be insensitive to CPU-generated DRAM traffic. However, the amount of work that the small SPM sizes allow is typically insufficient to absorb CPU/GPU synchronization costs. On-chip caches are larger, and would solve this issue, but have been argued too unpredictable due to self-evictions. We show how self-eviction can be minimized in GPU caches via clever managing of prefetches, thus lowering the performance cost, while retaining timing predictability.

Full Text (PDF)