CAMP: Accurate Modeling of Core and Memory Locality for Proxy Generation of Big-data Applications

Reena Panda1,a, Xinnian Zheng2, Andreas Gerstlauer1,c and Lizy Kurian John1,b
1The University of Texas at Austin,
areena.panda@utexas.edu
bljohn@ece.utexas.edu
cgerstl@ece.utexas.edu
2NVIDIA
xzheng1@utexas.edu

ABSTRACT


Fast and accurate design-space exploration is acritical requirement for enabling future hardware designs. How-ever, big-data applications are often complex targets to evaluate on early performance models (e.g., simulators or RTL models) owing to their complex software-stacks, significantly long run times, system dependencies and the limited speed of performance models. To overcome the challenges in benchmarking complex big-data applications, in this paper, we propose a proxy generation methodology, CAMP that can generate miniature proxy benchmarks, which are representative of the performance of big- data applications and yet converge to results quickly without needing any complex software stack support. Prior system-level proxy generation techniques model core locality features in detail, but abstract out memory locality modeling using simple stride-based models, which results in poor cloning accuracy for most applications. CAMP accurately models both core-performance and memory locality, along with modeling the feedback loop between the two. CAMP replicates core performance by modeling the dependencies between instructions, instruction types, control- flow behavior, etc. CAMP also adds a memory locality profiling approach that captures spatial and temporal locality of applications. Finally, we propose a novel proxy replay methodology that integrates the core and memory locality models to create accurate system-level proxy benchmarks. We demonstrate that CAMP proxies can mimic the original application's performance behavior and that they can capture the performance feedback loop well. For a variety of real-world big-data applications, we show that CAMP achieves an average cloning accuracy of 89%. We believe this is a new capability that can facilitate for overall system (core and memory subsystem) design exploration.



Full Text (PDF)