Accurate Private/Shared Classification of Memory Accesses: A Run-time Analysis System for the LEON3 Multi-core Processor

Nam Hoa, Ishraq Ibne Ashraf, Paul Kaufmannb and Marco Platznerc
Department of Computer Science, University of Paderborn, Germany.
anamh@mail.upb.de
bpaul.kaufmann@gmail.com
cplatzner@upb.de

ABSTRACT


Related work has presented simulation-based experiments to classify data accesses in a shared memory multicore into private and shared. This information can be used to selectively turn on/off cache coherency mechanisms for data blocks, which can save memory bus bandwidth, minimize energy consumption, and reduce application runtimes. In this paper we present an implementation of a private/shared classification mechanism on a LEON3 SPARC multi-core processor running the Linux 2.6 kernel. Our mechanism is paged-based and allows for classifying and counting data accesses at run-time. Compared to previous work, our system provides more accurate, i.e., realistic, data as it includes a real multi-core architecture and an OS. Additionally, our prototype allows us to quantitatively evaluate the overhead for the classification mechanism. We test our system with sequential and parallel benchmarks from the Mibench, ParaMibench, PARSEC, and SPLASH2 application suites. The results show that parallel benchmarks are promising targets for selectively controlling coherency mechanisms and that the run-time overheads induced by our mechanism are rather small.



Full Text (PDF)