DATE 2017

AFEC: An Analytical Framework for Evaluating Cache Performance in Out-of-Order Processors

Kecheng Ji¹, Ming Ling¹, Qin Wang¹, Longxing Shi¹ and Jianping Pan²
¹National ASIC System Engineering Technology Research Center, Southeast University, Nanjing 210096, China.
^ajikecheng@seu.edu.cn
^btrio@seu.edu.cn
^c220153670@seu.edu.cn
^dlxshi@seu.edu.cn
²Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada.
pan@uvic.ca

ABSTRACT

Evaluating cache performance is becoming critically important to predict the overall performance of out-of-order processors. Non-blocking caches, which are very common in out-of-order CPUs, can reduce the average cache miss penalty by overlapping multiple outstanding memory requests and merging different cache misses with the same cacheline address into one memory request. Normally, memorylevel- parallelism (MLP) has been used as a metric to describe the concurrency of memory access. Unfortunately, due to the extremely dynamic dependences among the program memory references, it is very difficult to quantify MLP without time-consuming simulations. Moreover, the merging of multiple cache misses, which makes the average cache miss service time less than the physical DDR access latency, is seldom considered in the existing researches. In this paper, we propose a cache performance evaluation framework based on program trace analysis and analytical models to fast estimate MLP and the effective cache miss service time without simulations. Comparing with the results by Gem5 simulations of MobyBench 2.0, Mibench 1.0 and Mediabench II, the average accuracy of the modeled MLP and the average cache miss service time is higher than 91% and 92%, respectively. Combined with cache misses calculated by the stack distance theory, the average absolute error of CPU stall time (due to cache misses) is lower than 10%, while the evaluation time can be sped up by 35 times relative to the Gem5 full simulations.

Full Text (PDF)