AFEC: An Analytical Framework for Evaluating Cache Performance in Out-of-Order Processors
Kecheng Ji1, Ming Ling1, Qin Wang1, Longxing Shi1 and Jianping Pan2
1National ASIC System Engineering Technology Research Center, Southeast University, Nanjing 210096, China.
ajikecheng@seu.edu.cn
btrio@seu.edu.cn
c220153670@seu.edu.cn
dlxshi@seu.edu.cn
2Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada.
pan@uvic.ca
ABSTRACT
Evaluating cache performance is becoming critically important to predict the overall performance of out-of-order processors. Non-blocking caches, which are very common in out-of-order CPUs, can reduce the average cache miss penalty by overlapping multiple outstanding memory requests and merging different cache misses with the same cacheline address into one memory request. Normally, memorylevel- parallelism (MLP) has been used as a metric to describe the concurrency of memory access. Unfortunately, due to the extremely dynamic dependences among the program memory references, it is very difficult to quantify MLP without time-consuming simulations. Moreover, the merging of multiple cache misses, which makes the average cache miss service time less than the physical DDR access latency, is seldom considered in the existing researches. In this paper, we propose a cache performance evaluation framework based on program trace analysis and analytical models to fast estimate MLP and the effective cache miss service time without simulations. Comparing with the results by Gem5 simulations of MobyBench 2.0, Mibench 1.0 and Mediabench II, the average accuracy of the modeled MLP and the average cache miss service time is higher than 91% and 92%, respectively. Combined with cache misses calculated by the stack distance theory, the average absolute error of CPU stall time (due to cache misses) is lower than 10%, while the evaluation time can be sped up by 35 times relative to the Gem5 full simulations.