Energy-Efficient Two-level Instruction Cache Design for an Ultra-Low-Power Multi-core Cluster
Chen Jie1, Igor Loi2, Luca Benini3,a and Davide Rossi3,b
1DEI University of Bologna Grenoble, France
jie.chen@greenwaves-technologies.com
2Research and Development GreenWaves Technologies Grenoble, France
igor.loi@greenwaves-technologies.com
3DEI University of Bologna Bologna, Italy
aluca.benini@unibo.it
bdavide.rossi@unibo.it
ABSTRACT
High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) edge devices. Exploiting cluster of multiple programmable processors has recently emerged as a suitable solution to address this challenge. However, one of the main power bottlenecks for multi-core architectures is the instruction cache memory. We propose a two-level structure based on Standard Cell Memories (SCMs) which combines a private instruction cache (L1) per-core and a low-latency (only one cycle latency) shared instruction cache (L1,5). We present a detailed comparison of performance and energy efficiency for different instruction cache architectures. Our system-level analysis shows that the proposed design improves upon both state-of-the art private and shared cache architectures and balances well performance with energy-efficacy. On average, when executing a set of real-life IoT applications, our multi-level cache improves performance and energy efficiency both by 10% with respect to the private instruction cache system, and improves energy efficiency by 15% and 7% with a performance loss of only 2% with respect to the shared instruction cache. Besides, relaxed timing makes two-level instruction cache an attractive choice for aggressive implementation, with more slack for convergence in physical design.
Keywords: Instruction Cache, Parallel Architecture, Energy Efficiency, Relaxed-Timing.