doi: 10.3850/978-3-9815370-4-8_0303
Exploiting Dynamic Timing Margins in Microprocessors for Frequency-Over-Scaling with Instruction-Based Clock Adjustment
Jeremy Constantin1,a, Lai Wang2, Georgios Karakonstantis1,b, Anupam Chattopadhyay3 and Andreas Burg1,c
1Telecommunications Circuits Laboratory, Institute of Electrical Engineering, EPFL, Switzerland.
ajeremy.constantin@epfl.ch
bgeorgios.karakonstantis@epfl.ch
candreas.burgg@epfl.ch
2MPSoC Architectures Research Group, UMIC, RWTH Aachen University, Germany
3School of Computer Engineering, NTU, Singapore
ABSTRACT
Static timing analysis provides the basis for setting the clock period of a microprocessor core, based on its worst-case critical path. However, depending on the design, this critical path is not always excited and therefore dynamic timing margins exist that can theoretically be exploited for the benefit of better speed or lower power consumption (through voltage scaling). This paper introduces predictive instruction-based dynamic clock adjustment as a technique to trim dynamic timing margins in pipelined microprocessors. To this end, we exploit the different timing requirements for individual instructions during the dynamically varying program execution flow without the need for complex circuit-level measures to detect and correct timing violations. We provide a design flow to extract the dynamic timing information for the design using post-layout dynamic timing analysis and we integrate the results into a custom cycle-accurate simulator. This simulator allows annotation of individual instructions with their impact on timing (in each pipeline stage) and rapidly derives the overall code execution time for complex benchmarks. The design methodology is illustrated at the microarchitecture level, demonstrating the performance and power gains possible on a 6-stage OpenRISC in-order general purpose processor core in a 28nm CMOS technology. We show that employing instruction-dependent dynamic clock adjustment leads on average to an increase in operating speed by 38% or to a reduction in power consumption by 24%, compared to traditional synchronous clocking, which at all times has to respect the worst-case timing identified through static timing analysis.
Full Text (PDF)
|