A Mechanism for Energy-Efficient Reuse of Decoding and Scheduling of x86 Instruction Streams

Marcelo Brandaleroa and Antonio Carlos S. Beckb
Instituto de Informtica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil.


Current superscalar x86 processors decompose each CISC instruction (variable-length and with multiple addressing modes) into multiple RISC-like mops at runtime so they can be pipelined and scheduled for concurrent execution. This challenging and power-hungry process, however, is usually repeated several times on the same instruction sequence, inefficiently producing the very same decoded and scheduled mops. Therefore, we propose a transparent mechanism to save the decoding and scheduling transformation for later reuse, so that next time the same instruction sequence is found it can automatically bypass the costly pipeline stages involved. We use a coarse-grained reconfigurable array as a means to save this transformation, since its structure enables the recovery of mops already allocated in time and space, and also larger ILP exploitation than superscalar processors. The technique can reduce the energy consumption of a powerful 8-issue superscalar by 31.4% at low area costs, while also improving performance by 32.6%.

Keywords: x86, Superscalar, Dynamic optimization, Instruction-level parallelism.

Full Text (PDF)