Low-Power Variation-Aware Cores based on Dynamic Data-Dependent Bitwidth Truncation

Ioannis Tsiokanosa, Lev Mukhanovb and Georgios Karakonstantisc
Institute of Electronics, Communications and Information Technology (ECIT), School of EEECS, Queen’s University Belfast
aitsiokanos01@qub.ac.uk
bl.mukhanov@qub.ac.uk
cg.karakonstantis@qub.ac.uk

ABSTRACT


Increasing variability of transistor parameters in nanoscale era renders modern circuits prone to timing failures. To address such failures, designers adopt pessimistic timing/voltage guardbands, which are estimated under rare worst-case conditions, thus leading to power and performance overheads. Recent approximation schemes based on precision reduction may help to limit the incurred overheads, but the precision is reduced statically in all operations. This results in unnecessary quality loss, since these schemes neglect the fact that only few long latency paths (LLPs) may be prone to failures, and such paths may be activated rarely. In this paper, we propose a variationaware framework that minimizes any quality loss by dynamically truncating the bitwidth only for operands triggering the LLPs. This is achieved by predicting at runtime the excitation of the LLPs based on the processed operands. The applied truncation, which we implement by setting a number of least-significant bits to a constant value of zero, can effectively reduce the delay of the excited LLPs, providing sufficient timing slack to avoid failures without using conservative guardbands. To facilitate the adoption of such a scheme within pipelined cores and limit the incurred overheads, we also shape the path distribution appropriately for isolating the LLPs in a single pipeline stage. Additionally, to evaluate the efficacy of our framework, we perform postlayout dynamic timing analysis based on real operands that we extract from a variety of applications. When applied to the implementation of an IEEE-754 compatible double precision floating-point unit (FPU) in a 45nm technology, our approach eliminates timing failures under 8% delay variations with no performance loss. Our design comes at a cost of up-to 4.48% power and 0.34% area overheads, while the occasional operand truncation incurs minimal quality-loss in terms of relative error, up-to 4.1 . 10-6. Finally, when compared to an FPU with pessimistic margins, our technique can save up-to 44.3% power.



Full Text (PDF)