Multi-Armed Bandits for Efficient Lifetime Estimation in MPSoC design

Calvin Maa, Aditya Mahajanb and Brett H. Meyerc
Department of Electrical and Computer Engineering, McGill University, Montréal, Québec, Canada.


Reliability in integrated circuits is becoming a critical issue with the miniaturization of electronics. Smaller process technologies have led to higher power densities, resulting in higher temperatures and earlier device wear-out. One way to mitigate failure is by over-provisioning resources and remapping tasks from failed components to components with spare capacity, or slack. Since the slack allocation design space is large, finding the optimal is difficult, as brute-force approaches are impractical. During design space exploration, device lifetimes are typically evaluated using Monte-Carlo Simulation (MCS) by sampling each design equally; this method is inefficient since poor designs are evaluated as accurately as good designs. A better method will focus sampling time on the designs that are difficult to distinguish, reducing the time required to evaluate a set of designs; this can be accomplished using Multi-armed Bandit (MAB) Algorithms. This work demonstrates that MAB achieve the same level of accuracy as MCS in 1.45 to 5.26 times fewer samples.

Full Text (PDF)