Online Monitoring and Adaptive Routing for Aging Mitigation in NoCs

Zana Ghaderi1,a, Ayed Alqahtani2 and Nader Bagherzadeh1,2
1Computer Science Department, University of California, Irvine, CA, USA.
2Department of Electrical Engineering and Computer Science, University of California, Irvine, CA, USA.


Scalability of Network-on-Chip (NoC) as a promising solution for many-core systems can be jeopardized due to reliability challenges such as aging in advanced silicon technology. Previous mitigation techniques to protect NoC are either offline, while aging is strictly influenced by runtime operating conditions, or impose significant overheads to the system. This paper presents an online monitoring method through a Centralized Aging Table (CAT) for routers in NoCs. Router's capacity in flits, which are the main stimuli in routers, is predictable and limited for a given period of time. Consequently, stress rate and temperature, which are the major sources of aging mechanisms such as Bias Temperature Instability (BTI) and Hot Carrier Injection (HCI), will be in the predictable ranges, as well. Hence, our methodology uses CAT which is populated by values that represent aging degradation for each different pairs of stress and temperature ranges during a given period of time. Furthermore, utilizing CAT, we propose an online adaptive aging-aware routing algorithm in order to avoid highly aged routers which eventually leads to age balancing between routers. Additionally, our proposed routing algorithm reduces maximum age of routers by changing the shortest paths between source-destination pairs adaptively, considering routers' ages across them in each given period of time. Extensive experimental analysis using gem5 simulator demonstrates that our online routing algorithm and monitoring methodology, CAT, improves delay degradation of maximum aged router and aging imbalance on average by 39% and 52% compared to XY routing, respectively. The impact of our proposed methodology on network latency, Energy-Delay-Product (EDP) and link utilization is negligible.

Keywords: Aging, NoCs, Monitoring, Delay degradation, Adaptive routing.

Full Text (PDF)