# An MILP-Based Aging-Aware Routing Algorithm for NoCs

Kshitij Bhardwaj, Koushik Chakraborty, and Sanghamitra Roy

Electrical and Computer Engineering, Utah State University kshitij.bhardwaj@aggiemail.usu.edu, {koushik.chakraborty, sanghamitra.roy}@usu.edu

Abstract-Network-on-Chip (NoC) architectures have emerged as a better replacement of the traditional bus-based communication in the many-core era. However, continuous technology scaling has made aging mechanisms such as Negative Bias Temperature Instability (NBTI) and electromigration primary concerns in NoC design. In this paper<sup>1</sup>, we propose a novel system-level aging model to model the effects of asymmetric aging in NoCs. We observe a critical need of a holistic aging analysis, which when combined with power-performance optimization, poses a multi-objective design challenge. To solve this problem, we propose a Mixed Integer Linear Programming (MILP)based aging-aware routing algorithm that optimizes the various design constraints using a multi-objective formulation. After an extensive experimental analysis using real workloads, we observe a 62.7%, 46% average overhead reduction in network latency and Energy-Delay-Product-Per-Flit (EDPPF) and a 41% improvement in Instructions Per Cycle (IPC) using our agingaware routing algorithm.

# I. INTRODUCTION

Network-on-Chip (NoC) architectures have emerged as a scalable and reliable alternative to the traditional bus-based communication paradigms [1]. However, with continuous scaling of CMOS technologies, reliability has become a primary concern in NoC designs. Various aging mechanisms such as *Negative Bias Temperature Instability (NBTI)* and *Electromigration* play a major role in limiting the device lifetime. As the NoC performance directly dictates the system level performance, it is critical to address reliability challenges in the NoC efficiently [2].

An NoC architecture comprises two major components: router and link. A pipelined NoC router consists of both combinational logic structures (e.g. virtual channel allocation logic) and storage-cell structures (e.g. virtual channels). Due to the presence of these structures, NBTI is the major aging mechanism associated with NoC routers [3]. NoC links, on the other hand, are implemented using repeated copper interconnects [4]. Therefore, NBTI (repeaters) and electromigration (copper interconnects) are the two primary aging problems associated with NoC links.

Recent literature comprises a few circuit/architecture techniques to alleviate the effects of NBTI on NoCs. Kodi et al. have proposed adaptive inter-router energy and area-efficient links for NoCs [5]. Fu et al. proposed enhancements of circuits, router microarchitecture, and inter-router reliability in the presence of NBTI [3]. However, these techniques have two major limitations: (a) they omit NoC links in their reliability analysis, focusing on routers only; (b) they fail to inspect the impact of NoC aging on application performance. We make the following contributions in this paper:

- We perform a holistic aging analysis for NoCs, where we model and show the impact of NBTI and electromigration on the performance of an NoC-based multicore system. Our work integrates SPICE level process variation and aging analysis, circuit-level statistical timing analysis, and full system architectural simulation (Section IV).
- We formulate a comprehensive system-level aging model for NoC routers and links. This model considers the effects of asymmetric aging in routers and links during program execution (Section III).
- We propose a Mixed Integer Linear Programming (MILP) based aging-aware oblivious routing algorithm (Section VI). This algorithm combines aging-aware constraints with other power-performance constraints and optimizes them using a multi-objective formalized approach. *To the best of our knowledge, ours is the first work on comprehensive aging-aware routing algorithms for NoCs.*
- An extensive experimental analysis using the GARNET NoC simulator [6] and real workloads (PARSEC benchmarks [7]) indicates an average of 62.7% and 46% overhead reduction in the network latency and Energy-Delay-Product-Per-Flit (EDPPF) [8]. We also obtain an average improvement of 41% in Instructions Per Cycle (IPC) using our proposed algorithm (Section VII).

## II. MOTIVATION

In this section, we show the importance of considering the aging effects, especially both NBTI and electromigration in links, for realistic robustness estimation of NoCs.

To evaluate the aging effects on links, we perform an experiment using a simple NoC architecture that comprises two routers connected by a link. We analyze the network latency under four different schemes with moderately heavy utilization:

TABLE I: Different Schemes

| Scheme    | Degradation in Routers | Degradation in Link       |
|-----------|------------------------|---------------------------|
| r(N)      | NBTI                   | NONE                      |
| l(N)r(N)  | NBTI                   | NBTI                      |
| l(E)r(N)  | NBTI                   | Electromigration          |
| l(NE)r(N) | NBTI                   | NBTI and Electromigration |

## A. Analyzing Network Latency

Figure 1(a) shows the variation of network latency with time. As is evident, l(NE)r(N) estimates the highest network latency. The consideration of both NBTI and electromigration in l(NE)r(N) degrades the link more, thereby causing this substantial increase. Network latency is least affected in r(N) as there is no link degradation and the small increase is only due to NBTI degradation in routers.



Fig. 1: NoC Reliability: Impact of NBTI and Electromigration; Buffer utilization (flits/cycle) in  $4 \times 4$  NoC Mesh.

## B. Effect on Fault-Tolerance of NoC

Due to the continuous degradation in network latency, the NoC will be rendered faulty after a point in time. We assume that a network becomes faulty when the network latency exceeds a pre-defined threshold (40% in our study). Figure 1(b) shows the time taken for the network to become faulty under different scenarios. For example, under l(NE)r(N), a network can be rendered faulty in 3 years. However, using r(N) grossly over-estimates the time to failure (more than eight years). In reality, the copper wires are likely to substantially degrade in performance by that time.

# III. SYSTEM-LEVEL AGING MODEL FOR NOC

In an NoC, every router and link are utilized in different amounts, some more than others. This router/link utilization depends on the amount of communication between different processing cores. For example, Figure 1(c) shows the variation in buffer utilization (flits/cycle) in a  $4 \times 4$  NoC mesh. These utilization were obtained using the GARNET NoC simulator when traffic is generated by the *canneal* benchmark run.

In this section, we derive a system-level model to find the relationship between router/link utilization and the amount of stress experienced during NBTI/electromigration degradation. For this purpose, we introduce a novel metric called *Traffic Acceptance Capacity (TAC), defined as the fraction of the nominal traffic<sup>2</sup> that a stressed router/link should accept.* 

Based on the model in this section, a heavily utilized router/link that experiences maximum stress must accept the least traffic, i.e. its TAC value must also be low. If the limit imposed by TAC is exceeded in a router undergoing maximum degradation, it is rendered faulty. We now model this metric and also show its significance through an example in the next section.

## A. Modeling Effects of NBTI on NoC Routers

Due to the presence of both combinational and storage circuitry in NoC routers, the effects of NBTI on the performance of these routers cannot be ignored [3]. In this section, we model these effects using the long-term NBTI model [9]:

$$\Delta V_{th} \approx \left(\frac{n^2 K_v^2 \alpha C t_1 t}{\xi^2 t_{ox}^2 (1-\alpha)}\right)^n \tag{1}$$

<sup>2</sup>Nominal traffic is the traffic across routers/links when they are unstressed.

where  $(\Delta V_{th})$  denotes the threshold voltage change in a PMOS, and other details of the parameters are:

• 
$$K_v: \left(\frac{qt_{ox}}{\epsilon_{ox}}\right)^3 K^2 C_{ox} (V_{gs} - V_{th}) \sqrt{C} exp\left(\frac{2E_{ox}}{E_o}\right)$$

- Cox: oxide capacitance per unit area
- q: electron charge; t: Aging period (in seconds)
- $T_{clk}$ : 0.33ns; C:  $T_o^{-1}.exp(-E_a/kT)$
- $E_a(eV)$ : 0.49; T: Temperature; n: 0.166
- $\alpha$ : duty cycle;  $t_1$ :  $10^{-4}$ ;  $\xi$ : 0.95
- $t_{ox}$ : 1.75E-9;  $E_o(V/nm)$ : 0.335

To find the TAC of a stressed router, we use a similar analysis as in [10] that estimates the workloads across the stressed cores by considering delay variations.

1) Analyzing Delay Variation in a Stressed Router: In an NoC system, different routers can experience a wide variation in performance degradation due to the combined effect of process variation and NBTI aging. Fundamentally, the TAC of a stressed router is estimated by comparing its performance degradation, measured as the delay variation, with that of the router experiencing the maximum performance degradation.

To estimate the delay variation in a stressed router, we extend the gate delay model in [11] to the critical path delay model. After perturbing  $V_{th}$  as  $V_{th} = V_{th0} + \Delta V_{th}$ , the *i*-th critical path delay can be written as:

$$delay_i = delay_i(V_{th0}, L_{eff}) + \left(\frac{\delta delay_i}{\delta V_{th}}\right) \Delta V_{th}$$
 (2)

where delay  $delay_i(V_{th0}, L_{eff})$  is modeled as a Gaussian distribution with  $V_{th0}$  and  $L_{eff}$  as the nominal threshold voltage and channel length. As there can be many critical paths in a single router, the critical path with the biggest variation is used in the calculation of TAC. We analyze all the routers in the system and estimate their biggest critical path variation  $(\Delta delay_r)$ . The TAC of a given router is then estimated by comparing its delay variation with that of the router with the worst variation. However, in case the worst router experiences more than  $3\sigma_{delay}$  variation, which statistically covers 99.7% of all delay variation for TAC estimation. Section IV describes our experimental setup in more detail.

$$\Delta delay_r = min(max_i((\delta delay_i/\delta V_{th})\Delta V_{th}), 3\sigma_{delay})[10]$$
<sup>(3)</sup>

To relate the delay variation with TAC, we use the percentage model proposed in [10]. In this model, when the delay has

zero variation, the value of TAC is 100% and when the delay variation is maximum (at  $3\sigma_{delay}$ ), the router must not accept any traffic (TAC = 0). Hence,

$$TAC_r = 1 - \left(\frac{\Delta delay_r}{3\sigma_{delay}}\right) \tag{4}$$

where  $TAC_r$  is the Traffic Acceptance Capacity of the stressed router due to delay variation. We use a similar approach to model Traffic Acceptance Capacity of NoC links next.

## **B.** Modeling NBTI and electromigration in NoC Links

NoC links are modeled as repeated copper interconnects and therefore suffer from two different types of stresses: a) NBTI stress that increases the repeater resistance [13] and b) electromigration stress due to the use of barrier layers in copper interconnects that increases the wire resistance [14].

1) Analyzing Delay Variation in Stressed Links: To model the propagation delay of a repeated interconnect in the presence of NBTI and electromigration stress, we include the increase in wire resistance due to electromigration in the NBTIaware delay model given in [13]. Therefore, the propagation delay of the link under both NBTI and electromigration is:

$$delay_{link_{stress}} = kT_d + p(0.69) \left(C_d + \frac{C_w}{k} + C_g\right) \Delta R_o$$
$$+ p(0.69) \left(\frac{C_g}{k}\right) \Delta R_w + p(0.38) \left(\frac{C_w}{k^2}\right) \Delta R_w$$
(5)

where k is the number of repeaters,  $R_o$  is the repeater resistance,  $C_d$  is the output drain diffusion capacitance of the repeater,  $C_g$  is the input gate capacitance of the repeater,  $R_w$  is the wire resistance,  $C_w$  is the wire capacitance, p is the number of stressed repeaters,  $T_d$  is the original unstressed delay,  $\Delta R_o$ is the increase in repeater resistance due to NBTI and  $\Delta R_w$ is the increase in wire resistance due to electromigration.

The variability of repeater resistance with the threshold voltage ( $\Delta V_{th}$ ) due to NBTI is given as [13]:

$$\frac{\delta R_o}{\delta V_{th}} = g \left[ \frac{2 + (V_{GS} - |V_{th}| - \Delta V_{th})(\frac{\mu}{2.v_{sat}.L} + \theta)}{\frac{1}{2}\mu C_{ox}\frac{W}{L}(V_{GS} - |V_{th}| - \Delta V_{th})^3} \right]$$
(6)

where

$$g = \frac{3}{4} V_{dd} \left( 1 - \frac{7}{9} \lambda V_{dd} \right) \tag{7}$$

 $\Delta V_{th}$  is given by Equation 1 and rest of the symbols are similar to those used in [13].

The variability of the wire resistance due to electromigration stress is modeled as [14]:

$$\Delta R_w = \frac{2R_w \frac{\gamma}{A_0} D_0^{\frac{1}{2}} t^{\frac{1}{2}} e^{\frac{-Q_a}{2RT_a}}}{1 - 2\frac{\gamma}{A_0} D_0^{\frac{1}{2}} t^{\frac{1}{2}} e^{\frac{-Q_a}{2RT_a}}}$$
(8)

where t is the stressed duration,  $T_a$  is the absolute temperature of the wire,  $D_0$  is the frequency factor in copper oxide (6.5E-7  $m^2/s$ ), R is the gas constant (8.31 J/mole K)),  $A_0$  is the initial height of bare copper wire (400E-9 m)) and  $Q_a$  is the activation energy in copper oxide (1.64E5 J/mole).

We find the effective delay variation by comparing the delay variation with its  $3\sigma_{delay_{link}}$  value.

$$\Delta delay_l = min(delay_{link_{stress}} - T_d, 3\sigma_{delay_{link}}) \quad (9)$$

After calculation of the effective delay variation, we use the percentage model to evaluate the  $TAC_l$  for the stressed link:

$$TAC_l = 1 - \left(\frac{\Delta delay_l}{3\sigma_{delay_{link}}}\right) \tag{10}$$

Next, we use our system-level model to determine the effect of aging on the delay experienced by a flit<sup>3</sup> due to stressed routers and stressed links.

#### C. Effect of Aging on Flit Delay

The performance of a system is directly related to the delay in the system. Therefore, we now model the effects of asymmetric aging in routers and links on the delay experienced by the flits. This modeling is used in formulating our MILP-based aging-aware routing algorithm (Section VI) and in the end-to-end analysis of aging impact on the performance of NoC-based multicore systems (Section VII).

1) Total Delay Calculation due to NoC Routers and Links: The delay seen by a flit, from the time when it is buffered into the input buffers of an NoC Router till it is allocated an output link can be described as the delay due to an unstressed NoC router or  $dr_{us}$ . Similarly we define the unstressed delay due to an NoC link as the time taken by a flit to travel the link to reach the router at the other end or  $dl_{us}$ . Now if a router/link is under aging stress, the same flit will experience a higher delay as signified by Equations 2 and 5. Therefore, the delay due to a stressed router and link can be formulated as:

$$dr_s = dr_{us} + \Delta d_r; \quad dl_s = dl_{us} + \Delta d_l$$

where  $\Delta d_r$  is given by  $\Delta delay_r$  in Equation 3 and  $\Delta d_l$  is given by  $\Delta delay_l$  in Equation 9.  $\Delta d_r$  and  $\Delta d_l$  for different routers and links will be different based on their utilization.

## IV. EXPERIMENTAL METHODOLOGY

Our experimental setup combines SPICE level analysis for process variation and NBTI aging, statistical timing analysis using synthesized Verilog for NoC routers, and full system architectural simulation. The effect of process variation and NBTI aging in basic logic gates are performed through Synopsys *HSPICE*, using Predictive Technology Models (PTM) [15], and long term degradation due to NBTI [9]. On each of these gates, we run 10K Monte Carlo simulations to obtain respective statistical distribution of their performance characteristics. Using these gates at 45nm technology, we synthesize the NoC router RTL obtained from Stanford University's opensource NoC router resources [16]. Subsequently, we perform a statistical timing analysis to find various critical paths in the router, and their delay distributions under the combined effect of process variation and NBTI aging.

For architectural simulation, we use the GARNET NoC simulator, embedded inside GEMS [17]. GARNET uses the ORION power model [18] to calculate power consumptions of the routers and the links. In our experiment, we consider a system with 16 processors in a  $4 \times 4$  mesh topology. Each

<sup>&</sup>lt;sup>3</sup>The flit stands for Flow Control Unit and is the smallest unit that is transmitted over an NoC link in a pipelined fashion.

processor has a dual issue 32 entry out-of-order issue window and a private L1 cache (2-way, 32 KB, response latency: 3 cycles) and a shared L2 cache (4-way, 2 MB, response latency: 15 cycles). Workloads:For traffic generation purposes, we use PARSEC benchmarks with 16 threads pinned to cores.

# V. TAC: A CASE STUDY

In this section, we present a motivational example to show that the stressed NoC routers and links have different TAC values depending on their utilization.

# A. Estimating TAC

In order to calculate different TAC values, we first gather router and link utilization by running different benchmarks on the GEMS-GARNET setup. Once the utilization is known, we consider the top 8 routers and links (Figure 1(c)) that have the highest utilization. As these selected routers and links are the most heavily utilized, they have higher power consumptions and temperatures and therefore will be more affected by aging as compared to others. Also, as these selected routers and links have different utilization among themselves, they will be under differential stress and will consequently have different TACvalues. These TAC values are calculated using our systemlevel aging model (Section III).

# B. Results

We ran four PARSEC benchmarks to generate traffic in the GARNET simulator. Tables II and III show the different  $TAC_r$  and  $TAC_l$  for different benchmark traffic. As is evident from both the tables, TAC values for one stressed router/link differs from the other. For example, In Table II, when traffic is generated by the *Canneal* benchmark run, router R7 (least utilized among stressed routers) has the maximum  $TAC_r$  and R1 (most utilized among stressed routers) has the minimum among the 8 stressed routers.  $TAC_r$  for other stressed routers lie between these two extreme cases which are separated by the range, calculated as the percentage of average TAC values. This experiment provides the following observations:

• Each stressed router/link has different *TAC* values.

• A stressed router/link which is utilized heavily should accept less traffic as compared to some other less utilized router/link. Therefore, *TAC* values can be used as a measure of the amount of stress on NoC routers and links.

| T/ | ۱B | LE | II: 7 | $CAC_r$ | for | different | benc | hmark | (s ( | %) | ) |
|----|----|----|-------|---------|-----|-----------|------|-------|------|----|---|
|----|----|----|-------|---------|-----|-----------|------|-------|------|----|---|

|           | ,           |             |              | ,     |
|-----------|-------------|-------------|--------------|-------|
| Benchmark | Max $TAC_r$ | Min $TAC_r$ | Avg. $TAC_r$ | Range |
| Canneal   | 82.3        | 68.5        | 73.4         | 18.80 |
| Dedup     | 73.13       | 43.75       | 53.6         | 54.81 |
| Facesim   | 63.5        | 29.24       | 39.67        | 86.36 |
| Ferret    | 65.06       | 31.35       | 40.24        | 83.77 |
|           |             |             |              |       |

# TABLE III: $TAC_l$ for different benchmarks (%)

| Benchmark | Max $TAC_l$ | Min $TAC_l$ | Avg. $TAC_l$ | Range |
|-----------|-------------|-------------|--------------|-------|
| Canneal   | 85.22       | 69.9        | 78.4         | 19.54 |
| Dedup     | 80.60       | 49.98       | 65.44        | 46.79 |
| Facesim   | 81.28       | 59.79       | 69.10        | 31.09 |
| Ferret    | 81.95       | 51.22       | 66.50        | 46.21 |

# VI. MILP-BASED AGING-AWARE ROUTING

In this section, we outline a routing algorithm that mitigates aging effects on NoC-based multicore system performance with minimal overhead. Simply meeting *TAC* limits can substantially improve NoC reliability, but comes at a high cost of other design constraints. To effectively perform a multiobjective design space exploration, we use an optimization framework based on MILP to formulate an aging-aware oblivious routing.

In our algorithm, we use router and link utilization from the traffic profiling. An application's communication characteristics (e.g. link and router utilization) might change at runtime. To mitigate this problem, it is possible to create several routing schemes using our optimization framework for different traffic scenarios. At runtime, the system can periodically monitor the traffic pattern and choose the routing scheme that best matches the observed network traffic characteristics.

## A. Traffic Threshold Calculation

In order to control traffic across a stressed router, traffic flowing through the links that input to the router must be controlled. For example, if router R5 in Figure 1(c) is stressed, then the traffic across the input links (L15, L21, L22, L24) must be controlled such that R5 meets its *TAC limit*.

Therefore, the input links to a stressed router, whether stressed or unstressed, must have an upper bound on the traffic that they can accept. In the case of stressed input links, this bound should be less than or equal to their  $TAC_l$  limits.

# **B.** Variable Definitions

These are the various variables used in the MILP:

- 1) Variable to indicate the flow of flits.  $F^k$ : flow between some (source, destination) pair (s, d).
- Variable to indicate the amount of flits flowing through a link due to some flow.
   FL<sub>j</sub><sup>k</sup>: Amount of flits flowing through a link j due to flow F<sup>k</sup>, measured in flits.
   P<sub>j</sub><sup>k</sup>: Amount of flits flowing through a link j due to flow F<sup>k</sup>, measured in flits/cycle (link utilization).
- 3) Variable to show if a link j is utilized for flow  $F^k$ .

$$U_j^k = \begin{cases} 1 & \text{if } FL_j^k > 0\\ 0 & \text{if } FL_j^k = 0 \end{cases}$$

- 4) Variables to indicate the set of input links (I(R)) and the set of output links (O(R)) for a router R.
- 5) Variables to indicate the total number of flits that comprise a flow  $F^k$  (Capacity of flow),  $C_k$  and total number of flits across all flows, TC.
- 6) Variable to indicate the total number of hops for  $F_k$ .  $hp^k = \sum_{j \in L_T} U_j^k - 1$
- Variable to formulate total delay across all the links. We use the delay-per-flit across the stressed and unstressed links, as described in Section III-C1.

$$TLD = \sum_{k \in T_f} \sum_{j \in L_T} FL_j^k * dl_j$$

where  $T_f$  represents all flows in the network and  $dl_j$  is the delay-per-flit of link j:

$$dl_j = \begin{cases} dl_{usj} & \text{if } j \in L_T - L_{stress} \\ dl_{sj} & \text{if } j \in L_{stress} \end{cases}$$

8) Variable to show if a router i is utilized for a flow  $F_k$ 

γ

$$_{i}^{k} = \begin{cases} 1 & \text{if } \sum_{j \in O(i)} U_{j}^{k} = 1 \\ 0 & \text{if } \sum_{j \in O(i)} U_{j}^{k} = 0 \end{cases}$$

 Variable for delay across all the routers. Here also we use delay-per-flit across the stressed and unstressed routers as in Section III-C1.

$$TRD = \sum_{k \in T_f} C_k * \left(\sum_{i \in R_T} dr_i * r_i^k\right)$$

where  $dr_i$  is the delay-per-flit of each router *i*:

$$dr_i = \begin{cases} dr_{us} & \text{if } i \in R_T - R_{stress} \\ dr_s & \text{if } i \in R_{stress} \end{cases}$$

10) Variable to indicate total delay.

$$TD = TLD + TRD$$

- 11) Link utilization for a flow  $F^k$  in flits per cycle:  $P^k_j = F L^k_j / T D \label{eq:prod}$
- 12) Variable to indicate energy-per-flit for a flow  $F^k$   $(E_{flit}^k)$ . Assuming each flit is composed of *n* bits, we use the energy model given in [19] to model  $E_{flit}^k$ :

$$E_{flit}^{k} = n * hp^{k} * E_{sbit} + n * (hp^{k} - 1) * E_{lbit}$$

where  $E_{sbit}$  and  $E_{lbit}$  are the energy consumed by the router switch and energy consumed by the links when 1 bit of data is transported through the router. Therefore, the total energy consumed due to all the flows is:

$$TE = TC \sum_{k \in T_f} E_{flit}^k$$

13) Variable to indicate EDPPF, defined as the product of total energy and total delay-per-flit.

$$EDPPF = (TE * TD)/TC$$

14) In order to avoid deadlock or live-lock, we use a turn prohibition model [20]. In a two-dimension NoC topology, a flit can follow eight different turns. According to the directions of the input and output links, turns can be categorized as: west-north (WN), north-east (NE), eastsouth (ES), south-west (SW) in the clockwise direction and west-south (WS), south-east (SE), east-north (EN), north-west (NW) in the counter-clockwise direction.

## C. Constraints

We now discuss the different constraints:

1) Every link that is utilized for a flow  $F^k$  must operate under its threshold value  $(T_j)$ . This threshold is calculated using the *TAC* values as described in Section VI-A.

$$\sum_{k \in T_f} P_j^k \le T_j \quad \forall j \in L_T$$

2) The number of hop counts for a flow  $F^k$  must be less than a maximum limit  $H_k$ .

$$hp^k \leq H_k \quad \forall k \in T_f$$

3) There should be a single path between a (s, d) pair. Conservation of flow is also required. Therefore  $\forall k \in T_f$ , for a source router  $R_s$  and destination router  $R_d$ ,

$$\sum_{j \in I(R_s)} U_j^k = 0 \sum_{j \in O(R_s)} U_j^k = 1 \sum_{j \in I(R_d)} U_j^k = 1 \sum_{j \in O(R_d)} U_j^k = 0$$

For an intermediate router  $R_i$ ,

$$\sum_{j \in I(R_i)} U_j^k = \sum_{j \in O(R_i)} U_j^k$$

4) In order to avoid deadlocks, some of the turns discussed in the previous section need to be prohibited. Therefore, we also include the constraints due to the prohibition turn model described in [20] in our MILP formulation.

TABLE IV: Cost Functions to Minimize

| Description                                  | Variables Involved                            |
|----------------------------------------------|-----------------------------------------------|
| Number of links utilized by a flow           | $\sum_{j \in L_T} U_j^k$                      |
| Total number of links utilized for all flows | $\sum_{k \in T_f} \sum_{j \in L_T}^{r} U_j^k$ |
| Total delay due to all routers               | TRD                                           |
| Total delay due to all links                 | TLD                                           |
| Total energy across all flows                | TE                                            |
| Energy-Delay-Product-Per-Flit                | EDPPF                                         |
|                                              |                                               |

## **D.** Cost Functions

Different objective functions for the routing algorithm that need to be minimized are shown in Table IV.

We use CPLEX [21] to solve this multi-objective MILP and GARNET to obtain the network parameters.

# VII. EXPERIMENTAL RESULTS

In this section, we study the system level (multicore) overheads of mitigating aging degradation in an NoC.

## A. Comparative Schemes

We use three different schemes to show the importance of an aging aware NoC design and the critical need for a formalized approach in this pursuit.

- NO-AGE: We model a system without aging awareness in NoC routing to: (a) quantify its resultant impact on system robustness; and (b) establish a baseline to estimate powerperformance overheads for aging aware routing algorithms in NoC (described next). This scheme employs deterministic XY routing algorithm [22].
- **TAC-AGE**: This scheme models asymmetric aging effects using the system-level aging model described in Section III. This scheme obeys *TAC* limits, but are not optimized for other design constraints. Here, also XY routing is used.
- MIP-ROUT: This scheme extends TAC-AGE and uses our MILP-based aging aware routing algorithm to mitigate the aging effects on performance and energy.

## **B.** Robustness Evaluation

In order to show the robustness degradation in a  $4 \times 4$  NoC mesh that does not consider aging aware-routing (NO-AGE), we compare the reliability of this scheme with the MIP-ROUT scheme that models the aging-aware routing. Table V shows the router and link utilization for both the schemes for *Canneal* benchmark run. As evident from the table, under NO-AGE, we notice several routers and links exceeding their TAC limit (TAC-LIM), which will render them faulty after a while. MIP-ROUT adjusts these utilizations, and therefore improves NoC robustness. In the case of TAC-AGE, since stressed routers and links meet their *TAC* limits, its robustness is better than that of NO-AGE but lower than that of MIP-ROUT.

In Figure 2(a), we show the reliabilities of the schemes for an aging period of 7 years using the reliability's exponential dependence on failure rate [22]. As expected, NO-AGE shows higher failure rate compared to MIP-ROUT, as its design does not adapt to the wear-out degradation of NoC components. As stressed routers and links are well below their TAC-LIM in MIP-ROUT, its reliability is better than TAC-AGE also.

## C. Overhead Analysis

1) Network Latency: Figure 2(b) shows the overhead estimated in network latency for TAC-AGE and MIP-ROUT, relative to the NO-AGE scheme. As TAC-AGE does not optimize design constraints like the total router and link



Fig. 2: NoC Reliability, and overhead comparisons between aging aware NoC designs.

TABLE V: Stressed router and link utilization

| Stressed Routers |       |       |       |           |       |       |       |       |
|------------------|-------|-------|-------|-----------|-------|-------|-------|-------|
|                  | R1    | R0    | R2    | R5        | R4    | R6    | R3    | R7    |
| NO-              | 0.54  | 0.33  | 0.28  | 0.22      | 0.17  | 0.15  | 0.14  | 0.09  |
| AGE              |       |       |       |           |       |       |       |       |
| TAC-             | 0.36  | 0.22  | 0.19  | 0.15      | 0.12  | 0.10  | 0.10  | 0.07  |
| LIM              |       |       |       |           |       |       |       |       |
| MIP-             | 0.32  | 0.20  | 0.16  | 0.14      | 0.10  | 0.09  | 0.08  | 0.03  |
| ROUT             |       |       |       |           |       |       |       |       |
|                  |       |       | Sti   | essed Lir | ıks   |       |       |       |
|                  | L9    | L12   | L20   | L13       | LO    | L23   | L15   | L1    |
| NO-              | 0.165 | 0.126 | 0.100 | 0.098     | 0.085 | 0.080 | 0.060 | 0.056 |
| AGE              |       |       |       |           |       |       |       |       |
| TAC-             | 0.111 | 0.086 | 0.069 | 0.072     | 0.065 | 0.066 | 0.050 | 0.048 |
| LIM              |       |       |       |           |       |       |       |       |
| MID              | 0.077 | 0.007 | 0.000 | 0.005     | 0.000 | 0.040 | 0.020 | 0.000 |
| IVIIP-           | 0.066 | 0.037 | 0.029 | 0.025     | 0.022 | 0.040 | 0.030 | 0.029 |

latency, it incurs higher overhead as compared to MIP-ROUT. MIP-ROUT reduces the overhead by an average of 62.7% for different benchmarks (labels same as in Table VI).

2) Energy-Delay-Product-Per-Flit: Figure 2(c) shows the EDPPF for different benchmarks across these schemes. Both TAC-AGE and MIP-ROUT show overheads relative to NO-AGE, as they adapt to offer greater robustness. Compared to TAC-AGE, MIP-ROUT reduces this overhead by an average of 46% across different benchmarks.

*3) IPC:* Table VI shows the loss in IPC due to TAC-AGE and MIP-ROUT, relative to NO-AGE. Across different benchmarks, MIP-ROUT shows 41% system performance improvement over TAC-AGE, hence showing the effectiveness of our approach.

TABLE VI: Percentage IPC Loss (Lower is Better)

|              |       |         | ·        |
|--------------|-------|---------|----------|
| Benchmark    | Label | TAC-AGE | MIP-ROUT |
| Canneal      | can   | 29.5    | 22.9     |
| Dedup        | ded   | 29.6    | 15.4     |
| Facesim      | fac   | 13.2    | 6.8      |
| Ferret       | fer   | 28.8    | 14.8     |
| Fluidanimate | flu   | 19.5    | 8.5      |
| Freqmine     | fre   | 25.2    | 16.8     |
| Raystone     | ray   | 16.9    | 11.3     |

## VIII. CONCLUSION

In this paper, we analyze the impact of aging on NoC architecture, considering both NoC routers and links. We observe a critical need to optimize NoC power-performance metrics while considering wear-out degradation resulting from asymmetric utilization of NoC components. To efficiently tackle this multi-objective design challenge, we proposed an MILP based aging aware routing algorithm. Extensive experimental analysis demonstrate 62.7%, 46% improvements in network latencies and EDPPF for our algorithm, respectively. At the system level, our algorithm shows 41% performance improvement for real workloads.

# Acknowledgements

This work was supported in part by National Science Foundation grant CNS-1117425.

## REFERENCES

- R. Marculescu, Y. Ogras, L.-S. Peh, N. D. E. Jerger, and Y. V. Hoskote, "Outstanding research problems in noc design: System, microarchitecture, and circuit perspectives," *TCAD*, vol. 28, no. 1, pp. 3–21, 2009.
- ture, and circuit perspectives," *TCAD*, vol. 28, no. 1, pp. 3–21, 2009.
  J. D. Owens, W. J. Dally, R. Ho, D. N. Jayasimha, S. W. Keckler, and L.-S. Peh, "Research challenges for on-chip interconnection networks," *IEEE Micro*, vol. 27, no. 5, pp. 96–108, 2007.
- [3] X. Fu, T. Li, and J. A. B. Fortes, "Architecting reliable multi-core network-on-chip for small scale processing technology," in *Proc. of DSN*, 2010, pp. 111–120.
- [4] C. Hernandez, F. Silla, and J. Duato, "A methodology for the characterization of process variation in noc links," in *Proc. of DATE*, 2010, pp. 685–690.
- [5] A. K. Kodi, A. Sarathy, A. Louri, and J. M. Wang, "Adaptive interrouter links for low-power, area-efficient and reliable network-on-chip (noc) architectures," in *Proc. of ASP-DAC*, 2009, pp. 1–6.
- [6] N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha, "Garnet: A detailed on-chip network model inside a full-system simulator," 2009, pp. 33–42.
- [7] PARSEC benchmarks, http://parsec.cs.princeton.edu/.
- [8] B. Li, L.-S. Peh, and P. Patra, "Impact of process and temperature variations on network-on-chip design exploration," in *NOCS*, 2008, pp. 117–126.
- [9] S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula, "Predictive modeling of the nbti effect for reliable design," in *IEEE Custom Integrated Circuits Conference*, sept. 2006, pp. 189–192.
- [10] J. Sun, A. K. Kodi, A. Louri, and J. M. Wang, "Nbti aware workload balancing in multi-core systems," in *Proc. of ISQED*, 2009, pp. 833–838.
  [11] H. Chang and S. S. Sapatnekar, "Statistical timing analysis considering
- [11] H. Chang and S. S. Sapatnekar, "Statistical timing analysis considering spatial correlations using a single pert-like traversal," in *ICCAD*, 2003, pp. 621–626.
- [12] W. Navidi, *Statistics for Engineers and Scientist*, 3rd ed. Mc Graw Hill, 2010.
- [13] B. Datta and W. Burleson, "Analysis and mitigation of nbti-impact on pvt variability in repeated global interconnect performance," in *Proc. of GLSVLSI*, 2010, pp. 341–346.
- [14] M. Sun, M. G. Pecht, and D. Barbe, "Lifetime rc time delay of on-chip copper interconnect," in *IEEE Tran. on Semiconductor Manufacturing*, vol. 15, 2002, pp. 253–259.
- [15] Predictive Technology Model, http://ptm.asu.edu/.
- [16] Open Source NoC Router RTL, https://nocs.stanford.edu/cgi-bin/trac.cgi/ wiki/Resources/Router.
- [17] M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, "Multifacets general execution-driven multiprocessor simulator (gems) toolset," *SIGARCH Comput. Archit. News*, vol. 33, 2005.
- [18] A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi, "Orion 2.0: A fast and accurate noc power and area model for early-stage design space exploration," in *DATE*, 2009, pp. 423–428.
- [19] J. Hu and R. Marculescu, "Energy- and performance-aware mapping for regular noc architectures," *TCAD*, vol. 24, no. 4, pp. 551–562, 2005.
- [20] N. Nikitin, S. Chatterjee, J. Cortadella, M. Kishinevsky, and Ü. Y. Ogras, "Physical-aware link allocation and route assignment for chip multiprocessing," in NOCS, 2010, pp. 125–134.
- [21] CPLEX, http://www.ilog.com/products/cplex/.
- [22] W. J. Dally and B. Towles, Principles and practices of interconnection networks. Morgan Kaufmann, 2004.