## Impact of Adaptive Voltage Scaling on Aging-Aware Signoff

Tuck-Boon Chan<sup>†</sup>, Wei-Ting Jonas Chan<sup>†</sup> and Andrew B. Kahng<sup>†‡</sup>

<sup>†</sup>ECE and <sup>‡</sup>CSE Departments, UC San Diego, La Jolla, CA 92093

{tbchan, wechan, abk}@ucsd.edu

Abstract-Transistor aging due to bias temperature instability (BTI) is a major reliability concern in sub-32nm technology. Aging decreases performance of digital circuits over the entire IC lifetime. To compensate for aging, designs now typically apply adaptive voltage scaling (AVS) to mitigate performance degradation by elevating supply voltage. Varying the supply voltage of a circuit using AVS also causes the BTI degradation to vary over lifetime. This presents a new challenge for margin reduction in conventional signoff methodology, which characterizes timing libraries based on transistor models with pre-calculated BTI degradations for a given IC lifetime. Many works have separately addressed predictive models of BTI and the analysis of AVS, but there is no published work that considers BTI-aware signoff that accounts for the use of AVS during IC lifetime. This motivates us to study how the presence of AVS should affect aging-aware signoff. In this paper, we first simulate and analyze circuit performance degradation due to BTI in the presence of AVS. Based on our observations, we propose a ruleof-thumb for chip designers to characterize an aging-derated standardcell timing library that accounts for the impact of AVS. According to our experimental results, this aging-aware signoff approach avoids both overestimation and underestimation of aging - either of which results in power or area penalty - in AVS enabled systems.

## I. INTRODUCTION

Bias temperature instability (BTI) is a major aging mechanism in sub-32nm CMOS technology. The BTI effect increases the threshold voltage ( $|V_t|$ ) of a MOS transistor, resulting in a time-dependent timing degradation in very large scale integrated (VLSI) circuits [8] [7]. It is mandatory to consider the BTI effect in modern timing signoff recipes – via 10-year timing libraries, flat  $V_{DD}$  margin, etc. – to ensure that circuits will operate correctly over lifetime.

Adaptive voltage scaling (AVS) is a low-power design technique which adjusts the supply voltage ( $V_{DD}$ ) of a circuit adaptively to meet the timing performance requirement with the minimum voltage and power. AVS can be used to mitigate BTI-induced timing degradation by increasing  $V_{DD}$  as long as the BTI degradation is captured by the performance sensor in AVS [2] [10] [11]. However, the use of AVS during IC lifetime to compensate for BTI degradation causes a fundamental inconsistency among the voltages in signoff (library characterization) and circuit operation, as illustrated in Figure 1. Resolving this inconsistency is the subject of our present investigation.



Fig. 1. The upper part of this figure illustrates a signoff flow using a derated library. The lower part of this figure illustrates that AVS increases the voltage of the circuit to compensate for BTI degradation. As a result, the circuit ends up with a voltage at the end of lifetime ( $V_{final}$ ) which does not match the voltages ( $V_{lib}$ ,  $V_{BTI}$ ) used for library characterization. Such inconsistency among the voltages leads to design overheads.

978-3-9815370-0-0/DATE13/©2013 EDAA

The upper part of Figure 1 shows a typical signoff flow, in which a derated library is characterized so that circuit designers can use the library for circuit design and signoff. The signoff flow consists of three major steps:

- (1) The magnitude of BTI degradation  $(|\Delta V_t|)$  is estimated using an aging model. Note that the voltage applied in the aging model, which we denote by  $V_{BTI}$  ( $V_{BTI}$  is used to calculate the  $|\Delta V_t|$  for derated library characterization), significantly influences the  $|\Delta V_t|$  that results from BTI degradation [16]. Therefore, the selection of  $V_{BTI}$  affects the derated library.
- (2) The extracted  $|\Delta V_t|$  is used in transistor models to characterize an aging-derated library which accounts for BTI degradation. During the library characterization, transistors and standard cells are simulated at a possibly different voltage level, which we denote by  $V_{lib}$ .
- (3) With the derated library, circuit designers can implement and sign off a circuit.

During runtime (lower part of Figure 1), AVS increases the  $V_{DD}$  of the circuit to compensate for BTI degradation. This will lead to a higher  $V_{DD}$  at the end of circuit lifetime ( $V_{final}$ ). Note that  $V_{lib}, V_{BTI}$  and  $V_{final}$  could be different from each other. For instance,  $V_{final}$  is a result of AVS to compensate for BTI degradation which varies depending on circuit implementation. Also, guardbanding for the operating worst-case during library characterization will lead to different  $V_{lib}$  and  $V_{BTI}$ . This is because the worst-case BTI degradation happens when  $V_{BTI}$  is high but the worst-case gate delays happen when  $V_{lib}$  is low. Moreover, circuit designers do not know  $V_{final}$  before the circuit is implemented. Hence, there is no obvious guideline for how to define  $V_{lib}$  and  $V_{BTI}$  during library characterization when  $V_{final}$  remains an unknown. Such inconsistency among  $V_{final}$ ,  $V_{lib}$  and  $V_{BTI}$  leads to the following questions, which we address in this paper:

- (1) What is the design overhead when timing libraries are not properly characterized (e.g., due to poor selection of  $V_{lib}$  and  $V_{BTI}$ ) to account for the aging effect in an AVS circuit?
- (2) What are guidelines to define BTI- and AVS-aware signoff corners that guarantee timing correctness with little design overhead?

Although there have been many analyses of the interactions between aging and AVS [2] [5] [10] [11], none of them discusses the questions mentioned above. Generally, previous literature assumes that a circuit is signed off with timing libraries without BTI effect. Hence, it is possible that the circuit fails to meet performance requirements due to BTI degradation. Although a BTI-aware timing analysis can be applied after signoff, this may require multiple iterations of signoff and resizing or other ECOs before the circuit implementation converges.

Our contributions are as follows.

(1) To answer the first question, we sign off benchmark circuits using different derated libraries and compare metrics (e.g., area and power) of the resulting circuit implementations. Our experimental results show that circuits signed off using different derated libraries have up to 35% area or 20% runtime power overheads for the same frequency requirements.

- (2) To answer the second question, we analyze the impact of BTI degradation and AVS on  $V_{final}$ ,  $V_{lib}$  and  $V_{BTI}$ , and propose guidelines for the selection of  $V_{lib}$  and  $V_{BTI}$ . Based on these guidelines, we propose a methodology to obtain  $V_{lib}$  and  $V_{BTI}$  for the characterization of derated libraries that account for aging effect in a circuit with AVS.
- (3) We show that circuit implementations signed off with derated libraries obtained by our method achieve superior circuit area and power tradeoffs compared to implementations obtained using alternative derated libraries.

The organization of the rest of this paper is as follows. In Section II we discuss the signoff for aging circuits that have AVSbased adaptivity. In Section III, we propose a heuristic approach to estimate the proper voltage corner at which to characterize derated timing libraries for aging-aware signoff. We describe our aging model and experiment setup in Section IV, and present experimental results in Section V. Finally, we conclude this paper in Section VI.

#### II. AGING-AWARE SIGNOFF

## A. Signoff with Derated Library

In a typical timing signoff methodology, meeting timing constraints with pre-defined corner libraries implies that the circuit will work correctly at the target specification. This is because the corner libraries are characterized at worst-case operating conditions. Thus, to characterize a BTI aging library for signoff, traditional methodology considers the worst-case transistor degradation due to the BTI effect. Our present work focuses on library characterization for signoff of setup-time checks, since the main effect of BTI aging is to increase delay in data paths.

Characterization of an aging library is commonly performed in two steps. First, transistor aging is estimated at a worst-case scenario defined by the total time of BTI stress, the temperature, and the voltage  $(V_{BTI})$  being applied to the transistors. Note that this BTI degradation estimation is pessimistic for a AVS circuit because  $V_{BTI}$  is defined as a constant for the entire lifetime, whereas the voltage of a AVS circuit is initially smaller and gradually increases during circuit lifetime. Second, the transistor aging  $(\Delta |V_t|)$  calculated from the first step is included in transistor models for library characterization. During timing library characterization, we must also fix the operating voltage  $(V_{lib})$  of the transistors and standard cells. The values of  $V_{BTI}$  and  $V_{lib}$  could be different because the worst-case corner for  $V_{BTI}$  is at the maximum allowed voltage (higher voltage increases  $\Delta |V_t|$ ), while the worst-case corner for  $V_{lib}$  is at the minimum allowed voltage (lower voltage increases gate delay). As we will show in Section V, this subtle difference between selection of  $V_{lib}$  and selection of  $V_{BTI}$  has significant impact on circuit area and runtime power.

## B. Worst-Case BTI Degradation

Note that the BTI-induced timing degradation is affected by the total stress time (i.e., total time when transistors are on), which varies depending on circuit activity. The actual circuit activity is very difficult to capture because it is determined by circuit usage. Since it is impractical for any known AVS monitor to capture the detailed circuit activity of each transistor in a circuit, we assume that designers must consider a worst-case scenario at signoff.

Velamala et al. in [17] show that worst-case timing degradation occurs when critical paths experience a long *DC BTI stress* (i.e., transistors are always under BTI stress). However, assuming a DC BTI stress may be too pessimistic: a typical CMOS circuit usually switches during operation, and exhibits an *AC BTI stress* (i.e., transistors experience alternate BTI stress and recovery phrases). The measurement results in [6] and [7] show that the amount of BTI degradation is not sensitive to *stress duty cycle* (i.e., the ratio of total stress time to total operating time) when the duty cycle ranges from 20% to 80%. This means that we can approximate the

BTI degradation in a typical CMOS circuit by assuming an AC BTI stress with 50% duty cycle. In the studies reported below, we consider both DC and AC aging scenarios with  $125^{\circ}C$  operating temperature.<sup>1</sup>

## C. Adaptive Voltage Scaling (AVS)

To study BTI degradation of a circuit with AVS, we assume that the circuit monitors its maximum frequency ( $F_{max}$ ) in a discretetime manner. Whenever the  $F_{max}$  of the circuit is lower than a pre-defined target frequency ( $F_{target}$ ), the  $V_{DD}$  will be increased by a  $V_{step}$  (where  $V_{step}$  is an attribute of the voltage regulator). After the  $V_{DD}$  adjustment, the AVS circuitry will evaluate  $F_{max}$  and continue to increase  $V_{DD}$  until  $F_{max} \ge F_{target}$ . The AVS mechanism is illustrated in Figure 2. In our discussion, we use t to denote time,



Fig. 2. Experiment flow to emulate AVS mechanism.

 $\Delta t$  to denote the time interval between successive AVS calibrations, and  $t_{final}$  to denote the end of circuit lifetime. The  $V_{DD}$  of the circuit at the beginning of its lifetime (i.e., the minimum voltage needed to meet the frequency requirement at t = 0) is denoted by  $V_{init}$ .

The update library step in Figure 2 is very slow if we characterize a library whenever  $V_{lib}$  or  $\Delta |V_t|$  is changed. To speed up the simulation runtime, we pre-characterize a set of libraries with different  $V_{lib}$  and  $\Delta |V_t|$ . To obtain the  $F_{max}$  of a circuit at specific  $V_{lib}$  and  $\Delta |V_t|$ , we simulate the circuit with all the precharacterized libraries and estimate the  $F_{max}$  value by interpolation with spline polynomial functions. Circuit leakage and runtime power are estimated similarly. The lifetime leakage and power are obtained by averaging over all timesteps. Figure 3 shows that the delay, leakage power and dynamic power estimations obtained from the interpolation have only 1.35%, 2.03% and 0.28% error on average compared to values obtained from actual data obtained by characterization of libraries at the sampled points.

# III. GUIDELINES FOR CHARACTERIZATION OF DERATED LIBRARIES

## A. Observation: $V_{lib} = V_{BTI} \approx V_{final}$

To study the relationship between  $V_{BTI}$  and  $V_{final}$ , we implement a given circuit using a library characterized at the nominal voltage of the process technology ( $V_{lib} = V_{nom}$ ), with the assumption that there is no BTI degradation. We then use the flow in Figure 2 to obtain the  $V_{final}$  of the circuit (lifetime = 10 years, DC BTI degradation). Figure 4 shows the  $\Delta |V_t|$  with AVS compared to the case where  $V_{final}$  is applied to the same circuit throughout circuit lifetime. During the early lifetime, the BTI degradation ( $\Delta |V_t|$ ) for the adaptive  $V_{DD}$  case (AVS) is less than that for the fixed  $V_{final}$ case. This is because the adaptive  $V_{DD}$  case has a smaller  $V_{DD}$  value

<sup>1</sup>Although temperature profile is spatially non-uniform across a chip, we use the highest operating temperature  $(125^{\circ}C)$  in our analysis to estimate the worst-case BTI degradation.



Fig. 3. To evaluate the accuracy of the interpolation approach, we obtain the actual delay, leakage and runtime power by characterizing additional libraries at the  $V_{lib}$  and  $\Delta |V_t|$  used in the interpolation. The average error between the actual and the interpolated delay, leakage, and power values at sampled points is 1.35%, 2.03%, and 0.28%, respectively.



Fig. 4.  $|\Delta V_t|$  of PBTI and NBTI of a circuit (MPEG2) with a flat  $V_{BTI} = V_{final}$  or AVS over circuit lifetime. The results show that the difference between a flat  $V_{DD}$  and AVS is less than 10mV, and that this difference becomes smaller toward the end of circuit lifetime.

at early lifetime, and BTI degradation increases with  $V_{DD}$ . However, due to the front-loaded nature of BTI degradation [5],  $\Delta V_t$  difference between the fixed  $V_{final}$  and the AVS cases quickly converges.

The simulation results in Figure 4 show that we can estimate the degradation of an AVS circuit by assuming a constant  $V_{final}$  throughout circuit lifetime. This approximation slightly overestimates the  $\Delta |V_t|$ , but the overestimation is very small. In other words, we can characterize a derated library using  $V_{final}$  for signoff (i.e.,  $V_{BTI} = V_{final}$ ).

Note that the assumption of a constant  $V_{final}$  throughout circuit lifetime implies that  $V_{lib} = V_{final} = V_{BTI}$ . To understand what is the appropriate setup for  $V_{lib}$ , we analyze what are the implications when  $V_{lib} \neq V_{BTI}$ . When  $V_{lib} > V_{BTI}$ , the library characterization is optimistic because we assume the operating voltage is higher than the voltage that defines BTI degradation. This violates the principle of having a derated library that defines the worst-case condition. Thus, we should not use a  $V_{lib}$  that is greater than the  $V_{BTI}$ . On the other hand, having  $V_{lib} < V_{BTI}$  means that the library characterization is pessimistic. However, there is no reason to be more pessimistic because the degradation obtained from  $V_{BTI}$  is already slightly pessimistic. We conclude that having  $V_{lib} = V_{final}$ is a reasonable option to avoid being optimistic or overly pessimistic in library characterization.



Fig. 5. The relationship between  $V_{final}$  and  $\alpha$  for different cells.  $\alpha$  is the delay margin at signoff. The curves vary with different gate complexity and topology.

## B. Estimation of V<sub>final</sub> at Early Design Stage

Of course, the main obstacle to library characterization with  $V_{lib} = V_{BTI} = V_{final}$  is that this requires knowledge of the  $V_{final}$  of an AVS circuit, which is not available in the early design stages when the actual circuit is not fully implemented. Indeed, to obtain the  $V_{final}$ , we need to implement a circuit with a library, which requires  $V_{lib}$  and  $V_{BTI}$ .

To overcome this "chicken and egg" problem, we analyze the factors that determine  $V_{final}$  of a circuit by synthesizing cell chains consisting of different standard cells. In our experiment, we construct the cell chains such that they meet the frequency requirement at t = 0 with  $V_{lib} = 0.9$ V (nominal voltage of the technology), when there is no BTI degradation assumed in the library.

Results in Figure III-B show that  $V_{final}$  is related to both gate complexity and gate topology. For instance, we observe that a complex gate such as AOI requires smaller  $V_{DD}$  to compensate for BTI degradation, which leads to a smaller  $V_{final}$ . At the same time, different gate topologies (e.g., NAND3 and NOR3) cause the gate delay to be dominated by NMOS or PMOS devices. Since different devices have different BTI degradation, the gate topologies also affect  $V_{final}$ .<sup>2</sup> Another subtle factor that affects  $V_{final}$  is the *delay margin* of the circuit. Delay margin (denoted by  $\alpha$ ) is defined as the difference (normalized to the signed-off circuit delay) between the target delay and the delay of the signed-off circuit at t = 0 (denoted by  $D_{t=0}$ ). That is,

$$\alpha = \frac{D_{target} - D_{t=0}}{D_{target}}$$

$$D_{target} = \frac{1}{F_{target}}$$
(1)

To estimate the  $V_{final}$  versus  $\alpha$  curve of a circuit (before the circuit is implemented), we assume that the critical path of the circuit is composed of a mix of different cell types. Thus, we model the  $V_{final}$  versus  $\alpha$  curve by averaging the curves from various cell types. We choose gates from the following categories to increase the gate diversity: (1) complex and simple gates, (2) pass gates, (3) PMOS-dominated gates, and (4) NMOS dominated gates. Our simulation results in Figure III-B show that the maximum error of  $(V_{final})$  among different circuits and cell chains is about one  $V_{step}$  (10mV) for different  $\alpha$ .

In summary, we can characterize an aging library for an AVS circuit if the following AVS-related information is available: (1)  $V_{init}$ , (2)  $V_{step}$ , (3)  $\Delta t$  and (4)  $F_{target}$  (relative to circuit  $F_{max}$  at  $t_0$ ).

<sup>2</sup>We have also simulated different sizes of gates, but the results show that size has smaller influence than gate complexity or topology.

| F | TAI<br>Parameters<br>NBTI Agii | [D]  | Ref       | ERENC<br>OUR |  |                    |
|---|--------------------------------|------|-----------|--------------|--|--------------------|
|   |                                | PBTI | NBTI      |              |  |                    |
|   | п                              | 3.3  | 2.5       |              |  |                    |
|   | Α                              | 4.52 | $2e^{-3}$ | 1            |  | Vma                |
|   | β                              | 0.   | 85        | 1            |  | Vin                |
|   | $E_0(MV/cm)$                   | 0.   | 15        |              |  | V <sub>heur1</sub> |
|   | $E_a(eV)$                      | 0.13 |           | 1            |  | Vheur2             |

1.15 1.20

0.492

0.494

TABLE II REFERENCE VOLTAGES USED IN OUR EXPERIMENTS.

|                  | Voltage (V) |
|------------------|-------------|
| Vmax             | 1.05        |
| Vinit            | 0.90        |
| $V_{heur1}$ (DC) | 0.97        |
| $V_{heur2}$ (DC) | 0.95        |
| Vheur1 (AC)      | 0.95        |
| $V_{heur2}$ (AC) | 0.93        |

#### **IV. EXPERIMENT SETUP**

#### A. Aging Model

 $t_{ox}(nm)$ 

 $V_t(V)$ 

To predict the impact of BTI on design performance, we use the analytical model from [16]. The  $|V_t|$  degradation of a MOS transistor is given as

$$\begin{aligned} |\Delta V_t| &= \sqrt{K_v^2 \cdot (t - t_0)^{\frac{1}{n}}} \\ K_v &= A \cdot t_{ox} \cdot \sqrt{C_{ox}(V_{gs} - V_t)} \cdot \left[1 - \frac{V_{ds}}{\beta(V_{gs} - V_t)}\right] \\ &\times exp(\frac{V_{gs}}{E_{otox}}) \cdot exp(\frac{-Ea}{kT}) \end{aligned}$$
(2)

where *t* is the total stress time of a transistor,  $t_0$  is the time when a circuit is turned on for the first time, *k* is the *Boltzmann* constant,  $t_{ox}$  is transistor oxide thickness, *T* is temperature,  $V_{gs}$  is gate-to-source voltage, and  $V_{ds}$  is drain-to-source voltage. In this paper, we assume both  $V_{gs}$  and  $V_{ds}$  are the same as  $V_{BTI}$ .  $\beta$ , *n* and *A* are fitting parameters with values as listed in Table I.<sup>3</sup>

To explore circuit-level performance degradation, we use the aforementioned calibrated transistor degradation model along with the 32nm PTM transistor model [13] to characterize the FreePDK library (i.e., the original 45nm BSIM model of FreePDK library is replaced by the 32nm BSIM model). Since the original PTM transistor model only has a typical corner, we characterize the worst-case corner of the PTM transistor model by perturbing the process parameters in the PTM model. We assume that the relative process variation between worst-case and typical corners in the PTM model is similar to that in a 32nm MOSIS design kit. We also scale all interconnect resistances and capacitances from 45nm to 32nm by a factor of 0.7 using a commercial place and route tool [4]. We obtain timing and power of the circuits using [15]. To model BTI degradation with varying  $V_{DD}$  we use the technique in [2], [17].<sup>4</sup>

## B. Circuit Implementation

To evaluate the impact of AVS on aging-aware signoff, we compare the area and power of circuits that are signed off with different derated libraries. We set up experiments by implementing four benchmark circuits: c5315, c7552 [3], AES, and MPEG2 [12]. Library characterizations are carried out based on a 32nm PTM BSIM model with SS corner setting. The circuits are obtained through the following steps:

<sup>3</sup>We fit the parameters *A*, *E*<sub>0</sub>, and  $\beta$  based on a set of BTI data in [19]. Then, we extract the values of *n* for PBTI and NBTI from their corresponding measurement plots in [19]. The value of *E<sub>a</sub>* is obtained from [16].

<sup>4</sup>This technique can be summarized as follows. Whenever the  $V_{DD}$  is changed at time  $t_i$ , we record the accumulated  $\Delta |V_t|$  as  $\Delta V_{t_i}^{acc}$ . Based on the  $\Delta V_{t_i}^{acc}$ , we calculate the *effective stress time*  $t'_i$  using the relationship between  $\Delta V_t$  and t, which can be obtained from the aging model (2) with  $V_{ds}=V_{gs}=V_{DD}+V_{step}$ . After that, the  $\Delta |V_t|$  for the  $i^{th}$  time interval ( $\Delta |V_{t_i}|$ ) can be obtained by calculating the difference between  $\Delta |V_t|$  at  $t'_i$  and  $t'_i + \Delta t$ . Finally, the accumulated  $|V_t|$  degradation is given as

$$|\Delta V_{t_i+\Delta t}^{acc}| = (|\Delta V_{t_i}^{acc}|^{\frac{1}{n}} + |\Delta V_{t_i}|^{\frac{1}{n}})^n$$

- (1) Define  $V_{init} = 0.9V$ ,  $\Delta t = 3$  days,  $V_{step} = 0.01V$  and  $F_{target}$  for each benchmark circuit. The clock constraints of the four designs are 1.38GHz, 1.25GHz, 893MHz, and 1.05GHz, respectively.
- (2) Implement (synthesis, place and route) each circuit using a library characterized with  $V_{lib}$ =0.9V,  $\Delta |V_t| = 0$ .
- (3) Mitigate EDA tool "noise" by making 10 separate "synthesis, placement and route" runs for each benchmark circuit with {-4, -3, ..., +4, +5}ps perturbation of the clock constraint, and generate a circuit [9]. Then, report metrics for the circuit with minimum area-power product among the 10 candidate circuits thus produced.
- (4) Run the flow in Figure 2 to ensure that the circuit does not violate timing constraints until the end-of-lifetime. Store the circuit (#5 in Table III) and  $V_{final}$ .
- (5) Sign off the same benchmark circuits using different derated libraries characterized with the four combinations: (1) (V<sub>init</sub>, V<sub>init</sub>), (2) (V<sub>init</sub>, V<sub>max</sub>), (3) (V<sub>max</sub>, V<sub>max</sub>), and (4) (V<sub>init</sub>, V<sub>final</sub> obtained from Step (4)). This step generates Columns #1~#4 in Table III.
- (6) Repeat Step (5) using a derated library with  $V_{lib} = V_{BTI} = V_{heur1}$  and  $V_{heur2}$ , where  $V_{heur1}$  and  $V_{heur2}$  are the predicted  $V_{final}$  values obtained with our proposed  $V_{final}$  estimation method. The  $V_{heur1}$  and  $V_{heur2}$  are defined by  $\alpha = 0$  and  $\alpha = 0.03$  to evaluate the results with different  $\alpha$  since designer may keep some slack while signoff. This step generates circuits #6 and #7 in Table III.
- (7) Calculate runtime power of all circuits with AVS (i.e., the AVS mechanism in Figure 2).

#### V. EXPERIMENTAL RESULTS

To study potential implications of signoff choices on circuit area and power, we implement circuits with different derated libraries, as well as a reference circuit signed off with  $V_{lib} = V_{init}$  and no BTI degradation. The  $V_{lib}$  and  $V_{BTI}$  of the derated libraries are given in Table III. In Column #1, both  $V_{lib}$  and  $V_{BTI}$  are set to  $V_{init}$ . This setup represents the scenario where the effect of AVS is not considered during library characterization. In Column #2, we set  $V_{lib} = V_{init}$  but let  $V_{BTI} = V_{max}$  to model the worst-case scenario of a derated library. In Column #3, both  $V_{lib}$  and  $V_{BTI}$  are set to  $V_{max}$ . This represents another extreme scenario for the derated library, where the supply voltage of a circuit is assumed to increase to  $V_{max}$  to compensate for BTI degradation. The setup in Column #4 is similar to that in #2 but the  $V_{BTI}$  is defined by the  $V_{final}$  of the reference circuit. Note that this is an artificial setup because of the dependency between the  $V_{BTI}$  and the reference circuit. However, we use this setup to study the impact of ignoring the fact that  $V_{DD}$  varies due to AVS, even given that we have a reasonable estimation for BTI degradation. Column #5 in Table III represents the reference setup, which does not have a specific  $V_{lib}$  and  $V_{BTI}$  because both voltage values vary over time. Columns #6 and #7 are for the heuristic methods with  $\alpha = 0$  and 0.03, respectively. The values of  $V_{lib}$  and  $V_{BTI}$  are given in Table II.

Figure 6 plots the power and area tradeoff for all circuits, where we assume that each circuit increases supply voltage adaptively to compensate for DC BTI degradation. The results show that circuits implemented with different-degradation libraries have significant differences in power and area. For instance, circuits signed off with the setup in Column #2 of Table III have up to 35% larger area compared to other circuits. This is because the derated library is characterized with a worst-case BTI degradation, which leads to pessimistic circuit timing estimation. The results in Table III shows that the  $V_{DD}$  of the circuits in Column #2 remain at  $V_{init}$  (0.9V) at the end of circuit lifetime. This means that AVS is not triggered to compensate for BTI degradation due to the large timing margin resulted from a pessimistic signoff setup. The results also show that

#### TABLE III

Implementation results with different derated libraries. Circuit lifetime = 10 years. Circuit area and power values are normalized to those of the reference circuits in Col. #5.

| Circuit #:       |             |       | 1     | 2         | 3         | 4           | 5    | 6                  | 7                 |
|------------------|-------------|-------|-------|-----------|-----------|-------------|------|--------------------|-------------------|
| V <sub>lib</sub> |             |       | Vinit | Vinit     | $V_{max}$ | Vinit       | N/A  | V <sub>heur1</sub> | Vheur2            |
| V <sub>BTI</sub> |             |       |       |           |           |             |      | $(\alpha = 0)$     | $(\alpha = 0.03)$ |
|                  |             |       | Vinit | $V_{max}$ | $V_{max}$ | $V_{final}$ | N/A  | V <sub>heur1</sub> | Vheur2            |
|                  |             |       |       |           |           | of #5       |      |                    |                   |
|                  |             | c5315 | 0.90  | 0.90      | 0.98      | 0.90        | 0.93 | 0.91               | 0.90              |
|                  | DC          | c7552 | 0.90  | 0.90      | 0.98      | 0.90        | 0.94 | 0.94               | 0.92              |
| $V_{DD}$ (V)     | degradation | AES   | 0.90  | 0.90      | 1.00      | 0.90        | 0.96 | 0.95               | 0.94              |
| at 10-year       |             | MPEG2 | 0.90  | 0.90      | 1.00      | 0.90        | 0.95 | 0.94               | 0.95              |
| lifetime         |             | c5315 | 0.90  | 0.90      | 1.00      | 0.90        | 0.91 | 0.90               | 0.90              |
| point            | AC          | c7552 | 0.90  | 0.90      | 1.01      | 0.90        | 0.92 | 0.93               | 0.90              |
|                  | degradation | AES   | 0.90  | 0.90      | 1.01      | 0.90        | 0.93 | 0.94               | 0.93              |
|                  |             | MPEG2 | 0.90  | 0.90      | 1.02      | 0.90        | 0.93 | 0.94               | 0.91              |



Fig. 6. Power-to-area tradeoff among all circuit implementations of each of the four designs, under DC degradation. In each plot, we show the average runtime power and area of the  $\#1\sim\#7$  implementations for a given design. The (blue) circles of #3 tend to have higher power consumption because of the underestimation of degradation. The (red) squares of #1 #2, and #4 tend to have higher area because the overestimation. The (black) diamonds of other circuits tend to be more balanced between the two extremes.

some benchmark circuits (c5315, c7552, AES) implemented with the setup in Column #2 have about 5% more power compared to the reference circuits. This is because the total numbers of instances for the circuits in Column #2 are much larger than for the reference circuits.



Fig. 7.  $V_{DD}$  and  $F_{max}$  of three MPEG2 circuit implementations obtained with different derated libraries. The voltage of circuit #2 is fixed at  $V_{init}$  because it has large margin for degradation. This is due to the signoff corner for circuit #2 being too pessimistic. By contrast,  $V_{DD}$  of circuit #3 rises higher than that of circuit #5 soon after manufacturing, as a result of the signoff corner for circuit #3 being too optimistic.

Figure 6 shows that when more accurate BTI degradation information is available (i.e., setup #4), the derated library is less pessimistic, which leads to smaller area overheads. However, the circuit areas are 1% to 13% larger compared to the reference circuits because the derated library does not consider that supply voltage will be higher than  $V_{init}$  due to AVS. Since the derated library is pessimistic, the  $V_{DD}$  of the circuits in Column #4 remain at  $V_{init}$  (0.9V) at the 10-year lifetime point (see Table III). Therefore, the circuits in Column #4 have 4% to 9% lower power compared to the reference circuits.

In the case where the BTI degradation is underestimated and potential  $V_{DD}$  increment is ignored (i.e., setup #1), the inaccurate estimations compensate each other. Therefore, the area and power of the circuits implemented with such a derated library will have only small differences (< 8%) from the corresponding values for the reference circuit. However, the qualify of results (QoR) of circuits implemented with this derating setup is unpredictable as the outcomes depend on the magnitude of BTI degradation and the sensitivity of circuit performance to AVS.

On the other hand, Figure 6 shows that circuits in Column #3 always have 10% more power compared to the reference circuit. Table III shows that the  $V_{DD}$  of the circuits at 10-year lifetime point is much larger than that of the reference circuit. This indicates that the derated library is optimistic. Therefore, circuits signed off using this derated library will require higher supply voltages to compensate for performance degradation. This shows that an optimistic derated library can cause significant power overhead.

Figure 7 shows the  $V_{DD}$  and the corresponding  $F_{max}$  of the MPEG2 benchmark circuit over 10 years. When the signoff corner is too optimistic (#3), the implemented circuits fail to meet timing constraints due to BTI degradation. Therefore, the  $V_{DD}$  of the circuit is increased to a higher level than for the reference circuit (#5). On the other hand, the circuits in Column #2 have too much timing margin (no  $V_{DD}$  increment over lifetime even if aging) because the signoff corner is too pessimistic.

In Figure 6, we can see that circuits #6 and #7, which are implemented using derated libraries obtained from our heuristic approach, have less than 1% area and less than 5% power difference compared to the reference circuit. This shows that the derated library characterized based on our method can simultaneously capture the effects of the BTI degradation and the varying of  $V_{DD}$  due to AVS. Moreover, the circuits can be obtained through a single signoff step, unlike the reference circuits, which require multiple timing analysis and signoff iterations. We also note that the results of #6 and #7 are similar even though the derated libraries has 3% target slack difference. This suggests that our method is not sensitive to small changes in target slack.

Figure 8 shows the results of the same experiment setup, but with AC BTI degradation. We see that the results are qualitatively similar to those obtained with DC degradation. Since the AC BTI degradation is about 60% of that in the DC condition, the power/area differences between the circuits are reduced.

Area differences among different MPEG2 circuit implementations are relatively smaller than those observed for the other three designs, in both AC and DC cases. This is because the ratio of sequential cells (registers) to total cells in the MPEG2 testcase ( $\sim$ 50%) is larger than in the other testcases (e.g.,  $\sim$ 20% for AES circuit implementations). In the FreePDK cell library [21], there is only one size option for the sequential cells. Therefore, about half of the cells in MPEG2 cannot be resized even if the timing margins are different across the derated libraries. This explains the smaller area differences for MPEG2 across different derated libraries.

The results in Figures 6 and 8 show that characterizing a derated library with our proposed method can accurately estimate the effect of BTI aging of a circuit with AVS. The improved estimation can reduce design effort. For example, circuits implemented using the



Fig. 8. Power-to-area tradeoff among all circuit implementations of each of the four designs, under AC degradation. The (blue) circles of #3 tend to have higher power consumption because of the underestimation of degradation. The (red) squares of #1 #2, and #4 tend to have higher area because the overestimation. The (black) diamonds of other circuits tend to be more balanced between the two extremes.

derated libraries #1, #2, #3 and #4 will incur area or power penalty due to inaccurate estimation in BTI aging. Moreover, designers can only discover the inaccuracy after circuit implementation and AVS emulation. Hence, the circuits implemented using an inaccurate derated library may require additional design effort (e.g., sizing, AVS emulation and signoff) to reduce power and circuit area.

We observe that with the widespread adoption of AVS, an alternative signoff methodology emerges: namely, to sign off circuits using a library characterized with an *un-aged* device model. Such a methodology would leverage the presence of AVS by assuming that AVS would compensate for any BTI degradation during lifetime. And, if the library does not include any margin for BTI degradation, such a methodology will potentially save circuit area. On the other hand, such an approach does not verify the design at  $V_{final}$  during signoff. Therefore, it is possible that the implemented circuit does not meet design requirements (including performance through lifetime) and require another signoff iteration. For example, the circuit signed off at  $V_{init}$  may have new EM violations at  $V_{final}$  which cannot be identified during the signoff. Investigation of such an alternative signoff methodology, and its implications, is a subject for future investigation.

#### VI. CONCLUSION

In this paper, we study a fundamental discrepancy concerning the voltages that are applied for aging-derated library characterization, and the voltage through lifetime of a circuit with AVS – namely,  $V_{lib}, V_{BT1}$  and  $V_{final}$ . Because of the inconsistency among these voltages, the derated library can be either optimistic or pessimistic with respect to the impact of BTI degradation and AVS, depending on the values of  $V_{lib}$  and  $V_{BT1}$ . Our experimental results show that circuit implementations using different derated libraries can have up to 35% difference in circuit area and up to 20% difference in runtime power.

To avoid the design overhead that potentially arises from poor selection of  $V_{lib}$  and  $V_{BTI}$  during library characterization, we propose a library characterization guideline, which suggests that  $V_{lib} = V_{BTI} \approx V_{final}$  is the best strategy for aging-derated library characterization. We also point out that the inconsistency among  $V_{lib}$ ,  $V_{BTI}$  and  $V_{final}$  is a "chicken and egg" problem, in that  $V_{final}$  is required for library characterization but is not available before the circuit is implemented with the derated library. We solve this problem by estimating the  $V_{final}$  from simple replica circuits and AVS parameters available early in the design process. Our experimental results show that the circuits implemented using derated libraries obtained from our methodology have less than 2% area and 5% power differences compared to a reference circuit. This suggests that the derated library obtained using our methodology accurately captures the combined impact of BTI degradation and AVS.

Our ongoing work pursues (1) a comprehensive aging- and AVSaware library characterization for PVT corners; (2) signoff of hold time violation considering degradation of the clock distribution network; and (3) extension of our methodology for aging-aware library characterization to contexts where the actual circuit consists of devices with different BTI characteristics.

#### REFERENCES

- M. A. Alam, K. Roy and C. Augustine, "Reliability- and Process-Variation Aware Design of Integrated Circuits - A Broader Perspective", *Proc. IEEE Intl. Reliability Physics Symposium*, 2011, pp. 4A.1.1-4A.1.11.
- [2] M. Basoglu, M. Orshansky and M. Erez, "NBTI-Aware DVFS: A New Approach to Saving Energy and Increasing Processor Lifetime", *Proc. ISLPED*, 2010, pp. 253-258.
- [3] F. Brglez and H. Fujiwara, "A Neutral Netlist of 10 Combinational Benchmark Circuits and A Target Translator in FORTRAN", *Proc. ISCAS*, 1985, pp. 677-692.
- [4] Cadence Design Systems, SoC Encounter,
- http://www.cadence.com/rl/Resources/datasheets/soc\_encounter\_ds.pdf. [5] T.-B. Chan, J. Sartori, P. Gupta and R. Kumar, "On the Efficacy of
- [6] F.D. Orini, J. Souron, T. Separa and R. Trans, J. Star and J. Linning T. NBTI Mitigation Techniques", *Proc. DATE*, 2011, pp. 1-6.
  [6] T. Grasser, B. Kaczer, W. Goes, H. Reisinger, T. Aichinger, P. Hehenberger, P. J. Wagner, F. Schanovsky, J. Franco, M. T. Luque and M. Nelhiebel, "The Paradigm Shift in Understanding the Bias Temperature Instability: From Reaction-Diffusion to Switching Oxide Traps", *IEEE Trans. on Electron Devices* 58(11) (2011), pp. 3652-3666.
- [7] V. Huard, N. Ruiz, F. Cacho and E. Pion, "A Bottom-Up Approach for System-On-Chip Reliability", *Microelectronics Reliability* 51(9-11) (2011), pp. 1425-1439.
- [8] B. Kaczer, S. Mahato, V. V. de A. Camargo, M. T. Luque, Ph. J. Roussell, T. Grasser, F. Catthoor, P. Dobrovolny, P. Zuber, G. Wirth and G. Groeseneken, "Atomistic Approach to Variability of Bias-Temperature Instability in Circuit Simulations", *Proc. IEEE Intl. Reliability Physics Symposium*, 2011, pp. XT.3.1-XT.3.5.
- [9] K. Jeong and A. B. Kahng, "Methodology from Chaos in IC Implementation", Proc. ISQED, 2010, pp. 885-892.
- [10] S. V. Kumar, C. H. Kim and S. S. Sapatnekar, "Adaptive Techniques for Overcoming Performance Degradation due to Aging in CMOS Circuits", *IEEE Trans. on VLSI Systems* 19(4) (2011), pp. 603-614.
- [11] E. Mintarno, J. Skaf, R. Zheng, J. B. Velamala, Y. Cao, S. Boyd, R. W. Dutton and S. Mitra, "Self-Tuning for Maximized Lifetime Energy-Efficiency in the Presence of Circuit Aging", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* 30(5) (2011), pp. 760-773.
- [12] OpenCores, http://opencores.org/.
- [13] PTM Model, http://ptm.asu.edu/.
- [14] J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers, "The Impact of Technology Scaling on Lifetime Reliability", Proc. IEEE Intl. Conf. on Dependable Systems and Networks, 2004, pp. 177-186.
- [15] Synopsys Primetime, http://www.synopsys.com/Tools/Implementation /SignOff/Pages/PrimeTime.aspx
  [16] R. Vattikonda, W. Wang and Y. Cao, "Modeling and Minimization of
- [16] R. Vattikonda, W. Wang and Y. Cao, "Modeling and Minimization of PMOS NBTI Effect for Robust Nanometer Design", *Proc. DAC*, 2006, pp. 1047-1052.
- [17] J. B. Velamala, V. Ravi and Y. Cao, "Failure Diagnosis of Asymmetric Aging Under NBTI", *Proc. ICCAD*, 2011, pp. 428-433.
  [18] W. Wang, S. Yang and Y. Cao, "Node Criticality Computation for
- [18] W. Wang, S. Yang and Y. Cao, "Node Criticality Computation for Circuit Timing Analysis and Optimization Under NBTI Effect", *Proc. ISQED*, 2009, pp. 763-768.
- [19] S. Zafar, Y. H. Kim, V. Narayanan, C. Cabral Jr., V. Paruchuri, B. Doris, J. Stathis, A. Callegari and M. Chudzik, "A Comparative Study of NBTI and PBTI (Charge Trapping) in SiO2/HfO2 Stacks with FUSI, TiN, Re Gates", *Proc. IEEE Symp. on VLSI Technology*, 2006, pp. 23-25.
- [20] L. Zhang and R. P. Dick, "Scheduled Voltage Scaling for Increasing Lifetime in the Presence of NBTI", Proc. ASP-DAC, 2009, pp. 492-497.
- [21] 45nm FreePDK process design kit, www.eda.ncsu.edu/wiki/FreePDK.