# A Novel Delay Calibration Method Considering Interaction between Cells and Wires

Leilei Jin<sup>1</sup>, Jiajie Xu<sup>1</sup>, Wenjie Fu<sup>1</sup>, Hao Yan<sup>1</sup>, Xiao Shi<sup>2</sup>, Ming Ling<sup>1</sup>, Longxing Shi<sup>1</sup>

<sup>1</sup>The National ASIC System Engineering Technology Research Center, Southeast University

<sup>2</sup>School of Computer Science and Engineering, Southeast University

Nanjing, China

{jinleilei, jiajiex, wenjfu, yanhao, xshi, trio, lxshi}@seu.edu.cn

Abstract—In the advanced technology, the accuracy of cell and wire delay modeling are the key metrics for timing analysis. However, when the supply voltage decreases to the nearthreshold regime, the complicated process variation effect causes the cell delay and the wire delay hard to model. Most researchers study cell or wire delay separately, ignoring the coefficients between them. In this paper, we propose an N-sigma delay model by characterizing different sigma levels ( $-3\sigma$  to  $+3\sigma$ ) of the cell and wire delay distribution. The N-sigma cell delay model is represented by the first four moments and calibrated by the operating conditions (input slew, output load). Meanwhile, based on the Elmore model, the wire delay variability is calculated by considering the effect of drive and load cells. The delay models are verified through the ISCAS85 benchmarks and the functional units of PULPino processor with TSMC 28 nm technology. Compared to the SPICE results, the average errors for estimating the  $+/-3\sigma$  cell delay are 2.1% and 2.7% and those of the wire delay are 2.4% and 1.6%, respectively. The errors of path delay analysis keep below 6.6% and the speed is 103X over SPICE MC simulations.

Keywords—process variations, cell delay, wire delay, timing analysis

## I. INTRODUCTION

Static timing analysis is one of the main procedures in the circuit design flow [1]. It provides both the basis for physical design optimizations and the metric for the timing sign-off [2]. During chip fabrication, manufacturing restrictions cause various variations which lead to unpredictable electrical characteristic deterioration of transistors and interconnects in an advanced technology [3]. Although device-level simulators can capture golden results through Monte Carlo (MC) sampling, the simulation is always impractically slow. Differently, the statistical timing analysis is widely studied to quantify the impact of process variations, by introducing the probability density function (PDF) of the delay distribution. In statistical analysis, the industry explored several methods for handling process variation effects, such as Liberty Variation Format (LVF) [4]. It calculates delay variation by indexing the input slew and the output load to determine both the mean and variation of cells and wires. The effective capacitance is added to the output load of cells, representing the effect of connected wires. For the wire delay, Elmore is the most popular metric [14], using the first moment of interconnect delay. A drawback of Elmore is neglecting the wire delay variability caused by connected cells and process variations.

As revealed in [6], the interconnect delay is affected by several parameters (e.g., input slew, wire length). Moon et al. [5] adjust the endpoint slacks by additive calibration factors referring to PrimeTime [7] reports. Since this mechanism requires frequent calibrations to keep the average errors low, a simpler correction factor is proposed for every RC tree using the correction factors [8]. However, it is also not straightforward to compute the corresponding correction factors. Recently, Cheng et al. use a machine-learning-based method to ensure the accuracy, but it is overcomplicated [9]. Above all, the existing calibrating methods are purely dependent on the accuracy of the referenced timing tool without circuit-level design insights on the origin of the dominant interconnect variability contributions.

In this paper, the proposed *N*-sigma model is built using skewness and kurtosis to quantify the  $n\sigma$  quantiles of the cell delay, considering the effect of operating conditions. Meanwhile, the  $n\sigma$  quantiles of the wire delay are accurately modeled under the influence of drive/load cells. With the interaction between cells and wires, the parameters of cell and wire models are calibrated considering the effect of each other. Finally, the  $n\sigma$  quantiles of the model estimate the +/-3 $\sigma$  delay precisely. Illustrated in Fig. 1, the path delay is calculated through timing propagating based on the cell delay (in blue) and the wire delay (in purple).

The main contributions can be summarized as follows:

1) To describe the cell delay distribution, we propose an *N*-sigma model using skewness and kurtosis. The coefficients of  $n\sigma$  quantiles are calculated through linear regression with the cell delay moments, which are calibrated considering the operating condition effects. For example, the operating condition of *Cell*<sub>FI</sub> in Fig. 1 is the input slew of pin A1 and the output load *C*<sub>FI</sub>.

2) Considering the process variations effect, the proposed *N-sigma* wire delay model is derived using the wire variability and Elmore model. The wire variability is calibrated by cell-specific coefficients denoted by drive/load cells. For instance, the wire delay of RC tree in Fig. 1 is calibrated by the drive cell  $Cell_{FI}$  and the load cell  $Cell_{FO}$ .

3) The correlation between cells and wires is revealed in this work. Based on the input slew and the output load, the



Fig. 1. A brief overview diagram of the path delay calcualtion flow.

moments of cell delay are calibrated through the interpolation

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 62274034 and Grant 61974024.

method. For the wire delay model, its variability can be accounted for by the driver/load cell strengths and the number of stacked transistors.

The rest of this paper is structured as follows: Section 2 introduces the related works. Section 3 describes the cell delay modeling process. Section 4 gives the details of the wire delay model. The accuracy of models is discussed in Section 5. Finally, Section 6 concludes this paper.

## II. RELATED WORKS

## A. Cell Delay Models

Cell delay models can be divided into two categories: using analytical expressions and using empirical models. Analytical approaches provide relationships between cell delays and drain currents. The detailed drain current model is studied by considering process variations and the load capacitance of the cell [10]. However, in near-threshold voltages, it's hard to build an accurate delay expression because of the intricated effect of process variations. Therefore, recent proposed analytical expressions are studied based on simplified scenarios, for example, only taking the threshold voltage variation into consideration [10]. On the other hand, empirical models directly estimate the PDF of delay distribution by introducing a Gaussian random variable. As delay distribution becomes asymmetrical in near-threshold voltages, the log-skew-normal based models are proposed that take the logarithm of the delay and then fit it to a skew-normal density function [11] [12]. Data for building the model is from Monte Carlo simulations. These empirical models treat the cell as a black box and construct the relationship between inputs and outputs through fittings or regressions. Unlike the analytical methods which construct the relationship between cell delay and process variations with complicated expressions [13], the empirical models directly fit the shape of its PDF and show higher accuracies [12].

#### B. Wire Delay Models

With increasing interconnect resistances and aggressive metal pitch scaling, the soaring RC delay overshadows the benefit of advanced device architectures and results in severe design issues [2]. For interconnect delay analysis, many metrics are widely used, from the explicit Elmore delay (delay with the first moment) expression to the D2M (delay with two moments) [14][9]. Existing metrics characterize the wire delay dependence on the parasitic files regardless of the connected cells' topology, leading to large errors compared to SPICE simulations. In addition, reduced-order modeling (MOR) approaches are recently researched extensively [15]. However, all established MOR methodologies result in dense system matrices that render their simulation impractical, since the simulation cost can easily overshadow the benefits obtained from dimension reduction. By referring to the results of the sign-off timers, additive calibration factors are used to adjust the endpoint slacks [5]. Some machine learning-based methods are also proposed with a sophisticated process for wire delay analysis [9]. Accordingly, a leaf-specific delay correction method is proposed to calibrate the Elmore results to improve their accuracy [8]. Nevertheless, the mechanism of building the delay correction factor is very ambiguous and open to various physical interpretations. As revealed in [16], wire delay is not only dominated by the interconnect structure, which quadratically increases with the wire length, but also dominated by the driver/load cells. Thus, a detailed wire delay modeling is needed to reveal the origin of the dominant interconnect variability contributions.

## III. STATISTICAL CELL DELAY MODELING

#### A. Cell Delay Modeling

Actually, estimating the full shape of the delay distribution is non-trivial. To ensure the timing yield during the chip signoff, the most important information for the designer is the 99.86% quantile as the worst-case delay [1]. The 99.86% quantile is equal to  $\mu + 3\sigma$  (mean  $\mu$  and standard deviation  $\sigma$ ) based on the traditional assumption of a Gaussian distribution. Inspired by it, this paper denotes the 0.14%, 2.28%, 15.87%, 50%, 84.13%, 97.72%, 99.86% quantiles of a Gaussian-like delay distribution as several sigma levels ( $-3\sigma$ ,  $-2\sigma$ ,  $-\sigma$ ,  $0\sigma$ ,  $+\sigma$ ,  $+2\sigma$ ,  $+3\sigma$ ). Fig. 2 shows the probability distribution functions (PDFs) of an inverter delay with the supply voltage from 0.5V to 0.8V.  $T_c$  represents the cell delay in this paper. If shaded samples account for 15.87% of total samples through MC simulation, the delay value,  $T_c(-2\sigma) = 12.15ps$ , is considered as the  $-2\sigma$  sigma level or 15.87% quantile.

The delay distribution at the near-threshold regime becomes asymmetric and with a longer tail. Under the circumstance, the  $\pm 3\sigma$  quantiles are unequal to  $\mu \pm 3\sigma$  based on the traditional assumption of a Gaussian distribution. The third moment, skewness  $\gamma$ , can be used to describe asymmetry in a non-Gaussian distribution [12]. Furthermore, kurtosis  $\kappa$ as the fourth moment is introduced to describe the thickness of a distribution's tail [17]. Fig. 3 shows the PDFs with different values of skewness and kurtosis. Different from a Gaussian distribution with skewness=0 and kurtosis=3, positive skewness in Fig. 3(a) makes the delay distribution left-skewed, skewing the  $n\sigma$  quantiles to the left. Kurtosis (>0) in Fig. 3(b) results in a higher distribution, swinging the  $n\sigma$ quantiles off their original position.

Based on the first four moments  $[\mu, \sigma, \gamma, \kappa]$ , we proposed an *N-sigma* cell delay model to estimate those quantiles. The input sets for regressions are moments  $[\mu, \sigma, \gamma, \kappa]$  of the cell delay, obtained through MC simulations. The output set for regressions is the quantiles representing sigma levels (-3 $\sigma$  to +3 $\sigma$ ) also captured by MC. As can be seen in Fig. 3(a), the



Fig. 2. The delay distribution of a inventer under different voltages (25°C).



Fig. 3. The effect of skewness and kurtosis on the characteristic of the quantiles  $(-3\sigma \text{ to } +3\sigma)$ .

skewness affects sigma points between  $-2\sigma$  to  $+2\sigma$  larger than the  $\pm 3\sigma$  sigma points. So, the  $T_{-\sigma}$ ,  $T_{+\sigma}$ ,  $T_{-2\sigma}$ ,  $T_{+2\sigma}$ , and  $T_{0\sigma}$ are built considering the skewness effect with the term  $\sigma\gamma$ . Considering the kurtosis effect in Fig. 3(b), the  $\pm 3\sigma$  and  $\pm 2\sigma$ represent more divergences. Hence, the  $T_{-2\sigma}$ ,  $T_{+2\sigma}$ ,  $T_{-3\sigma}$ , and  $T_{+3\sigma}$  in Table I are built with the term  $\sigma\kappa$  correspondingly. In addition, the  $n\sigma$  quantiles are affected by the skewness and the kurtosis at the same time, so the cross term  $\gamma\kappa$  must also be considered.  $A_{ni}$  and  $B_{nj}$  ( $0 \le i, j \le 2$ ) in Table I are the regression coefficients between moments and quantiles through MATLAB. Ultimately, the cell delay in a logic circuit can be expressed using the *N*-sigma quantiles model. In the rigorous situation, the sigma level can be extended to  $\pm 6\sigma$  to keep the stability and avoid timing failure.

For a single cell, the moments are constant under certain operating conditions (input slew and output load). When put the cell into a path, the impact of output load capacitance is closely related to the fanout interconnects (i.e., wires) [16]. The cell delay is also affected by the driven current of the previous cell, which can be quantified by the input slew [21]. Hence, the effect of topological connections (driver/load cells or wires) on the current cell is reflected by the input slew *S* and the output load *C*, causing variability of the moments [21]. To ensure the universality of the *N-sigma* cell delay model, the moments need to be calibrated to reflect the delay distribution of a cell in a path accurately.

#### B. Cell Moments Calibration

Given a standard cell library, the propagation delays of the cells are analyzed for different operating conditions. For each cell type and input pin, the moments of cell delay are calculated based on the samples extracted from 10k MC analysis. Fig. 4 shows the operating condition effect on the moments of an INV delay distribution. Purple curves reflect the moment changes with the input slew increasing at equal steps (10ps, 20 ps, ..., 300 ps) with a constant output load (0.4fF). Similarly, blue curves reflect the moment changes with the output load increasing at equal steps (0.1 fF),  $0.2fF, \ldots, 6.0fF$ ) with a constant input slew (10ps). In Fig. 4, the mean and standard deviation of the cell delay is in direct proportion to the input slew and the output load obviously. Differently, the values of skewness and kurtosis have a complicated change with the increase of the input slew and the output load. They need a higher-order regression, more like a cubic function.

The *N*-sigma cell delay calculation process is given in Fig. 5. Hence, considering operating condition effects, a calibrated model of cell delay moments is constructed. The change of the moments can be modeled through the interpolation method based on SPICE MC simulations. Firstly, the standard cell with a referenced operating condition ( $S_{ref} = 10ps$ ,  $C_{ref} =$ 

| Sigma<br>level  | Percent<br>defective | 1 1 2                                                                                         |
|-----------------|----------------------|-----------------------------------------------------------------------------------------------|
| $T_c(-3\sigma)$ | 0.14%                | $\mu - 3 * \sigma + B_{30} * \sigma \kappa + B_{31} * \gamma \kappa$                          |
| $T_c(-2\sigma)$ | 2.28%                | $\mu-2*\sigma+B_{20}*\sigma\gamma+B_{21}*\sigma\kappa+B_{22}*\gamma\kappa$                    |
| $T_c(-\sigma)$  | 15.87%               | $\mu - \sigma + B_{10} * \sigma \gamma + B_{11} * \gamma \kappa$                              |
| $T_c(0\sigma)$  | 50.00%               | $\mu + A_{00} * \sigma \gamma + A_{01} * \gamma \kappa$                                       |
| $T_c(+\sigma)$  | 84.13%               | $\mu + \sigma + A_{10} * \sigma \gamma + A_{11} * \gamma \kappa$                              |
| $T_c(+2\sigma)$ | 97.72%               | $\mu + 2 * \sigma + A_{20} * \sigma \gamma + A_{21} * \sigma \kappa + A_{22} * \gamma \kappa$ |
| $T_c(+3\sigma)$ | 99.86%               | $\mu + 3 * \sigma + A_{30} * \sigma \kappa + A_{31} * \gamma \kappa$                          |



Fig. 4. The first four moments of the INVx1 delay distribution under different operating conditions.



Fig. 5. The N-sigma cell delay model construction flow.

0.4fF) can be marked as the reference moments,  $M_{ref} = [\mu_0, \sigma_0, \gamma_0, \kappa_0]$ . The calculation of  $M_{ref}$  is helpful for characterizing each effect of the operating conditions. In this paper, the reference moments are confirmed under the reference operating conditions of the input slew *S* being 10*ps*, and the output load *C* being 0.4fF.  $\Delta S$  represents the margin in the input slew *S* and the reference slew  $S_{ref}$  which is similar to  $\Delta C$ :

$$\Delta S = S - S_{ref}; \ \Delta C = C - C_{ref} \tag{1}$$

Considering the approximate linearization between the  $\mu$  and  $\sigma$  with the operating conditions, the calibrated  $\mu'$  and  $\sigma'$  are calculated by a bilinear interpolation in (2). In addition, a cubic interpolation is adopted to calculate  $\gamma'$  and  $\kappa'$  accounting for the complicated variations caused by  $\Delta S$  and  $\Delta C$ . The cross term  $\Delta S \cdot \Delta C$  is considered both in (2) and (3) to ensure the accuracy of the interpolation method. Hence, the moments  $M_{cell} = [\mu', \sigma', \gamma', \kappa']$  under operating condition deviations { $\Delta S$ ,  $\Delta C$ } can be calculated through interpolation, where P, Q, R, and K are the coefficient vectors of the operating condition deviations.

$$[\mu', \sigma'] = [\mu_0, \sigma_0] + \mathbf{P} \cdot [\Delta S, \Delta C] + K \cdot \Delta S \cdot \Delta C$$
(2)

$$[\gamma', \kappa'] = [\gamma_0, \kappa_0] + \boldsymbol{P} \cdot [\Delta S, \Delta C] + \boldsymbol{Q} \cdot [\Delta S^2, \Delta C^2] + \boldsymbol{R} \cdot [\Delta S^3, \Delta C^3] + K \cdot \Delta S \cdot \Delta C$$
(3)

The  $n\sigma$  quantiles shown in Table I will be calibrated with the modified moments  $[\mu', \sigma', \gamma', \kappa']$  instead of the referenced moments  $[\mu_0, \sigma_0, \gamma_0, \kappa_0]$ . The coefficients  $A_{ni}$  and  $B_{nj}$  are fixed and still apply when the operating condition changes. All the coefficients **P**, **Q**, **R**, *K*,  $A_{ni}$ , and  $B_{nj}$  mentioned above are calculated and stored as the coefficients file in the look-up table form shown in Fig. 5. By applying the proposed model, each cell's quantiles can be quantified for arbitrary circuit netlist under any input slew and output load.

## IV. STATISTICAL WIRE DELAY MODELING

#### A. Wire Delay Modeling

Due to the simplicity of its computation, Elmore is the most popular metric as the first moment of wire delay [9]. The Elmore delay from node  $p_0$  to node  $p_N$  in Fig. 6 is given by (4). When process uncertainties increase and serious metal resistance shielding effects emerge as technology keeps shrinking, Elmore and other metrics diverge from SPICE simulation results [6]. The wire delay distribution becomes asymmetric as shown in Fig. 7. In this paper,  $T_w$  represents the wire delay and its mean and standard variance is  $\mu_w$  and  $\sigma_w$ , respectively. For the RC network in Fig. 7, the 99.86% quantile of  $T_w$  is 31.65 ps, demonstrating a nonnegligible error of Elmore which is equal to 22.19 ps.

$$T_{Elmore} = \mu_w = \sum_{k=1}^N R_{pk} \times C_{pk} \tag{4}$$

In general, the delay variability of an RC tree at a given technology and supply voltage depends on several factors, i.e. wire length, input slew, driver/load cell strength, and cell topology [16]. The delay correction method in [8] expresses the delay variability of logical cells and paths with the cell strength and the number of stacked transistors, without physical wires. Inspired by [8], we propose a novel calibration method to calculate the wire delay variability ( $\sigma_w/\mu_w$ ), represented by  $X_w$ . Experiment results from place-and-route netlists show that  $X_w$  is in proportion to the delay variability of drive/load cells. A more detailed process of modeling  $X_w$  is explored in the next subsection.



Fig. 7. Comparison of Elmore delay and the SPICE simulation results.



Fig. 8. Comparison of the wire delay distribution with driver/load INV cells for different strengths of 1, 2, and 4.

### B. Wire Coefficients Calibration

Fig. 8 shows an example of the delay distribution of the same RC tree with different driver/load inverters for strengths of 1, 2, and 4. Based on the observation of Fig. 8, the mean of the wire delay is proportional to the load/driver cell strengths. The standard variation is proportional to the load cell strengths and inversely with the driver cell strengths. Additionally, the wire delay variability  $(\sigma_w/\mu_w)$  is proportional to the load cell strengths and inversely with the driver cell strengths.

To reflect the effect of drive/load cells, we propose a novel calibration method using the wire delay variability  $(\sigma_w/\mu_w)$ inspired by [8]. Refer to Pelgrom's law [18], the wire delay variability is determined by the driver/load cell strengths  $(\sqrt{FI_{strength}}, \sqrt{FO_{strength}})$  and the number of stacked transistors (n) [19]. As a result of the averaging effect of variations across the transistor channel,  $\sigma_w/\mu_w$  decreases as the square root of the number of stacked transistors, and the cell strength under specific drive/load cells is given in (5).  $\sigma_{FI}/\mu_{FI}$  is a ratio of the standard deviation and the mean of the cell delay (*Cell<sub>FI</sub>*) which is similar to  $\sigma_{FO}/\mu_{FO}$ . Since the number of stacked transistors (n) of the driver/load cell (e.g., NAND) is integer multiples of transistors in an inverter, the FO4 cell (INVx4) shown in Fig. 7 can be taken as a baseline. The ratio  $\sigma_{FI}/\mu_{FI}$  ( $\sigma_{FO}/\mu_{FO}$ ) of an arbitrary driver (load) cell is in proportion to the ratio  $\sigma_{FO4}/\mu_{FO4}$  of an INVx4. As a result, the expression (5) can be converted to the (6) and the cellspecific coefficients  $X_{FI}$  and  $X_{FO}$  can be used to represent the wire delay variability caused by the driver cell and the load cell, respectively. Eventually, the wire delay variability can be precisely modeled as a linear combination of driver/load cellspecific coefficients in (7).

$$\begin{cases} \frac{\sigma_{FI}}{\mu_{FI}} \propto \frac{1}{\sqrt{n_{FI}}} \cdot \frac{1}{\sqrt{FI_{strength}}} \\ \frac{\sigma_{FO}}{\mu_{FO}} \propto \frac{1}{\sqrt{n_{FO}}} \cdot \frac{1}{\sqrt{FO_{strength}}} \end{cases}$$
(5)

$$\begin{cases} \frac{\sigma_{FI}}{\mu_{FI}} = X_{FI} \cdot \frac{\sigma_{FO4}}{\mu_{FO4}}\\ \frac{\sigma_{FO}}{\mu_{FO}} = X_{FO} \cdot \frac{\sigma_{FO4}}{\mu_{FO4}} \end{cases}$$
(6)

$$X_w = \frac{\sigma_w}{\mu_w} = X_{FI} \cdot \frac{\sigma_{FI}}{\mu_{FI}} + X_{FO} \cdot \frac{\sigma_{FO}}{\mu_{FO}}$$
(7)

The variation  $\sigma_w$  of the wire delay is shown in (8) with  $X_w$  represented the calibrated coefficients. Covering the asymmetry of the distribution with the wire delay variability  $\sigma_w/\mu_w$ , the  $n\sigma$  quantiles can be characterized in (9).

$$\sigma_w = \mu_w \cdot X_w = T_{Elmore} \cdot X_w \tag{8}$$

$$T_w(n\sigma) = (1 + n \cdot X_w) \cdot T_{Elmore}$$
(9)

In a specific circuit, a path can be represented by a set of primary inputs, a set of primary outputs, a set G of standard cells, and a set N of nets representing the interconnections between these elements. As shown in (10), the  $n\sigma$  quantiles of the path delay arrival time  $T_{path}$  are composed of cell delay and wire delay, denoted as  $T_c$  and  $T_w$ , ultimately.

$$T_{path}(n\sigma) = \sum_{cells} T_c(n\sigma) + \sum_{wires} T_w(n\sigma) \quad (10)$$
  
V. EVALUATION

## A. Experimental Setup

The supply voltage is set to 0.6V in the near-threshold region and the temperature is 25°C. Based on TSMC 28 nm PDK, the accuracy of delay models is verified by comparing the data obtained through SPICE simulations with 10k MC samples under global and local variations. The accuracy of path delay analysis is verified using the ISCAS85 benchmark suite and the functional units of PULPino [20], an open-source RISC-V microprocessor. All experiments are run on a 4.2 GHz Intel i9-12900k processor.

## B. Accuracy of Cell Delay Model

The parameters in the *N*-sigma cell delay distribution are modeled as shown in Table I and are verified under the FO4 constraint. Table II shows the errors of estimated +/-3 $\sigma$  delay using LSN [12], Burr [13], and the proposed N-sigma models compared to SPICE simulation results. Obviously, the Burrbased model cannot be used for estimating the  $+3\sigma$  delay in the near-threshold voltage region. On the contrary, the average error of the LSN-based delay model is less than 5% and the Nsigma model is always less than 3% for each cell at 0.6V. The *N-sigma* model shows a remaining stable prediction accuracy no matter for the complex logic cell AOI or the simple logic cells like NOR.

#### C. Accuracy of Wire Delay Model

To illustrate the accuracy and efficiency of the proposed method, five examples of RC interconnect circuits are provided for comparison studies. Each resistor and capacitor are randomly chosen from the parasitic files. The schematic diagram of an RC network with a driver cell and a load cell at both ends of the wire. The cell strength constraints of the driver/load cells are set to be FO1, FO2, FO4, and FO8. The

TABLE II: ACCURACY OF ESTIMATING THE  $\pm -3\sigma$  Cell Delay COMPARED TO SPICE SIMULATION RESULTS.

| Std cell | Errors of cell model (%) |             |            |             |            |             |  |  |  |  |  |  |
|----------|--------------------------|-------------|------------|-------------|------------|-------------|--|--|--|--|--|--|
|          | LSN[12]                  |             | Bu         | rr[13]      | Ours       |             |  |  |  |  |  |  |
|          | <i>-3σ</i>               | +3 <b>0</b> | <i>-3σ</i> | +3 <b>σ</b> | <i>-3σ</i> | +3 <b>0</b> |  |  |  |  |  |  |
| NOR2x1   | 5.04                     | 7.89        | 11.66      | 10.67       | 3.57       | 4.81        |  |  |  |  |  |  |
| NOR2x2   | 4.78                     | 6.31        | 14.56      | 9.45        | 3.17       | 2.56        |  |  |  |  |  |  |
| NOR2x4   | 5.23                     | 7.82        | 16.79      | 12.56       | 3.09       | 3.67        |  |  |  |  |  |  |
| NOR2x8   | 6.48                     | 8.97        | 10.20      | 10.45       | 2.67       | 3.78        |  |  |  |  |  |  |
| NAND2x1  | 3.44                     | 4.78        | 11.25      | 6.98        | 2.31       | 1.79        |  |  |  |  |  |  |
| NAND2x2  | 5.87                     | 5.98        | 15.68      | 6.76        | 2.71       | 2.97        |  |  |  |  |  |  |
| NAND2x4  | 5.67                     | 7.34        | 10.65      | 12.57       | 1.01       | 1.95        |  |  |  |  |  |  |
| NAND2x8  | 4.18                     | 8.45        | 11.77      | 10.67       | 1.04       | 1.67        |  |  |  |  |  |  |
| AOI2x1   | 5.72                     | 6.79        | 8.46       | 13.78       | 3.31       | 3.97        |  |  |  |  |  |  |
| AOI2x2   | 9.97                     | 11.89       | 11.78      | 9.76        | 2.78       | 3.75        |  |  |  |  |  |  |
| AOI2x4   | 7.83                     | 10.46       | 12.56      | 10.35       | 2.67       | 2.89        |  |  |  |  |  |  |
| AOI2x8   | 10.26                    | 13.31       | 13.68      | 12.56       | 2.66       | 2.67        |  |  |  |  |  |  |
| Avg.     | 5.50                     | 7.67        | 12.42      | 10.55       | 2.03       | 2.73        |  |  |  |  |  |  |

errors in estimating  $X_{FI}$  and  $X_{FO}$  are shown in Fig. 9 by fitting MC simulations which are about 1.92% and 3.31%, respectively. Based on  $X_{FI}$  and  $X_{FO}$ , the average errors for the  $-3\sigma$  and  $+3\sigma$  wire delay estimations are 1.61% and 2.39% as shown in Fig. 10. In addition, a comparison of  $+3\sigma$  delay of each wire on the critical path of C432 is exhibited in Fig. 11. The Elmore model produces larger differences from MC simulation results compared to the proposed N-sigma wire model. Hence, the proposed N-sigma model is significantly effective to estimate the wire delay.

### D. Accuracy of Path Delay Analysis

The netlists of benchmark circuits are generated through Design Compiler. The parasitic information defined by the SPEF files is obtained through IC Compiler. To verify the validity of the path delay analysis, Table III depicts the precision of the MC simulation results, the PrimeTime-based results [7], the ML-based method [9], the correction-based method [8], and our method. The path delay in the ML-based method is consist of the LUT-based cell delay and the MLbased wire delay. The network to calculate the wire delay is trained using the first and second moments and many other features. The correction-based method calibrates the Elmore delay with the help of the PrimeTime report. Compared with the results from 5000 MC simulations, the  $+3\sigma$  error is about 3.67% while the -3  $\sigma$  of our method is about 5.62%, respectively. Besides, the ML-based method and the correction-based method show errors of 18.27% and 11.73% compared to the  $+3\sigma$  of MC simulation results, respectively. In addition, the analysis result based on our method is more



Fig. 11. Comparison of the prediction errors of each wire at the critical path of C432

Circuit path of C432

Wire6

Wire7

Wire9

Wire4

Wire1

TABLE III: THE PATH ANALYSIS RESULTS BASED ON ISCASS5 BENCHMARKS AND THE FUNCTIONAL UNITS OF PULPINO PROFESSOR

|                   |       |          | Critical path Delay (ns) |             |       |      |            | Errors of path delay (%) |             |      |      |            | Runtime (s) |             |        |     |     |            |      |
|-------------------|-------|----------|--------------------------|-------------|-------|------|------------|--------------------------|-------------|------|------|------------|-------------|-------------|--------|-----|-----|------------|------|
| Path #Ne          |       | s #Cells | МС                       |             | PT ML |      | Correction | Ours                     |             | PT   | ML   | Correction | Ours        |             | МС     | PT  | ML  | Correction | Ours |
|                   |       |          | <i>-3σ</i>               | +3 <b>σ</b> | [7]   | [9]  | [8]        | <i>-3σ</i>               | +3 <b>σ</b> | [7]  | [9]  | [8]        | <i>-3σ</i>  | +3 <b>σ</b> | me     | [7] | [9] | [8]        | Ours |
| C432              | 734   | 655      | 584                      | 1015        | 1359  | 1267 | 1156       | 635                      | 1075        | 33.9 | 24.9 | 13.9       | 8.7         | 5.9         | 1196.5 | 2.0 | 0.5 | 3.4        | 1.1  |
| C1355             | 1091  | 977      | 523                      | 921         | 1297  | 1190 | 1036       | 559                      | 943         | 40.8 | 29.2 | 12.4       | 6.9         | 2.4         | 1211.5 | 1.9 | 0.9 | 2.7        | 1.5  |
| C1908             | 1184  | 1093     | 727                      | 1272        | 1698  | 1467 | 1396       | 758                      | 1296        | 33.5 | 15.4 | 9.7        | 4.3         | 1.8         | 1173.2 | 2.3 | 1.1 | 2.9        | 1.7  |
| C2670             | 2415  | 1810     | 686                      | 1177        | 1589  | 1274 | 1287       | 717                      | 1225        | 34.9 | 8.2  | 9.3        | 4.5         | 4.1         | 1351.5 | 2.4 | 2   | 3.5        | 2.8  |
| C3540             | 2290  | 2168     | 252                      | 462         | 605   | 589  | 530        | 267                      | 470         | 30.9 | 27.4 | 14.6       | 5.9         | 1.7         | 1651.1 | 2.2 | 2.1 | 4.1        | 3.1  |
| C6288             | 3725  | 3246     | 520                      | 890         | 1221  | 1017 | 990        | 541                      | 910         | 37.2 | 14.3 | 11.2       | 4.1         | 2.3         | 1303.4 | 3.1 | 2.8 | 5.3        | 5.1  |
| C5315             | 5371  | 5275     | 879                      | 1581        | 1972  | 1774 | 1690       | 905                      | 1599        | 24.7 | 12.2 | 6.8        | 2.9         | 1.1         | 1656.2 | 2.5 | 2.7 | 3.9        | 8.1  |
| C7552             | 4536  | 4041     | 766                      | 1368        | 1697  | 1597 | 1516       | 796                      | 1377        | 24.1 | 16.8 | 10.8       | 3.8         | 0.7         | 2001.3 | 2.9 | 2.4 | 3.7        | 6.6  |
| ADD of<br>PULPino | 2531  | 4088     | 784                      | 1867        | 2670  | 2677 | 2356       | 834                      | 1999        | 42.9 | 30.2 | 15.4       | 6.3         | 7.1         | 1841.5 | 2.3 | 2.1 | 4.5        | 6.3  |
| SUB of<br>PULPino | 2576  | 3066     | 856                      | 1903        | 2549  | 2699 | 2245       | 902                      | 1970        | 33.9 | 15.5 | 17.9       | 5.3         | 3.5         | 1996.2 | 2.6 | 1.9 | 4.1        | 6.47 |
| MUL of<br>PULPino | 62967 | 49570    | 4908                     | 6856        | 8492  | 8566 | 7436       | 5238                     | 7315        | 23.9 | 17.6 | 11.3       | 6.7         | 6.7         | 2438.2 | 3.7 | 3.3 | 5.2        | 73.6 |
| DIV of<br>PULPino | 91932 | 51654    | 5178                     | 7099        | 7692  | 7730 | 7590       | 5578                     | 7568        | 16.8 | 7.5  | 6.9        | 7.7         | 6.6         | 2346.3 | 3.9 | 3.1 | 5.6        | 78.9 |
| Avg.              | -     | -        | -                        | -           | -     | -    | -          | -                        | -           | 31.4 | 18.3 | 11.7       | 5.6         | 3.6         | 1807.3 | 2.7 | 2.1 | 4.1        | 16.3 |

accurate than the correction-based method depending on PrimeTime.

Since  $X_{FI}$  and  $X_{FO}$  need to be calculated for quantifying each driver/load cell effect, and is the main process of the whole timing analysis, the runtime of our proposed method is in direct proportion to the number of cells. The ratios of runtime to the cell numbers are 1.1/655=0.0016 and 78.9/51654=0.0015 for c432 and DIV, respectively. For huge designs, the proposed path analysis flow is slightly longer but still acceptable compared to MC simulation method. It is worth noting the ML-based method needs long-time training time which is quite time-consuming and requires high memory storage. In summary, through modeling the wire variability factors, the  $\pm 3\sigma$  of path delay can be calculated with sufficient precision even though the runtime is slightly longer than the three other methods.

#### VI. CONCLUSION

To minimize divergence between chip timing results and a signoff tool, an accurate method to model the cell and wire delay is necessary. In this paper, the *N-sigma* model is built for quantifying the cell delay and wire delay under nearthreshold voltage. The cell delay model is calibrated considering the operating condition effects. The wire delay is accurately modeled with the cell-specific coefficients representing the influence of driver/load cells. The cell and wire delay through calibration demonstrate a higher precision compared to other models. In future work, the runtime of the proposed model can be reduced with methods like GPU acceleration to make it close other models.

#### REFERENCES

- D. Blaauw, K. Chopra, et al, "Statistical timing analysis: From basic principles to state of the art," *IEEE T. Comput. Aid D.*, vol. 27, no. 4, pp. 589–607, 2008.
- [2] Fu W, Jin L, et al. "A cross-layer power and timing evaluation method for wide voltage scaling," in *Proc. DAC*, 2020, pp.1-6,.
- [3] Huynh-Bao T, Ryckaert J, et al. "Statistical timing analysis considering device and interconnect variability for BEOL requirements in the 5-nm node and beyond,". *IEEE T. VLSI Systems*, 2017, 25(5): 1669-1680.
- [4] Kahng A B. New game, new goal posts: A recent history of timing closure[C] in *Proc. DAC*, pp.1-6, 2015.
- [5] C. W. Moon et al., "Method of designing a digital circuit by correlating different static timing analyzers," U.S. Patent 7 823 098, Oct. 26, 2010.

- [6] Han K, Kahng A B, Lee H, et al. "Performance-and energy-aware optimization of BEOL interconnect stack geometry in advanced technology nodes," in *Proc. ISQED*.pp. 104-110, 2017.
- [7] PrimeTime-PX User Guide, Version 2008.12, Synopsys, Mountain View, CA, USA, 2008.
- [8] Sharma A, Chinnery D, Reimann T, et al. Fast Lagrangian relaxationbased multithreaded gate sizing using simple timing calibrations[J]. *IEEE T. Comput. Aid D.*, 2019, 39(7): 1456-1469.
- [9] Cheng H H, et al. "Fast and accurate wire timing estimation on tree and non-tree net structures," in *Proc. DAC*, pp. 1-6, 2020.
- [10] P. Cao, Z. Liu, J. Guo, and J. Wu, "An analytical gate delay model in near/subthreshold domain considering process variation," *IEEE Access*, vol. 7, pp. 171515–171524, 2019.
- [11] Ramprasath S, et al. A skew-normal canonical model for statistical static timing analysis[J]. *IEEE T. VLSI Systems*, 2015, 24(6): 2359-2368.
- [12] H. A. Balef, et al., "All-region statistical model for delay variation based on log-skew-normal distribution," *IEEE T. Comput. Aid D.*, vol. 35, no. 9, pp. 1503–1508, 2015.
- [13] A. Moshrefi, et. al, "Statistical estimation of delay in nano-scale cmos circuits using burr distribution," *Microelectronics Journal*, vol. 79, pp. 30–37, 2018.
- [14] Elmore W C. The transient response of damped linear networks with particular regard to wideband amplifiers[J]. *Journal of applied physics*, 1948, 19(1): 55-63.
- [15] Antoniadis C, Evmorfopoulos N, Stamoulis G. Efficient sparsification of dense circuit matrices in model order reduction[C]// Proc. ASPDAC, pp. 255-260, 2019.
- [16] Ciofi I, Contino A, et al. Impact of wire geometry on interconnect RC and circuit delay[J]. *IEEE Transactions on Electron Devices*, 2016, 63(6): 2488-2496.
- [17] Jin L, et al. A Statistical Cell Delay Model for Estimating the 3σ Delay by Matching Kurtosis[J]. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 2022, 69(6): 2932-2936.
- [18] M. J. M. Pelgrom, et al, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, Oct. 1989.
- [19] Alioto M, et al. A novel framework to estimate the path delay variability on the back of an envelope via the fan-out-of-4 metric[J]. *IEEE Transactions on Circuits and Systems I: Regular Papers*, 2017, 64(8): 2073-2085.
- [20] Traber A, Zaruba F, Stucki S, et al. PULPino: A small single-core RISC-V SoC[C]//3rd RISCV Workshop. 2016.
- [21] Yu L, Saxena S, Hess C, et al. "Statistical library characterization using belief propagation across multiple technology nodes," in *Proc. DATE*, 2015, pp. 1383-1388.
- [22] Schneider, et al. "GPU-accelerated time simulation of systems with adaptive voltage and frequency scaling," in *Proc. DATE*, 2020, pp, 879-884