# An Efficient Decoupling Capacitance Optimization Using Piecewise Polynomial Models

Xiaoyi Wang<sup>\*</sup>, Yici Cai<sup>\*</sup>, Sheldon X.-D. Tan<sup>†</sup>, Xianlong Hong<sup>\*</sup>, Jacob Relles<sup>†</sup>

\*Department of Computer Science and Technology, TsingHua University, Beijing, China, 100084

<sup>†</sup>Department of Electrical Engineering, University of California, Riverside, CA 92521

Abstract—This paper proposes an efficient decoupling (decaps) capacitance optimization algorithm to reduce the voltage noise of on-chip power grid networks. The new method is based on the efficient charge formulation of the decap allocation problem. But different from the existing work [12], the new method applies the more accurate piecewise polynomial micromodels to estimate the voltage noises during the linear programming process. The resulting method overcomes the over-estimation problem, which plagues the existing method. The proposed method has the best of two worlds: it has the efficiency of the charge-based methods and the accuracy of the sensitivity-based methods. Experimental results demonstrate that the proposed method leads to the decap values similar to that of the sensitivity-based methods, which give the best reported results and are much better than the existing charge-based method, and at the same time, it enjoys the similar efficiency of the charge-based method.

#### I. INTRODUCTION

With increasing integration density and roaring clock frequency, reliable on-chip power supply becomes a challenging task for VLSI chip design. Excessive voltage variations, including voltage drop and ground bounce, would have an adverse impact on chip performance and reliability, since they not only degrade the already tight noise margin in today's VLSI circuits, but also increase gate delay, cause false logic switching, and sometimes even lead to logic failure. In practice, most designs now require voltage drop to be confined to certain percentage of the nominal supply voltage [1]. Adding decoupling capacitance (decaps) has been proved to be the most efficient way to restrict the voltage drops in on-chip power grid networks. However excessive decaps will cause more leakage currents and reduce the reliability of the chips. As a result, decap should be added in an economical way to optimally balance its noise reduction functions and adverse effects.

Budgeting decap in an area-efficient way, however, is a difficult task due to prohibitive analysis costs of power/grid (P/G) networks with millions of nodes and extracted on-chip and off-chip RLC components in modern VLSI design. Mathematically, optimal decap allocation is a nonlinear optimization problem [9]. Existing approaches can be roughly classified into two categories: the sensitivity based methods and the charge-based methods.

Sensitivity-based approaches [2], [4]–[6], [8], [10] use sensitivity information to guide the optimization processes. To

This work is funded in part by National Natural Science Foundation of China (NSFC) grants under No. 60828008 and No. 60776026 and in part by Natioanl Science Foundation (NSF) grant under No. OISE-0623038.

compute the sensitivity, transient simulations of the whole P/G networks have to be carried out at every optimization step. As a result, those methods are very expense and difficult to scale for large circuits. To mitigate the problem, multigrid concept was proposed in [10], where the sensitivity-based optimization is performed on a size-reduced mesh. The method, however, is limited to the regular-mesh or regular-mesh-like P/G structures. In [6], the whole circuit is partitioned into a number of sub-circuits based on the local effect of adding decap to reduce IR drops, and each of them is optimized individually. In [2], leakage current impacts of decaps were explicitly considered in sensitvity-based framework. Recently the partitioning is done more efficiently using random walk concept [5]. Sensitivity based method has the advantage over charge-based methods in that they can deliver more accurate and smaller decap values as they directly monitor the voltage drops during the optimization processes.

The second methods are operating on the charges or the transformed voltage drops (the voltage-time integral), which can lead to more simple problem formulations [3], [11], [12]. Those methods perform the approximate integration on the differential equations. As a result, no transient simulations are required during the optimization and the methods are much more efficient than the sensitivity-based methods. However, they generally suffer over-estimation issue as the approximation can be quite rough as we will demonstrate in this paper.

In this paper, we propose an efficient charge-based decap optimization method, which has the best of two worlds: it has the efficiency of the charge-based methods and the accuracy of the sensitivity-based methods. The new method is based on more accurate modeling of the voltage-time integral constraints, which is critical for the accurate estimation of decap during the optimization. We propose a more accurate piecewise polynomial model to approximate the voltage-time integral instead of the simple linear model as in the existing method [12]. We show that the existing approach can lead to significant over estimation of the decap values. Experimental results show that the new approach delivers the much smaller decap values, similar to the sensitivity-based method, than the latest charge-based methods.

## **II. PROBLEM FORMULATION**

The purpose of decap optimization is to use minimum decap to avoid the excessive voltage drop. At the same time, the maximum single capacitance is bounded due to the physical constraint. Then we can formulate the decap optimization problem as following:

$$\min_{\substack{\text{subject to}}} \sum_{i} c_i \tag{1}$$

$$V \ge V_{\text{thre}} \tag{2}$$

$$c_{int,i} \leq c_i \leq c_{\max,i}$$

where  $c_i$  is the capacitance, which includes both intrinsic and decoupling capacitances and  $c_{int,i}$  is the intrinsic capacitance at node *i*.  $C = diag(c_1, \ldots, c_n)$ , V is the transient voltage waveform vector,  $V_{\text{thre}}$  is the threshold voltage,  $c_{\max,i}$  is the maximum capacitance allowed at node *i*, *G* is the conductance matrix obtained by modified nodal analysis (MNA) and J is the current source vector. In our formulation, we ignore the inductances as inductances are still less important at the chip level [7].

### III. REVIEW OF SLP-BASED DECAP BUDGET METHOD

There are four steps in the successive linear programming (SLP) based decap optimization method [12] as described in algorithm 1.

| L | Run the transient simulation of power/ground | l network | once | with | the |
|---|----------------------------------------------|-----------|------|------|-----|
|   | intrinsic capacitances and placed decap;     |           |      |      |     |

- 2 Find out the violation regions, (i.e. the regions where the voltage drop constraints are violated) and the violation time windows (i.e. the time windows during which violations occur). Also Decide the sampling nodes, i.e. the nodes to which the decap will be attached. If no violation is found, the optimization ends;
- **3** For each violation region, determine the optimal amount of decap by linear programming or sequential linear programming ;
- Distribute the decap of the sampling nodes into the respective sampling regions. Go back to step 1 until no violation is found;
- Algorithm 1: The flow of decap optimization in [12].

In the critical third step of algorithm 1, the decap optimization problem (1) is formulated as a linear programming problem.

The essential idea of this formulation is to transform the MNA equations and voltage drop constraints (2) to linear constraints by integration and approximation of voltage waveforms. First, the MNA equations (2) are transformed to "charge transfer equations" by integration and macromodeling. Suppose we have already found out the violation regions and sampling nodes by transient simulation and the violation time window is determined as  $[t_0, t_1]$  (please refer to [12] for more details,). Then the MNA equation (2) is partitioned as follows:

$$\begin{pmatrix} G_{11} & G_{12} \\ G_{12}^T & G_{22} \end{pmatrix} \cdot \begin{pmatrix} \int_{t_0}^{t_1} \mathbf{V}_1 \, \mathrm{dt} \\ \int_{t_0}^{t_1} \mathbf{V}_2 \, \mathrm{dt} \end{pmatrix} = \begin{pmatrix} \int_{t_0}^{t_1} \mathbf{J}_1 \, \mathrm{dt} \\ \int_{t_0}^{t_1} \mathbf{J}_2 \, \mathrm{dt} + \int_{t_0}^{t_1} \mathbf{I} \, \mathrm{dt} \end{pmatrix}$$

where  $V_2$  is the voltages of the sampled violating nodes and I is the current supplied by decap. Then the charge transfer equations are formulated by macromodeling :

$$\boldsymbol{Q} = \boldsymbol{A} \cdot \boldsymbol{W} + \boldsymbol{B} \tag{3}$$

where  $Q = \int_{t_0}^{t_1} I \, dt$  is the charge transferred from decap,  $A = G_{22} - G_{12}^T G_{11}^{-1} G_{12}$  is the admittance matrix, W



Fig. 1. Voltage Waveforms without & with decap

is  $\int_{t_0}^{t_1} V_2 dt$  and  $B = (G_{12}^T G_{11}^{-1} \int_{t_0}^{t_1} J_1 dt - \int_{t_0}^{t_1} J_2 dt)$  is the equivalent charge drawn by current sources. Second, the voltage constraints (2) are also transformed to linear constraints on voltage-time integral W. As shown in Figure 1, the voltage waveform should be above the  $V_{\text{thre}}$  after adding decap. Correspondingly, the integral of voltage waveform (W) should be no less than the shaded area. However, the shaded area is unknown until the decap is added, [12] suggests using the linear approximation of this area. Hence, the transient voltage constraint (2) is re-formulated as:

$$\boldsymbol{W} \ge \frac{1}{2} (\boldsymbol{V}(t_0) + V_{\text{thre}})(t_1 - t_0)$$

Since all the constraints in problem (1) have been transferred to linear constraints, we end up with a linear programming problem:

min 
$$\sum_{i \in \mathbf{s}} c_i$$
 (4)

$$M' \circ C \ge A \cdot W + B$$

$$W \ge L$$
 (5)

 $c_{int,i} \leqslant c_i \leqslant c_{\max,i}$ 

where  $\boldsymbol{C} = \begin{pmatrix} c_1 & c_2 & \dots & c_m \end{pmatrix}^T$ ,

s.t

$$\boldsymbol{M}' = \begin{pmatrix} \boldsymbol{V}_{1}(t_{0}) - \boldsymbol{V}_{1}(t_{1}) \\ \boldsymbol{V}_{2}(t_{0}) - \boldsymbol{V}_{2}(t_{1}) \\ \vdots \\ \boldsymbol{V}_{m}(t_{0}) - \boldsymbol{V}_{m}(t_{1}) \end{pmatrix}, \boldsymbol{L} = \begin{pmatrix} (V_{\text{thre}} + \boldsymbol{V}_{1}(t_{0})) \cdot T \\ (V_{\text{thre}} + \boldsymbol{V}_{2}(t_{0})) \cdot T \\ \vdots \\ (V_{\text{thre}} + \boldsymbol{V}_{m}(t_{0})) \cdot T \end{pmatrix},$$
(6)

S is the set of sampling nodes, m = |S|,  $T = (t_1 - t_0)/2$ and the  $\circ$  operation is the entry-wise product.

There are, however, two inaccurate variables in such linear programming formulation. The first one is  $V(t_1)$ . Ideally, Vshould reach the threshold voltage at  $t_1$ , i.e.  $V(t_1) = V_{\text{thre}}$ . However,  $V(t_1)$  could also be above  $V_{\text{thre}}$  practically. As a result, [12] proposed a successive linear programming (SLP) to mitigate this problem. The second non-accurate variable is L, the lower bound of the voltage-time integral. As illustrated in Figure 1, the voltage waveform after adding decap is approximated by a straight line. So L is the area of trapezoid,



Fig. 2. Remaining violations after adding decap determined by SLP based method.

which can be quite different from  $\int_{t_0}^{t_1} V \, dt$  after adding decap. Our paper is focusing on improve the second inaccuracy by using more accurate piecewise polynomial models.

## IV. VIOLATION RESIDUE AND VIOLATION TIME WINDOW DRIFT

In this section, we analyze the inaccuracy in the modeling of voltage-time integrals and its impacts on the decap optimization.

## A. Violation Residue

One important observation we have for the existing SLP problem 4 is that it usually fails to eliminate all violations completely. Figure 2 shows one of the failing cases. In this figure, we plot the voltage waveforms of 3 sampling nodes. The left sub-figure shows the voltage drop improvement with decap budget by sensitivity based non-linear optimization [4]. And the right sub-figure shows the voltage drop improvement with decap by SLP based method which shows the SLP based method is unable to remove the violations completely.

Table I shows the overall voltage violations before adding any decap and after adding decap by sensitivity based method and by SLP method, where the voltage violation  $Vio = \sum_{i \in vio} \int V dt$ , #Vio is the number of violating nodes. As we can see, the violations still present after SLP while sensitivity based method eliminates the violation completely.

|                                                          | No Decap                                 |    |           | itivity | Based Decap Budget | SLP De   |      |         |  |  |  |
|----------------------------------------------------------|------------------------------------------|----|-----------|---------|--------------------|----------|------|---------|--|--|--|
|                                                          | Vio #Vio                                 |    | Vio # Vio |         | budget             | Vio      | #Vio | budget  |  |  |  |
|                                                          | 0.0041                                   | 77 | 0.0       | 0       | 82.4 pF            | 0.000507 | 26   | 80.3 pF |  |  |  |
| TABLE I                                                  |                                          |    |           |         |                    |          |      |         |  |  |  |
| VOLTAGE DROP VIOLATIONS BEFORE AND AFTER ADDING DECAP BY |                                          |    |           |         |                    |          |      |         |  |  |  |
|                                                          | SENSITIVITY BASED METHOD AND SLP METHOD. |    |           |         |                    |          |      |         |  |  |  |

Further analysis shows the violation residue is caused by the inaccurate linear approximation of L in equation (5). This fact becomes obvious if we analyze the voltage waveforms shown in Figure 3. There are three curves in Figure 3 : (1) The curve V is the original voltage waveform when no decap is added; (2) The curve  $V_{opt}$  is the voltage waveform when optimal decap  $C_{opt}$  is added, which is the desired curve; (3)



Fig. 3. Analysis of voltage waveforms with different decap values.

And the curve  $V_{\rm slp}$  is the voltage waveform after adding decap  $C_{\rm slp}$  by SLP method. We also have three areas after integrating the voltage waveforms: (1) L' is  $\int_{t_0}^{t_1} V_{\rm opt} dt$ ; (2) W' is  $\int_{t_0}^{t_1} V_{\rm slp} dt$ ; (3) L is the linear approximation of L', as defined in equation (6). The original voltage constraint (2) is equivalent to  $W' \ge L'$ . Because the L' is not known beforehand, the linear approximation L is used in SLP problem (4). Consequently,  $W' \ge L$  is satisfied. However, as we can see in Figure 3, area L' is much larger than area L. And it happens in this case  $L' > W' \ge L$ . So the curve  $V_{\rm slp}$ violates the voltage drop constraints (2).

#### B. Violation Time Window Drift

It's obvious that the violation residue problem shown in section IV-A has already been noticed in [12]. In order to cope with this problem, [12] suggests to apply another outer iteration as stated in algorithm 1 until the voltage constraints (2) are satisfied.

However, simply repeating the SLP method may place excessive decap. Figure 4 shows waveforms after the first and second iterations, where  $V^{(0)}$  is the initial voltage waveform without any capacitance (even the intrinsic capacitance is ignored),  $V^{(1)}$  is the voltage waveform after applying SLP method once,  $V^{(2)}$  is the voltage waveform obtained by SLP method using the  $V^{(1)}$  as input waveform, as stated in algorithm 1 and the dashed curve  $V_{opt}$  is the voltage waveform after adding optimal decap by sensitivity based method. As we can see, the voltage violation does disappear after these 2 iterations. However, we also can see the resulting voltage waveform  $V^{(2)}$  is high above the waveform  $V_{opt}$ , which is the desired waveform.

The reason for this decap over-design is that the "violation time window" has drifted from  $[t_0, t_1]$  to  $[t_0, t_1']$  during the iterations. Because the SLP method is supposed to move the maximum voltage drop time point (i.e. minimum voltage time point) to the end of violation time window (otherwise there could be other violating voltage out of violation time window), so the minimum voltage occurs at time point  $t_1'$  after the second iteration, as demonstrated in Figure 4. However, this voltage has to be much larger than the threshold voltage,



Fig. 4. Simple iteration of the SLP method leads to decap over-estimation.

i.e.  $\mathbf{V}^{(2)}(t'_1) > V_{\text{thre}}$ , which is also shown in Figure 4. Actually, we can see this fact from the MNA equation  $C \cdot \frac{d\mathbf{V}^{(2)}}{dt} + G \cdot \mathbf{V}^{(2)} = \mathbf{J}$ . Since  $\frac{d\mathbf{V}^{(2)}}{dt}(t'_1) \approx 0$  (because  $t'_1$  is minimum voltage time point as mentioned before), so we know  $G \cdot \mathbf{V}^{(2)}(t'_1) \approx \mathbf{I}(t'_1)$ . While we also know  $G \cdot \mathbf{V}^{(0)} = \mathbf{I}$  (the MNA with C = 0), we conclude that  $\mathbf{V}^{(2)}(t'_1) \approx \mathbf{V}^{(0)}(t'_1) > \mathbf{V}^{(0)}(t_1) = V_{\text{thre}}$ . This deduction is also proved by the waveforms in Figure 4.

Therefore, the simple iteration of SLP pulls the voltage much higher than necessary and consequently costs more decap than necessary. Table II shows the overall decap amount during the iteration of SLP.  $C_{opt}$  is the decap amount obtained by sensitivity based decap optimization and C is the decap budget of SLP based method. This table shows the negative effect of violation time window drift to decap optimization.

| Copt     | # Iter | Vio      | #Vio | C        | $\Delta C = (C - C_{\rm opt})/C_{\rm opt}$ |  |  |  |  |  |  |
|----------|--------|----------|------|----------|--------------------------------------------|--|--|--|--|--|--|
| 82.4 pF  | 1      | 0.000507 | 26   | 80.3 pF  | -2.5%                                      |  |  |  |  |  |  |
|          | 2      | 0.0      | 0    | 113.1 pF | +38.82%                                    |  |  |  |  |  |  |
| TABLE II |        |          |      |          |                                            |  |  |  |  |  |  |

THE VIOLATION AND DECAP AMOUNT IN EACH ITERATIONS.

#### V. NEW METHOD BY PIECEWISE POLYNOMIAL MODELS

In this section, we present the new decap budgeting method by using more accurate piecewise polynomial method.

#### A. Voltage Waveform Prediction by Piecewise Polynomial

As we know that the violation residue is mainly caused by inaccuracy of L, we propose a more accurate piecewise polynomial based method to improve the prediction of voltage constraint L. The basic idea is to approximate the voltage waveform  $V_{opt}$  by piecewise polynomial instead of a simple straight line. However, this approximation is difficult because we don't know the curve  $V_{opt}$  until the optimal decap is added. Nevertheless, we still can observe some properties of curve  $V_{opt}$ , which will be helpful for our approximation :

1)  $\frac{\mathrm{d} \boldsymbol{V}_{\mathrm{opt}}}{\mathrm{d} t}(t) \approx 0$  when  $t \in [t_0, t_0 + \delta t]$ ; where  $\delta t$  is a small time;

2) 
$$\frac{\mathrm{d}\mathbf{v}_{\mathrm{opt}}}{\mathrm{d}t}(t_1) \approx 0;$$

3)  $\boldsymbol{V}_{opt}^{a}(t_1) \approx V_{thre}$ .



Fig. 5. Piecewise polynomial modeling of voltage waveform integration.

Also we obtain another property  $V_{opt}(t) \approx V(t_0)$  when  $t \in [t_0, t_0 + \delta t]$  by integrating the equation in the first property.

After examining above properties, we find it's natural to approximate the curve  $V_{opt}$  by piecewise polynomial interpolation in two time regions :  $[t_0, t_0 + \delta t]$  and  $[t_0 + \delta t, t_1]$ . In time region  $[t_0, t_0 + \delta t]$ , the voltage is approximated by the constant value because of property (1). In time region  $[t + \delta t, t_1]$ , the voltage curve is approximated by a cubic polynomial determined by following interpolation conditions :  $\frac{dV_{opt}}{dt}(t_0 + \delta t) = \frac{dV_{opt}}{dt}(t_1) = 0$ ,  $V_{opt}(t_0 + \delta t) = V(t_0)$  and  $V_{opt}(t_1) = V_{thre}$ . So the approximating polynomial is defined as following:

$$P_{0}(t) = V(t_{0}) \quad \text{if} \quad t \in [t_{0}, t_{0} + \delta t] \quad (7)$$
$$P_{1}(t) = at^{3} + bt^{2} + ct + d \quad \text{if} \quad t \in [t_{0} + \delta t, t_{1}]$$

where a, b, c, d are the interpolating coefficients.

Notice that variable  $\delta t$  remains unknown. Because this variable is critical to determine the shape of voltage curve, we estimate it through the voltage waveform we have already know. Suppose curve  $V_{k-1}$  is the waveform obtained in last optimizing iteration, as shown in Figure 5. We apply the same piecewise polynomial formula (7) to curve  $V_{k-1}$  but using different time window  $[t_0, t_p]$ , where  $t_p = \operatorname{argmin} V_{k-1}$ . Similarly, a piecewise polynomial is obtained as following :

$$\boldsymbol{P}_{0}(t) = \boldsymbol{V}(t_{0}) \quad \text{if} \quad t \in [t_{0}, t_{0} + \delta t] \\ \boldsymbol{P}_{1}^{'}(t) = a^{'}t^{3} + b^{'}t^{2} + c^{'}t + d^{'} \quad \text{if} \quad t \in [t_{0} + \delta t^{'}, t_{p}]$$

In order to decide the variable  $\delta t'$ , we require the "distance" between voltage curve and piecewise polynomial approximation to be zero :

$$D = \int_{t_0}^{t_0 + \delta t'} (\boldsymbol{P}'_0 - \boldsymbol{V}_{k-1}) \, \mathrm{dt} + \int_{t_0 + \delta t'}^{t_1} (\boldsymbol{P}'_1 - \boldsymbol{V}_{k-1}) \, \mathrm{dt} = \boldsymbol{0}$$
(8)

Solve equation (8) and obtain  $\delta t'$ :

$$\delta t^{'} = \frac{2 \int_{t_0}^{t_p} \boldsymbol{V}_{k-1} \, \mathrm{dt} - (\boldsymbol{V}(t_0) + \boldsymbol{V}_{k-1}(t_p)) \cdot (t_p - t_0)}{\boldsymbol{V}(t_0) - \boldsymbol{V}_{k-1}(t_p)}$$

Then we assume  $\delta t \approx \delta t'$  to generate the approximation of voltage waveform  $V_k$  by formula (7).

Determine the sampling nodes S Determine the violation time window  $[t_0, t_1]$ Build macro-model by equation (3) Set intrinsic capacitance  $C_0 \leftarrow C_{intri}$  $\boldsymbol{L}_{k}^{\prime} \leftarrow \frac{1}{2} (\boldsymbol{V}(t_{0}) + V_{\text{thre}}) \cdot (t_{1} - t_{0})$ while  $\tilde{True}$  do  $\begin{array}{l} \boldsymbol{L} \leftarrow \boldsymbol{L}_{k}^{'} \\ \boldsymbol{V}(t_{1}) \leftarrow \boldsymbol{V}_{p} \leftarrow V_{\mathrm{thre}} \end{array}$ while True do Obtain  $C_k$  by solving problem (4) Update  $V_p$  by solving following equations :  $(\mathbf{V}(t_0) - \mathbf{V}_p) \cdot C_k = A \cdot (\mathbf{V}(t_0) + \mathbf{V}_p)T + \mathbf{B}$ where T is  $(t_1 - t_0)/2$ if  $|V_p - V(t_1)| \leq \epsilon$  then Break end if  $V_p < V_{\text{thre}}$  then  $V(t_1) = V_{\text{thre}}$ else if  $V_p \leq V(t_1) - \delta$  then  $V(t_1) = V(t_1) - \delta$ else if  $V_p \ge V(t_1) + \delta$  then  $V(t_1) = V(t_1) + \delta$ else  $V(t_1) = V_p$ end  $C \leftarrow C_k + C_0$ Get the voltage  $V_{k-1}$  by solving equation  $C \cdot \frac{\mathrm{d} \boldsymbol{V}_{k-1}}{\mathrm{d} \mathbf{t}} = A \cdot \boldsymbol{V}_{k-1} + (\boldsymbol{G}_{12}^T \cdot \boldsymbol{G}_{11}^{-1} \cdot \boldsymbol{J}_1 - \boldsymbol{J}_2)$ Update  $L_{k}^{'}$  according to formula (9) if  $|L'_{k} - L| \leq \epsilon$  then Breakend end

Algorithm 2: New Iterative SLP Based Algorithm

Finally, we can integrate piecewise polynomial approximation (7) to get voltage constraint  $L'_k$ :

$$\mathbf{L}'_{k} \approx \int_{t_{0}}^{t_{0}+\delta t} \mathbf{P}_{0} \,\mathrm{dt} + \int_{t_{0}+\delta t}^{t_{1}} \mathbf{P}_{1} \,\mathrm{dt} \\
= \frac{1}{2} (\mathbf{V}(t_{0}) + V_{\mathrm{thre}}) \cdot (t_{1} - t_{0}) + \frac{1}{2} (\mathbf{V}(t_{0}) - V_{\mathrm{thre}}) \delta t \\
= \frac{1}{2} (\mathbf{V}(t_{0}) + V_{\mathrm{thre}}) \cdot (t_{1} - t_{0}) \qquad (9) \\
+ 2 \int_{t_{0}}^{t_{p}} \mathbf{V}_{k-1} \,\mathrm{dt} - (\mathbf{V}(t_{0}) + \mathbf{V}_{k-1}(t_{p})) \cdot (t_{p} - t_{0})$$

$$+\frac{2J_{t_0}\mathbf{v}_{k-1}\mathbf{u}\mathbf{v}(\mathbf{v}_{(0)})+\mathbf{v}_{k-1}(t_p)}{2(\mathbf{V}(t_0)-\mathbf{V}_{k-1}(t_p))/(\mathbf{V}(t_0)-V_{\text{thre}})}$$

Notice that  $L'_{k}$  becomes  $\int_{t_0}^{t_1} V_{opt} dt$  as voltage waveform  $V_{k-1} \rightarrow V_{opt}$  because of  $t_p \rightarrow t_1$  and  $V_{k-1}(t_p) \rightarrow V_{thre.}$ So the voltage waveform  $V_{opt}$  is a "fix point" of equation (9) because using  $L' = \int_{t_0}^{t_1} V_{opt}$  in SLP constraint (5) also leads to voltage waveform  $V_{opt}$ . Therefore, the new voltage constraint  $L'_{k}$  equation (9) is convergent when  $k \rightarrow \infty$ .

## B. The Flow of the New Decap Optimization Method

Plugging the new voltage constraint  $L'_k$  to linear programming problem (4), we could obtain a more accurate decap budget. After adding decap, the voltage waveforms are simulated to generate a better voltage constraint according to equation (9). The new iterative SLP-based decap optimization is shown in Algorithm 2. We remark that the new iterative optimization flow is different from [12] in the following respects: 1. Sampling nodes, violation time window and macromodeling are determined once at the beginning, while algorithm 1 builds them in every iteration. 2. Violation time window is determined ignoring any capacitance to avoid violation time window drift while algorithm 1 considers decap already placed in the previous iterations, which causes violation time window drift. 3. In addition to voltages  $V(t_1)$ , voltage constraints L are improved iteratively.

These differences are essential to eliminate the violation residue and avoid the violation time window drift.

## VI. EXPERIMENTAL RESULTS

We implemented all three decap optimization algorithms in Matlab: sensitivity based decap optimization [4], SLP based algorithm 1 [12] and the new iterative SLP based algorithm 2 in Matlab. The built-in non-linear interior-point optimization function ('FMINCON") is used in the sensitivity based algorithm. And another built-in linear programming function ("LINPROG") is utilized in both SLP based algorithm 1 and the new iterative SLP algorithm. All the experiments were carried out on a Linux machine with 2.3GHz CPU and 8GB memory. However, due to the memory limit of 32-bit Matlab, only 2GB-3GB memory is available for our algorithm.

All test circuits are generated with realistic parameters for R,C and current sources based on industry designs. As mentioned in [12], the inefficiency of sensitivity based algorithm make it impossible to use very huge circuits. So we apply three algorithms to smaller test circuits to demonstrate the characteristic of each algorithms. We stress that the proposed method can easily improved by using partitioning scheme.

Table III shows the comparison between three algorithms, where the first column is the circuit name, the second column is the node number in the circuit, the third column is the violating node number. For each algorithm, iteration number, decap budget and runtime are also presented. Also the seventh column and eleventh column present the decap budget difference with respect to the decap values obtained by the proposed algorithm. All three algorithms remove all the voltage drop violation after adding decap. So the violating node number after adding decap is omitted.

As shown in the table, the proposed algorithm obtains almost the same (sometimes slightly less) decap values as the sensitivity based method. In contrast, the SLP based algorithm gives much larger decap values for most of cases.

We observe that all these decap budget over-estimations occur when the second iteration was carried out in the SLP based algorithm 1, which confirms our observation that overestimation is caused by the violation time window drift in the second iteration. Usually, the violation time window drift after the first iteration is proportional to the decap added in the first iteration. Therefore, if violation is larger, the decap added in the first iteration will be larger and consequently lead to larger violation time window drift and larger decap overestimation.

|         | Sensitivity Based Algorithm |      |       |          | SLP Based Algorithm |          |       |           |        | Iterative SLP Alg |       |          |        |
|---------|-----------------------------|------|-------|----------|---------------------|----------|-------|-----------|--------|-------------------|-------|----------|--------|
| Circuit | #Node                       | #Vio | #Iter | Budget   | Δ                   | Time     | #Iter | Budget    | Δ      | Time              | #Iter | Budget   | Time   |
| ctk1    | 1049                        | 77   | 31    | 82.4 pF  | 1.19%               | 158.8s   | 2     | 113.1 pF  | 38.82% | 3.1s              | 2     | 81.4pF   | 3.1s   |
| ctk2    | 2032                        | 455  | 26    | 482.6 pF | -36.2%              | 127.1s   | 1     | 757.0 pF  | 0.00%  | 9.3s              | 1     | 757.0 pF | 9.3s   |
| ctk3    | 3532                        | 1118 | 49    | 788.4 pF | -17.2%              | 302.5s   | 2     | 1305.7 pF | 37.1%  | 23.1s             | 2     | 952.6 pF | 43.2s  |
| ctk4    | 4875                        | 1413 | 51    | 1.043 nF | -10.0%              | 1020.8 s | 2     | 2.089 nF  | 80.2%  | 39.4s             | 2     | 1.159 nF | 81.6s  |
| ctk5    | 6033                        | 1733 | 18    | 1.198 nF | -0.80%              | 173.2 s  | 1     | 1.208 nF  | 0%     | 10.5s             | 1     | 1.208 nF | 10.5s  |
| ctk6    | 7550                        | 2429 | 33    | 2.341 nF | -9.51%              | 904.7 s  | 2     | 4.720 nF  | 82.48% | 26.4s             | 2     | 2.586 nF | 45.5s  |
| ctk7    | 8879                        | 3250 | 44    | 2.965 nF | -11.62%             | 952.1 s  | 2     | 4.201 nF  | 25.2%  | 50.4s             | 2     | 3.355 nF | 92.3s  |
| ctk8    | 10251                       | 4841 | 40    | 3.890 nF | 4.62%               | 2165.9 s | 2     | 5.836 nF  | 56.9%  | 113.8s            | 2     | 3.718 nF | 208.7s |
| ctk9    | 20123                       | 7098 | NA    | NA       | NA                  | NA       | 1     | 8.134 nF  | 0%     | 65.5s             | 1     | 8.134 nF | 47.9s  |
| ctk10   | 40178                       | 9741 | NA    | NA       | NA                  | NA       | 2     | 15.973 nF | 79.43% | 64.3s             | 2     | 8.902 nF | 97.4s  |
|         |                             |      |       |          |                     |          |       |           |        |                   |       |          |        |

TABLE III

Comparison between the sensitivity based algorithm, the SLP based algorithm and the proposed algorithm

|   |        |      |          |      |          | -   |                                                               |          | -   |      |          |                |
|---|--------|------|----------|------|----------|-----|---------------------------------------------------------------|----------|-----|------|----------|----------------|
|   |        |      | Iter 1   |      |          |     | er 2 in SLP Based Algorithm Iter 2 in Iterative SLP Algorithm |          |     |      |          |                |
|   | Vio    | #Vio | Vio      | #Vio | Budget   | Vio | #Vio                                                          | Budget   | Vio | #Vio | Budget   | overestimation |
| ĺ | 0.0041 | 77   | 0.000507 | 26   | 80.3 pF  | 0.0 | 0                                                             | 113.1 pF | 0.0 | 0    | 81.4 pF  | 38.9%          |
| ĺ | 0.0160 | 203  | 0.0015   | 55   | 170.2 pF | 0.0 | 0                                                             | 241.8 pF | 0.0 | 0    | 180.1 pF | 34.3%          |
|   | 0.0502 | 386  | 0.0039   | 104  | 226.2 pF | 0.0 | 0                                                             | 375.3 pF | 0.0 | 0    | 246.4 pF | 52.3%          |
| [ | 0.1183 | 546  | 0.0066   | 138  | 284.4 pF | 0.0 | 0                                                             | 556.1 pF | 0.0 | 0    | 320.8 pF | 73.3%          |

TABLE IV

VIOLATION AND DECAP BUDGET IN EACH ITERATIONS

Table IV shows the violation and the decap in each iterations of both the SLP based algorithm and the proposed algorithm. The test circuit "ctk1" is used in this experiment and the current source is scaled up to increase the violation. The column "Vio" is the total violation, i.e.  $Vio = \sum_{i \in vio} \int V dt$ , where t in violation time window. And "#Vio" is the violating node number. Although the violation in practice is not as large as shown in table IV, table IV shows the larger the violation is, the larger the decap overestimation is.

Although the scale of these test circuits are limited due to the memory limit of 32-bit MATLAB, The table still shows a good comparison of CPU runtime between the proposed and the SLP based algorithm, both are significantly less than the sensitivity based algorithm. We also can see the the proposed algorithm is sometimes slower than the SLP based algorithm despite they both complete the optimization in 2 iterations. This is because the LP problem in the 2nd iteration remains as large as in the first iteration for our algorithm but the 2nd iteration of the SLP based algorithm has a reduced LP problem to solve.

#### VII. CONCLUSION

In this paper, we have proposed an efficient decoupling (decaps) capacitance optimization algorithm. The new method improves the existing method [12], by applying the more accurate piecewise polynomial models to estimate the voltage noises during the linear programming processes. The proposed method has the efficiency of the charge-based methods and the accuracy of the sensitivity-based methods. Experimental results have demonstrated that the proposed method leads to the much smaller decap values, which are similar to the best-known sensitivity-based method, than the existing charge based method, and while it maintains the same efficiency of the charge-based method.

#### REFERENCES

- [1] H. B. Bakoglu, *Circuits, Interconnections, and Packaging for VLSI.* Addison-Wesley Publishing Company, 1990.
- [2] Y. Cai, J. Fu, X. Hong, S. X.-D. Tan, and Y. Luo, "Power/ground network optimization considering decap leakage currents," *IEEE Trans.* on Circuits and Systems II: Analog and Digital Signal Processing, vol. 53, no. 10, pp. 1012–1016, Oct. 2006.
- [3] H. H. Chen and D. D. Ling, "Power supply noise analysis methodology for deep-submicron VLSI chip design," in *Proc. Design Automation Conf. (DAC)*, 1997.
- [4] J. Fu, Z. Luo, X. Hong, Y. Cai, S. X.-D. Tan, and Z. Pan, "A fast decoupling capacitor budgeting algorithm for robust on-chip power delivery," *IEICE Trans. on Fundamentals of Electronics, Communications* and Computer Science(IEICE), vol. E87-A, no. 12, pp. 3273–3280, Dec. 2004.
- [5] L. Kang, Y. Cai, Y. Zou, J. Shi, X. Hong, and S. X.-D. Tan, "Fast decoupling capacitor budgeting for power/ground network using random walk approach," ASP-DAC '07: Proceedings of the 2007 conference on Asia South Pacific design automation, Jan 2007.
- [6] H. Li, J. Fan, Z. Qi, S. X.-D. Tan, L. Wu, Y. Cai, and X. Hong, "Partitioning-based approach to fast on-chip decoupling capacitor budgeting and minimization," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 25, no. 11, pp. 2402–2412, Nov. 2006.
- [7] S. Pant and E. Chiprout, "Power grid physics and implications for cad," in *Proceedings of 2006 Design Automation Conference*, 2006, pp. 199– 204.
- [8] H. Su, S. S. Sapatnekar, and S. R. Nassif, "Optimal decoupling capacitor sizing and placement for standard cell layout designs," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 22, pp. 428–436, Apr 2003.
- [9] H. Su, S. Sapatnekar, and S. Nassif, "An algorithm for optimal decoupling capacitor sizing and placement for standard cell layouts," *ISPD* '02: Proceedings of the 2002 international symposium on Physical design, Apr 2002.
- [10] K. Wang and M. Marek-Sadowska, "On-chip power supply network optimization using multigrid-based technique," in *Proc. Design Automation Conf. (DAC)*, 2003, pp. 113–118.
- [11] C. K. S. Zhao and K. Roy, "Decoupling capacitance allocation and its application to power-supply noise-aware floorplanning," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 21, pp. 81–92, Jan 2002.
- [12] M. Zhao, R. Panda, S. Sundareswaran, S. Yan, and Y. Fu;, "A fast onchip decoupling capacitance budgeting algorithm using macromodeling and linear programming," *Design Automation Conference*, 2006 43rd ACM/IEEE, pp. 217 – 222, Jun 2006.