# Analysis of Power Consumption and BER of Flip-flop Based Interconnect Pipelining

Jingye Xu, Abinash Roy and Masud H.Chowdhury ECE, University of Illinois at Chicago, Chicago, IL60607 jxu6@uic.edu, aroy5@uic.edu, masud@ece.uic.edu

## Abstract

This paper addresses the problem of interconnect pipelining from both power consumption and bit error rate (BER) point of view and tries to find the optimal solution for a given wire pipelining scheme in nanometer scale very large scale integration technologies. In this paper a detailed analysis for the dependency of power consumption and BER on the number of flip-flops inserted and repeater size is performed. For the best tradeoff between the wire delay, BER and power consumption, a methodology is developed to optimize the repeater size and the number of flip-flops inserted which maximize a user-specified figure of merit. Then this methodology is applied to calculate the optimal solutions for some International Technology Roadmap for Semiconductor technology nodes.

## **1** Introduction

The delay of global interconnects increases with technology scaling because the global interconnect length do not scale down. In fact, since the feature size of CMOS devices is continuously decreasing and more functionality is integrated on a chip, the length and number of global interconnects tend to increase [7]. Consequently, in future nanometer designs it will be impossible to carry signal across the chip within a single clock cycle and multi-cycle cross-chip communication becomes necessary, so that cross-chip interconnect is removed from all the timing constraints, and the chip speed is determined by the most critical intra-block/local combinational path, in order to continue employing higher frequencies [4],[5]. Insertion of sequential elements in interconnects lines - a concept that has become known as *interconnect pipelining* – is one feasible solution for modern nanometer technologies. The idea is to divide a wire, whose delay is longer than one clock cycle, into several segments by inserting sequential elements to store signal values that require multiple clock cycles to travel through a particular global wire. Two types of sequential elements can be used for this purpose, and hence interconnect pipelining can be divided into two types: (i) flip-flop based, and (ii) latch based wire pipelining.

The issues of interconnect pipelining can be addressed from three aspects: CAD tools development that take interconnect pipelining into consideration, computer architecture level design, and circuit level analysis. Currently most of the research work about interconnect pipelining lie in the first two aspects. The numerous CAD related challenges of using wire pipelining are given in [3]. It also mentioned several changes that must be made

to current CAD tools so that this technique can be widely used. A bunch of work can be found at the architecture level. There is a detailed study of the issue that wire pipelining will alter the function or cycle behavior of a circuit in [4]. Several approaches have been proposed to solve this problem, such as wire retiming [8], algorithm working at the gate level [9] and latency insensitive technique [10]. In [11], a floor-planning methodology, which considers interconnect pipelining and its impact on performance using the IPC sensitivity models is described. The authors of [5] explored the possibilities of sharing interconnect pipelining to reduce wiring overheads. And, [6] provides two techniques to deal with the short path constraint of latch based wire pipelining. In [2], the analytical model to determine the number, position and feasible region for flip-flop based wire pipelining has been presented. A method of estimating the interconnect power at the chip level considering concurrent repeater and flip-flop insertion was given in [12]. Compared with the above two aspects, the research work at the circuit level is insufficient.

As the system delay dominated by the interconnect delay, an increasing number of repeaters and flip-flops are used to reduce the interconnect delay. Consequently, the power consumed by interconnects including repeaters and FFs gain a growing significance in the total system power [13]. There are many papers discussing the optimization technique for global interconnect regarding to the latency, bandwidth and power dissipation [16]-[19]. But none of them take wire pipelining into consideration. In [1] a study of bit error rate in interconnect pipelining is presented using the method of statistical timing analysis. But it did not take many circuit level issues into consideration.

This paper studies the circuit level issues of interconnect pipelining and proposed an optimization technique for flip-flop inserted global wire. The dependency of the BER and power consumption on the number of flip-flops inserted and the size of repeaters has been set up. We also present a new methodology for determining the optimum number of flip-flops inserted and size of repeaters of the wire pipelining system for a given technology, which maximizes a user-defined figure of merit. Section 2 illustrates the issue of power dissipation of the wire pipelining system. The detailed discussion of the BER of a wire pipelining scheme is given in Section 3. Section 4 is the methodology for optimizing a wire pipelining scheme according to delay, power consumption and BER. Some simulation results are also given in this section. Section 5 concludes this paper with future works.

#### **2** Power Estimation of Interconnect Pipelining

A typical D flip-flop based interconnect pipelining stages

are shown in Figure 1, from which we can see that two kinds of components are used in this interconnect: DFF and repeater. Because of the structure of the wire pipelining, it is convenient for us to divide the total power dissipation into two parts: power consumed by flip-flops and the power consumed by repeaters.



Figure 1. DFF Pipelined Interconnect

First, let us consider the DFF power consumption. Usually, the power consumption is composed of 3 parts: dynamic power, leakage power and short circuit power. But according to [17], the short circuit power is becoming a minor part with technology scaling in nanometer circuit. Therefore, we only consider the first two kinds of power consumption. If the clock frequency is denoted by  $f_{clk}$ ,  $\alpha_i$  and  $C_i$  are the switching probability and the total capacitance of node *i* respectively. The swing range coefficient of node *i* is given by  $k_i$ . According to [15], the dynamic power consumption of a single DFF can be expressed as

$$P_{dF} = f_{clk} C_{eff} V_{DD}^2 \text{, where } C_{eff} = \sum_{i=1}^{N} \alpha_i k_i C_i$$
(1)

And, the leakage power is

$$P_{lF} = V_{DD} I_{off} s_F \tag{2}$$

where,  $I_{off}$  is the unit leakage current and  $s_F$  is the total gate size of one FF. Therefore, the total power consumption of a DFF  $P_{FF}$ can be estimated as  $P_{FF}=P_{dF}+P_{lF}$ .

Now let us see the power consumption of different kinds of DFFs. Figure 2 gives out the comparison of the power dissipation of two kinds of flip-flops for different technology nodes. The results are acquired through Spectre circuit simulator, in this simulation, the switching probability is 0.5 and the clock frequency is 1GHz. The parameters used in this simulation are listed in Table 1, which is obtained from [18] [19]. The schematic of these two kinds of flip-flops, dynamic flip-flop and static flip-flop, are shown in Figure 3 [14]. From the comparison, we see that for all technology nodes, the power dissipation of dynamic flip-flop is smaller than that of the static one.



Figure 2. Comparison of the power consumption of the two kinds of flip-flop

Table 1. Technology and equivalent circuit model parameters for different technology nodes

| Tech. node<br>(nm)        | 130   | 90    | 65    | 45    |
|---------------------------|-------|-------|-------|-------|
| Width(nm)                 | 335   | 230   | 145   | 103   |
| Thickness(nm)             | 670   | 482   | 319   | 236   |
| $r (\Omega-um)$           | 0.098 | 0.198 | 0.475 | 0.905 |
| $c_a$ (fF/mm)             | 207   | 181   | 165   | 143   |
| $c_b (\mathrm{fF}/\mu^2)$ | 0.057 | 0.071 | 0.103 | 0.116 |
| c (fF/um)                 | 0.226 | 0.197 | 0.180 | 0.155 |
| $V_{\rm DD}$ (V)          | 11    | 1     | 0.7   | 0.6   |



Figure 3. Dynamic DFF and static DFF

Now, let us consider the power consumption of the repeaters. Here, we assume that for a minimum sized repeater, the input capacitance is  $c_0$ , the output parasitic capacitance is  $c_p$ , and output resistance is  $r_s$ . In a wire pipelining scheme, because the function of repeater is to drive the whole wire segment (Figure 4), the size of the repeaters are usually big. If the repeater size is denoted by *s*, the total output resistance is  $R_{tr}=r_s/s$ , the output parasitic capacitance  $C_p=c_ps$  and the input capacitance is  $C_L=c_0s$ . Considering a uniform interconnect of resistance *r* per unit length and capacitance *c* per unit length, the method of estimating the power consumed by global wire with repeater insertion is mentioned in [16]-[19].



Figure 4. A long wire driven by a repeater

If *l* is the wire length and  $\alpha$  is the switching factor, the switching power of the repeater is given by [17]:

$$P_{dR} = \alpha (s(c_{p} + c_{0}) + lc) V_{DD}^{2} f_{clk}$$
(3)

And, the average leakage power of a repeater can be expressed as [17]:

$$P_{lR} = \frac{1}{2} V_{DD} (I_{offn} W_{n\min} + I_{offp} W_{p\min}) s$$
<sup>(4)</sup>

Here,  $W_{nmin}$  and  $W_{pmin}$  are the width of the NMOS and PMOS transistor in minimum sized inverter respectively. In this paper we assume that  $I_{offn}=I_{offp}=I_{off}$  and  $W_{pmin}=3W_{nmin}$ . Then (4) can be written as

$$P_{IR} = 2V_{DD}I_{off}W_{n\min}s$$
<sup>(5)</sup>

Here, we also only consider the dynamic power consumption and leakage power consumption. So, the total power consumption is given by the following equation

$$P_{repeater} = P_{dR} + P_{lR} \tag{6}$$

Considering a global interconnect of length L, if we insert N-1 flip-flops, the whole wire is divided into N wire segments and there are total N+1 flip-flops in wire pipelining scheme. Then the total power consumption of the whole wire pipelining system can be written as:

$$P_{total} = (N+1)P_{FF} + NP_{repeater}$$
(7)

Using (6) and (3), we may write a detailed expression of the power consumption using the number of flip-flops inserted and the size of the inserted repeaters.

$$P_{total} = (N+1)P_{FF} + k_1Ns + k_2$$
(8)

where,  $k_1 = \alpha (c_p + c_0) V_{DD}^2 f_{clk} + 2V_{DD} I_{off} W_{n\min}$  $k_2 = \alpha Lc V_{DD}^2 f_{clk}$ 

Here, we have already used the equation L=N l. From equation (8), we see that the power consumption of the whole wire pipelining system will increase with the increasing of the size of the inserted repeaters and the number of flip-flops inserted.

The following is a comparison of the power consumed by the inserted flip-flops and repeaters. In this comparison, we implemented a 4-stage wire pipelining shown in Figure 1 using dynamic DFF and all the repeaters are 10 times of the minimum size. The power is measured by Spectre circuit simulator. From this comparison we see that the power consumed by the repeaters is much higher than the power consumed by the DFF in all the technology nodes. Usually, in a global wire, the power consumption of the repeaters is more than 10 times of that consumed by the flip-flops. For example, for 90nm technology the power consumed by repeaters is 415uW. But, that consumed by flip-flops is only 39.7uW.



Figure 5. Comparison of the power consumed by flip-flops and repeaters in a wire pipelining scheme.

### **3** Bit Error Rate Analysis

A detailed study of flip-flop based wire pipelining is given in [2], which gives out the minimum number of flip-flops to be inserted, central position and feasible region of each inserted flip-flop. However, this work is at the architecture level, and does not take many circuit level issues into consideration including repeater sizing, process and parameter variations and clock signal variation. In real circuits, there are many non-ideal effects such as temporal and spatial clock signal variation (clock skew and jitter), wire delay uncertainty and timing parameter variations of the sequential elements. These variations and uncertainties will greatly increase the BER of a wire pipelining scheme.

The BER is the error probability when a single data bit is transmitted through a pipelined global interconnect wire. In order to estimate the BER of a flip-flop based wire pipelining, we must use statistical timing analysis method given by [1]. Here is a brief review of this method. A typical DFF-based interconnect pipelining is shown in Figure 1. We denote  $T_{setup}$  to be the set up time of a DFF,  $T_{prop}$  to be the propagation delay from D to Q after the positive clock edge,  $T_{clk}$  to be the clock period, and  $t^i_{wire}$  to be the propagation delay from the output of DFF at (*i*-1)-th stage to the input D of DFF at *i*-th stage. For the DFF at the *i*-th stage to properly latch on a data bit, the propagation delay

$$d_i = T_{prop} + t_{wire}^i \tag{9}$$

must satisfy a timing constraint

$$0 \le d_i \le T_{CLK} - T_{setup} \tag{10}$$

If we define a variable  $\delta_i = T_{prop} + t^i_{wire} + T_{setup} - T_{CLK}$  with a probability density function  $(p.d.f.) p(\delta_i)$  then the probability to have correct data transmission between the (i-1)-th and *i*-th stage can be expressed as:

$$q_i = \Pr(T_{setup} - T_{clk} \le \delta_i \le 0) = \int_{T_{setup} - T_{clk}}^0 p(\delta_i) \, d\delta_i \qquad (11)$$

Since  $d_i = T_{prop} + t^i_{wire}$  is definitely greater than zero, the probability of the event  $\delta_i < T_{setup} - T_{clk}$  is zero. Therefore the above equation can be written as

$$q_i = \int_{-\infty}^0 p(\delta_i) d\delta_i \tag{12}$$

where the lower bound of integration is extended from  $T_{setup} - T_{clk}$  to  $-\infty$ . Due to the presence of a DFF, the probability of correct data transmission at each stage is independent of each other. Hence, for an N-stage flip-flop based wire pipelining the BER is given by (13)

$$BER = 1 - \prod_{i=1}^{N} q_i \tag{13}$$

In reality, because all the process parameters have normal distributions, it is reasonable to assume that all timing variables  $T_{prop}$ ,  $t^{i}_{wire}$ ,  $T_{setup}$ , and  $T_{CLK}$  also have normal distributions, then  $\delta$  will also have a normal *p.d.f* function with the mean and variance given by (14) and (15)

$$\mu_{\delta i} = \mu_{T prop} + \mu_{itwire} + \mu_{T setup} - \mu_{T clk}$$
(14)

$$\sigma_{\delta i}^{2} = \sigma_{T prop}^{2} + \sigma_{i t v i r e}^{2} + \sigma_{T s e t u p}^{2} + \sigma_{T c l k}^{2}$$
(15)

Hence

$$q_i = P(\delta \le 0) = \frac{1}{2} + erf(-\frac{\mu_{\delta i}}{\sigma_{\delta i}})$$
(16)

Where,  $erf(x) = \frac{1}{\sqrt{2\pi}} \int_{0}^{x} \exp(-\frac{t^{2}}{2}) dt$ 

If we define  $\delta = T_{prop} + T_{setup} - T_{clk}$ , the equation (12) can be written as

$$q_{i} = \int_{-\infty}^{0} p(T_{prop} + t_{wire} + T_{setup} - T_{clk}) d\delta_{i}$$

$$= \int_{-\infty}^{-t_{wire}} p(\delta') d\delta'$$
(17)

and the BER of the whole wire pipelining is

$$BER = 1 - \left(\int_{-\infty}^{-t_{wire}} p(\delta') \ d\delta'\right)^N \tag{18}$$

In the above equation, we have assumed that all the flip-flops are evenly distributed along the global interconnect, so all the wire segments have the same delay  $t_{wire}$ .

From (18), we see that the BER of the wire pipelining will be affected by the wire segment delay and the number of flip-flops inserted. Actually, the wire segment delay will be affected by the number of flip-flops inserted and the repeater size, which can be see from the expression of the wire segment delay given by [20]

$$t_{wire} = (r_s(c_0 + c_p) + \frac{r_s}{s}cl + rlsc_0 + \frac{1}{2}rcl^2)\ln 2$$
(19)

Here, l is the length of the wire segment and l=L/N. Using the equations given above, we may calculate the optimal number of inserted flip-flops, which can be acquired through solving (20).

$$\frac{\partial BER}{\partial N} = 0 \tag{20}$$

Through calculation, we can find that the optimal number of flip-flops is unusually large. For example, for a 20mm global interconnect, if the standard deviations of all the parameters are 10% of their nominal value, the optimal numbers of inserted flip-flops are 147 for 130nm technology and 135 for 65nm technology. Figure 6 plots the relationship between the BER and the number of flip-flops inserted. But in real circuit, it is impractical to insert so many flip-flops into a global interconnect for the long delay time and vast power consumption. So, a tradeoff must be made between the BER and the total delay time.



Figure 6. BER vs. Number of DFFs

The Spectre simulation shows us the same conclusion. We implemented the wire pipelining scheme in 65nm technology



Figure 7. Output waveform for different number of inserted DFFs

and the distance between the driver and the receiver is 3.2mm. Figure 7 shows the experiment results. When *N* equals 3, a bit error will occur and increase *N* will solve this problem. According to the output waveform, it is unnecessary to insert more than 5 DFFs into this global interconnect.

Now, let us see the relationship between BER and buffer sizing. We consider a 0.5mm long 65nm global wire driven by a buffer of size s, the relationship between the wire delay and the repeater size is shown in Figure 8. From the above calculation, we see that the minimum delay is achieved when the repeater size is 65. This optimal repeater size can be calculated through (21) [20]

$$s_{opt} = \sqrt{\frac{r_s c}{rc_0}}$$
(21)

But in practice, the repeater size is usually much smaller than the optimal repeater size due to the high power consumption and area cost. And, if the size of a repeater is too big, driving this repeater is also become a problem.



Figure 8. Delay vs. Repeater size

Here, we consider a 3-stage wire pipelining scheme in 65nm technology and used the same DFF as previous experiments. This time, the distance between the driver and the receiver is 5mm and all the inserted flip-flops are evenly distributed along the global wire. Then we performed the Spectre simulation.

Figure 9 (a) plots the relationship between the total delay for

one wire segment and the repeater size. Using the data given by the simulation, we may calculate the BER for different repeater size. The result is given by

Figure 9 (b), in which we see that the BER will greater than 50% if the repeater sizes are less than 12.5 times of the minimum size. In this calculation, the standard deviation of all the parameters is 3% of their nominal values. The output waveform is shown in Figure 10, in which we see that it is nearly impossible to transmit signal through this wire pipelining if the repeater size is less than 12 times. The experiment results are nearly identical with the calculation results.

Although increasing repeater size will lower the BER, from section two, we know that the power consumption will increase as well. So, a compromise is also required between BER and power consumption, which will be discussed in next section.



Figure 9. (a)Repeater sizing vs. delay (b)BER vs. repeater size



Figure 10. Output waveform for different repeater size

# 4 Optimization methodology

The purpose of optimizing global interconnects is to simultaneously achieve small delay, low power consumption and high solidity. Unfortunately, a lower BER can be obtained either by increasing the repeater size when the repeater size is smaller than a certain threshold or by increase the number of inserted flip-flops as long as the number of inserted flip-flops is small. But doing this will definitely increase the power consumption. And, if the number of flip-flops inserted is increased, the delay cycles of the whole interconnect, which is equal to the number of wire segments, will increase, which is also undesirable. Therefore, in order to find out the optimal solution for a particular wire pipelining scheme, some compromise must be made between power consumption, BER and number of delay cycles. Here, we use the product of BER, power consumption P and number of delay cycles N as an appropriate figure of merit, which is defined in (22).

$$F = \frac{\left(1 - BER\right)^{i}}{P^{i} \cdot N^{k}} \tag{22}$$

Where i, j, and k are the weights of the cost functions which imply which design objective is more highly valued. The range of the BER changes from 0 to 1 and the delay cycles N is an integer that is greater than or equal to 1. The difference of power consumption of different implementation for a particular wire pipelining scheme varies relatively little. According to the range of the three different parameters, we used the weights 3, 3 and 1/2 for *i*, *j* and *k* respectively. Optimal number of wire segments and repeater size for maximum figure of merit can be determined by setting simultaneously the derivative of (22) with respect to *N* and *s* to zero

$$\frac{\partial F}{\partial N} = 0$$
 and  $\frac{\partial F}{\partial s} = 0$  (23)

The methodology outlined above is used to optimize the number of flip-flops inserted and the size of the repeaters of wire pipelining for ITRS technology nodes of 130nm and 65nm. Here, we consider a global wire of 5mm in length and the clock frequency is 2GHz. We implemented these circuit using Cadence tools and then simulated them using Spectre circuit simulator. When calculating the BER, we assume the standard deviation of all the timing parameters are 3% of their nominal value.

Table 2 shows the simulation result for 130nm technology and the data for 65nm technology are given in Table 3. From the simulation results, we see that BER will decrease when we enlarge the repeater size or add more wire segments. But the whole wire pipelining scheme will consume more power in both cases. According to the figure of merit defined by us, the optimal number of wire segment and repeater size is 1 and 15 respectively. That means, we don not need to insert any sequential element for this global interconnect in 130nm technology. But for 65nm technology, we have to insert 5 flip-flops and make the repeater size 6.

Table 2. BER and power consumption of 130nm technology.

| Ν | s  | BER      | Power<br>(mW) | FOM       |
|---|----|----------|---------------|-----------|
| 4 | 3  | 1        | 1.0200        | 0         |
|   | 4  | 0.9671   | 1.2147        | 9.9346E-6 |
|   | 5  | 0.0050   | 1.3660        | 0.19324   |
|   | 6  | 6.28E-08 | 1.4611        | 0.1603    |
|   | 7  | 0        | 1.5424        | 0.13626   |
| 3 | 5  | 0.9997   | 1.1306        | 1.078E-11 |
|   | 6  | 0.2317   | 1.2512        | 0.13368   |
|   | 7  | 0.00107  | 1.3543        | 0.23168   |
|   | 8  | 1.86E-09 | 1.4276        | 0.19844   |
|   | 9  | 2.27E-13 | 1.4769        | 0.17922   |
| 2 | 7  | 0.9999   | 1.0244        | 6.577E-13 |
|   | 8  | 0.8606   | 1.1123        | 0.0013919 |
|   | 9  | 0.1711   | 1.1842        | 0.2425    |
|   | 10 | 0.0064   | 1.2457        | 0.35882   |
|   | 11 | 6.28E-09 | 1.2776        | 0.33908   |
| 1 | 13 | 0.1168   | 0.7948        | 1.3722    |
|   | 15 | 3.06E-04 | 0.8550        | 1.5985    |
|   | 16 | 6.22E-06 | 0.8789        | 1.4729    |
|   | 17 | 2.84E-07 | 0.9016        | 1.3645    |
|   | 19 | 1.12E-10 | 0.9421        | 1.1959    |

Table 3. BER and power consumption of 65nm technology.

| Ν | s  | BER      | Power<br>(mW) | FOM    |
|---|----|----------|---------------|--------|
| 6 | 4  | 1        | 0.2167        | 0      |
|   | 5  | 9.69E-02 | 0.3006        | 11.067 |
|   | 6  | 2.25E-05 | 0.3315        | 11.21  |
|   | 7  | 2.08E-09 | 0.3591        | 8.8161 |
|   | 8  | 3.15E-13 | 0.3711        | 7.9889 |
| 5 | 6  | 0.183    | 0.3382        | 6.3002 |
|   | 7  | 9.96E-04 | 0.3710        | 8.7302 |
|   | 8  | 5.40E-07 | 0.3884        | 7.6327 |
|   | 9  | 1.05E-09 | 0.4076        | 6.6036 |
|   | 10 | 2.01E-12 | 0.4191        | 6.0761 |
| 4 | 8  | 0.159    | 0.3413        | 7.4808 |
|   | 9  | 0.0039   | 0.3622        | 10.4   |
|   | 10 | 1.15E-04 | 0.3792        | 9.1667 |
|   | 12 | 1.54E-08 | 0.4014        | 7.731  |
|   | 15 | 1.41E-12 | 0.4172        | 6.8855 |
| 3 | 12 | 0.2547   | 0.3488        | 5.6325 |
|   | 15 | 0.003098 | 0.3783        | 10.565 |
|   | 17 | 8.92E-05 | 0.3958        | 9.3088 |
|   | 20 | 1.71E-06 | 0.4129        | 8.2017 |
|   | 22 | 3.71E-07 | 0.4196        | 7.8151 |

## 5 Conclusion and future work

This paper studies circuit level issues of interconnect pipelining and finds out that increasing the number of inserted flip-flops and enlarging of the buffer size will lower the BER at the cost of power consumption. Therefore, tradeoff must be made between the solidity of a wire pipelining and the power consumption. So, we have developed a methodology based on a user-defined figure of merit to find the optimal solution for an interconnect-pipelining scheme from both BER and power consumption point of view. This solution gives out the optimal number of flip-flops inserted repeater size. Our ongoing attempt is to take area cost into consideration and try to find the best solution for a wire pipelining scheme considering more circuit level issues. Similar work can be done for latch based wire pipelining. Other circuit level issues, such as the variability and unpredictability of capacitive and inductive coupling, may also be incorporated in this work.

#### References

- L. Zhang, Y. Hu, "Statistical Timing Analysis in Sequential Circuit for On Chip Global Interconnect Pipelining," *Design Automation Conference*, pp.904-907, 2004.
- [2] R. Lu, G. Zhong, K. Cheng, K. Chao, "Flip-Flop and Repeater Insertion for Early Interconnect Planning," *Design Automation* and Test in Europe Conference and Exhibition, pp.690-695, March 2002.
- [3] Lou Scheffer, "Methodologies and Tools for Pipelined On-Chip Interconnect," *IEEE International Conference on Computer*

Design: VLSI in Computers and Processor, pp.152-157, September 2002.

- [4] V. Nookala, S. S. Sapatnekar, "Designing optimized pipelined global interconnects: Algorithms and methodology impact," *IEEE International Symposium on Circuit and systems*, Vol.1, pp.608-611, May 2005.
- [5] J. Cong, Y. Fan, Z. Zhang, "Architecture-Level Synthesis for Automatic Interconnect Pipelining," *Design Automation Conference*, pp.602-607, 2004.
- [6] V. Seth, M. Zhao, J. Hu, "Exploiting Level Sensitive Latches in Wire Pipelining," *International Conference on Computer Aided Design*, pp.283-290, November 2004.
- [7] International technology Roadmap for Semiconductors, Semiconductor Research Corporation, 2004
- [8] H. Zhou, C. Lin, "Retiming for Wire Pipelining in System-On-Chip," *IEEE Transaction on Computer-Aided Design* of Integrated Circuits and Systems. Vol.23, Issue 9, pp.1338-1345, September 2004
- [9] V. Nookala and S. S. Sapatnekar, "A method for correcting the functionality of a wire-pipelined circuit," *Design Automation Conference*, pp.570-575, 2004.
- [10] M. R. Casu, L. Macchiarulo, "A new approach to latency insensitive design," *Design Automation Conference*, pp. 576-581, June 2004
- [11] A. Jagannathan, H. H. Yang, K. Konigsfeld, D. Milliron, M. Mohan, M. Romesis, G. Reinman, J. Cong, "Microarchitecture Evaluation With Floorplanning And Interconnect Pipelining," *Design Automation Conference*, Vol.1, pp.I/8 - 115, January 2005
- [12] W. Liao, L. He, "Full-chip Interconnect Power Estimation and Simulation Considering Concurrent Repeater and Flip-flop Insertion," *International Conference on Computer Aided Design*, pp.574-580 November 2003
- [13] W. Liao and L. He, "Full-chip Interconnect Power Estimation and Simulation Considering Concurrent Repeater and Flip-flop Insertion", *ICCAD03*, pp.574-580, Nov. 2003.
- [14] Antonio G. M. Strollo, Ettore Napoli, and Carlo Cimino, "Analysis of Power Dissipation in Double Edge Triggered Flip-flops", *IEEE Transaction on Very Large Scale Integration* Systems, Vol.8, No.5, Oct. 2000.
- [15] Vladimir Stoanovic and Vojin G. Oklobdzija, "Comparative Analysis of Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems", *IEEE Journal of Solid-State Circuits*, Vol.34, No.4, Apr. 1999.
- [16] Victor Adler, Eby G. Friedman, "Repeater Design to Reduce Delay and Power in Resistive Interconnect", *IEEE Transaction on Circuits and Systems*, Vol.45, No.5, May 1998.
- [17] K. Banerjee and A. Mehrotra, "A Power-Optimal Repeater Insertion Methodology for Global Interconnects in Nanometer Designs", *IEEE Transaction on Electron Devices*, Vol.49, No.11, Nov. 2002.
- [18] X. Li, J. Mao, H. Huang and Y. Yu, "Global Interconnect Width and Spacing Optimization for Latency, Bandwidth and Power Dissipation", *IEEE Transaction on Electron Devices*, Vol.52, No.10, Oct.2005
- [19] M. L. Mui, K. Banerjee, A. Mehrotra, "A Global Inerconnect Optimization Scheme for Nanometer Scale VLSI Width Implications for Latency, Bandwidth and Power Dissipatoin", *IEEE Transaction on Electron Devices*, Vol.51, No.2, Feb. 2004
- [20] H. B. Bakoglu, "Circuits, Interconnections and Packaging for VLSI", Reading, MA: Addision-Wesley, 1990.