# A Bus Delay Reduction Technique Considering Crosstalk

Kei Hirose and Hiroto Yasuura

Department of Computer Science and Communication Engineering, Kyushu University Kasuga Koen 6-1, Kasuga, Fukuoka 816-8580, JAPAN E-mail: {khirose, yasuura}@c.csce.kyushu-u.ac.jp

# Abstract

As the CMOS technology scaled down, the horizontal coupling capacitance between adjacent wires plays dominant part in wire load, crosstalk interference becomes a serious problem for VLSI design. We focused on delay increase caused by crosstalk. On-chip bus delay is maximized by crosstalk effect when adjacent wires simultaneously switch for opposite signal transition directions. This paper proposes a bus delay reduction technique by intentional skewing signal transition timing of adjacent wires. An approximated equation of bus delay shows our delay reduction technique is effective for repeaterinserted bus. The result of SPICE simulation shows that the total bus delay reduction by from 5% to 20% can be achieved.

### 1. Introduction

As the CMOS technology scaled down into deep sub-micron region, the horizontal coupling capacitance between adjacent wires becomes dominant for wire load[1]. The increase of inter-wire coupling capacitance makes crosstalk interference a serious problem for VLSI circuits[3, 7]. Crosstalk causes logical malfunctions and delay faults. Especially for an on-chip bus, crosstalk noise is a serious problem for modern and future VLSI design[8, 4]. Since each line of a bus runs in parallel for a long distance, the inter-wire coupling capacitance between adjacent wires of the bus is relatively larger than other interconnects. Besides physical capacitance increase, simultaneous switching for opposite transition directions between adjacent wires makes the effective inter-wire coupling capacitance double.

Repeater insertion techniques are widely used to reduce capacitance and resistance of long interconnections for decreasing wire delay[2, 5], and it is effective for reducing crosstalk. However, since simultaneous



Figure 1: Coupling capacitance increases in scaleddown VLSI due to horizontal shrink progress.

transition for opposite directions still occurs, the worstcase bus delay is increased by crosstalk.

The basic idea of our delay reduction technique is to prevent simultaneous opposite transition by skewing signal transition timing of adjacent wires. Even when the transition timing of some wires are delayed, if the crosstalk effect on bus delay is decreased, the total bus delay can be reduced.

In the next section, we describe about bus delay increased by crosstalk. In section3, our delay reduction technique is described. Section 4 shows experimental results and principle analysis, and section5 is conclusion.

#### 2. On-chip bus crosstalk

An interconnect delay is in proportion to the product of resistance and capacitance of the interconnect wire. The capacitance of interconnection consists of  $C_0$ , which is capacitance between the wire and the substrate, and  $C_m$ , which is the capacitance between adjacent wires (see figure 1). In conventional LSI, vertical capacitance is usually larger than horizontal capacitance, because of large space between adjacent wires and low aspect ratio of cross-section of a metal wire. In present and future VLSI, however, inter-wire coupling capacitance  $C_m$  becomes dominant because horizontal shrink progress, leaving vertical shrink not to cause resistance increase, makes the aspect ratio increase.

The larger horizontal capacitance makes crosstalk interference between adjacent wires a serious prob-



Figure 2: The delay with three types of simultaneous transition on a three-line directional bus. (a) same direction. (b) solitary transition. (c) opposite direction.

lem in VLSI design. The crosstalk is a noise caused by inter-wire coupling capacitance between adjacent wires, and causes logic malfunctions and delay faults. Since the degree of crosstalk interference is determined by various factors, such as physical inter-wire coupling capacitance, length of wires running in parallel, switching speed of signals, and transition timing of adjacent signals, it is very difficult for VLSI designers to estimate the crosstalk influence in sufficient accuracy.

On-chip bus is widely used in VLSI. In bus structure, crosstalk immunity is more important because long interconnect wires often run together and in parallel. A repeater insertion technique is widely used to decrease bus delay. Since the repeater divides long interconnection into several segments, capacitance and resistance of a wire segment can be decreased, resulting in the total bus delay reduction can be achieved.

Here, we consider how crosstalk effect on bus delay. Figure 2 shows a three-line directional bus, and input and output voltage waveform of three types of simultaneous transition. Effective capacitance of the center wire can be represented as following equation.

$$C_{eff} = C_0 + C_m \left| \frac{\Delta V_2 - \Delta V_1}{E} \right| + C_m \left| \frac{\Delta V_2 - \Delta V_3}{E} \right|,\tag{1}$$

where  $\Delta V_2$  is voltage variation of the center wire,  $\Delta V_1$ , and  $\Delta V_3$  are voltage variation of the neighbors. *E* is power supply voltage (equals to rail-to-rail signal voltage in CMOS circuits). This equation shows  $C_{eff}$ varies according to neighbors' simultaneous switching direction.

When the center wire switches alone and both neighbors are quiet (figure 2 (b)),  $C_{eff}$  is  $C_0 + 2C_m$ , which is equivalent to physical capacitance of the center wire. When three wires simultaneously switch for same transition direction (figure 2 (a)), since  $C_{eff}$  correspond to  $C_0$ , the total delay becomes smaller than the case of

solitary transition. When the center wire and both neighboring wires simultaneously switch for opposite transition direction (figure 2 (c)), since  $C_{eff}$  becomes the maximum value  $C_0 + 4C_m$ , total bus delay is maximized. This worst-case delay determines the bus clock cycle time and bus performance.

# 3. Total bus delay reduction technique

#### 3.1. Basic idea and assumptions

The simultaneous transition often occurs on bus structure, because drivers of a bus are often designed to synchronize to a bus clock. The simultaneous opposite transition of adjacent wires causes bus delay increase. The basic idea of our delay reduction technique is intentional transition timing skewing, in order to prevent simultaneous opposite transition between adjacent wires. We call this intentional skewing a shift. The bus wires are placed with alternation of normal timing wire and shifted timing wire for the purpose of no adjacent wires switch at the same transition timing. The shifting time can be made by a delay line of inverter chain or two phase clock. This technique can be combined with various repeater insertion techniques.

We made following assumptions for evaluating the effectiveness of our technique.

- Figure 3 shows a simplified bus model used through evaluation. Three directional wires with repeaters run in parallel and the neighboring wires from the driver to the receiver are unchanged. Each segment of wires is considered as a distributed RC line.
- $C_0$  is the total capacitance between a wire from a driver to a receiver and the substrate,  $C_m$  is the total capacitance between two adjacent wires. R is the total resistance of a wire from a driver to a receiver. It is assumed that repeaters divide a wire from a driver to a receiver into equal length segments.
- $\Delta T$  and  $T_d$  are defined as shown in figure 4.  $\Delta T$  is a shifting time which is an interval of two kinds of input transition timing.  $T_d$  is the total delay which is a period from the earliest 50% transition of driver input voltage to the latest 50% transition of receiver output voltage.  $T_d$  corresponds to the worst-case delay of the bus.

#### 3.2. Rough estimation of our technique

To estimate the effectiveness of timing shift technique for reducing bus delay maximized by crosstalk,



Figure 3: A model of on-chip bus, consists of drivers, repeaters, receiver, and distributed RC lines.



Figure 4: The definition of  $\Delta T$  and  $T_d$ .

we introduce an approximated equation of the worstcase bus delay. Consider the center wire switches for opposite direction against both neighbors. Let the input voltage of both neighbors becomes  $v = |\Delta V_1|/E =$  $|\Delta V_3|/E$  when the center wire begins to switch at  $\Delta T$ after both neighbors began to switch. The effective capacitance of center wire can be led from eq. (1) as follows,

$$C_{eff} = C_0 + (4 - 2v)C_m.$$
 (2)

From the equation of 50% bus delay with k - 1 repeaters[2], whose on-resistance and input capacitance are represented as  $R_t$  and  $C_t$ , the total bus delay with center wire's timing shift can be represented as following equation.

$$T_d = \Delta T + \left(0.7R_t + 0.4\frac{R}{k}\right) \left[C_0 + (4-2v)C_m\right] + 0.7(kR_t + R)C_t.$$
(3)

From [7], v can be approximated as

$$v = \left| \frac{V_{1,3}(\Delta T) - V_{1,3}(0)}{E} \right| = 1 + K_1 \exp\left(-\frac{k^2 \sigma_1}{RC} \Delta T\right),$$
(4)

where

 $C = C_0 + 2C_m,$ 



Figure 5: The relation between shifting time  $\Delta T$  and total bus delay  $T_d$  calculated from eq. (5). ( $C_0=500$  fF,  $C_m=1000$  fF,  $R=500\Omega$ ,  $R_t=2k\Omega$ ,  $C_t=100$  fF)

$$K_{1} = -1.01 \frac{kR_{t}C + kC_{t}R + RC}{kR_{t}C + kC_{t}R + \frac{\pi}{4}RC},$$
  

$$\sigma_{1} = \frac{1.04RC}{k^{2}R_{t}C_{t} + kR_{t}C + kC_{t}R + (\frac{2}{\pi})^{2}RC}.$$

Substituting eq. (4) for eq. (3), the total bus delay with input timing shift can be estimated by following equation.

$$T_{d} = \Delta T + \left(0.7R_{t} + 0.4\frac{R}{k}\right)\left(C_{0} + 4C_{m}\right)$$
$$+0.7(kR_{t} + R)C_{t} - \left(1.4R_{t} + 0.8\frac{R}{k}\right)C_{m}$$
$$\left[1 + K_{1}\exp\left(-\frac{k^{2}\sigma_{1}}{RC}\Delta T\right)\right].$$
(5)

Figure 5 shows the estimated bus delay with our timing shift technique. The horizontal axis represents shifting time  $\Delta T$ , the vertical axis represents total bus delay  $T_d$ . For repeater-inserted bus (k = 2, 4, 6), the total bus delay can be reduced by center wire's input timing shift. When no repeater inserted to the bus (k = 1), however, our technique gains no advantage for delay reduction. By solving eq. 5 in terms of  $\Delta T$ , the  $\Delta T$  which minimizes  $T_d$  can be also estimated.

From eq. (5), it depends on  $C_m$ ,  $R_t$ , R, and k how much our technique is effective. Especially,  $C_m$  is the dominant for total delay reduction. The greater  $C_m$  becomes, the more effective our delay reduction technique becomes. Our technique seems to be more effective in deep sub-micron VLSI.

Table 1: Capacitances and resistance of supposed buses

|         | $C_0$ [fF] | $C_m$ [fF] | $R \ [\Omega]$ |
|---------|------------|------------|----------------|
| bus I   | 500        | 500        | 500            |
| bus II  | 500        | 1000       | 500            |
| bus III | 500        | 1500       | 500            |
|         | -          |            |                |



Figure 6: The relation between  $\Delta T$  and  $T_d$  of the bus II with 3 repeaters.

# 4. Experimental evaluation

#### 4.1. SPICE simulation and results

The  $T_d$  was evaluated using Star-HSPICE simulator. 10-step  $\pi$  RC ladder circuit is used for simulation in order to approximate a distributed RC bus lines[6]. Each repeater, driver, and receiver consists of two seriesconnected inverters whose  $W_n/L_n = 6$ ,  $W_p/L_p = 12$ . The  $0.5\mu m$  CMOS process and device parameter for the  $VDEC^1$  chip fabrication service is used. The power supply voltage is 3.3V. The three types of bus line are simulated with capacitances  $C_0, C_m$ , and resistance R as shown in table 1. Only  $C_m$  is changed because  $C_m$ is dominant for delay reduction. This capacitance and resistance are supposed from [4, 8], the bus length is 10mm. The number of repeaters which are inserted by regular interval for all lines is changed from 0 to 7. For each case of repeater insertion,  $\Delta T$  is increased from 0ns to 5ns every 0.2ns step.  $T_d$  is evaluated by measuring the time from 50% input transition of preceding drivers to 50% transition of the receiver output of the center line with input pattern which realizes the worst-case bus delay.

Figure 6 shows experimental result and estimated delay by eq. 5 of bus II with 3 repeaters. In experi-



Figure 7: The waveforms of input voltages for each repeater and receiver of two adjacent wires which is in three parallel lines with three repeaters par line. (a)  $\Delta T$ =0ns, (b)  $\Delta T$ =1ns.

mental result, the  $T_d$  takes minimum value 8.40ns at  $\Delta T = 1.4$ ns. Since  $T_d$  at  $\Delta T = 0$ ns is 9.76ns, it can be said  $T_d$  is reduced by 13.88% with 1.4ns input timing shift. The results show the relation between  $T_d$  and  $\Delta T$  for almost all buses gets minimum value at a certain  $\Delta T$ . The estimated delay seems good approximation of bus delay with our technique.

Table 2, 3, 4 shows experimental results of all combinations of the buses and the number of repeaters. The minimum  $T_d$ ,  $\Delta T$  which minimize  $T_d$ , and delay reduction ratio to the normal  $T_d$  are indicated. In almost all cases of repeater-inserted bus,  $T_d$  can be reduced by shifting driver transition timing. As the number of repeaters increases, the minimum  $T_d$  and its  $\Delta T$  decrease. According to the degree of wire capacitance, the advantage of our method increases.

The power becomes little smaller than normal case. Our technique seems also effective for decreasing dI/dt noise.

### 4.2. Qualitative consideration

The reason why the total delay can be reduced by input transition timing shift is considered.

Figure 7 shows the propagation of signals from inputs to the end of final wire segments. Each waveform stands for the voltage of each input of repeaters and receivers of two adjacent wires. The simulated bus is bus II with three repeaters. The e1, e2, E1, and E2 correspond to the points shown in figure 3.

Figure 7 (a) shows propagation waveforms of si-

 $<sup>^1\</sup>mathrm{VLSI}$  Design and Education Center, The University of Tokyo.

Table 2:  $T_d$  at  $\Delta T = 0$ ns, the minimum  $T_d$ ,  $\Delta T$  which minimize  $T_d$ , and the reduction ratio of the bus I.

| #repeater                                   | 0    | 1    | 2    | 3    | 4    | 5    | 6    | 7    |
|---------------------------------------------|------|------|------|------|------|------|------|------|
| $T_d \text{ [ns]} (\Delta T = 0 \text{ns})$ | 5.65 | 5.90 | 6.02 | 6.09 | 6.17 | 6.26 | 6.35 | 6.45 |
| $\Delta T [ns]$                             | 0.0  | 0.0  | 1.0  | 0.8  | 0.8  | 0.6  | 0.6  | 0.6  |
| Minimum $T_d$ [ns]                          | 5.65 | 5.90 | 5.83 | 5.68 | 5.66 | 5.70 | 5.79 | 5.90 |
| Reduction ratio [%]                         | 0.00 | 0.00 | 3.19 | 6.76 | 8.22 | 8.87 | 8.88 | 8.42 |

Table 3:  $T_d$  at  $\Delta T = 0$ ns, the minimum  $T_d$ ,  $\Delta T$  which minimize  $T_d$ , and the reduction ratio of the bus II.

| #repeater                                          | 0    | 1     | 2     | 3     | 4     | 5     | 6     | 7     |
|----------------------------------------------------|------|-------|-------|-------|-------|-------|-------|-------|
| $T_d \ (\Delta T = 0 \mathrm{ns}) \ \mathrm{[ns]}$ | 9.95 | 10.08 | 9.90  | 9.76  | 9.70  | 9.72  | 9.77  | 9.82  |
| $\Delta T [ns]$                                    | 0.0  | 2.2   | 1.6   | 1.4   | 1.2   | 1.2   | 1.0   | 1.0   |
| Minimum $T_d$ [ns]                                 | 9.95 | 9.78  | 8.84  | 8.40  | 8.21  | 8.13  | 8.14  | 8.19  |
| Reduction ratio [%]                                | 0.00 | 3.05  | 10.74 | 13.88 | 15.39 | 16.34 | 16.67 | 16.60 |

Table 4:  $T_d$  at  $\Delta T = 0$ ns, the minimum  $T_d$ ,  $\Delta T$  which minimize  $T_d$ , and the reduction ratio of the bus III.

| #repeater                                          | 0     | 1     | 2     | 3     | 4     | 5     | 6     | 7     |
|----------------------------------------------------|-------|-------|-------|-------|-------|-------|-------|-------|
| $T_d \ (\Delta T = 0 \mathrm{ns}) \ \mathrm{[ns]}$ | 14.21 | 14.16 | 13.66 | 13.33 | 13.16 | 13.11 | 13.13 | 13.20 |
| $\Delta T \; [ns]$                                 | 0.0   | 3.0   | 2.6   | 2.2   | 2.0   | 1.8   | 1.6   | 1.6   |
| Minimum $T_d$ [ns]                                 | 14.21 | 13.22 | 11.77 | 11.05 | 10.68 | 10.48 | 10.40 | 10.38 |
| Reduction ratio [%]                                | 0.00  | 6.60  | 13.86 | 17.09 | 18.85 | 20.10 | 20.82 | 21.30 |

multaneous opposite transition ( $\Delta T=0$ ns), (b) shows propagation waveforms of opposite transition with timing shift ( $\Delta T=1$ ns). In figure (a), since simultaneous switchings make effective capacitance increase, the maximum crosstalk affects on all wire segments, resulting in total delay increase. On the other hand, in figure (b), since the voltage of e1, the end of first wire segment, exceeds the threshold of the repeater before the delayed wire switches, preceding signals propagate to next wire segment before neighbor's transition reaches to that segment. On the delayed wire, since the voltage variation of neighbors is smaller than in case of simultaneous opposite switching, crosstalk effect on a delayed wire becomes smaller, resulting in the delay reduction.

The reason for bus delay reduction by transition timing shift is that a small shifting time between adjacent wires makes preceding signals to propagate next wire segment by repeaters without neighbor's effects, and delayed signals arrive at the end of the bus with smaller effect of neighbors.

# 5. Conclusion

In this paper, a technique for reduction of maximum bus delay caused by crosstalk is proposed. By approximated equation of bus delay, it becomes clear that our technique is effective for repeater-inserted bus. The result of SPICE simulation shows the total bus delay can be reduced by from 5% to 20%. An analysis of overhead to make shifting time is future work.

# References

- Semiconductor Industry Association. National Technology Roadmap for Semiconductors. SEMATEC, 1997.
- [2] H.B. Bakoglu. Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley Publishing Company, Massachusetts, 1990.
- [3] L. Gal. On-chip cross talk the new signal integrity challenge. Proc. of CICC '95, pages 251–254, 1995.
- [4] A.B. Kahng, S. Muddu, E. Sarto, and R. Sharma. Interconnect tuning strategies for high-performance ICs. *Proc. of DATE98*, pages 471–478, 1998.
- [5] D. Li, A. Pua, P. Srivastava, and U. Ko. A repeater optimization methodology for deep sub-micron, highperformance processors. *Proc. of ICCD'97*, pages 726– 731, 1997.
- T. Sakurai. Approximation of wiring delay in MOSFET LSI. *IEEE J. of Solid-State Circuits*, SC-18(4):418–426, August 1983.
- [7] T. Sakurai. Closed-form expressions for interconnection delay, coupling, and crosstalk in VLSI's. *IEEE Trans.* on Electron Devices, 40(1):118–124, January 1993.
- [8] J.S. Yim and C.M. Kyung. Reducing cross-coupling among interconnect wires in deep-submicron datapath design. *Proc. of 36th DAC*, pages 485–490, June 1999.