# Thermally Robust Clocking Schemes for 3D Integrated Circuits

Mosin Mondal<sup>+</sup> Andrew J. Ricketts<sup>†</sup> Sami Kirolos<sup>+</sup> Tamer Ragheb<sup>+</sup> Greg Link<sup>‡</sup> N. Vijaykrishnan<sup>†</sup> Yehia Massoud<sup>+</sup>

<sup>+</sup>Rice University, Houston, TX. {mosin,kirolos,ragheb,massoud}@rice.edu
<sup>†</sup>Pennsylvania State University, University Park, PA. {ricketts,vijay}@cse.psu.edu
<sup>‡</sup>York College of Pennsylvania, York, PA. glink@ycp.edu

# ABSTRACT

3D integration of multiple active layers into a single chip is a viable technique that greatly reduces the length of global wires by providing vertical connections between layers. However, dissipating the heat generated in the 3D chips possesses a major challenge to the success of the technology and is the subject of active current research. Since the generated heat degrades the performance of the chip, thermally insensitive/adaptive circuit design techniques are required for better overall system performance. In this paper, we propose a thermally adaptive 3D clocking scheme that dynamically adjusts the driving strengths of the clock buffers to reduce the clock skew between terminals. We investigate the relative merits and demerits of two alternative clock tree topologies in this work. Simulation results demonstrate that our adaptive technique is capable of reducing the skew by 61.65%on the average, leading to much improved clock synchronization and design performance in the 3D realm.

# 1. INTRODUCTION

3D integration of chips is a promising approach to integrate large systems on a single chip where the problems associated with conventional interconnects are reduced since the average global wire length is reduced drastically [1, 2]. In the 3D integration technology, the overall two dimensional system is divided into a number of smaller size blocks that are fabricated at different layers, stacked on top of each other and connected through vertical interlayer vias [3,4]. 3D integration offers the possibility of integrating heterogeneous technologies on the same system for high performance SoC, as shown in Figure 1, by fabricating optical, RF, analog and digital chips on different layers of the three dimensional chip. This offers much improved noise performance in mixed signal chips due to reduced electromagnetic interference between the digital and analog parts, implemented on different layers, as well as lower substrate noise. Additionally, a greater number of gates can be realized using 3D integration technology leading to larger system integration and integration of traditionally off-chip components on a single 3D chip. For example, the potential to fabricate bigger caches or integration of processor and memory on a single chip exists in the 3D realm. Overall, the main benefit of the 3D integration technique is the possibility of performance enhancement of the global interconnects and the prospect of integrating heterogeneous active layers to enable newer system architectures.

The realization of 3D integrated circuits faces complex challenges. Although there are issues associated with integrating different active layers on a single 3D chip, a major problem in the 3D chips is the inefficient heat dissipation from the chips that leads to thermally induced performance degradation and can reduce the lifetime of the fabricated chips [5]. In conventional 2D chips, the generated heat is dissipated through an external heat



Figure 1: Example of a 3D Integrated Circuit.

sink. In the 3D chips, all the layers will contribute to the generation of heat that has to be dissipated through one or two heat sink(s). Spreading the heat from the layers away from the heat sink is more challenging due to the use of dielectric materials, which have lower thermal conductivity, to isolate the different layers of the chip. The difference in temperature in a single layer and across layers arises because of the structural difference and diversity of computational activities. However, the profiles across the layers vary due to the varying ease of heat dissipation from the layers as well. For example, the existence of large caches versus relatively small frequently accessed arithmetic units can cause significant temperature differences on a single layer. On the other hand, layers are thermally more insulated than other layers leading to interlayer temperature variation.

The temperature difference across a single layer can be higher than  $50^{\circ}$ C [6] affecting the performance of the different layers on the 3D chip since temperature variation impacts the performance of the interconnects as well as the active devices. Increasing the temperature changes the resistance of interconnect wires according to the relation  $R = R_0 [1 + \alpha (T - T_0)]$ , where  $R_0$  is the interconnect resistance at the nominal temperature  $T_0$ , and  $\alpha$  is the temperature coefficient of resistance for the interconnect material. For common interconnect materials (Copper and Aluminum), the resistance increases with the increase in temperature. On the other hand, MOSFET devices behave differently than interconnects with the increase in temperature because of two contending effects on the drain current due to the decrease of both the carrier mobility  $(\mu)$  and the threshold voltage  $(V_T)$ . The square-law equation of the drain current of a MOSFET transistor  $[I_{DS} = \mu C_{OX} (\frac{W}{L}) (V_{GS} - V_T)^2]$  suggests that the decrease in the mobility would decrease the current, and inversely, the decrease of the threshold voltage would increase the current. However, the current-temperature relation is a function of the operating point of the transistor – the threshold voltage variation is dominant at relatively low bias voltages whereas the

 $<sup>^* \</sup>rm This$  work was supported by the grants NSF CARRER 0448558, NSF 0093085 and MARCO/GSRC.



Figure 2: H-tree with node labels.

Table 1: Temperature effect on rising edge skew between buffers T3\_4 and T4\_2 (See Figure 2).

| $\Delta T(^{o}C) \rightarrow$ | 0 | 20   | 40   | 60    | 80    |
|-------------------------------|---|------|------|-------|-------|
| Skew (ps) $\rightarrow$       | 0 | 26.5 | 57.7 | 102.0 | 160.0 |

mobility variation with temperature is dominant at higher bias voltage. There exists a bias point, known as the *zero temperature coefficient* (ZTC) point, where the drain current of the transistor is insensitive to temperature variations. As discussed above, the effect of spatial and temporal variation of temperature on the combined performance of the interconnects and devices is a complex phenomenon which becomes even more complicated in a 3D chip because of the uneven variation of temperature across layers. Advancement in cooling and packaging technology alone is not sufficient to fully manage the thermal effects in 3D chips, therefore, ensuring performance under thermal effects is the key to the success of the 3D integration technology.

The focus of this paper is the integrity of the clock signal in the context of 3D integrated circuits. For the synchronous part of a 3D chip, which may be distributed across layers, skewless clock signal is of utmost importance for the accuracy and speed of operation of a design. Since the clock network spans over most parts of the chip and thereby gets exposed to as diverse temperature range as occurs across the chip, the effect of temperature is very pronounced in clock trees. Figure 2 shows an H-tree for a single layer of a 3D chip where we can see that for a number of physically close terminals the clock signals traverse through entirely different temperature zones to reach the terminals which can lead to significant skew between these terminals. As shown in Table 1 for the H-tree mapped to the 45nm technology node. the clock skew increases with increasing temperature difference between different parts of the chip necessitating intelligent solutions to mitigate the effect of temperature on clock skew. We present a novel solution for reducing the variation of clock skew with temperature in 3D integrated circuits in this paper.

For conventional 2D circuits, there are existing techniques that can be applied to reduce clock skew with temperature. However, those methods are either inefficient or not general enough even for 2D chips. The technique proposed by Shakeri and Meindl in [7] uses a temperature variable supply voltage (TVS) scheme where the supply voltage is adjusted around the ZTC point to nullify the thermally induced delay variation in the interconnects. The ZTC point occurs at a voltage around 0.37V for the 45nm node, which will greatly reduce the speed of circuit operations. Moreover, in the context of the clock tree, this method requires fine-tuning of the supply voltage for multiple sections of the clock tree and level converters between each of these section, which increases the complexity of the circuit. Recent temperature invariant design solutions [8,9] assume some fixed known temperature profiles, which will be highly inaccurate particularly in the case of a processor that runs many different applications with varied characteristics, producing application specific temperature profiles [10]. Serious problem may arise when the cold locations of the assumed profile become hot and the hot locations cold due to change of applications.

With the aim to reduce the skew between the clock tree terminals in a 3D chip, in this paper, we propose a new circuit technique that dynamically adjusts the driving strengths of the clock buffers according to the spatial and temporal temperature profile of the chip. To the best of our knowledge, this is the first attempt to dynamically reduce the clock skew for 3D chips. As opposed to the previously proposed 2D methods, our generalized method uses only modified clock buffers that reduces the design complexity and does not assume any fixed temperature profile. For the clock distribution in the 3D chips, we investigate two different clock tree topologies and compare between them, as discussed in Section 2. We demonstrate that our technique can reduce the thermally induced skew by 61.65% on the average.

The rest of the paper is organized as follows. Section 2 presents the two alternative clock tree topologies. In Section 3, we present details of the temperature dependent RLC model for the shielded clock tree used in this work. This section also discusses the two probable clocking schemes for 3D chips. Section 4 presents our method for dynamically adjusting the driving strength of the clock buffers. Results and comparative study of the alternative schemes are presented in Section 5. Finally, conclusions are drawn in Section 6.

# 2. PROPOSED CLOCK TREE TOPOLOGIES

We consider two alternative clocking schemes for 3D chips in this paper. The first scheme is shown in Figure 3 where the input clock signal, at the center of the clock tree, is fed to each layer through interlayer vias and each layer has its own clock tree with associated clock buffers implemented in the corresponding active layer. We refer to this scheme as the "replica topology". The second scheme, as depicted in Figure 4, implements the clock tree with the clock buffers on a single layer and using interlayer vias the clock signals from the terminals of the clock tree are passed to all other layers. We call this scheme as the "via topology". In the replica topology, the clock tree for each layer can be customized according to the temperature profile of the layer. For example, the average temperature of a layer away from the heat sink is likely to be higher than the average temperature of the layer attached to the heat sink, which can be handled in the first scheme. However, the obvious disadvantage of this scheme is the design overhead, both in terms of resources and design efforts required for the laverwise customization. Moreover, because of the separate customization of the different layers, the skew between terminals in different layers may be high even if the skew is low in the same layer. On the other hand, the via topology will provide uniform skew compensation across layers since the same terminal clock signals are transmitted across layers. Additionally, the second scheme obviously has the advantage of less design overhead. In Section 5, we compare these two schemes. In particular, we compare the clock skew improvement achieved by the two schemes for a number of temperature profiles as well as the corresponding power requirements. It can be understood that the replica topology will require N times more area and wiring resources compared to the via topology, where N is the number of layers in the 3D clock tree.



Figure 3: 3D H-tree repeated in each layer, fed by a common source.



Figure 4: 3D H-tree with via.

# 3. CLOCK TREE MODELING FOR INCLUD-ING THERMAL EFFECTS

For synchronous modules spread across multiple layers in a 3D chip, the arrival times of the clock signal at different communicating clock sinks should ideally have zero skew for the proper functioning of the chip. Therefore, the clock distribution network needs to be carefully designed so that the clock signal arrives at each terminal (ideally) at the same time. Designing the clock sinks at equal distances from the clock source is one of the ways of distributing the clock with low clock skew, as found in the H-trees most commonly used in regular array based designs [11]. To prevent the attenuation of the clock signal as it traverses over long distances, clock buffers are inserted at different locations of the clock tree. Additionally, by appropriate sizing of buffers, the rise and fall time of the clock signal can be adjusted. Due to variations in the operating conditions as well as manufacturing variations, significant clock skew can be observed at the terminals of a clock tree. On-chip temperature variation in a 3D chip happens to be a major source of clock skew since the uneven spatial variation of temperature leads to unequal change in the wire resistance and clock buffer characteristics along different paths. In this work, we consider two different schemes for clocking 3D synchronous modules and develop SPICE models for analyzing the clock skew under temperature variation. The interconnect



Figure 5: Dimensions of the wires and vias in the clock trees of Figures 3 and 4.

wires are modeled by their RLC parasitic components and the devices are modeled using the BSIM4 model. The details of the dynamically adaptive buffers, as mentioned before, will be discussed in Section 4. Without the loss of generality, we used simulation models for 3D H-trees in this work, but any other balanced clock trees can be handled in an identical manner.

In this work we consider a shielded H-tree in a 3D chip with four layers designed in the 45nm technology node having a die length of 2cm. For illustrating the effects of temperature variation on clock tree, we consider a 2 level H-tree, as depicted in Figure 2, where the first-level H has sides of length 10mm and the second-level H's have sides of length 5mm. The dimensions of the wires are shown in Figure 5. Clock buffers of sizes gradually reducing from the clock source toward the clock sink have been inserted in the clock tree, as depicted in Figure 2.

A distributed RLC representation of the interconnect wires is crucial for accurate analysis of the clock tree since the clock lines, extending over a long region in the global metal layers, experience significant inductive effects. We use a partial element equivalent circuit (PEEC) based interconnect model which is readily usable by circuit simulators like SPICE [12]. The interconnect parasitics are extracted using different parasitic extractors and combined together - the resistance and inductance values are extracted by FastHenry [13] whereas the capacitance is extracted using FastCap [14]. For the inductance extraction, the terminals of each segment is represented as a port that makes the inductance values extracted this way represent the partial inductance terms, which solely depend on the geometry of the wires [15]. Because of the PEEC based modeling, the circuit simulator automatically determines the effective values of the loop inductances [16]. Note that from the interconnect perspective. temperature mainly impacts the resistance of the interconnect wires since the resistivity of the interconnect material increases with temperature. The effect of temperature can safely be ignored for capacitance since the effect of temperature on the dielectric  $(SiO_2)$  is negligible in the temperature range the chip operates in. Temperature does not have any direct impact on the geometry dependent values of partial inductance. However, the effective loop inductance can be affected because of the redistribution of the return currents due to uneven change in the resistance values caused by temperature variation. In essence, if proper temperature dependent values are assigned to the resistances, then the effect of temperature on the effective inductance will be automatically taken care of by the circuit simulator. Sim-



Figure 6: Thermally adaptive buffer schematic.

ilarly, ambient temperature values need to be annotated to the clock buffers. For simulating the effect of temperature on clock skew, the four layered clock tree is mapped onto a  $32 \times 32 \times 4$  grid with each grid having its own temperature which is applied to the circuit elements present in the grid.

# 4. DESIGN OF ADAPTIVE CLOCK BUFFERS

An adaptive circuit scheme proposed originally in the context of 2D thermal issues in [17] is employed for reducing the variation of the clock skew with temperature gradient in the 3D design. As shown in the schematic presented in Figure 6, temperature sensors placed close to the buffers sense the ambient temperatures and convert the temperatures to voltages that are processed by a wave shaping circuitry and finally used for dynamically changing the driving strengths of the clock buffers to reduce the overall skew. In this section, we discuss the design and integration of small low-cost temperature sensors and the adaptive buffer circuits.

#### 4.1 Temperature sensor design

A number of on-chip temperature sensors are required for implementing the temperature dependent adaptive strength clock buffers. Very high accuracy  $(\pm 0.1^{\circ} \text{C})$  CMOS temperature sensors have been reported by recent works [18] where the precision comes at the expense of increased area and power dissipation. For instance, the sensor proposed in [18] used a chip area of 4.5mm<sup>2</sup> on a  $0.7\mu m$  CMOS technology. In our design, to be able to distribute the thermal sensors at multiple locations across the chip, we used a moderate accuracy temperature sensor for the sake of reduced area and power. The architecture of the temperature sensor used in our design was proposed in [19]. The schematic of the circuit is shown in Figure 7 where the transistors M5 and M6 provide the bias for the current mirror transistors M3 and M4. The difference between the voltages of the two diodeconnected transistors (M1 and M2) is computed and amplified using the opamp subtraction circuit. Although simple in nature, the sensor has the advantage of a good linearity over a wide range of temperature. The circuit has been designed and simulated using the 45nm BSIM4 predictive technology models [20]. The waveforms of the output are shown in Figure 8. The results demonstrate the linearity of the output over the temperature range of interest.

## 4.2 Adaptive buffers

Adaptive variation of the buffer strength is used to compensate the variation of performance due to change of temperature. A combination of two techniques is used in the adaptive buffer to compensate the temperature effect – buffer current control and body bias control.



Figure 7: Temperature sensor schematic.



Figure 8: Temperature Sensor Output voltage levels as functions of temperature.

The first technique for the adaptive-current buffer is illustrated by the inverter shown in Figure 6 where the circuit is formed of two parallel pull-up transistor branches (MP<sub>1</sub> and MP<sub>2</sub>) and two parallel pull-down branches (MN<sub>1</sub> and MN<sub>2</sub>). The effective transistor currents in the pull-up and pull-down sections are changed dynamically to control the rise and fall times of the output using the series connected switches (SW<sub>N</sub> and SW<sub>P</sub>). The switch bias voltage levels, determined by the wave shaping circuitry, are shown in Figure 9. The wave shaping circuits amplify the signal coming from the temperature sensor and adjusts its levels between zero and  $V_{DD}$  for suitable switch bias for the temperature range of interest.

In addition to the adaptive current method, a dynamic body biasing technique is also applied to dynamically tune the threshold voltage of the devices in the buffer. The wave shaping circuitry provides the controlled voltages required for body biasing. For the NMOS transistor, normally designed with bulk tied to the ground, we apply a positive voltage, ranging between 0V to 0.5V ( $V_{DD}$ =1V) on its bulk. The inverse voltage, varying between  $V_{DD}$ =1V and 0.5V, is applied to the PMOS transistor bulk. The waveforms of the body bias voltage levels for the NMOS and the PMOS transistors are shown in Figure 9. The combination of the two techniques can efficiently adjust the driving strengths of the clock buffers.

#### 4.3 Wave-shaping Circuitry

The wave-shaping circuit, as depicted by Figure 10, is responsible for generating the temperature dependent voltages that are used as the bias voltages of the switches  $(SW_N \text{ and } SW_P)$  and the body bias of the transistors in the clock buffers. The wave shaping circuit is composed of common-source and common-drain amplifiers. The current source loads used in the common-



Figure 9: Control waveforms coming from the wave shaping circuits.

source amplifiers were biased using diode connected transistor pairs (M1-M2, M5-M6, M9-M10, M15-M16, M19-M20 and M23-M24). The current source amplifiers provide good linearity, however the headroom voltage of the current source transistor limits the output swing. The output voltage swings within one overdrive voltage  $(V_{GS} - V_T)$  of each supply. While attaining the required body bias voltage, source follower amplifiers are used for providing the required output current under forward body bias. The switch control voltages do not draw significant current and therefore can be driven directly by the common source stages.

# 5. RESULTS

The performance of the proposed thermally robust clock trees was evaluated by SPICE simulations using the BSIM4 predictive technology models for the 45nm CMOS technology node [20]. The simulations were performed for both the topologies presented in Section 2. To demonstrate the efficacy of our method, we measure the skew between two physically close terminals, T1\_3 in layer 1 and T2\_1 in layer 4, in the three dimensional clock tree under different temperature profiles. [21] suggests that up to five layers, the 3D temperature profile will be dominated by the heating of the first layer and the temperature rises almost linearly with the number of layers. In our experiments we assume that the temperature increases by 5°C for each layer away



Figure 10: Wave-shaping circuit.



Figure 11: Skew improvement for the split profile using thermally adaptive buffers.

from the heat sink.

We start with the split thermal profile as shown in Figure 2 where the left and right halves of a layer are fixed at temperatures  $T_{L,i}$  and  $T_{R,i}$ , respectively. The skew variation with temperature difference  $(T_{L,i} - T_{R,i})$  is illustrated in Figure 11. Note that the skew in the via topology is lower than the skew obtained in the replica topology since the vertical temperature gradient increases the skew in the replica topology. The adaptive buffers reduced the temperature dependence of the skew as shown in Figure 11. For the replica topology, the maximum compensated skew was 55ps as opposed to the original maximum skew of 188ps. On the other hand, the compensated skew was within  $\pm$  15ps over the  $80^{\circ}$ C temperature variation for the via topology, while the original maximum skew was 155ps over the same temperature range. The localization of the clock-tree in one layer makes the compensation in the via topology more effective than the replica topology.

We performed additional simulations for two more generic thermal profiles. These profiles were extended from 2D generic cases modeled in [22] to the 3D case. The first profile models a linear temperature fall off across each layer of the 3D chip whereas the second profile is an exponential fall off of temperature across each layer. As assumed in case of the split thermal profile, the temperature difference between layers is  $5^{\circ}$ C. The simulation results for the two proposed topologies have been shown in Table 2. Note that the skew values reported in the table represents the maximum skew between any pair of terminals in the 3D clock tree within  $\sqrt{2}$  times the distance between the nearest terminals since the clock skew is more important for physically close terminals. Based on the results from the table, we find that the average unmodified skew is 2.17 times more in the replica topology than the via topology. It can be noticed that, on the average, our skew reduction technique reduces the skew by 55.09% for the replica topology and by 61.65% for the via topology. Moreover, the average skew after the correction is approximately 2.5 times less in the via topology as compared to the average compensated skew obtained for the replica topology. We also compared the power consumption of the two clock tree topologies. Intuitively, the replica topology will consume approximately N times more power and area than the via topology, where N is the number of layers in the 3D clock tree. However, since the load driven by the single layer of clock tree in the via topology is equal to the combined load driven by all the layers of the replica topology, the ratio of power consumed by the first and second topologies will be less than N. The average power consumed by the replica topology for the temperature profiles

Table 2: Results of Skew improvement after using the thermally adaptive buffers.

|                                                   | Parar     | neters    | Replica Topology |             |             | Via Topology |             |             |
|---------------------------------------------------|-----------|-----------|------------------|-------------|-------------|--------------|-------------|-------------|
| Thermal                                           | $T_H$     | $T_L$     | Unmodified       | Compensated | %           | Unmodified   | Compensated | %           |
| Profile                                           | $(^{o}C)$ | $(^{o}C)$ | Skew (ps)        | Skew (ps)   | Improvement | Skew (ps)    | Skew $(ps)$ | Improvement |
| Linear:                                           | 100       | 20        | 103.50           | 29.26       | 71.73       | 61.08        | 23.83       | 60.99       |
| T(x,y) = a(x+y) + b                               | 100       | 40        | 92.32            | 37.10       | 59.81       | 47.58        | 11.06       | 76.75       |
| $a = \frac{T_H - T_L}{2L}$                        | 100       | 60        | 81.15            | 43.44       | 46.47       | 33.62        | 11.62       | 65.44       |
| $b = \overline{T_L}$                              | 100       | 80        | 68.99            | 42.52       | 38.37       | 18.22        | 9.18        | 49.62       |
| Exponential:                                      | 100       | 20        | 94.22            | 26.99       | 71.35       | 54.72        | 24.49       | 55.24       |
| $T(x,y) = a.e^{-b(x+y)}$                          | 100       | 40        | 88.09            | 29.88       | 66.08       | 45.18        | 15.77       | 65.10       |
| $a = T_H$                                         | 100       | 60        | 79.51            | 40.92       | 48.53       | 32.82        | 9.87        | 69.93       |
| $b = \frac{1}{2L} ln\left(\frac{T_H}{T_L}\right)$ | 100       | 80        | 68.63            | 42.28       | 38.39       | 18.07        | 9.01        | 50.14       |

mentioned in Table 2 was 0.394W, whereas the via topology consumes only 0.168W for our four-layer 3D clock tree. Therefore, the replica topology consumes 2.35 times more power, making the via topology the obvious choice for designing 3D clock trees.

## 6. CONCLUSIONS

In this paper, clocking schemes for 3D integrated circuits were investigated for designing thermally robust clock trees for a multilayered synchronous circuit. We proposed two different clock tree topologies and developed RLC simulation models for the clock trees in the 45nm technology node. We presented a thermally adaptive 3D clocking scheme that senses the ambient temperature and dynamically adjusts the driving strengths of the clock buffers to reduce the clock skew between terminals. Based on the simulation results, we compared the two alternative clock tree topologies and determined the superiority of one scheme over the other. Simulation results demonstrated that the dynamically adaptive design technique is capable of reducing the skew by 61.65% on the average, leading to thermally robust clock tree designs for 3D integrated circuits.

# 7. REFERENCES

- H. Kurino et. al., "Intelligent Image Sensor Chip with Three Dimensional Structure," in Proceedings of International Electron Devices Meeting (IEDM), pp. 879–882, 1999.
- [2] Y.-F. Tsai, Y. Xie, N. Vijaykrishnan, and M. J. Irwin, "Three-Dimensional Cache Design Exploration Using 3DCacti," in *Proceedings of the IEEE International Conference on Computer Design*, 2005, pp. 519–524.
- [3] A. Rahman, D. Antoniadis, and A. Agarwal, "System Level Performance Evaluation of Three-Dimentional Integrated Circuits," in *IEEE Transactions on VLSI Systems*, vol. 8, pp. 671–678, December 2000.
- [4] R. H. Havemann and J. A. Hutchby, "High-Performance Interconnects: An Integration Overview," in *Proceedings of IEEE*, vol. 89, pp. 586–601, May 2001.
- [5] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, "3-D ICs: A Novel Chip Design for Improving Deep-Submicrometer Interconnect Performance and Systems-on-Chip Integration," in *Proceedings of the IEEE*, vol. 89, pp. 602–633, May 2001.
- [6] S. Borkar et. al., "Parameter variation and impact on circuits and microarchitecture," in Proceedings of the Design Automation Conference, 2003.
- [7] K. Shakeri and J. Meindl, "Temperature Variable Supply Voltage for Power Reduction," in *IEEE Computer Society* Annual Symposium on VLSI, 2002, pp. 71–74.
- [8] A. H. Ajami, K. Banerjee, and M. Pedram, "Modeling and Analysis of Non-Uniform Substrate Temperature Effects in High

Performance VLSI," in *IEEE Transactions on Computer Aided Design*, vol. 24, no. 6, pp. 849–861, 2001.

- [9] M. Cho, S. Ahmed, and D. Z. Pan, "TACO: Temperature Aware Clock-Tree Optimization," in *Proceedings of the International Conference on Computer Aided Design*, 2005, pp. 582–587.
- [10] G. M. Link and N. Vijaykrishnan, "Thermal Trends in Emerging Technologies," in *Proceedings of the International* Symposium on Quality Electronic Designs, March 2006.
- [11] J. M. Rabaey, Digital Integrated Circuits: a Design Perspective. Prentice-Hall, Inc., 1996.
- [12] H. Heeb and A. Ruehli, "Three-Dimensional Interconnect Analysis Using Partial Element Equivalent Circuits," in *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*, pp. 974 – 982, November 1992.
- [13] M. Kamon, M. J. Tsuk, and J. White, "Fasthenry: A Multipole-Accelerated 3-D Inductance Extraction Program," in *IEEE Transactions on Microwave Theory and Techniques*, pp. 1750 – 1758, September 1994.
- [14] K. Nabors and J. White, "Fastcap: A Multipole Accelerated 3-D Capacitance Extraction Program," in *IEEE Transactions* on Computer-Aided Design of Integrated Circuits and Systems, vol. 10, pp. 1447–1459, November 1991.
- [15] A. Ruehli, "Inductance Calculations in a Complex Integrated Circuit Environment," in *IBM J. Res. Dev.*, vol. 16, no. 5, pp. 470–481, 1972.
- [16] Y. Massoud and Y. Ismail, "Grasping the Impact of On-chip Inductance," in *IEEE Circuits and Devices Magazine*, vol. 17, no. 4, pp. 14–21, July 2001.
- [17] M. Mondal, A. Ricketts, S. Kirolos, T. Ragheb, G. Link, V. Narayanan and Yehia Massoud, "Mitigating Thermal Effects on Clock Skew with Dynamically Adaptive Drivers," in *Proceedings* of the International Symposium on Quality Electronic Design, 2007.
- [18] M. Pertijs, K. Makinwa, and J. Huijsing, "A CMOS Smart Temperature Sensor with a 3σ Inaccuracy of ±0.1°C from -55°C to 125°C," in *IEEE Journal of Solid-State Circuits*, vol. 40, December 2005.
- [19] Q. Chen, M. Meterelliyoz, and K. Roy, "A CMOS Thermal Sensor and Its Applications in Temperature Adaptive Design," in *Proceedings of the International Symposium on Quality Electronic Design*, 2006.
- [20] W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for sub-45nm Design Exploration," in *Proceedings of the International Symposium on Quality Electronic Design*, 2006, pp. 585–590. [Online]. Available: http://www.eas.asu.edu/~ptm
- [21] S. Im and K. Banerjee, "Full Chip Thermal Analysis of Planar (2-D) and Vertically Integrated (3-D) High Performance ICs," in *Technical Digest of IEDM*, pp. 727–730, 2000.
- [22] A. Ajami, K. Banerjee, and M. Pedram, "Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects," in *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 24, pp. 849–861, 2005.