# A Network-Flow Based Algorithm For Power Density Mitigation at Post-Placement Stage

Sean Shih-Ying Liu\*, Ren-Guo Luo\* and Hung-Ming Chen

Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan

Email: sniealu.ee96g@g2.nctu.edu.tw, xmax5151@gmail.com, hmchen@mail.nctu.edu.tw

Abstract—In this paper, we propose a power density mitigation algorithm at post-placement stage. Our proposed framework first identifies cluster of bins with high temperature, then propagates power density away from high temperature region by balancing regional power density. The problem of balancing regional power density is modeled as a supplydemand problem and solution is obtained with minimal displacement of cells. An analytical temperature profiling algorithm is tightly integrated within the framework to constantly update the temperature profile in response to incremental perturbation to placement. Our proposed approach can effectively reduce maximum temperature compared to previous works on temperature mitigation.

## I. INTRODUCTION

As the scale of a single transistor decreases, the increase in power density brings more attention in terms of chip's reliability and performance. The uneven distribution of power density creates hot spots or regions with unusual high temperature. These hot spots may induce undesirable effect which will only become more serious as power density increases.

According to past researches [1, 2], a  $10^{\circ}$ C increase in on-chip temperature will lead to 4% decrease in current drive capacity and 5% increase in interconnect delay. The deterioration in performance may also lead to timing failure [3]. In this regard, effective temperature alleviation techniques and accurate models are essential prior to tape-out.

Previous works on temperature alleviation can be classified to four categories: Matrix Synthesis [4-6], Simulated Annealing [7, 8], Partition Based Placement [9, 10] and Force Directed Placement [11-14]. Matrix Synthesis method partitions placement into bins, bins are sorted according to its temperature, and redistributes temperature to evenly distribute temperature. Simulated Annealing approach adds thermal penalty to the objective function and discourages moves that violates power distribution constraints. Power dissipation is incrementally updated and temperature profile is obtained based on power dissipation matrix. Partition based approach imposes thermal constraints during partitioning to determine whether a movement of cell is legal or not. Layout is recursively divided into regions such that the difference in temperature is minimized among each region. Force directed approach adds a thermal repulsive force in addition to the repulsive force when solving the Hessian Matrix. In contrast to temperature mitigation at global placement stage, relatively few discussed temperature reduction at post-placement stage. Liu et. al. [1] proposed the concept of empty row insertion and hot wrapper to reduce maximum temperature at expense of chip area increase.

## A. Our Contributions

To maintain a competitive placement while mitigating temperature, we seek an incremental technique that can be applied at postplacement stage. We formulate the problem of balancing regional power density as a network flow problem and solve it using Max-Flow and Cycle Canceling Algorithm. Similar techniques have being applied in legalization [15]. Our framework tightly integrates analytical temperature profiling to constantly update temperature profile in response to change in power density distribution.



Fig. 1. Flow chart of the proposed temperature mitigation framework.

## B. Overview of the Algorithm

Fig. 1 is a flow chart of the proposed methodology. The given input is a legalized placement. Temperature profiling is initiated to identify a maximum temperature bin. A high temperature region can be identified by clustering bins around the maximum temperature bin. Power density is iteratively propagated away from the hot region. Temperature profile is iteratively updated to identify next maximum temperature bin. The process iteratively identifies a maximum temperature bin and reduces regional power density around the bin.

### II. PROBLEM FORMULATION

Routability is the primary objective in placement. Any other objectives are generally considered as secondary. To this end, we seek to perturb the given placement as little as possible while reducing maximum temperature.

**The Post-Placement Temperature Mitigation Problem.** Given a legalized design with known power density for each cell, minimize the maximum on chip temperature with minimal increase in total displacement.

# III. BIN CLUSTERING AND PROPAGATION ORDER DETERMINATION

Temperature profiling can be quickly obtained using analytical approach [14, 16, 17]. The temperature on each bin is known. Since temperature for any single bin is affected by every single power density on the chip. Simply reduce power density on a single bin cannot effectively reduce maximum temperature. Thus, to effectively reduce maximum temperature for a specific bin, we seek to propagate regional power density at broader scale.

Fig. 2 is an illustration of hot region clustering and heat propagation. Using maximum temperature bin as center, the cluster of bins progressively expands until the percentage in temperature difference between selected bin and maximum temperature bin is below  $\gamma$ . In our experiment,  $\gamma$  is set from 75% to 90%. The identified cluster of bins colored in red in Fig. 2(b) represents the clustered region with extraordinary high temperature. To effectively reduce power density within the hot region, an iterative process is proposed using the concept of wave propagation to propagate power density away from hot region.

This work is partially supported by the National Science Council of Taiwan ROC under grant No. NSC 101-2220-E-009-045. Sean Shih-Ying Liu and Ren-Guo Luo contributed equally to this work.

<sup>978-3-9815370-0-0/</sup>DATE13/©2013 EDAA.



Fig. 2. Propagation of power density away from hot region to neighboring bins. (a) Illustration of the temperature profile for a given placement. (b)-(f) Illustration on the procedure for power density propagation.



Fig. 3. Illustration of bin temperature and its graph representation. (a) An example of a 3x3 bin temperature. (b) Supply value on power density for each bin. (c) Graph representation of node supply value. (d) Sub-optimal solution in power density propagation. (e) An optimal solution fors power density propagation.

## IV. BALANCING REGIONAL POWER DENSITY USING NETWORK OPTIMIZATION ALGORITHM

In this section, the problem of balancing regional power density is modeled as a network optimization problem. The goal is to achieve an even distribution of power density with minimum total displacement of cells. Fig. 3(d) and Fig. 3(e) illustrates two approaches to balance regional power density. Fig. 3(e) is an optimal solution by moving cells across fewer bins while Fig. 3(d) provides a sub-optimal solution.

### A. Balancing Regional Power Density Problem

The power density on a given bin  $b_k$  is denoted as  $p_k$ . A region is defined as  $n \times n$  number of bins as illustrated in Fig. 3(a). The corresponding graph representation is illustrated in Fig. 3(c). Each bin  $b_k$  is represented by a node  $n_k$  and an edge e(u, v) exists between every two adjacent nodes. Exchange of power density between two nodes on edge e(u, v) is represented as flow f(u, v) in graph. The cost and capacity on edge e(u, v) is denoted as a(u, v) and c(u, v)respectively.

The average power density over a region of  $n \times n$  bins is denoted as  $p_{avg}$ . In Eq. (1), the *supply value*  $l_k$  of a bin is defined as bin's power density subtracted by average power density. A bin with positive value

of supply  $(l_k > 0)$  is defined as *supply bin* while bin with negative value of supply  $(l_k < 0)$  is defined as *demand bin*.

$$l_k = p_k - p_{avg} \tag{1}$$

Given the previous definitions, the problem of balancing regional power density using minimal propagation cost can be formulated.

**The Regional Power Density Balancing Problem.** Given a graph G(V, E), x supply nodes  $S = \{s_1, s_2, ..., s_x\}$ , y demand nodes  $D = \{d_1, d_2, ..., d_y\}$ , minimize total flow on all edge such that every unit of positive supply on supply nodes are directed to even out negative supply on every demand node.

The mathematical expression of the problem is defined in Eq. (2). The first constraint states the conservation of flow of a given node and second constraint states that net flow of a demand node should equal to its negative supply value. Thus, the problem of balancing regional power density is formulated as a supply-demand problem.

minimize 
$$\sum_{(u,v)\in E} a(u,v) \cdot f(u,v)$$
  
subject to 
$$\sum_{w\in V} f(u,w) = 0, \text{ for all } u \notin \{S,D\}$$
$$\sum_{w\in V} f(w,u_k) = -l_k, \text{ for all } u_k \in \{D\}$$
(2)

#### B. Obtaining a Min-Cost Max-Flow Solution

The problem of multiple supplies and multiple demands with hard constraints on demands can be optimally solved using the combination of Ford-Fulkerson Algorithm and Cycle Canceling Algorithm. A super supply node  $S^*$  and a super demand node  $D^*$  are created to connect to all supply nodes and demand nodes respectively.

While Ford-Fulkerson returns a maximum flow solution, it does not guarantee the flow is minimum cost. Flow with less cost implies less total displacement of cells. Thus, Klein's Algorithm or Cycle Canceling Algorithm can be applied to refine the given flow to a minimum cost solution. Bellman Ford Algorithm is applied to identify negative cost cycle and iteratively saturates every identified negative cost cycle until none can be found.

Eq. (3) determines the capacity value on all edges. To satisfy the demand constraints, the capacity for edge connecting to super supply node is set equal to supply value of the connected node and the capacity for edge connecting to super demand node is set equal to negative supply value of the connected node. Rest of the edges has infinite capacity.

The cost for each edge is determined using Eq. (4). Residual edge has a cost of negative 1 cost while original edge has a positive 1 cost.

$$c(u,v) = \begin{cases} \infty & \text{if } u \neq S^* \text{ and } v \neq D^* \\ l_k & \text{if } u = S^* \\ -l_r & \text{if } v = D^* \end{cases}$$
(3)

$$a(u,v) = \begin{cases} 1 & \text{for all original edges} \\ -1 & \text{for all residual edges} \end{cases}$$
(4)

Fig. 4 is an example of obtaining a solution with minimum cost that satisfies all hard constraints on demands. The seven subfigures illustrate a step-by-step procedure of Ford-Fulkerson Algorithm and Cycle Canceling Algorithm using Bellman Ford. In each subfigure, the graph on top maintains the relative position of the bins with each directed edge representing flow of power density. The graph at the bottom is a re-ordered graph in which each edge represents capacity of edge(except (e) and (f)). The dotted edge represents a residual edge with a negative cost. Rest of the edges not connecting to super node  $S^*$  or  $D^*$  are bi-directional with infinite capacity going both ways with a positive cost.

Fig. 4(a)-(d) illustrates the procedure of Ford-Fulkerson obtaining a maximum flow solution, however, with a sub-optimal cost. Fig. 4(e)-(h) illustrates the procedure of Cycle Canceling Algorithm to obtain a minimum cost solution by saturating identified negative cost cycle. In Fig. 4(e)-(f), the number on each edge denotes the cost per unit flow. In Fig. 4(f), a negative cost cycle is identified through nodes  $D_1 \rightarrow A \rightarrow B \rightarrow S_2 \rightarrow D_2 \rightarrow S_1 \rightarrow D_1$ . The minimum



Fig. 4. Illustration of step-by-step procedure in obtaining a min-cost max-flow solution. (a)-(d) Obtaining a max-flow solution using Ford-Fulkerson Algorithm. (e)-(h) Use Cycle Canceling Algorithm to obtain a min-cost solution.



Fig. 5. Illustration of bin resizing and cell shifting. (a) Original power density in two bins. (b) The new boundary between two bins and reassigns cells to achieve target power density. (c) Boundary is shifted to the calculated position. (d) Boundary is shifted back to the original position and simultaneously shifts the cells to preserve relative order.

capacity on this cycle is 5. Hence, a 5-unit flow is sent along this cycle to saturate the negative cost cycle. After the negative cost cycle is saturated, there is no more unsaturated negative cost cycle and a min-cost max-flow solution is obtained in Fig. 4(h).

### V. PROPAGATION OF POWER DENSITY AND RELOCATION OF CELLS

The concept of iterative local refinement [18] is applied to determine location of cells. Fig. 5 is an example illustrating the procedure of propagating power density between two bins. Fig. 5(a) is an illustration of two bins with power density before power density propagation. After flow of power density between two bins is derived, target power density can be calculated using Eq. (5).

Given the target power density for both bins, cells in both bins are first sorted based on their bottom left x-coordinate in non-decreasing order. Starting from the beginning of the list, a sweep line sweeps across the list and accumulates power density along the way. Sweep line stops when accumulation of power density exceeds target density.

The boundary of two bins is then shifted to the x-coordinate of the next cell in list. Cells located to the left of boundary are assigned to left bin while rest of cells are assigned to right bin. In Fig. 5(c), sweep lines stops at cell L and boundary of bins is shifted to x-coordinate of cell C.

After cells are reassigned, cells are relocated by compressing/expanding bins to return to its original dimension. For cells within compressed bin, cells are more spread apart when bin expands to its original dimension. Vice versa for cells within expanded bin, cells are more compact when bin compresses to its original dimension. Fig. 5(d) illustrates the relocation of cells after bin return to its original dimension.

### VI. EXPERIMENTAL RESULT

 
 TABLE I.
 Comparison with Prior Works [1, 4, 14] on Total HPWL After Legalization

|     |       | Original | [4]     | [1]     | [14]   | Our    |          |
|-----|-------|----------|---------|---------|--------|--------|----------|
| #   | Cell# | HPWL     | HPWL    | HPWL    | HPWL   | HPWL   | Time(s.) |
| al  | 211K  | 87.69    | LG Fail | 108.21  | 90.76  | 104.00 | 5.90     |
| a2  | 255K  | 98.22    | LG Fail | 108.68  | 105.76 | 104.98 | 45.22    |
| a3  | 452K  | 248.00   | LG Fail | 290.60  | 258.81 | 261.42 | 125.68   |
| a4  | 496K  | 212.09   | LG Fail | 291.92  | 219.12 | 220.86 | 56.25    |
| b1  | 278K  | 107.46   | LG Fail | 121.31  | 118.61 | 122.23 | 83.88    |
| b2  | 558K  | 155.53   | LG Fail | 178.86  | 163.46 | 165.87 | 119.57   |
| b3  | 1110K | 378.09   | LG Fail | 472.95  | 439.27 | 416.73 | 280.75   |
| b4  | 2180K | 897.69   | LG Fail | 1093.94 | 909.50 | 916.11 | 124.80   |
| Nm. | -     | 0.922    | -       | 1.109   | 0.982  | 1.000  | -        |

In this section, we present our experimental result by comparing with some of the prior works on thermal aware placement. Parameters from *Zhan and Sapatenkar* [16] is used during implementation of thermal model.

The entire framework is implemented with standard C++ language and experimented on Intel Xeon E5530 Quad Core machine operating at 2.4GHz. ISPD 2005 Placement benchmarks are used as input. Since

TABLE II.Comparison with Prior Works [1, 4, 14] on Temperature Mitigation Techniques on Maximum Temperature ( $T_{max}$ ) and<br/>Maximum Temperature Difference Between Adjacent Bins( $\delta T_S$ )

|            |       | Original  |              | Chu et. al. [4] |              | Liu et. al. [1] |              | Aroonsantidecha et. al. [14] |              | Our       |              |
|------------|-------|-----------|--------------|-----------------|--------------|-----------------|--------------|------------------------------|--------------|-----------|--------------|
| GP/DP      |       | -         |              | DP              |              | DP              |              | GP                           |              | DP        |              |
| benchmarks | Cell# | $T_{max}$ | $\delta T_S$ | $T_{max}$       | $\delta T_S$ | $T_{max}$       | $\delta T_S$ | $T_{max}$                    | $\delta T_S$ | $T_{max}$ | $\delta T_S$ |
| adaptec1   | 211K  | 63.95     | 1.02         | 47.79           | 0.05         | 62.87           | 1.03         | 60.21                        | 0.98         | 56.76     | 0.85         |
| adaptec2   | 255K  | 50.44     | 1.04         | 35.92           | 0.09         | 49.88           | 1.01         | 50.45                        | 1.01         | 48.56     | 0.91         |
| adaptec3   | 452K  | 46.14     | 1.25         | 33.11           | 0.16         | 45.59           | 1.25         | 44.01                        | 1.34         | 43.51     | 1.26         |
| adaptec4   | 496K  | 46.57     | 1.25         | 33.91           | 0.15         | 46.24           | 1.24         | 45.18                        | 1.22         | 45.29     | 1.25         |
| bigblue1   | 278K  | 73.54     | 1.12         | 51.23           | 0.06         | 72.05           | 1.11         | 70.77                        | 1.09         | 67.14     | 0.95         |
| bigblue2   | 558K  | 52.09     | 1.05         | 41.04           | 0.11         | 51.79           | 1.05         | 51.03                        | 1.02         | 50.61     | 1.00         |
| bigblue3   | 1110K | 73.79     | 2.92         | 39.98           | 0.42         | 71.86           | 2.95         | 65.46                        | 2.84         | 64.93     | 2.50         |
| bigblue4   | 2180K | 75.35     | 3.01         | 45.41           | 0.30         | 75.05           | 3.01         | 72.37                        | 2.98         | 66.59     | 2.82         |
| Norm.      | -     | 1.076     | 1.097        | 0.742           | 0.101        | 1.066           | 1.094        | 1.032                        | 1.081        | 1.000     | 1.000        |



(a) Before Temperature Mitigation

(b) After Temperature Mitigation

Fig. 6. Temperature distribution on input benchmark adaptec1 before and after temperature mitigation.

ISPD Placement benchmarks do not contain power information of cell, a new definition file is created to define the power consumption of each cell. The power density for each cell ranges from 0 to  $2 \times 10^6$   $W/m^2$ . FastDP 3.1[18] is used for final legalization.

Experiment is performed using a legalized placement. Post-Placement temperature mitigation techniques based on matrix synthesis proposed in [4] and empty row insertion proposed in [1] are re-implemented to the very best of our understanding. Table I presents the HPWL of a legalized placement after temperature mitigation<sup>1</sup>.

In comparison with [1], the work done in [1] has 6.6% higher maximum temperature with with 10.9% more HPWL. Techniques proposed in [1] may not be suitable with designs in presence of many macro blocks.

Table II compares result of proposed technique with prior works on thermal aware placement.  $T_{max}$  denotes maximum bin temperature and  $\delta T_S$  denotes maximum temperature difference between any two adjacent bins. In Table II, our proposed network based technique can effectively reduce temperature difference  $\delta T_S$ . Fig. 6 is the temperature distribution before and after our technique is conducted. Techniques proposed in [4] achieves best temperature reduction by redistributing power across entire chip, however, such greedy approach also significantly increase HPWL and leads to failure during legalization on all 8 testcases. Nevertheless, result from [4] offers an indication to the upper bound of temperature reduction.

## VII. CONCLUSION

In this paper, a framework that tightly integrates analytical temperature profiling with network flow based power density mitigation technique is proposed. By modeling regional power density balancing problem as supply-demand problem, temperature profile can be effectively smoothed out with minimal increase to total displacement.

#### REFERENCES

 W. Liu, A. Nannarelli, A. Calimera, E. Macii, and M. Poncino, "Post-Placement Temperature Reduction Techniques," in *Design Automation Test in Europe Conference*, 2010.

- [2] S. Chaudhury, "A Tutorial and Survey on Thermal-Aware VLSI Design: Tools and Techniques," 2009.
- [3] J.-L. Tsai, C.-P. Chen, G. Chen, B. Goplen, H. Qian, Y. Zhan, S.-M. Kang, M. Wong, and S. Sapatnekar, "Temperature-Aware Placement for SOCs," *Proceedings of the IEEE*, vol. 94, pp. 1502–1518, Aug. 2006.
- [4] C. C. N. Chu and M. D. F. Wong, "A Matrix Synthesis Approach to Thermal Placement," in *International Symposium on Physical Design*, pp. 163–168, 1997.
- [5] P. Ghosal and P. Dasgupta, "Thermal Aware Placement in 3D ICs," in Advances in Recent Technologies in Communication and Computing, 2010.
- [6] H. R. Prasun Ghosal and P. Dasgupta, "Minimizing Thermal Disparities During Placement in 3D ICs," in *Computational Science and Engineering*, 2010.
- [7] C. H. Tsai and S. M. Kang, "Cell-Level Placement for Improving Substrate Thermal Distribution," in *Transaction on Computer Aided Design of Integrated Circuits and Systems*, vol. 19, pp. 253–266, Feb. 2000.
- [8] J. Cong, J. Wei, and Y. Zhang, "A Thermal-Driven Floorplanning Algorithm for 3D ICs," in *International Conference on Computer Aided Design*, pp. 306–313, 2004.
- [9] G. Chen and S. Sapatnekar, "Partition-Driven Standard Cell Thermal Placement," in *International Symposium on Physical Design*, pp. 75–80, 2003.
- [10] J. Li and H. Miyashita, "Thermal-Aware Placement Based on FM Partition Scheme and Force-Directed Heuristic," *Transaction on Fundamental Electronic Communication Computeter Science*, pp. 989–995, April 2006.
- [11] B. Obermeier and F. M. Johannes, "Temperature-Aware Global Placement," in Asia and South Pacific Design Automation Conference, pp. 143–148, 2004.
- [12] A. B. Kahng, S. mo Kang, W. Li, and B. Liu, "Analytical Thermal Placement for VLSI Lifetime Improvement and Minimum Performance Variation," in *International Conference on Computer Design*, pp. 71–77, 2007.
- [13] B. Goplen and S. Sapatnekar, "Efficient Thermal Placement of Standard Cells in 3D ICs Using a Force Directed Approach," in *International Conference on Computer Aided Design*, pp. 86–89, 2003.
- [14] S. Aroonsantidecha, S. S.-Y. Liu, C.-Y. Chin, and H.-M. Chen, "A Fast Thermal Aware Placement with Accurate Thermal Analysis Based on Green Function," in *Asia and South Pacific Design Automation Conference*, 2012.
- [15] U. Brenner, "Vlsi Legalization with Minimum Perturbation by Iterative Augmentation," in *Design, Automation Test in Europe Conference*, pp. 1385–1390, march 2012.
- [16] Y. Zhan and S. S. Sapatnekar, "High Efficiency Green Function-Based Thermal Simulation Algorithms," in *Transaction on Computer Aided Design of Integrated Circuits and Systems*, vol. 26, pp. 1661–1675, Sept. 2007.
  [17] P.-Y. Huang and Y.-M. Lee, "Full-Chip Thermal Analysis for
- [17] P.-Y. Huang and Y.-M. Lee, "Full-Chip Thermal Analysis for the Early Design Stage via Generalized Integral Transforms," in *Transaction on Very Large Scale Integration System*, vol. 17, pp. 613–626, May 2009.
- [18] M. Pan, N. Viswanathan, and C. C. N. Chu, "An Efficient and Effective Detailed Placement Algorithm," in *International Conference on Computer Aided Design*, pp. 48–55, 2005.

<sup>&</sup>lt;sup>1</sup>The size of region(nxn bins) is adjusted for each testcase. The size of region range from 3x3 bins to 13x13 bins.