# 3D Reconfigurable Power Switch Network for Demand-supply Matching between Multi-output Power Converters and Many-core Microprocessors

Kanwen Wang, Hao Yu\*, Benfei Wang and Chun Zhang School of Electrical and Electronic Engineering Nanyang Technological University, Singapore 639798

*Abstract*—A 3D reconfigurable power switch network is introduced to optimally provide demand-supply matching between on-chip multi-output power converters and many-core microprocessors. For effective DVFS power management of many cores by area-efficient on-chip power converters, the reconfigurable power switch network supports space and time multiplexed access between power converters and cores. An integer linear programming is deployed to find one configuration of space-time multiplexing that can match between supply and demand with balanced utilization. The overall power management system is verified in SystemC-AMS based models. Experiment results show that the proposed design achieves 35.36% power saving on average when compared to the one without using the proposed power management.

## I. INTRODUCTION

The development of exa-flop-scale high-performance data center for cloud computing has imposed the need of tera-flop-scale high performance computing system with hundreds of microprocessor cores integrated on single chip [1], [2]. 3D integration is one promising approach for integrating many-core microprocessors [3]. However, such a high density integration can introduce severe power and thermal issues, which may significantly affect the system performance and reliability. As such, there is a need to not only deal with the surge of current but also to supply the required voltage-levels for various demands from many-core microprocessors [4], [5], [6], [7], [8], [9], [10].

Dynamic voltage and frequency scaling (DVFS) is generally deployed for power management, which requires power converters able to adjust and deliver the desired voltage-level [11]. Off-chip power converters may not be scalable for the surge of current demand of 3D many-core microprocessors due to large delivery loss and severe delivery integrity [12]. The integrated on-chip power converters [5], [6], [7] allow prompt response to the energy demand compared to off-chip power converters. However, the primary limitation to have on-chip power converters comes from the area limitation of buck inductor. For example in [6], the active area is  $1.3mm^2$  with  $10\mu H$  inductance. As such, providing on-chip power converter for each core is infeasible due to the large area overhead. The recent solution is to achieve multiple voltage-levels with the use of single-inductor-multiple-output (SIMO) power converters [8], [9], [10]. In SIMO, each core is periodically allocated with

certain time-slot to supply one voltage-level. By switching sufficiently, the SIMO power converter is able to drive multiple cores with different voltage-levels in a time-multiplexed manner. Note that the capability of SIMO is still limited for hundreds of many-core microprocessors. Moreover, considering hundreds of cores to be integrated on one die, the remaining area is quite limited to consider on-chip power converter with buck inductor.

The 3D integration of logic and memory also brings the possibility for the room of on-chip power converters. The work in [13] has demonstrated the possibility to design power converter on one die and 64-tile network-on-chip on the other die, which are integrated by through silicon interposer and via (TSI/TSV). To further explore the matching demand from between the many-core microprocessors and the supply from on-chip power converters at large scale, in this paper, we propose a 3D reconfigurable power switch network. The many-core microprocessors are on one die, on-chip power converters are on the other die, and TSVs are configured by power switch network to connect microprocessors and power converters under a space-time multiplexing. Multiple power converters can be shared by cores in space; and each power converter can be further shared by cores in time. With the use of integer linear programming (ILP), the most matched configuration of supplied voltage-levels from power converters is found to meet the demand from many cores. The overall power management system is verified by system-level behavior model implemented in SystemC-AMS for up to 32 microprocessor cores. The experiment results show that the proposed scheme can improve the power saving by 35.36% on average when compared the one without using the proposed power management.

The rest of this paper is organized as follows. In Section II, we present the 3D system architecture with power switch network. In Section III, we formulate space-time multiplexing problem and solve it with ILP solution. We present the power management circuit system model and the experiment result in Section IV with conclusion in Section V.

#### II. 3D RECONFIGURABLE POWER SWITCH NETWORK

In this section, we describe a 3D many-core system architecture with reconfigurable power switch network. Table I summarizes the necessary notations.

As shown in Fig. 1, the proposed system can be described by three parts below:

<sup>978-3-9815370-0-0/</sup>DATE13/©2013 EDAA

<sup>\*</sup> Please address comments to haoyu@ntu.edu.sg. This work is sponsored by Singapore MOE TIER-2 fund MOE2010-T2-2-037 (ARC 5/11) and A\*STAR SERC-PSF fund 11201202015.



Fig. 1: 3D reconfigurable power switch network for demand-supply matching between on-chip multi-output power converters and many-core microprocessors



Fig. 2: (a) Timing diagram of power management controller; (b) Functional units of power management circuit.

- Power Demand: a set of cores C with demanded voltage-levels with set-size  $N_c$ . Each core  $c_i$  has a demanded voltage-level  $v_d(c_i)$  to meet the deadline of its running task. In addition,  $v_a(c_i)$  is the allocated voltage-level to  $c_i$  after power management.
- Power Supply: a set of power converters R with setsize N<sub>r</sub>. Each power converter outputs the voltagelevel v(r<sub>i</sub>) ∈ V to supply the cores, where V is the set of available voltage-levels with set-size N<sub>v</sub> before power management;
- *Power Switch Network*: a set of reconfigurable switchboxes S with set-size N<sub>s</sub> to connect between R and C for demand-supply matching.

The overall system architecture is composed of microprocessor cores at the top tier die and power management circuit, including the power switch network and power converters, at the bottom tier die. The two tiers are connected by TSVs with reconfigurable connections enabled by power switch network. The power converters considered here generate multiple output voltage-levels and can be shared with multiple space and time access.

Fig. 2(a) presents the timing graph for power management controller, while Fig. 2(b) further shows the functional units of the bottom tier of power management circuit. The system works with every control cycle as follows. Firstly, the voltage and current sensors sample voltage and current values from the cores as power profile. By tracking power profiles of cores, the demanded voltage-levels of cores for the next period of control can be predicted based on a pre-stored training lookup table. The predicted data were then sent to DVFS unit. Next, the DVFS unit will decide the best space-time multiplexing that can meet the demands from cores, which are connected by a set of power converters with the minimum voltage-levels and the most balanced utilization. Specifically, this matching can be solved by an integer linear programming (ILP) problem within one of microprocessor cores. As a result, the power switch network are configured accordingly for TSV connections between power converters and cores.

Note that the traditional island-based [4] and SIMO-based [8], [9], [10] power managements assume the fixed connections between power converters and cores. As such, the design specification of power converter is usually in a pessimistic manner to supply a surge of the maximum current. By implementing reconfigurable power networks, space-time multiplexing problem can be formulated to match the demand and supply. As such, power converters can be greatly shared by cores with both space and time.

| Notations                      | Definitions                                |  |  |  |
|--------------------------------|--------------------------------------------|--|--|--|
| $V = \{v_1, \ldots, v_{N_v}\}$ | Set of voltage-levels                      |  |  |  |
| $I = \{i_1, \ldots, i_{N_v}\}$ | Set of core load currents                  |  |  |  |
| $R = \{r_1, \ldots, r_{N_r}\}$ | Set of power converters                    |  |  |  |
| $C = \{c_1, \ldots, c_{N_c}\}$ | Set of cores                               |  |  |  |
| $S = \{s_1, \dots, s_{N_s}\}$  | Set of switch boxes                        |  |  |  |
| $T = \{t_1, \ldots, t_{N_r}\}$ | Set of power stations                      |  |  |  |
| $p(c_i)$                       | Power consumptions of core $c_i$           |  |  |  |
| $v_d(c_i) \in V$               | Demanded voltage-level from core $c_i$     |  |  |  |
| $v_a(c_i) \in V$               | Supplied voltage-level to core $c_i$       |  |  |  |
| $v(r_i) \in V$                 | Output voltage-level of converter $r_i$    |  |  |  |
| $\sigma(t_i)$                  | Boolean variable of power station validity |  |  |  |
| $I_c$                          | Maximum converter inductance current       |  |  |  |
| $I_{max}$                      | Maximum load current                       |  |  |  |
| $C_L$                          | Load capacitance                           |  |  |  |
| $\Delta V$                     | Maximum core supply-voltage drop           |  |  |  |
| Н                              | H Time slot for time-multiplexing          |  |  |  |

TABLE I: Notations for 3D reconfigurable power switch network by space-time multiplexing

## III. SPACE-TIME MULTIPLEXING BY ILP OPTIMIZATION

In this section, we first present the problem of space-time multiplexing for the proposed 3D reconfigurable power switch network. Then, we discuss how to apply integer linear programming (ILP) to find the optimal supply-demand matching. Upon the obtained solution, the state transition in power management controller is performed and the power switch network is reconfigured accordingly.

#### A. Space-Time Multiplexing Problem

A number of relevant definitions are first presented below.

Definition 1: the space-time multiplexing problem is defined as follows: there are  $N_r$  power converters shared spatially by  $N_c$  different cores through  $N_s$  power switches, while each power converter can switch among  $N_v$  different voltage-levels at a fixed time slot H to supply multiple voltage-levels simultaneously.

Definition 2: the power station  $t_i$  is defined as the vector of  $[R(t_i), C(t_i), S(t_i), \sigma(t_i)]$ , where  $R(t_i)$  is the supplying power converter,  $C(t_i)$  is the set of cores driven by  $R(t_i)$ , and  $S(t_i)$  is the set of switch box configurations to connect  $C(t_i)$  with  $R(t_i)$ . The Boolean variable  $\sigma(t_i)$  indicates the validity of power station  $t_i$ 

$$\sigma(t_i) = \begin{cases} 1 & t_i \text{ is a valid power station} \\ 0 & t_i \text{ is an invalid power station} \end{cases}$$
(1)

where each valid power station  $t_i$  corresponds to one feasible matched solution with pruning of unnecessary solutions. As such, one can further define state space of power station as follows.

Definition 3: the state w is valid if  $\sigma(t_i) = 1$ ,  $\forall t_i \in T$ , and the state w is *invalid* if  $\exists \sigma(t_i) = 0$ ,  $t_i \in T$ . The validity of each power station  $t_i$  is determined by the driving ability for one solution of  $t_i$ .

Due to the physical circuit constraints,  $\sigma(t_i) = 1$  if and only if the following conditions can be satisfied: (i) the maximal power converter inductance current does not exceed a specified value of  $I_c$ ; and (ii) the maximal core voltage-drop is within a specified value of  $\Delta V$  during multiplexing. Based on the aforementioned definitions, the space-time multiplexing is performed as follows. At each control cycle, the voltage-level of each core  $c_i$  is tracked by sensors with auto-regression prediction. The demanded voltage-level  $v_d(c_i)$  is provided from the last prediction. By finding the optimal matched space-time multiplexing such that  $v_a(c_i) \ge v_d(c_i)$  can be satisfied for all cores, the controller sets the state transition which represents the change of reconfigurable power switch network to connect different set of TSVs between power converters and cores.

The cost function to make one state transition is described by *distance* between one invalid state w' to one valid state w as

$$d(w',w) = \sum_{i=1}^{N_c} (v_a(c_i) - v_d(c_i))^2.$$
 (2)

Intuitively, (2) sums up the unmatched supply-demand voltage-levels for all cores, which leads to mismatched power waste at the state w. Obviously, the smaller the distance d is, the less mismatched power waste, and thus the less power consumption. As such, the objective of state transition is to find the valid state w with the minimal distance to w':  $tr(w') = \arg\min_w (d(w', w))$ . Therefore, the space-time multiplexing becomes the problem to find the matched tr(w'), which can be solved as follows.

#### B. ILP Matched Solution

By configuring the power switch network and supplying different voltage-levels to cores, the problem is to find the power station settings that can minimize the allocated supply voltage-levels to each core with demanded voltage-level. As discussed above, it becomes the minimization problem of tr(w'). Although tr(w') can be found by enumerating in the entire state space, the brute-force searching suffers from  $O(N_c N_r)$  complexity. In addition, it is non-trivial to balance the load of power converters.

In this paper, we formulate the searching of tr(w') as one 0-1 integer linear programming (ILP) problem below.

$$\begin{array}{ll} \text{min:} & \sum_{i=1}^{N_c} \sum_{j=1}^{N_r} \sum_{v=1}^{N_v} v_v \cdot x_{ij}^v \\ \text{s.t.:} & (\text{i}) & \sum_{j=1}^{N_r} \sum_{v=1}^{N_v} x_{ij}^v = 1, \, \forall 1 \leq i \leq N_c \\ & (\text{ii}) & \sum_{j=1}^{N_r} \sum_{v=1}^{N_v} v_v \cdot x_{ij}^v \geq v_d(c_i), \, \forall 1 \leq i \leq N_c \\ & (\text{iii}) & \sum_{i=1}^{N_c} i_v \cdot x_{ij}^v \leq I_c, \, \forall 1 \leq j \leq N_r, \, 1 \leq v \leq N_v \\ & (\text{iv}) & \sum_{i=1}^{N_c} \sum_{v=1}^{N_v} x_{ij}^v \leq 1 + \frac{\Delta V \cdot C_L}{I_{max}H}, \, \forall 1 \leq j \leq N_r \\ & (\text{v}) \, N_{min} \leq \sum_{i=1}^{N_c} \sum_{v=1}^{N_v} x_{ij}^v \leq N_{max}, \, \forall 1 \leq j \leq N_r. \end{array}$$

In (3), the Boolean variable  $x_{ij}^v$  equals 1 if and only if the core  $c_i \in C$  is supplied by the power converter  $r_j \in R$  at the voltage-level  $v_v \in V$ , as explained in (4)

$$x_{ij}^{v} = \begin{cases} 1 & c_i \text{ supplied by } r_j \text{ at voltage-level } v \\ 0 & \text{otherwise} \end{cases}$$
(4)

Note that the objective of (3) is to minimize the total allocated voltage-levels under three types of constraints. The first type includes two constraints to ensure that each core is correctly supplied: (i) each core is supplied by exactly one power converter with exactly one voltage-level; and (ii) the allocated voltage-level for each core must satisfy its demand. The second type includes two constraints to guarantee the valid state of power converters under design specifications: (iii) the maximal power inductance current does not exceed  $I_c$ ; and (iv) the maximal core voltage-drop does not exceed  $\Delta V$ . Note that  $C_L$  is the core load capacitance,  $I_{max}$  is the maximum load current and H is the switching time slot of power converter under time-multiplexing mode [9]. The third type includes one constraint to make sure the load balance: (v) each power converter has the minimal and maximal number of cores to drive, such that no power converter will be under or over utilized.

The ILP problem (3) can be solved by one microprocessor core in the scale of milliseconds, which is faster compared to the off-chip power converter based DVFS management in the scale of seconds. The space-time multiplexing is then configured by the searched optimal solution of power station for the matched supply and demand. Accordingly, the controller can configure the power switch network to connect TSVs between power converters and cores.

## **IV. SIMULATION RESULTS**

## A. System Modeling and Settings

The proposed system is validated by system-level models built from SystemC-AMS. Table II summarized the system design specifications. All units are scaled or modeled at 130nm technology. The specification of low power MIPS microprocessor core [14] is taken as the core model. Each core has the nominal frequency of 250MHz and the maximal power consumption of 0.4W. Benchmarks from SPEC-2000 [15] are simulated by Wattch [16] to generate power profiles. The power profiles are applied to different cores as workload tasks randomly. The ILP problem is solved by lp\_solve [17] with typical solving time in the scale of milliseconds with the formulated matching problem space-time for multiplexing.

Fig. 3 shows the circuit diagram of reconfigurable power switch network, which composes of power converters, 3D TSVs and switch boxes. A 2-phase multi-output power converter [18] is used to generate 4 different voltage-levels with settings similar to [19]. To supply the maximum current of cores, the inductance value in power converter is set as 1nH per phase. Such an inductor requires an area of  $0.25mm^2$ , which occupies 30% area of the power converter. The design of on-chip power converter thereby needs to consider the limitation of inductor area, which is placed on



Fig. 3: Circuit diagram of reconfigurable power switch network

another tier. Moreover, the vertical TSV [20] with the size of  $500 \mu m^2$  works between cores and power converters. According to the model in [21], it has a dc resistance of  $20m\Omega$ . Considering the maximum load current of 350mA, the IR-drop of is around 7mV, which is quite small. In addition, as the capacitor of TSV is in fF scale, it does not influence the load capacitance. For each TSV channel, one switch box is assigned with Nr power switches to support the core-converter connection. The switch box offers a compact reconfigurable unit driven by the controller. The power switch inside each switch box is designed with the size of  $520\mu m^2$  to be able to deliver the maximum core current with switching time of 300ns. For a fully-connected power network,  $N_r * N_c$  power switches are needed. In a 16-core example, 4 power converters and 64 power switches will be used with the implementation area at the bottom tier  $6.73mm^2$  excluding the controller. Though additional layer is used for power switch network, it may provide fully flexible DVFS controllability to maintain reasonable power density for the 3D many-core system.

## B. System Model Verification

1) Runtime Load Power Tracking and Prediction: To decide the demanded voltage-level  $v_d(c_i)$  of core  $c_i$  under the space-time multiplexing power management, the core power  $p(c_i)$  needs to be tracked and predicted. In this paper, the power tracking and prediction is based on an auto regression (AR) algorithm [22]. At each sampling time t, the latest sampled load power  $p_{t-1}$  can be read from current and voltage sensor. Along with the previous recorded load power information  $p_{t-2}, p_{t-3}, \ldots$ , the transient power  $p_t$  needed for the next time interval can be predicted by

$$p_t = \sum_{i=1}^{K} a_i \cdot p_{t-i} + \epsilon \tag{5}$$

where K is the order of the model,  $a_i$  is the auto regression coefficients, and  $\epsilon$  is the prediction error. K is set to 8 to guarantee the precision of prediction and decrease the complexity of calculation. These coefficients can be calculated by core with the least square method and updated in every prediction cycle. Based on the predicted power consumptions the required voltage-level is looked up in the

| Item             | Description                                                   | Symbol    | Value                      | Size            |  |
|------------------|---------------------------------------------------------------|-----------|----------------------------|-----------------|--|
|                  | Performance                                                   | N.A.      | 410 DMIPS                  |                 |  |
| Microprocessor   | Frequency                                                     | $f_c$     | 250MHz                     | $1.5 mm^{2}$    |  |
|                  | Power Consumption                                             | $P_c$     | 0.4W                       |                 |  |
|                  | Input Voltage                                                 | $V_{in}$  | 2.4V                       |                 |  |
|                  | Output Voltage                                                | $V_{out}$ | 0.6V, 0.8V, 1.0V, 1.2V     |                 |  |
|                  | Load Current                                                  | Iload     | 120mA, 150mA, 220mA, 350mA |                 |  |
| Power Converter  | Flying Capacitance                                            | $C_{fly}$ | 18nF                       | $1.6mm^2$       |  |
| I ower Converter | Number of Phases<br>Inductor per Phase<br>Switching Frequency |           | 2                          | 1.0//////       |  |
|                  |                                                               |           | 1nH                        |                 |  |
|                  |                                                               |           | 50-200MHz                  |                 |  |
|                  | Peak Efficiency                                               | N.A.      | 77%                        |                 |  |
|                  | Length                                                        | l         | $25 \mu m$                 |                 |  |
|                  | Diameter                                                      | W         | $5\mu m$                   | _               |  |
| TSV              | Isolation Film                                                | r         | 120nm                      | $500 \mu m^{2}$ |  |
|                  | Resistance                                                    |           | $20m\Omega$                |                 |  |
|                  | Capacitance                                                   | $C_{TSV}$ | 37 <i>f</i> F              |                 |  |
|                  | Width                                                         | $W_s$     | 4mm                        |                 |  |
| Power Switch     | witch Length                                                  |           | 130nm                      | $520 \mu m^{2}$ |  |
|                  | Switching Time                                                | N.A.      | 300ns                      |                 |  |

TABLE II: System settings of 3D many-core microprocessors, power converters, TSVs and reconfigurable power switch network

TABLE III: Space-time multiplexing: average power consumption and controller runtime

| Core Number | Benchmarks                               | Power per Core (mW) |          | Power Saving (%) | Controller Runtime (ms) |
|-------------|------------------------------------------|---------------------|----------|------------------|-------------------------|
|             | Denchmarks                               | Space-Time          | Non-DVFS | rower saving (%) | Controller Kuntime (ms) |
| 4           | Group 1: art, eon, lucas, wupwise        | 279.50              | 393.71   | 29.01%           | 7.30                    |
|             | Group 2: apsi, gcc, gzip, mcf            | 168.32              | 349.34   | 51.82%           | 9.50                    |
|             | Group 3: facerec, galgel, twolf, crafty  | 224.95              | 366.14   | 38.56%           | 7.20                    |
|             | Group 4: vortex, parser, mgrid, sixtrack | 240.06              | 385.85   | 37.78%           | 10.70                   |
| 8           | Group 1 + Group 2                        | 223.17              | 371.53   | 39.93%           | 25.00                   |
|             | Group 1 + Group 3                        | 252.24              | 379.93   | 33.61%           | 27.10                   |
|             | Group 1 + Group 4                        | 260.04              | 389.78   | 33.29%           | 37.00                   |
|             | Group 2 + Group 3                        | 195.34              | 357.74   | 45.40%           | 21.70                   |
|             | Group 2 + Group 4                        | 202.65              | 367.60   | 44.87%           | 30.40                   |
|             | Group 3 + Group 4                        | 231.71              | 376.00   | 38.38%           | 29.80                   |
| 16          | All Groups                               | 309.38              | 373.36   | 17.22%           | 50.80                   |
| 32          | All Groups                               | 319.93              | 374.14   | 14.49%           | 336.30                  |
|             | Average                                  | 242.27              | 373.79   | 35.36%           | N.A.                    |

pre-defined table, which will be explained in the following paragraph. As such, the demanded voltage-levels can be provided as input in a runtime fashion.

Fig. 4 shows the power tracking and prediction results for benchmark gcc. Fig. 4(a) shows that the red predicted power profile can closely match the actual blue one. In addition, Fig. 4(b) illustrates that the predicted power consumption can be successfully used to track the demanded voltage-level of one core. Since the power consumption is in proportion to the supplied voltage-level, larger power consumption requires higher supply voltage-level. As such, one lookup table is built with each voltage-level  $v_i$  corresponding to a certain range of power consumption value, and the predicted power consumption is utilized to look up the demanded voltage-level as shown in Fig. 4(b). The four power threshold and the corresponding voltage pairs in the lookup table are:  $(<0.17W, 0.6V), (0.17W \sim 0.19W, 0.8V), (0.19W \sim 0.21W)$ (1.0V) and (>0.21W, 1.2V). For example, when the power demand increases more than 0.22W, the voltage-level will be assigned to 1.2V to meet the power demand.

2) Space-time Multiplexing State Transition: To verify the correctness of the proposed space-time multiplexing, we take the 16-core microprocessor as an example. Fig. 5 illustrates one typical state-transition during the power management, in which different filling-shapes represent different power-station settings. At the beginning of control cycle, the demanded voltage-level for each core is tracked and predicted, which makes the current system state (i.e., top-left) invalid. As such, the ILP searching is triggered to find the optimal state to transit, which is shown in the



Fig. 4: Runtime power tracking and prediction for benchmark *gcc*: (a) power prediction; (b) voltage-level transition

top-right of the figure. The corresponding voltage-level transitions of *core 4*, *core 6*, *core 7* and *core 12* (i.e., one core from each power station) are extracted and plotted in the bottom-right of Fig. 5, which shows the correct transition for each core to be matched with the demanded voltage-level. For example, at the beginning the *core 4* is connected to power converter A with 1.0V voltage output. When the power management controller decides its next-cycle voltage-level as 1.2V, the connection to *core 4* 



Fig. 5: Space-time multiplexing with voltage-level transition

will be adjusted so that it will be connected to power converter B with 1.2V voltage output. Then the *core* 4 enters a stable state till the next power management control.

## C. Power Saving Comparison

We further show the advantage of the proposed space-time multiplexing power management. To eliminate any benchmark induced bias, we randomly generate four groups of benchmarks selected from SPEC-2000 benchmark, and combine these groups for different cores to run.

Table III compares the average power consumption per core between the proposed space-time multiplexing power management (i.e., the third column) and the case without using power management (i.e., the fourth column). For example, in the first group of benchmark with the core number of 4, the space-time multiplexing power management will achieve 29.01% power saving over that without using power management. As shown in Table III, 35.36% of power saving is achieved on average by utilizing the proposed space-time multiplexing.

In addition, the average runtime of the controller including ILP solving is illustrated in the last column, which is typically in the millisecond scale. The results have indicated the potential advantages in the proposed space-time multiplexing based power management for 3D many-core microprocessors and on-chip power converters.

## V. CONCLUSION

With the introduction of 3D reconfigurable power switch network, this paper explores the space-time multiplexing power management for demand-supply matching between on-chip power converters and many-core microprocessors. The power switch network is configured to perform space-time multiplexing between power converters and cores by vertical TSVs. Integer linear programming is applied to find the optimal matched solution. The proposed approach can maximumly utilize power converters to supply the demanded voltage-levels from cores. As verified by system-level behavior model implemented in SystemC-AMS, experiment results show that the space-time multiplexing can reduce power by 35.36% on average when compared to the one without using the proposed power management.

#### REFERENCES

- S. Vangal and et.al., "An 80-Tile 1.28TFLOPS network-on-chip in 65 nm CMOS," in *IEEE Intl. Solid-State Circuits Conf.*, 2007, pp. 98–108.
- [2] S. Bell and et.al., "TILE64<sup>TM</sup> processor: a 64-core SoC with mesh interconnect," in *IEEE Intl. Solid-State Circuits Conf.*, 2008, pp. 88– 98.
- [3] M. Healy and et.al., "Design and analysis of 3D-MAPS: a many-core 3D processor with stacked memory," in *IEEE Custom Integrated Circuits Conf.*, 2010, pp. 1–4.
- [4] S. Garg and et.al., "Technology-driven limits on DVFS controllability of multiple voltage-frequency island designs: a system-level perspective," in ACM/IEEE Design Automation Conf., 2009, pp. 818–821.
- [5] W. Kim and et.al., "System level analysis of fast, per-core DVFS using on-chip switching regulators," in *IEEE Int. Symp. on High Perf. Computer Arch.*, 2008, pp. 123–134.
- [6] F. Luo and D. Ma, "Integrated adaptive step-up/down switching DC-DC converter with tri-band tri-mode digital control for dynamic voltage scaling," in *IEEE Int. Symposium on Industrial Electronics*, 2008, pp. 142–147.
- [7] J. Wibben and R. Harjani, "A high-efficiency DC-DC converter using 2 nH integrated inductors," *IEEE J. of Solid-State Circuits*, vol. 43, no. 4, pp. 844–854, 2008.
- [8] R. Bondade and D. Ma, "Hardware-software codesign of an embedded multiple-supply power management unit for multicore SoCs using an adaptive global/local power allocation and processing scheme," ACM Trans. on Design Auto. of Electronic Syst., vol. 16, no. 3, p. 31, 2011.
- [9] D. Ma and et.al., "Single-inductor multiple-output switching converters with time-multiplexing control in discontinuous conduction mode," *IEEE J. of Solid-State Circuits*, vol. 38, no. 1, pp. 89–100, 2003.
- [10] M. Huang and et.al., "Single-inductor multi-output (SIMO) DC-DC converters with high light-load efficiency and minimized crossregulation for portable devices," *IEEE J. of Solid-State Circuits*, vol. 44, no. 4, pp. 1099–1111, 2009.
- [11] T. Burd and et.al., "A dynamic voltage scaled microprocessor system," *IEEE J. of Solid-State Circuits*, vol. 35, no. 11, pp. 1571–1580, 2000.
- [12] Y. Panov and M. Jovanovic, "Design considerations for 12-V/1.5-V, 50-A voltage regulator modules," *IEEE Trans. on Power Electronics*, vol. 16, no. 6, pp. 776–783, 2001.
- [13] N. Sturcken and et.al., "A 2.5D integrated voltage regulator using coupled-magnetic-core inductors on silicon interposer delivering 10.8A/mm<sup>2</sup>," in *IEEE Intl. Solid-State Circuits Conf.*, 2012, pp. 400– 402.
- [14] "MIPS processor cores," http://www.mips.com/products/ processor-cores/.
- [15] "SPEC 2000 CPU benchmark suits," http://www.spec.org/cpu/.
- [16] D. Brooks and et.al., "Wattch: a framework for architectural-level power analysis and optimizations," in *ACM Int. Symposium on Computer Architecture*, 2000, pp. 83–94.
- [17] "ILP solver 5.5," http://lpsolve.sourceforge.net/5.5/.
- [18] W. Kim and et.al., "A fully-integrated 3-level DC/DC converter for nanosecond-scale DVFS," *IEEE J. of Solid-State Circuits*, vol. 47, no. 1, pp. 206–219, 2012.
- [19] S. Majzoub and et.al., "Energy optimization for many-core platform: communiation and pvt aware voltage-island formantion and voltage selection algorithm," *IEEE Trans. on Computer Aided Design*, vol. 29, no. 5, pp. 816–829, 2010.
- [20] V. der Plas and et.al., "Design issues and considerations for low-cost 3D TSV IC technology," in *IEEE Intl. Solid-State Circuits Conf.*, 2010, pp. 148–149.
- [21] G. Katti and et.al., "Electrical modeling and characterization of through silicon via for three-dimensional ICs," *IEEE Trans. on Electron Devices*, vol. 57, no. 1, pp. 256–262, 2010.
- [22] "AutoRegression Analysis," http://paulbourke.net/miscellaneous/ar/.