# Adaptive Multi-Voltage Scaling in Wireless NoC for High Performance Low Power Applications

Hemanta Kumar Mondal, Sri Harsha Gade, Raghav Kishore, Sujay Deb

Department of Electronics and Communication Engineering Indraprastha Institute of Information Technology, Delhi, India Email: {hemantam, harshag, raghav1467, sdeb}@iiitd.ac.in

Abstract—Networks-on-Chip (NoCs) have garnered significant interest as communication backbone for multicore processors used across a wide range of fields that demand higher computation capability. Wireless NoCs (WNoCs) by augmenting single hop, long range wireless links with wired interconnects; offer the most promising solution to reduce multi-hop long distance communication bottlenecks and opens up innumerable possibilities of topological innovations that are not possible otherwise. However, energy consumption in routers along with Wireless Interface (WI) components still remains considerably high. Specifically for large systems with many nodes in the network, a significant amount of energy is consumed by the communication infrastructure (routers, links, WIs). The usage of the routers and WIs are application dependent and for most cases performance requirements can be met without operating the whole communication infrastructure to its maximum limit. Dynamic reconfigurable systems that can switch between both high performance and low power modes can cater to wide range of applications. In this paper, we propose a novel design methodology for energy efficient WNoC using Adaptive Multivoltage Scaling (AMS) that reduces dynamic power consumption, along with power gating to prevent static power dissipation in routers and WIs. We evaluate our proposed design in presence of real and synthetic traffic patterns. This approach saves up to 62.50% of static power with less than 1% area overhead. In different traffic scenarios, the proposed WNoC reduces overall packet energy dissipation up to 35% on average compared to a regular WNoC, without significant performance degradation. Design considerations for augmenting existing WNoCs with these routers and corresponding overheads are also presented.

Keywords— wireless network-on-chip; energy efficient on-chip communication; adaptive multi-voltages scaling; power-gating; high performance; low power design

# I. INTRODUCTION

Chip MultiProcessors (CMPs) are gaining significant interest for a wide range of applications; consumer electronics, single-chip cloud computers, supercomputers, defense applications, etc. These CMPs contain a large number of processing elements to meet the application needs and performance requirements. Network-on-Chip (NoC) architectures are fast becoming the preferred infrastructures for on-chip communications in these CMP platforms. NoCs are highly scalable and allow for high level of integration in CMPs. A generic CMP-NoC platform, that can be reconfigured to meet the requirements like throughput, power, operations per second, etc. of different applications, can make system design easier and improve time to market constraints. Today's processing cores are efficient enough to meet performance requirements of any application within a moderate power budget. NoCs offer efficient communication backbone and capabilities of NoCs are further enhanced by augmenting them with a single hop, long range wireless links

to build Wireless NoC (WNoC) architectures [1-5]. By efficient application mapping and interconnection of different cores using WNoC, a reconfigurable CMP can be designed to switch between different types of applications, both low power and high performance. A major challenge for such reconfigurable systems is high power consumption in required communication infrastructures specifically routers and WIs.

A WNoC architecture with proposed modifications and associated router design along with WI is shown in Fig. 1. Hybrid Router (HR) consists of WI components, viz., serializer/deserializer buffers, Low Noise Amplifier (LNA), Power Amplifier (PA), antennas etc. along with a standard Base Router (BR). NoC consumes significant portion [6][7] of total chip power. Addition of WI further increases power consumption in NoC routers. The problem becomes more acute in systems with large number of nodes. To make NoCs energy efficient while also meeting the communication requirements of multi-core systems, there is a need to reduce power consumption in BR components and WIs without hindering their performance. As number of routers increases, utilization of any single router in a given time frame can vary from 0% to 100% depending on the workload of application. For example, utilization of routers in a 16-core system for PARSEC [8] and SPLASH-2 [9] benchmarks are shown in Fig. 2. For these benchmarks, very few routers are highly utilized and two-thirds routers have utilized less than 30%. A similar trend is observed across different applications with only a few NoC routers having very high utilization and many routers having low to moderate utilization (0% to 30%). This can be exploited to implement aggressive power saving techniques to make NoC architectures more energy efficient without affecting overall performance.

Dynamic voltage scaling schemes have been traditionally employed to save dynamic power in processing elements since any reduction in voltage leads to quadratic savings in power. In this work, we present an Adaptive Multi-voltage Scaling (AMS) technique to reduce dynamic power dissipation in BR by exploiting low utilization phases of each router. We further reduce power consumption by applying power-gating technique to minimize static (leakage) power in router and WI components without significant overall performance degradation. A distributed power management controller is employed to implement AMS and power gating techniques and fine grained energy savings at each router is obtained. By adaptively scaling the voltage based on the utilization of router, we save significant portion of dynamic power. When the router utilization is extremely low, we turn off the router completely by power gating it. To maximize energy savings, we compute utilization of routers at global and local level. The global utilization is pre-computed with the knowledge of specific application that will be running on the system. Local utilization is computed dynamically using current and



Fig. 1. Proposed I WNoC architecture with base router (BR) and hybrid router (HR) along with receiver-end control unit

predicated future usage trend. In case of WIs, since most existing WNoC architectures [1-5] use them for any to any communication, number of active wireless links at any given time is limited to one. We exploit this to turn off all other WIs except the pair that is actively involved in wireless transmission. Since the proposed method uses a distributed approach, control signals are required to wakeup WIs when data needs to be transmitted to a particular WI. To avoid these control signals, we implement a receiver-end control mechanism with HR to detect signal at WI antenna immediately as it is received and subsequently wakeup LNA as illustrated in Fig. 1. This avoids exchange of control signals between far apart HRs.

The use of distributed AMS scheme with global and local router utilization allows us to reconfigure a system for range of applications with varying performance requirements. For applications that do not require very high performance, the system can be configured to optimize the number of active routers to save power and vice-versa for high-performance applications. The main contributions of this paper are:

- 1) A novel distributed AMS scheme to reduce the dynamic and static power consumption in base routers based on router utilization.
- 2) A reconfigurable CMP with WNoC that can be switched between a range of applications with different performance requirements.
- Power gating of WIs through AMS controller (AMSC) to ensure power efficient utilization of wireless communications.

The paper is organized into following sections. Section II briefly describes the related work. The wireless NoC architecture and the hardware implementation of AMS controller are explained in section III. Section IV describes experimental results. Section V concludes this work.

# II. RELATED WORK

Dynamic voltage/frequency scaling and power gating techniques have been widely used to save dynamic and static power dissipation respectively in processing elements in multi-core systems. These techniques have been extended to NoCs in the recent past to make them more energy efficient. Dynamic scaling schemes reduce the system's operating voltage or frequency or both dynamically during idle phases of execution to reduce energy consumption without affecting the performance. Different algorithms have been employed to identify low active phases of NoC routers. DVFS pruning for WNoC is explored in [10] and improves energy-delay product with various real traffic patterns. DVFS-enabled sustainable WNoC architecture has been proposed to facilitate design of energy and thermally efficient for sustainable multicore chips [11]. A history-based DVS policy based on link utilization is discussed in [12] to save the network energy consumption. All these methods save dynamic power consumption in NoC without affecting performance by reducing voltage/frequency of routers appropriately.

On the other hand, power gating schemes cutoff power supply to a router when it is not active and reduces static leakage power. Authors of [13] proposed a centralized power management controller and evaluated a sleep transistor based power-gated transceiver for low power on-chip wireless links. A centralized controller that implements DVFS for processing cores and power-gating in hybrid wireless routers is proposed in [14]. Centralized controllers add less area and power overheads as compared to distributed controllers, but because of increased complexity, this approach incurs long delays for transmitting control signals to different cores/clusters. Different distributed controller implementations for powergating operating at various granularity levels are introduced to save static leakage power [15-21]. Fine-grained power-gated FlexiBuffer explored in [16] to reduce leakage power in buffers can operate with minimal changes to flow control. NoRD (Node-Router Decoupling) [17], a novel technique for power-gating bypass, decouples the node's ability for transferring packet by monitoring the status of associated router. Power-aware routing and topology reconfiguration named Panthre [18] is proposed to provide long intervals of uninterrupted sleep to selected units using power-gating.

In this work, we aim to design a router that implements both dynamic scaling scheme to save switching power and power gating technique to save static power in WNoC architecture to reduce the power/energy consumption. We implement a distributed AMSC for routers to save power in both WIs and base router components. The proposed method promises to increase power efficiency of on-chip wireless communication infrastructure significantly. We present a detailed performance evaluation of WNoC architecture with



Fig. 2. Utilization of routers under different applications for a 16 core system

AMS method. We explore the performance overhead and associated trade-offs for realizing the proposed WNoC architecture.

# III. ENERGY-EFFICIENT WNOC ARCHITECTURE

In this section, we discuss design of energy-efficient WNoC architecture and AMS control strategy for basic router components and power gating scheme for router and WIs.

# A. Utilization-based AMS

AMS control technique for router components is implemented using both application level global utilization and router level dynamic local utilization. For a given application, based on global utilization over the entire application runtime, we categorize the routers into three regions; i) High Utilization Zone (HUZ), ii) Low Utilization Zone (LUZ), and iii) Rare Utilization Zone (RUZ). HUZ routers are actively involved in communication for most of the application runtime, whereas RUZ routers communicate little to no packets. LUZ routers fall between HUZ and RUZ, where they are regularly used but pass through idle low utilization phases. Threshold values for categorizing the routers into these three levels are application dependent and change for a given type of application. The number of routers and actual router ids falling into each category vary based on threshold set for the application. The threshold values and number of routers falling into each category for PARSEC and SPLASH-2 benchmarks are shown in Fig. 2. We have set the threshold utilization values as 5% for RUZ and 75% for HUZ. As explained earlier, these values are user defined and set according to application. Since the routers falling under RUZ have very low utilization, they are put in permanent inactive state by making the supply voltage to these routers zero. This is a onetime process and remains unchanged for a given application. On rare occasions when packets are directed to these power gated routers, they are bypassed using express virtual channels [22] to avoid significant performance degradation. This allows us to reconfigure a large multi-core system for low power applications by appropriately turning off routers or configure the system for very high performance applications by operating most of the routers in HUZ. The proposed technique allows switching between different application categories according to their performance requirements while optimizing over power consumption. Adaptive voltage scaling is applied on the remaining active routers based on dynamic utilization at each router.

To save dynamic power dissipation in routers without affecting performance, we change the operating voltage of each router adaptively based on the estimated utilization of the router. We estimate utilization on an epoch-to-epoch basis (an epoch signifies the start of a fixed/variable duration phase in total simulation period). At the end of each epoch, utilization at each router for next epoch is computed (described later) based on current load of the router and distribution of observed utilization during past n epochs for that router. Considering data from large number of past epochs may improve accuracy, but at the cost of additional memory overhead on the system. We have considered data from past 16 epochs in our simulations, which provided as sufficiently accurate results as considering higher number of past epochs. The distribution of past utilization is a function of actual router utilization in an epoch and estimated load at the start of that epoch. We set the operating voltage for next epoch accordingly. If estimation utilization is lower for next epoch, we scale down the voltage and vice-versa. If estimated utilization is very low, the router can be power gated for the next epoch to save power. The epoch duration is application and user dependent and has an impact on savings obtained; a large epoch cycle might miss variations in utilization level, whereas too small a cycle will result in excessive overhead due to shifting voltages. Based on observations, an epoch period of 1k cycles gave significant energy saving with considering transient energy during switch modes. . Epoch period of less than 1k cycles results in frequent switching between states and increases transient energy consumption. We considered four voltage levels viz. 0V, 0.8V, 1.0V and 1.1V. The hybrid switched inductor capacitor regulator circuit is adopted from [10] to generate required voltages levels. Using global level utilization and AMS at router level, the architecture can be reconfigured to minimize power (dynamic and static) dissipation for any given application.

# B. Dynamic Utilization Computation

AMS is applied on the basis of local runtime utilization of routers for each epoch. At each router, total utilization is depended on number of packets in the input buffer and packets that might be routed to router under consideration from all upstream routers that are one hop away. For all the packets that are currently at the input buffers, the router under consideration needs to be active to route them to their destinations. At each upstream router, we determine the packets that will traverse through the router under consideration. Since the router under consideration will process these packets in subsequent stages, they add to the overall utilization in the next epoch. To calculate utilization at upstream router,: i) first header decoder (HD) decodes header flit of packets from all VCs, ii) routing computation (RC) unit determines the route according to HD output and iii) finally utilization computing unit (UCU) estimates the utilization of all downstream routers within one hop distance. The process is performed at each router and the router under consideration gets its estimation utilization information from each of its upstream routers. Once utilization estimates from all upstream routers are obtained, total current load is computed based on input buffer occupancy and upstream utilization data. Using this as input and utilization distribution function from previous

| Algorithm: AMS Controller mechanism on routers                                 |  |  |  |  |
|--------------------------------------------------------------------------------|--|--|--|--|
| Initial: Pre-computed global utilization: HUZ, LUZ, RUZ                        |  |  |  |  |
| Supply_RoutersRUZ $\leftarrow$ 0V and Set router states and utilization levels |  |  |  |  |
| AMS Control: for each epoch                                                    |  |  |  |  |
| Upstream Load $\leftarrow$ {North, South, East, West ports}                    |  |  |  |  |
| Input Load - Input buffers occupied                                            |  |  |  |  |
| Total Load (TL) 🗲 Upstream Load + Input Load                                   |  |  |  |  |
| Level Probabilities (LP) $\leftarrow f$ (Past Distribution, TL)                |  |  |  |  |
| Utilization Estimate (UE) $\leftarrow$ Utilization (max(LP))                   |  |  |  |  |
| if (UE > Current Utilization): Voltage $\uparrow$ ;                            |  |  |  |  |
| elseif (UE < Current Utilization): Voltage $\psi$ ;                            |  |  |  |  |
| else No Change                                                                 |  |  |  |  |
| WI control:                                                                    |  |  |  |  |
| Initial: All PA and LNA are kept in sleep mode.                                |  |  |  |  |
| <b>Tx:</b> while (Arbiter_Grant_WI = $= 1$ )                                   |  |  |  |  |
| SupplyPA 🗲 Vdd;                                                                |  |  |  |  |
| Continue until whole message transferred                                       |  |  |  |  |
| if (packet transfer complete)                                                  |  |  |  |  |
| SupplyPA $\leftarrow$ 0V; Arbiter_Grant_WI = 0;                                |  |  |  |  |
| <b>Rx: while</b> (Rx power > noise threshold)                                  |  |  |  |  |
| SupplyLNA $\leftarrow$ Vdd;                                                    |  |  |  |  |
| Decode WI Address                                                              |  |  |  |  |
| if (Received Address != WI Address): SupplyLNA ← 0V;                           |  |  |  |  |
| if (packet transfer complete): SupplyLNA $\leftarrow$ 0V;                      |  |  |  |  |

Fig. 3 AMS controller strategy for routers and WIs

epochs, we compute utilization for next epoch. The AMS controller strategy is shown in Fig. 3. We divide router utilization into multiple levels. For our simulations, we considered four levels. The thresholds for each level are determined by user based on applications run on the system. When computing downstream router and current load, or total utilization, UCU represents it as 2-bit data indicating the level into which utilization falls rather than the actual value. The distribution function also maps 2-bit initial estimate to 2-bit actual utilization. Thus, data for storing *n*-past epoch require a *4n*-bit memory at each router. UCU at each router computes probabilities for all levels and chooses one with the highest probability. RUZ routers are barred from AMS operation.

### C. Wireless Interfaces Control

To reduce static power dissipation in WI, power gating technique is implemented for WI components and control modules for power gating are also built into AMSC. To ensure that WI overhead on NoC infrastructure is minimum, noncoherent on-off keying (OOK) modulation scheme is used. Serializer/deserializer buffers act as data interfaces between WI and remaining router components. PA and LNA amplify the transmitted or received signal respectively. Of all the components in WI, PA and LNA are the most power hungry accounting for more than 60% of its static energy dissipation and we apply power gating only to these components. Antennas considered here are capable of establishing communication between any other antennas on the chip. As a result, this shared wireless medium can be accessed by only one pair of WIs at a time. Multiple WIs need be active only when broadcast or multi-cast type messages are transmitted via wireless medium. The power gating algorithm for WIs is shown in Fig. 3. Initially all WI components are kept in sleep mode. When a routing strategy at HR decides to use WI and associated WI has permission to transmit, AMSC sends an active high wakeup signal to PA. At all other WIs, as soon as a signal is detected at the antenna, AMSC sends wakeup signal to LNA. To differentiate between noise and actual received signal, a comparator is used with antenna whose threshold is set to noise floor of the wireless channel. All WIs along with the intended receiver receives signal and as a result LNAs of all WIs turn on, even if data is not meant for that WI. At all these WIs, header needs to be decoded to determine if data is intended for that WI before turning off the LNA, which leads to unwanted energy consumption. To reduce this, we assign a unique address to each WI and this information is appended before the actual data to be transmitted. At the receiving antenna, the signal is decoded for this WI address using a simple decoder circuit. If the address matches, remaining actual data is sent into the router. If there is a mismatch, AMS immediately puts LNA to sleep mode. As the number of WIs is significantly less than total number of routers, number of bits required and time taken to complete decoding process is very less, thereby reducing active period of LNA. All the components in both transmitting and receiving WIs are kept active during data transmission. When the tail flit is detected at transmitter and receiver, AMSC puts both WIs in sleep



Fig. 4. State machine diagram of AMS controller



Fig. 5. Proposed router with AMS controller mode and wakes up the next transmitter in line.

#### D. Adaptive Multi-Voltage Control

The proposed adaptive multi-voltage control relies only on routers utilization information to dynamically change operating voltage and reduce power consumption without affecting performance. It is mainly comprised of UCU, state machine control and voltage regulator. Router utilization is represented as number of cycles for which it is active in an epoch period. The probability computation for each utilization level is calculated as integer number to required precision to avoid floating point calculations. The actual utilization and initial estimated load levels for past epochs are stored in a table to determine the distribution function required in calculating level probabilities. For current implementation, we considered four operating voltage states, low power  $(S_{LP})$ , normal state (S<sub>N</sub>) and high performance (S<sub>HP</sub>) states along with power gated state (S<sub>PG</sub>). The local router utilization is divided into four levels, 0-5 % (L1), 5-25% (L2), 25-75% (L3) and above 75% (L<sub>4</sub>). The number of states, utilization levels and their ranges are application dependent and can be chosen to meet application needs. Any and all utilization information is represented and communicated using 2-bit signal representing one of the four levels. Using 2-bit utilization level instead of actual utilization value reduces data to be communicated, thereby reducing interconnect overhead. Additional wires of only 2-bit width are required to communicate utilization information between routers. The state machine diagram representing possible scenarios and corresponding conditions is illustrated in Fig. 4 as per utilization of routers. Once the system is powered, all the routers start in normal state except RUZ routers. Routers under RUZ are power gated using precomputed global utilization. As system continues to run, utilization estimates for each epoch are made and controller scales up or down the voltage respectively according to high or low utilization estimate. A light utilization estimate  $(L_2)$ results in immediate low power state assigned for next epoch;  $L_3$  estimate results in continuing in the same state and  $L_4$  state assigns next high performance state. For example, if a router is in  $S_N$  state and the computed utilization is  $L_2$ , the assigned state for the next epoch will be SLP. The voltage regulator circuit is adopted from [10]. For power gating circuit, PMOS switch is used to minimize leakage power.

## E. Energy Efficient Hybrid Router Design

As shown in Fig. 1, WNoC architecture consists of base routers (BR) and hybrid routers (HR) enhanced with WIs for efficient on-chip communications. The proposed router architecture with AMSC is shown in Fig. 5. AMSC receives 2bit input determining estimated utilization from all upstream routers that are within one hop distance. Based on utilization estimate from UCU, it sends control signal, CNTRL\_MV to voltage regulator circuit to set appropriate voltage for next epoch. The power gating control of PA and LNA depends on input signal from arbiter of BR, comparator and WI address decoder circuits. Based on RC decision and permission for WI usage from arbiter, AMSC provides appropriate PG\_PA control signal. If the comparator output is high, PG\_LNA signal of AMSC is high to wake up LNA at the receiver. As soon as WI address decoder completes its operation, AMC disables LNA if decoder output is low.

## IV. PERFORMANCE EVALUATION

In this section, we discuss detail implementation, overhead, performance benefits and compare it with recent proposed WNoC architectures and power saving methods. We characterize proposed architecture using cycle-accurate Noxim simulator [24]. Application level traffic is collected from GEM5 [25] full system simulator using PARSEC [8] and SPLASH-2 [9] benchmarks. A 16 core system is used for all evaluations and system specifications are presented in Table I. The width of all wired links is same as flit size. We have adopted wormhole routing. The network switches and AMSC are synthesized with Synopsys Design Compiler using 28nm technology. We implemented voltage regulator circuit by using Cadence tool. We adopt SA-based optimization technique for placement of WIs to get maximum benefits and each wireless link can sustain data rate of 16 Gbps [1].

## A. Router Implementation with Overheads

To implement AMS, we configure router architecture with additional components; AMSC and voltage regulator. AMSC and regulator together occupy 761.01 $\mu$ m<sup>2</sup> per router. The total area requirement for modified BR (including control units, buffers, crossbar, arbiter, RC, and VCs) is  $9.72 \times 10^{-3}$  mm<sup>2</sup>. The areas of transceiver circuit for T<sub>x</sub> and R<sub>x</sub> are 0.09 and 0.07 mm<sup>2</sup> respectively [23]. The power-gating component for WI occupies 100.18  $\mu$ m<sup>2</sup>. Therefore, total area of HR with control unit and power-gating is 0.161mm<sup>2</sup> per HR. The AMSC and regulator circuits add less than 1% silicon area overhead for HR. The interconnection overhead for 8 bits UGS signals is associated with every router.

### B. Scalability and Impacts

Proposed router design integrates three components, viz., on-chip regulator and AMSC with every router. AMSC unit in a router requires data from the same router and downstream routers. As additional components are associated with only neighboring routers (single hop), implementation remains unchanged with system size. Even with large number of routers, AMSC design approach can be extended to any number of cores. The proposed method achieves significant savings in energy with little area overhead and is scalable to any system size.

The major impacts of power-gating at WI include wake-up latency and transient energy consumption. When a circuit changes states between sleep and wake-up modes, 114.59pJ

| Topology                    | 4×4 Mesh, 4x4 WNoC                               |  |
|-----------------------------|--------------------------------------------------|--|
| Routing                     | XY for baseline, NorthLast for wireless<br>links |  |
| Pipeline                    | 3 stages                                         |  |
| Flit size                   | Flit size 32 bits                                |  |
| Packet size                 | 64 flits                                         |  |
| Clock Frequency             | 2.5GHz                                           |  |
| Workload Synthetic and Real |                                                  |  |

TABLE I. SIMULATION SETUP





Fig. 6. Packet energy saving in percentage with AMS over non-AMS architecture under application-specific and synthetic traffic

transient energy, is spent with the implemented power-gating switch, which is very less as compared to saving energy. The wake-up latency of WI due to power-gating is 0.14ns which is comparable with Panthre [18].

#### C. Energy Saving with Synthetic and Real Traffic

The overall packet energy savings achieved by proposed AMS technique for different benchmarks on a 16-core system are presented in Fig. 6. To obtain the characteristics for both computation and communication intensive traffic patterns, we have considered benchmarks from PARSEC and SPLASH-2 along with synthetic traffic patterns. The percentage values presented are energy saved with AMS in mesh topology over mesh without AMS and AMS in WNoC over WNoC without AMS. On average, proposed technique achieves around 35% savings in energy across all benchmark in both mesh and WNoC topologies.

We further reduce power dissipation by reducing static power in WIs using power gating. LNA and PA of WIs consume 20mW static power of 32mW total transceiver power at 65-nm Technology [23]. The proposed technique saves up to 62.50% of static power of WIs using power-gating approach. Power consumption during different phases of HR operation is shown in Fig. 7. As the number of WIs increases, static power saved also increases since there is more opportunity for power-gating WIs.

Router design with AMSC saves significant amount of dynamic and static energy with less than 1% area overhead over HR. We compare our proposed technique with recently proposed DVFS/DVS and power gating designs for NoCs as shown in Table II. As can be seen from the table, the proposed technique performs equally well or better in terms of dynamic energy savings as other dynamic scaling techniques shown in [10] and [12]. The sustainable DVFS method in [11] scales both frequency and voltage which achieves cubic savings in dynamic power. Our proposed technique, with only voltage scaling, achieves comparable results and also further reduces static power in WIs. The power gating technique performs significantly better in WI static power with 62.50% savings over baseline WNoC.



Fig. 7. Power consumption of WI in different operating modes

TABLE II Comparison with existing energy efficient NoC architectures

| Ref. | Approaches                              | Power/ Energy saving (%)                                                | Penalty                                  |
|------|-----------------------------------------|-------------------------------------------------------------------------|------------------------------------------|
| [10] | Pruning DVFS in<br>WNoC                 | Energy-delay: 24.80%                                                    | 10% Area<br>overhead with WI             |
| [11] | DVFS-enabled<br>sustainable WNoC        | Overall energy: 60%                                                     |                                          |
| [12] | DVS policy based<br>on link utilization | Energy-delay: 36%                                                       | Negligible<br>performance<br>degradation |
| [20] | Power Punch                             | Router static energy: 83%                                               | 0.4% execution<br>Time                   |
| [18] | Panthre                                 | Overall network power:<br>14.5%                                         | 1.8% degradation<br>in performance       |
| [16] | FlexiBuffer                             | Overall router power: 39%%                                              | 3% degradation in<br>throughput          |
| [17] | NoRD                                    | Static energy per<br>router:29.9%                                       | 3% area overhead                         |
| [21] | Virtual channel<br>power-gating         | Overall static power: 40%                                               | 0.3% throughput<br>degradation           |
| This | AMS-based<br>WNoC                       | Overall packet energy: 20%<br>- 62.43%; Static power per<br>WI : 62.50% | Less than 1% area overhead               |

# D. Performance Evaluation with Synthetic and Real Traffic

We evaluate the performance of 16-core system using proposed AMSC router architecture by comparing throughput of mesh and WNoC topologies with and without AMS scheme. Wireless links enable a higher global average throughput compared with baseline architectures. The variation of network throughput as a function of packet injection rate is plotted in Fig. 8. Wireless links enable higher throughput on an average. The reduction in throughput for all traffic scenarios with AMS enabled is very small for both wired and wireless topologies. Hence, the proposed AMSC does not incur any considerable performance penalty while achieving significant energy savings.

### ACKNOWLEDGMENT

This work is partially supported by the DST INSPIRE Faculty Fellowship granted by the Department of Science and Technology, Govt. of India.

## V. CONCLUSION

In this work, we propose energy efficient WNoC architecture using novel AMS controller for multicore CMPs that saves both dynamic and static power consumption. The proposed controller can be effectively used for wired NoC that only employs AMS technique for base router components. The utilization-based approach can save static power consumption up to 62.50% in hybrid router and the overall packet energy consumption by 35% with less than 1% silicon area overhead of hybrid router.

#### References

- Deb, S., et al., "Design of an Energy-Efficient CMOS-Compatible NoC Architecture with Millimeter-Wave Wireless Interconnects," in Computers, IEEE Transactions on , vol.62, no.12, pp.2382-2396, Dec. 2013.
- [2] DiTomaso, et al., "iWISE: Inter-router Wireless Scalable Express Channels for Network-on-Chip (NoCs) Architectures," IEEE Symposium on High Performance Interconnects, pp. 11-18, Aug, 2011.
- [3] C. Wang, et al., "A Wireless Network-on-Chip Design for Multicore Platforms," 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2011.
- [4] S. B. Lee, et al., "A Scalable Micro Wireless Interconnects Structure for Chips," Mobicon '09, pp. 217-228, September, 2009.
- [5] Deb, S., et al., "CMOS compatible many-core noc architectures with multi-channel millimeter-wave wireless links. ACM Great Lakes Symposium on VLSI2012: pp.165-170,2012.
- [6] Hoskote, Y., et al., "A 5-GHz Mesh Interconnect for a Teraflops Processor," Micro, IEEE, vol.27, no.5, pp.51,61, Sept.-Oct. 2007.

Mesh Mesh AMS WNoC WNoC AMS



Fig. 8. Peak global throughput for proposed and baseline architectures

- [7] Taylor, M.B.; et al., "Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams," Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on, vol., no., pp.2,13, 19-23 June 2004.
- [8] C. Bienia, "Benchmarking modern multiprocessors," Ph.D. Dissertation, Princeton Univ., January 2011
- [9] Woo, S.C.; et al., "The SPLASH-2 programs: characterization and methodological considerations," Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on , vol., no.36, pp 22-24 June 1995.
- [10] Murray, J., et al., "DVFS Pruning for Wireless NoC Architectures," in Design & Test, IEEE, vol.32, no.2, pp.29-38, April 2015
- [11] Murray, J., et al., "DVFS-enabled sustainable wireless NoC architecture," in SOC Conference (SOCC), 2012 IEEE International, vol., no., pp.301-306, 12-14 Sept. 2012.
- [12] R,Abbas,et al."Low-energy GALS NoC with FIFO—Monitoring dynamic voltage scaling." Microelectronics J. 42.6 (2011): 889-896.
- [13] Mondal, H.K.; Deb, S., "An energy efficient wireless Network-on-Chip using power-gated transceivers," System-on-Chip Conference (SOCC), 2014 27th IEEE International, vol., no., pp.243,248, 2-5 Sept. 2014
- [14] Mondal, H.K., et al. "An Efficient Hardware Implementation of DVFS in Multi-core System with Wireless Network-on-Chip," VLSI (ISVLSI), 2014 IEEE Computer Society Annual Symposium on , vol., no., pp.184,189, 9-11 July 2014
- [15] Matsutani, H.; et al., "Run-time power gating of on-chip routers using look-ahead routing," Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific, vol., no., pp.55,60, 21-24 March 2008
- [16] Gwangsun Kim; et al., "FlexiBuffer: Reducing leakage power in on-chip network routers," Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE, vol., no., pp.936,941, 5-9 June 2011
- [17] Lizhong Chen; Pinkston, T.M., "NoRD: Node-Router Decoupling for Effective Power-gating of On-Chip Routers," Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on , vol., no., pp.270,281, 1-5 Dec. 2012
- [18] Parikh, R.; Das, R.; Bertacco, V., "Power-aware NoCs through routing and topology reconfiguration," Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, vol., no., pp.1,6, 1-5 June 2014
- [19] Muhammad, S.T., et al., "Traffic-Based Virtual Channel Activation for Low-Power NoC," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , vol.PP, no.99, pp.1,1
- [20] Lizhong Chen; Di Zhu; Pedram, M.; Pinkston, T.M., "Power punch: Towards non-blocking power-gating of NoC routers," High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on, vol., no., pp.378,389, 7-11 Feb. 2015
- [21] Mirhosseini, A., et al., "An energy-efficient virtual channel powergating mechanism for on-chip networks," in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, vol., no., pp.1527-1532, 9-13 March 2015
- [22] Amit Kumar, et al., "Express virtual channels: towards the ideal interconnection fabric" SIGARCH Comput. Archit. News 35, 2 (June 2007), 150-161
- [23] Xinmin Yu, et al., "Architecture and Design of Multichannel Millimeter-Wave Wireless NoC," Design & Test, IEEE, vol.31, no.6, pp.19,28, Dec. 2014.
- [24] F. Fazzino, M. Palesi, and D. Patti, "Noxim: Network-on-Chip simulator," http://noxim.sourceforge.net.
- [25] Nathan Binkert, et al., "The gem5 simulator," SIGARCH Comput. Archit. News 39, 2, pp. 1-7, August 2011