5.4 Emerging technologies for better NoCs

Time	Label	Presentation Title Authors
08:30	5.4.1	SIPTERPOSER: A FAULT-TOLERANT SUBSTRATE FOR FLEXIBLE SYSTEM-IN-PACKAGE DESIGN Speaker: Pete Ehrett, University of Michigan, US Authors: Pete Ehrett, Todd Austin and Valeria Bertacco, University of Michigan, US Abstract As Moore's Law scaling slows down, specialized heterogeneous designs are needed to sustain computing performance improvements. Unfortunately, the non-recurring engineering (NRE) costs of chip design - designing interconnects, creating masks, etc. - are often prohibitive. Chiplet-based disintegrated design solutions could address these economic issues, but current technologies lack the flexibility to express a rich variety of designs without redesigning the communication substrate. Moreover, as the number of chiplets increases, yield suffers due to 2.5D assembly defects. This work addresses these problems by presenting a flexible communication fabric that supports construction of arbitrary network topologies and provides robust fault-tolerance, demonstrating near-100% chip assembly yield at typical bonding defect rates. We achieve these goals with less than 3% additional power and zero exposed latency overhead for various real-world applications running on an example SiP. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.4.2	WAVES: WAVELENGTH SELECTION FOR POWER-EFFICIENT 2.5D-INTEGRATED PHOTONIC NOCS Speaker: Aditya Narayan, Boston University, US Authors: Aditya Narayan¹, Yvain Thonnart², Pascal Vivet², César Fuguet Tortolero² and Ayse Kivilcim Coskun¹ ¹Boston University, US; ²Univ. Grenoble Alpes, CEA-Leti, FR Abstract Photonic Network-on-Chips (PNoCs) offer promising benefits over Electrical Network-on-Chips (ENoCs) in manycore systems owing to their lower latencies, higher bandwidth, and lower energy-per-bit communication with negligible data-dependent power. These benefits, however, are limited by a number of challenges. Microring resonators (MRRs) that are used for photonic communication have high sensitivity to process variations and on-chip thermal variations, giving rise to possible resonant wavelength mismatches. State-of-the-art microheaters, which are used to tune the resonant wavelength of MRRs, have poor efficiency resulting in high thermal tuning power. In addition, laser power and high static power consumption of drivers, serializers, comparators, and arbitration logic partially negate the benefits of the sub-pJ operating regime that can be obtained with PNoCs. To reduce PNoC power consumption, this paper introduces WAVES, a wavelength selection technique to identify and activate the minimum number of laser wavelengths needed, depending on an application's bandwidth requirement. Our results on a simulated 2.5D manycore system with PNoC demonstrate an average of 23% (resp. 38%) reduction in PNoC power with only <1% (resp. <5%) loss in system performance. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.4.3	REGENT: A HETEROGENEOUS RERAM/GPU-BASED ARCHITECTURE ENABLED BY NOC FOR TRAINING CNNS Speaker: Biresh Joardar, Washington State University, US Authors: Biresh Joardar¹, Bing Li², Jana Doppa¹, Hai (Helen) Li², Partha Pratim Pande¹ and Krishnendu Chakrabarty² ¹Washington State University, US; ²Duke University, US Abstract The growing popularity of Convolutional Neural Networks (CNNs) has led to the search for efficient computational platforms to enable these algorithms. Resistive random-access memory (ReRAM)-based architectures offer a promising alternative to commonly used GPU-based platforms in this regard. However, backpropagation in CNNs is susceptible to the limited precision of ReRAMs. As a result, training CNNs on ReRAMs affects final accuracy of learned model. In this work, we propose REGENT, a heterogeneous architecture that combines ReRAM arrays with GPU cores using 3D integration and a high-throughput yet energy efficient Network-on-Chip (NoC) for high precision training. We also propose a bin-packing based framework that maps CNN layers and then optimizes the placement of computing elements to meet the targeted design objectives. Experimental evaluations indicate that the designed NoC can improve performance by 13.5% on average compared to another state-of-the-art counterpart. Also, REGENT improves full-system EDP on an average by 55.7% compared to conventional GPU-only platforms for training CNN workloads. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-13, 134	ACDC: AN ACCURACY- AND CONGESTION-AWARE DYNAMIC TRAFFIC CONTROL METHOD FOR NETWORKS-ON-CHIP Speaker: Siyuan Xiao, South China University of Technology, CN Authors: Siyuan Xiao¹, Xiaohang Wang¹, Maurizio Palesi², Amit Kumar Singh³ and Terrence Mak⁴ ¹South China University of Technology, CN; ²University of Catania, IT; ³University of Essex, GB; ⁴University of Southampton, GB Abstract Many applications exhibit error forgiving features. For these applications, approximate computing provides the opportunity of accelerating the execution time or reducing power consumption, by mitigating computation effort to get an approximate result. Among the components on a chip, network-on-chip (NoC) contributes a large portion to system power and performance. In this paper, we exploit the opportunity of aggressively reducing network congestion and latency by selectively dropping data. Essentially, the importance of the dropped data is measured based on a quality model. An optimization problem is formulated to minimize the network congestion with constraint of the result quality. A lightweight online algorithm is proposed to solve this problem. Experiments show that on average, our proposed method can reduce the execution time by as much as 12.87% and energy consumption by 12.42% under strict quality requirement, speedup execution by 19.59% and reduce energy consumption by 21.20% under relaxed requirement, compared to a recent work on approximate computing approach for NoCs. Download Paper (PDF; Only available from the DATE venue WiFi)
10:01	IP2-14, 300	POWER AND PERFORMANCE OPTIMAL NOC DESIGN FOR CPU-GPU ARCHITECTURE USING FORMAL MODELS Speaker: Nader Bagherzadeh, University of California Irvine, US Authors: Lulwah Alhubail and Nader Bagherzadeh, University of California - Irvine, US Abstract Heterogeneous computing architectures that fuse both CPU and GPU on the same chip are common nowa-days. Using homogeneous interconnect for such heterogeneous processors each with different network demands can result in performance degradation. In this paper, we focused on designing a heterogeneous mesh-style network-on-chip (NoC) to connect heterogeneous CPU-GPU processors. We tackled three problems at once; mapping Processing Elements (PEs) to the routers of the mesh, assigning the number of virtual channels (VC), and assigning the buffer size (BS) for each port of each router in the NoC. By relying on formal models, we developed a method based on Strength Pareto Evolutionary Algorithm2 (SPEA2) to obtain the Pareto optimal set that optimizes communication performance and power consumption of the NoC. By validating our method on a full-system simulator, results show that the NoC performance can be improved by 17% while minimizing the power consumption by at least 2.3x and maintaining the overall system performance. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Breaks (Lunch Area) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 26, 2019 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 27, 2019 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 28, 2019 Coffee Break 10:00 - 11:00 University Booth Best Demo Award Presentation at the University Booth 10:30 Lunch Break 12:30 - 14:00 Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

08:30

5.4.1

SIPTERPOSER: A FAULT-TOLERANT SUBSTRATE FOR FLEXIBLE SYSTEM-IN-PACKAGE DESIGN
Speaker:
Pete Ehrett, University of Michigan, US
Authors:
Pete Ehrett, Todd Austin and Valeria Bertacco, University of Michigan, US
Abstract
As Moore's Law scaling slows down, specialized heterogeneous designs are needed to sustain computing performance improvements. Unfortunately, the non-recurring engineering (NRE) costs of chip design - designing interconnects, creating masks, etc. - are often prohibitive. Chiplet-based disintegrated design solutions could address these economic issues, but current technologies lack the flexibility to express a rich variety of designs without redesigning the communication substrate. Moreover, as the number of chiplets increases, yield suffers due to 2.5D assembly defects. This work addresses these problems by presenting a flexible communication fabric that supports construction of arbitrary network topologies and provides robust fault-tolerance, demonstrating near-100% chip assembly yield at typical bonding defect rates. We achieve these goals with less than 3% additional power and zero exposed latency overhead for various real-world applications running on an example SiP.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:00

5.4.2

WAVES: WAVELENGTH SELECTION FOR POWER-EFFICIENT 2.5D-INTEGRATED PHOTONIC NOCS
Speaker:
Aditya Narayan, Boston University, US
Authors:
Aditya Narayan¹, Yvain Thonnart², Pascal Vivet², César Fuguet Tortolero² and Ayse Kivilcim Coskun¹
¹Boston University, US; ²Univ. Grenoble Alpes, CEA-Leti, FR
Abstract
Photonic Network-on-Chips (PNoCs) offer promising benefits over Electrical Network-on-Chips (ENoCs) in manycore systems owing to their lower latencies, higher bandwidth, and lower energy-per-bit communication with negligible data-dependent power. These benefits, however, are limited by a number of challenges. Microring resonators (MRRs) that are used for photonic communication have high sensitivity to process variations and on-chip thermal variations, giving rise to possible resonant wavelength mismatches. State-of-the-art microheaters, which are used to tune the resonant wavelength of MRRs, have poor efficiency resulting in high thermal tuning power. In addition, laser power and high static power consumption of drivers, serializers, comparators, and arbitration logic partially negate the benefits of the sub-pJ operating regime that can be obtained with PNoCs. To reduce PNoC power consumption, this paper introduces WAVES, a wavelength selection technique to identify and activate the minimum number of laser wavelengths needed, depending on an application's bandwidth requirement. Our results on a simulated 2.5D manycore system with PNoC demonstrate an average of 23% (resp. 38%) reduction in PNoC power with only <1% (resp. <5%) loss in system performance.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:30

5.4.3

REGENT: A HETEROGENEOUS RERAM/GPU-BASED ARCHITECTURE ENABLED BY NOC FOR TRAINING CNNS
Speaker:
Biresh Joardar, Washington State University, US
Authors:
Biresh Joardar¹, Bing Li², Jana Doppa¹, Hai (Helen) Li², Partha Pratim Pande¹ and Krishnendu Chakrabarty²
¹Washington State University, US; ²Duke University, US
Abstract
The growing popularity of Convolutional Neural Networks (CNNs) has led to the search for efficient computational platforms to enable these algorithms. Resistive random-access memory (ReRAM)-based architectures offer a promising alternative to commonly used GPU-based platforms in this regard. However, backpropagation in CNNs is susceptible to the limited precision of ReRAMs. As a result, training CNNs on ReRAMs affects final accuracy of learned model. In this work, we propose REGENT, a heterogeneous architecture that combines ReRAM arrays with GPU cores using 3D integration and a high-throughput yet energy efficient Network-on-Chip (NoC) for high precision training. We also propose a bin-packing based framework that maps CNN layers and then optimizes the placement of computing elements to meet the targeted design objectives. Experimental evaluations indicate that the designed NoC can improve performance by 13.5% on average compared to another state-of-the-art counterpart. Also, REGENT improves full-system EDP on an average by 55.7% compared to conventional GPU-only platforms for training CNN workloads.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

IP2-13, 134

ACDC: AN ACCURACY- AND CONGESTION-AWARE DYNAMIC TRAFFIC CONTROL METHOD FOR NETWORKS-ON-CHIP
Speaker:
Siyuan Xiao, South China University of Technology, CN
Authors:
Siyuan Xiao¹, Xiaohang Wang¹, Maurizio Palesi², Amit Kumar Singh³ and Terrence Mak⁴
¹South China University of Technology, CN; ²University of Catania, IT; ³University of Essex, GB; ⁴University of Southampton, GB
Abstract
Many applications exhibit error forgiving features. For these applications, approximate computing provides the opportunity of accelerating the execution time or reducing power consumption, by mitigating computation effort to get an approximate result. Among the components on a chip, network-on-chip (NoC) contributes a large portion to system power and performance. In this paper, we exploit the opportunity of aggressively reducing network congestion and latency by selectively dropping data. Essentially, the importance of the dropped data is measured based on a quality model. An optimization problem is formulated to minimize the network congestion with constraint of the result quality. A lightweight online algorithm is proposed to solve this problem. Experiments show that on average, our proposed method can reduce the execution time by as much as 12.87% and energy consumption by 12.42% under strict quality requirement, speedup execution by 19.59% and reduce energy consumption by 21.20% under relaxed requirement, compared to a recent work on approximate computing approach for NoCs.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:01

IP2-14, 300

POWER AND PERFORMANCE OPTIMAL NOC DESIGN FOR CPU-GPU ARCHITECTURE USING FORMAL MODELS
Speaker:
Nader Bagherzadeh, University of California Irvine, US
Authors:
Lulwah Alhubail and Nader Bagherzadeh, University of California - Irvine, US
Abstract
Heterogeneous computing architectures that fuse both CPU and GPU on the same chip are common nowa-days. Using homogeneous interconnect for such heterogeneous processors each with different network demands can result in performance degradation. In this paper, we focused on designing a heterogeneous mesh-style network-on-chip (NoC) to connect heterogeneous CPU-GPU processors. We tackled three problems at once; mapping Processing Elements (PEs) to the routers of the mesh, assigning the number of virtual channels (VC), and assigning the buffer size (BS) for each port of each router in the NoC. By relying on formal models, we developed a method based on Strength Pareto Evolutionary Algorithm2 (SPEA2) to obtain the Pareto optimal set that optimizes communication performance and power consumption of the NoC. By validating our method on a full-system simulator, results show that the NoC performance can be improved by 17% while minimizing the power consumption by at least 2.3x and maintaining the overall system performance.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

End of session
Coffee Break in Exhibition Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Coffee Break 10:30 - 11:30
Lunch Break 13:00 - 14:30
Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Wednesday, March 27, 2019

Coffee Break 10:00 - 11:00
Lunch Break 12:30 - 14:30
Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Thursday, March 28, 2019

Coffee Break 10:00 - 11:00
University Booth Best Demo Award Presentation at the University Booth 10:30
Lunch Break 12:30 - 14:00
Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50
Coffee Break 15:30 - 16:00