2.5 Reliability and Energy-Efficiency: Two Pillars of NoC Design

Printer-friendly version PDF version

Date: Tuesday 28 March 2017
Time: 11:30 - 13:00
Location / Room: 3C

Chair:
Sebastien Le Beux, Ecole Central du Lyon, FR

Co-Chair:
Tushar Krishna, Georgia Institute of Technology, US

This session addresses challenges related to energy efficiency and reliability of NoCs. The first paper proposes an analytical approach to evaluate the reliability of adaptive routing algorithms. In the second paper, an online monitoring and routing approach is proposed to address the aging-related degradation in electrical NoC. Finally, the third paper shows how to use network traffic-aware spatial parallelism to improve the energy efficiency of the Epiphany SoC.

TimeLabelPresentation Title
Authors
11:302.5.1(Best Paper Award Candidate)
RELIABILITY ASSESSMENT OF FAULT TOLERANT ROUTING ALGORITHMS IN NETWORKS-ON-CHIP: AN ANALYTIC APPROACH
Speaker:
Sadia Moriam, Technische Universitaet Dresden, DE
Authors:
Sadia Moriam and Gerhard Fettweis, Technische Universität Dresden, DE
Abstract
Rapid scaling of transistor gate sizes has significantly increased the density of on-chip integrations and paved the way for many-core systems-on-chip with highly improved performances. The design of the interconnection network of these complex systems is a critical one and the network-on-chip is now the accepted efficient interconnect for such large core arrays. An unfortunate adverse effect of technology scaling is the increased susceptibility to failures resulting in failing links and routers in the network-on-chip. To keep the network connected, efficient fault adaptive routing algorithms are necessary to route around faults. To design and evaluate the fault resiliency of such adaptive routing algorithms, fast, accurate and flexible analytic models are required, especially in large networks for which simulations are extremely time costly. In this paper, we present an analytic approach to evaluate the reliability of adaptive routing algorithms based on algebraic manipulations of the channel dependency matrix. It allows also to evaluate the number of alternate paths between source-destination pairs, in the presence of any number of permanent faults in the network. The analytic model is general and can be adapted to evaluate network reliability for any network topology and with any adaptive routing algorithm based on the turn model. We present cycle-accurate simulations to compare the accuracy of the model for the 2-D mesh and the hexagonal networks. The model is able to estimate the network fault resilience with an accuracy of about 1% and more than 70 times faster than the cycle accurate simulation.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.5.2ONLINE MONITORING AND ADAPTIVE ROUTING FOR AGING MITIGATION IN NOCS
Speaker:
Nader Bagherzadeh, University of California, Irvine, US
Authors:
Zana Ghaderi, Ayed Alqahtani and Nader Bagherzadeh, University of California, Irvine, US
Abstract
Scalability of Network-on-Chip (NoC) as a promising solution for many-core systems can be jeopardized due to reliability challenges such as aging in advanced silicon technology. Previous mitigation techniques to protect NoC are either offline, while aging is strictly influenced by runtime operating conditions, or impose significant overheads to the system. This paper presents an online monitoring method through a Centralized Aging Table (CAT) for routers in NoCs. Router's capacity in flits, which are the main stimuli in routers, is predictable and limited for a given period of time. Consequently, stress rate and temperature, which are the major sources of aging mechanisms such as Bias Temperature Instability (BTI) and Hot Carrier Injection (HCI), will be in the predictable ranges, as well. Hence, our methodology uses CAT which is populated by values that represent aging degradation for each different pairs of stress and temperature ranges during a given period of time. Furthermore, utilizing CAT, we propose an online adaptive aging-aware routing algorithm in order to avoid highly aged routers which eventually leads to age balancing between routers. Additionally, our proposed routing algorithm reduces maximum age of routers by changing the shortest paths between source-destination pairs adaptively, considering routers' ages across them in each given period of time. Extensive experimental analysis using gem5 simulator demonstrates that our online routing algorithm and monitoring methodology, CAT, improves delay degradation of maximum aged router and aging imbalance on average by 39% and 52% compared to XY routing, respectively. The impact of our proposed methodology on network latency, Energy-Delay-Product (EDP) and link utilization is negligible.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.5.3EBSP: MANAGING NOC TRAFFIC FOR BSP WORKLOADS ON THE 16-CORE ADAPTEVA EPIPHANY-III PROCESSOR
Authors:
Siddhartha 1 and Nachiket Kapre2
1Nanyang Technological University, SG; 2University of Waterloo, CA
Abstract
We can deliver high performance and energy effi- cient operation on the multi-core NoC-based Adapteva Epiphany- III SoC for bulk-synchronous workloads using our proposed eBSP communication API. We characterize and automate per- formance tuning of spatial parallelism for supporting (1) ran- dom access load-store style traffic suitable for irregular sparse computations, as well as (2) variable, data-dependent traffic patterns in neural networks or PageRank-style workloads in a manner tailored for the Epiphany NoC. We aggressively optimize traffic by exposing spatial communication structure to the fabric through offline pre-computation of destination addresses, un- rolling of message-passing loops, selective squelching of messages, and careful ordering of communication and compute. Using our approach, across a range of applications and datasets such as Sparse Matrix-Vector multiplication (Matrix Market datasets), PageRank (BerkStan SNAP dataset), and Izhikevich spiking neural evaluation, we deliver speedups of 6.5-10× while lowering power use by 2× over optimized ARM-based mappings. When compared to optimized OpenMP x86 mappings, we observe a 11-31× improvement in energy efficiency (GFLOP/s/W) for the Epiphany SoC. Epiphany is also able to beat state-of-the- art spatial FPGA (ZC706) and embedded GPU (Jetson TK1) mappings due to our communication optimizations. Our library is open-source and available at github.com/sidmontu/ebsp.git.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session
Lunch Break in Garden Foyer

Keynote Lecture session 3.0 in "Garden Foyer" 1350 - 1420

Lunch Break in the Garden Foyer
On all conference days (Tuesday to Thursday), a buffet lunch will be offered in the Garden Foyer, in front of the session rooms. Kindly note that this is restricted to conference delegates possessing a lunch voucher only. When entering the lunch break area, delegates will be asked to present the corresponding lunch voucher of the day. Once the lunch area is being left, re-entrance is not allowed for the respective lunch.