3.6 NoC in the age of neural network and approximate computing

Time	Label	Presentation Title Authors
14:30	3.6.1	GRAMARCH: A GPU-RERAM BASED HETEROGENEOUS ARCHITECTURE FOR NEURAL IMAGE SEGMENTATION Speaker: Biresh Joardar, Washington State University, US Authors: Biresh Kumar Joardar¹, Nitthilan Kannappan Jayakodi¹, Jana Doppa¹, Partha Pratim Pande¹, Hai (Helen) Li² and Krishnendu Chakrabarty³ ¹Washington State University, US; ²Duke University, US / TU Munich, US; ³Duke University, US Abstract Deep Neural Networks (DNNs) employed for image segmentation are computationally more expensive and complex compared to the ones used for classification. However, manycore architectures to accelerate training of these DNNs are relatively unexplored. Resistive random-access memory (ReRAM)-based architectures offer a promising alternative to commonly used GPU-based platforms for training DNNs. However, due to their low-precision storage capability, they cannot support all DNN layers and suffer from accuracy loss of learned models. To address these challenges, in this paper, we propose a heterogeneous architecture: GRAMAR, that combines the benefits of ReRAM and GPUs simultaneously by using a high-throughput 3D Network-on-Chip. Experimental results indicate that by suitably mapping DNN layers to GRAMAR, it is possible to achieve up to 33.4X better performance compared to conventional GPUs. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	3.6.2	AN APPROXIMATE MULTIPLANE NETWORK-ON-CHIP Speaker: Xiaohang Wang, South China University of Technology, CN Authors: Ling Wang¹, Xiaohang Wang² and Yadong Wang¹ ¹Harbin Institute of Technology, CN; ²South China University of Technology, CN Abstract The increasing communication demands in chip multiprocessors (CMPs) and many error-tolerant applications are driving the approximate design of the network-on-chip (NoC) for power-efficient packet delivery. However, current approximate NoC designs achieve improvements in network performance or dynamic power savings at the cost of additional circuit design and increased area overhead. In this paper, we propose a novel approximate multiplane NoC (AMNoC) that provides low-latency transfer for latency-sensitive packets and minimizes the power consumption of approximable packets through a lossy bufferless subnetwork. The AMNoC also includes a regular buffered subnetwork to guarantee the lossless delivery of nonapproximable packets. Evaluations show that, compared with a single-plane buffered NoC, the AMNoC reduces the average latency by 41.9%. In addition, the AMNoC achieves 48.6% and 53.4% savings in power consumption and area overhead, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	3.6.3	SHENJING: A LOW POWER RECONFIGURABLE NEUROMORPHIC ACCELERATOR WITH PARTIAL-SUM AND SPIKE NETWORKS-ON-CHIP Speaker: Bo Wang, National University of Singapore, SG Authors: Bo Wang, Jun Zhou, Weng-Fai Wong and Li-Shiuan Peh, National University of Singapore, SG Abstract The next wave of on-device AI will likely require energy-efficient deep neural networks. Brain-inspired spiking neural networks (SNN) has been identified to be a promising candidate. Doing away with the need for multipliers significantly reduces energy. For on-device applications, besides computation, communication also incurs a significant amount of energy and time. In this paper, we propose Shenjing, a configurable SNN architecture which fully exposes all on-chip communications to software, enabling software mapping of SNN models with high accuracy at low power. Unlike prior SNN architectures like TrueNorth, Shenjing does not require any model modification and retraining for the mapping. We show that conventional artificial neural networks (ANN) such as multilayer perceptron, convolutional neural networks, as well as the latest residual neural networks can be mapped successfully onto Shenjing, realizing ANNs with SNN's energy efficiency. For the MNIST inference problem using a multilayer perceptron, we were able to achieve an accuracy of 96% while consuming just 1.26 mW using 10 Shenjing cores. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP1-17, 139	LIGHTWEIGHT ANONYMOUS ROUTING IN NOC BASED SOCS Speaker: Prabhat Mishra, University of Florida, US Authors: Subodha Charles, Megan Logan and Prabhat Mishra, University of Florida, US Abstract System-on-Chip (SoC) supply chain is widely acknowledged as a major source of security vulnerabilities. Potentially malicious third-party IPs integrated on the same Network-on-Chip (NoC) with the trusted components can lead to security and trust concerns. While secure communication is a well studied problem in computer networks domain, it is not feasible to implement those solutions on resource-constrained SoCs. In this paper, we present a lightweight anonymous routing protocol for communication between IP cores in NoC based SoCs. Our method eliminates the major overhead associated with traditional anonymous routing protocols while ensuring that the desired security goals are met. Experimental results demonstrate that existing security solutions on NoC can introduce significant (1.5X) performance degradation, whereas our approach provides the same security features with minor (4%) impact on performance. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session

Time

Label

Presentation Title
Authors

14:30

3.6.1

GRAMARCH: A GPU-RERAM BASED HETEROGENEOUS ARCHITECTURE FOR NEURAL IMAGE SEGMENTATION
Speaker:
Biresh Joardar, Washington State University, US
Authors:
Biresh Kumar Joardar¹, Nitthilan Kannappan Jayakodi¹, Jana Doppa¹, Partha Pratim Pande¹, Hai (Helen) Li² and Krishnendu Chakrabarty³
¹Washington State University, US; ²Duke University, US / TU Munich, US; ³Duke University, US
Abstract
Deep Neural Networks (DNNs) employed for image segmentation are computationally more expensive and complex compared to the ones used for classification. However, manycore architectures to accelerate training of these DNNs are relatively unexplored. Resistive random-access memory (ReRAM)-based architectures offer a promising alternative to commonly used GPU-based platforms for training DNNs. However, due to their low-precision storage capability, they cannot support all DNN layers and suffer from accuracy loss of learned models. To address these challenges, in this paper, we propose a heterogeneous architecture: GRAMAR, that combines the benefits of ReRAM and GPUs simultaneously by using a high-throughput 3D Network-on-Chip. Experimental results indicate that by suitably mapping DNN layers to GRAMAR, it is possible to achieve up to 33.4X better performance compared to conventional GPUs.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:00

3.6.2

AN APPROXIMATE MULTIPLANE NETWORK-ON-CHIP
Speaker:
Xiaohang Wang, South China University of Technology, CN
Authors:
Ling Wang¹, Xiaohang Wang² and Yadong Wang¹
¹Harbin Institute of Technology, CN; ²South China University of Technology, CN
Abstract
The increasing communication demands in chip multiprocessors (CMPs) and many error-tolerant applications are driving the approximate design of the network-on-chip (NoC) for power-efficient packet delivery. However, current approximate NoC designs achieve improvements in network performance or dynamic power savings at the cost of additional circuit design and increased area overhead. In this paper, we propose a novel approximate multiplane NoC (AMNoC) that provides low-latency transfer for latency-sensitive packets and minimizes the power consumption of approximable packets through a lossy bufferless subnetwork. The AMNoC also includes a regular buffered subnetwork to guarantee the lossless delivery of nonapproximable packets. Evaluations show that, compared with a single-plane buffered NoC, the AMNoC reduces the average latency by 41.9%. In addition, the AMNoC achieves 48.6% and 53.4% savings in power consumption and area overhead, respectively.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

3.6.3

SHENJING: A LOW POWER RECONFIGURABLE NEUROMORPHIC ACCELERATOR WITH PARTIAL-SUM AND SPIKE NETWORKS-ON-CHIP
Speaker:
Bo Wang, National University of Singapore, SG
Authors:
Bo Wang, Jun Zhou, Weng-Fai Wong and Li-Shiuan Peh, National University of Singapore, SG
Abstract
The next wave of on-device AI will likely require energy-efficient deep neural networks. Brain-inspired spiking neural networks (SNN) has been identified to be a promising candidate. Doing away with the need for multipliers significantly reduces energy. For on-device applications, besides computation, communication also incurs a significant amount of energy and time. In this paper, we propose Shenjing, a configurable SNN architecture which fully exposes all on-chip communications to software, enabling software mapping of SNN models with high accuracy at low power. Unlike prior SNN architectures like TrueNorth, Shenjing does not require any model modification and retraining for the mapping. We show that conventional artificial neural networks (ANN) such as multilayer perceptron, convolutional neural networks, as well as the latest residual neural networks can be mapped successfully onto Shenjing, realizing ANNs with SNN's energy efficiency. For the MNIST inference problem using a multilayer perceptron, we were able to achieve an accuracy of 96% while consuming just 1.26 mW using 10 Shenjing cores.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:00

IP1-17, 139

LIGHTWEIGHT ANONYMOUS ROUTING IN NOC BASED SOCS
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Subodha Charles, Megan Logan and Prabhat Mishra, University of Florida, US
Abstract
System-on-Chip (SoC) supply chain is widely acknowledged as a major source of security vulnerabilities. Potentially malicious third-party IPs integrated on the same Network-on-Chip (NoC) with the trusted components can lead to security and trust concerns. While secure communication is a well studied problem in computer networks domain, it is not feasible to implement those solutions on resource-constrained SoCs. In this paper, we present a lightweight anonymous routing protocol for communication between IP cores in NoC based SoCs. Our method eliminates the major overhead associated with traditional anonymous routing protocols while ensuring that the desired security goals are met. Experimental results demonstrate that existing security solutions on NoC can introduce significant (1.5X) performance degradation, whereas our approach provides the same security features with minor (4%) impact on performance.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:00

End of session