eBSP: Managing NoC Traffic for BSP Workloads on the 16-Core Adapteva Epiphany-III Processor

Siddhartha1 and Nachiket Kapre2
1Nanyang Technological University, 50 Nanyang Avenue, Singapore.
2University of Waterloo, 200 University Ave W, Waterloo, Canada.


We can deliver high performance and energy efficient operation on the multi-core NoC-based Adapteva Epiphany-III SoC for bulk-synchronous workloads using our proposed eBSP communication API. We characterize and automate performance tuning of spatial parallelism for supporting (1) random access load-store style traffic suitable for irregular sparse computations, as well as (2) variable, data-dependent traffic patterns in neural networks or PageRank-style workloads in a manner tailored for the Epiphany NoC. We aggressively optimize traffic by exposing spatial communication structure to the fabric through offline pre-computation of destination addresses, unrolling of message-passing loops, selective squelching of messages, and careful ordering of communication and compute. Using our approach, across a range of applications and datasets such as Sparse Matrix-Vector multiplication (Matrix Market datasets), PageRank (BerkStan SNAP dataset), and Izhikevich spiking neural evaluation, we deliver speedups of 6.5-10 x; while lowering power use by 2x;over optimized ARM-based mappings. When compared to optimized OpenMP x86 mappings, we observe a 11-31x; improvement in energy efficiency (GFLOP/s/W) for the Epiphany SoC. Epiphany is also able to beat state-of-the-art spatial FPGA (ZC706) and embedded GPU (Jetson TK1) mappings due to our communication optimizations. Our library is open-source and available at github.com/sidmontu/ebsp.git.

Full Text (PDF)