12.7 Power-efficient multi-core embedded architectures

Printer-friendly version PDF version

Date: Thursday 12 March 2020
Time: 16:00 - 17:30
Location / Room: Berlioz

Chair:
Andreas Burg, EPFL, CH

Co-Chair:
Semeen Rehman, TU Wien, AT

This session has papers that provide power-efficiency solutions for multi-core embedded architectures. Techniques discussed in the session are related to the architectural measures as well as effectively controlling voltage-frequency settings using machine learning based on user experiences.

TimeLabelPresentation Title
Authors
16:0012.7.1TUNING THE ISA FOR INCREASED HETEROGENEOUS COMPUTATION IN MPSOCS
Authors:
Pedro Henrique Exenberger Becker, Jeckson Dellagostin Souza and Antonio Carlos Schneider Beck, Universidade Federal do Rio Grande do Sul, BR
Abstract
Heterogeneous MPSoCs are crucial to meeting energy efficiency and performance, given their combination of cores and accelerators. In this work, we propose a novel technique for MPSoCs design, increasing their specialization and task-parallelism within a given area and power budget. By removing the microarchitectural support of costly ISA extensions (e.g., FP, SIMD, crypto) from a few cores (transforming them into Partial-ISA Cores), we make room to add extra (full and simpler) in-order cores and hardware accelerators. While applications must migrate from Partial-ISA cores when they need the removed ISA support, they also execute at lower power consumption during their ISA-extension-free phases, since partial cores have much simpler datapaths compared to their full-ISA counterparts. On top of it, the additional cores and accelerators increase task-level parallelism and make the MPSoC more suitable for application-specific scenarios. We show the effectiveness of our approach by composing different MPSoCs in distinct execution scenarios, using the FP instructions and RISC-V ISA as a case study. To support our system, we also propose two scheduling policies, performance- and energy-oriented, to coordinate the execution of this novel design. For the former policy, we achieve 2.8x speedup for a neural network road sign detection, 1.53x speedup for a video-streaming app, and 1.2x speedup for a task-parallel scenario, consuming 68%, 75%, and 33% less energy, respectively. For the energy-oriented policy, partial-ISA reduces energy consumption by 29% over a highly efficient baseline, with increased performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
16:3012.7.2USER INTERACTION AWARE REINFORCEMENT LEARNING FOR POWER AND THERMAL EFFICIENCY OF CPU-GPU MOBILE MPSOCS
Speaker:
Somdip Dey, University of Essex, GB
Authors:
Somdip Dey1, Amit Kumar Singh1, Xiaohang Wang2 and Klaus McDonald-Maier1
1University of Essex, GB; 2South China University of Technology, CN
Abstract
Mobile user's usage behaviour changes throughout the day and the desirable Quality of Service (QoS) could thus change for each session. In this paper, we propose a QoS aware agent to monitor mobile user's usage behaviour to find the target frame rate, which satisfies the desired user's QoS, and applies reinforcement learning based DVFS on a CPU-GPU MPSoC to satisfy the frame rate requirement. Experimental study on a real Exynos hardware platform shows that our proposed agent is able to achieve a maximum of 50% power saving and 29% reduction in peak temperature compared to stock Android's power saving scheme. It also outperforms the existing state-of-the-art power and thermal management scheme by 41% and 19%, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:0012.7.3ENERGY-EFFICIENT TWO-LEVEL INSTRUCTION CACHE DESIGN FOR AN ULTRA-LOW-POWER MULTI-CORE CLUSTER
Speaker:
Jie Chen, Università di Bologna, CN
Authors:
Jie Chen1, Igor Loi2, Luca Benini3 and Davide Rossi3
1Università di Bologna, FR; 2GreenWaves Technologies, FR; 3Università di Bologna, IT
Abstract
High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) edge devices. Exploiting clusters of multiple programmable processors has recently emerged as a suitable solution to address this challenge. However, one of the main power bottlenecks for multi-core architectures is the instruction cache memory. We propose a two-level structure based on Standard Cell Memories (SCMs) which combines a private instruction cache (L1) per-core and a low-latency (only one cycle latency) shared instruction cache (L1,5). We present a detailed comparison of performance and energy efficiency for different instruction cache architectures. Our system-level analysis shows that the proposed design improves upon both state-of-the art private and shared cache architectures and balances well performance with energy-efficacy. On average, when executing a set of real-life IoT applications, our multi-level cache improves performance and energy efficiency both by 10% with respect to the private instruction cache system, and improves energy efficiency by 15% and 7% with a performance loss of only 2% with respect to the shared instruction cache. Besides, relaxed timing makes two-level instruction cache an attractive choice for aggressive implementation, with more slack for convergence in physical design.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:30End of session