12.7 Power-efficient multi-core embedded architectures

Time	Label	Presentation Title Authors
16:00	12.7.1	TUNING THE ISA FOR INCREASED HETEROGENEOUS COMPUTATION IN MPSOCS Authors: Pedro Henrique Exenberger Becker, Jeckson Dellagostin Souza and Antonio Carlos Schneider Beck, Universidade Federal do Rio Grande do Sul, BR Abstract Heterogeneous MPSoCs are crucial to meeting energy efficiency and performance, given their combination of cores and accelerators. In this work, we propose a novel technique for MPSoCs design, increasing their specialization and task-parallelism within a given area and power budget. By removing the microarchitectural support of costly ISA extensions (e.g., FP, SIMD, crypto) from a few cores (transforming them into Partial-ISA Cores), we make room to add extra (full and simpler) in-order cores and hardware accelerators. While applications must migrate from Partial-ISA cores when they need the removed ISA support, they also execute at lower power consumption during their ISA-extension-free phases, since partial cores have much simpler datapaths compared to their full-ISA counterparts. On top of it, the additional cores and accelerators increase task-level parallelism and make the MPSoC more suitable for application-specific scenarios. We show the effectiveness of our approach by composing different MPSoCs in distinct execution scenarios, using the FP instructions and RISC-V ISA as a case study. To support our system, we also propose two scheduling policies, performance- and energy-oriented, to coordinate the execution of this novel design. For the former policy, we achieve 2.8x speedup for a neural network road sign detection, 1.53x speedup for a video-streaming app, and 1.2x speedup for a task-parallel scenario, consuming 68%, 75%, and 33% less energy, respectively. For the energy-oriented policy, partial-ISA reduces energy consumption by 29% over a highly efficient baseline, with increased performance. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.7.2	USER INTERACTION AWARE REINFORCEMENT LEARNING FOR POWER AND THERMAL EFFICIENCY OF CPU-GPU MOBILE MPSOCS Speaker: Somdip Dey, University of Essex, GB Authors: Somdip Dey¹, Amit Kumar Singh¹, Xiaohang Wang² and Klaus McDonald-Maier¹ ¹University of Essex, GB; ²South China University of Technology, CN Abstract Mobile user's usage behaviour changes throughout the day and the desirable Quality of Service (QoS) could thus change for each session. In this paper, we propose a QoS aware agent to monitor mobile user's usage behaviour to find the target frame rate, which satisfies the desired user's QoS, and applies reinforcement learning based DVFS on a CPU-GPU MPSoC to satisfy the frame rate requirement. Experimental study on a real Exynos hardware platform shows that our proposed agent is able to achieve a maximum of 50% power saving and 29% reduction in peak temperature compared to stock Android's power saving scheme. It also outperforms the existing state-of-the-art power and thermal management scheme by 41% and 19%, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.7.3	ENERGY-EFFICIENT TWO-LEVEL INSTRUCTION CACHE DESIGN FOR AN ULTRA-LOW-POWER MULTI-CORE CLUSTER Speaker: Jie Chen, Università di Bologna, CN Authors: Jie Chen¹, Igor Loi², Luca Benini³ and Davide Rossi³ ¹Università di Bologna, FR; ²GreenWaves Technologies, FR; ³Università di Bologna, IT Abstract High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) edge devices. Exploiting clusters of multiple programmable processors has recently emerged as a suitable solution to address this challenge. However, one of the main power bottlenecks for multi-core architectures is the instruction cache memory. We propose a two-level structure based on Standard Cell Memories (SCMs) which combines a private instruction cache (L1) per-core and a low-latency (only one cycle latency) shared instruction cache (L1,5). We present a detailed comparison of performance and energy efficiency for different instruction cache architectures. Our system-level analysis shows that the proposed design improves upon both state-of-the art private and shared cache architectures and balances well performance with energy-efficacy. On average, when executing a set of real-life IoT applications, our multi-level cache improves performance and energy efficiency both by 10% with respect to the private instruction cache system, and improves energy efficiency by 15% and 7% with a performance loss of only 2% with respect to the shared instruction cache. Besides, relaxed timing makes two-level instruction cache an attractive choice for aggressive implementation, with more slack for convergence in physical design. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

Time

Label

Presentation Title
Authors

16:00

12.7.1

TUNING THE ISA FOR INCREASED HETEROGENEOUS COMPUTATION IN MPSOCS
Authors:
Pedro Henrique Exenberger Becker, Jeckson Dellagostin Souza and Antonio Carlos Schneider Beck, Universidade Federal do Rio Grande do Sul, BR
Abstract
Heterogeneous MPSoCs are crucial to meeting energy efficiency and performance, given their combination of cores and accelerators. In this work, we propose a novel technique for MPSoCs design, increasing their specialization and task-parallelism within a given area and power budget. By removing the microarchitectural support of costly ISA extensions (e.g., FP, SIMD, crypto) from a few cores (transforming them into Partial-ISA Cores), we make room to add extra (full and simpler) in-order cores and hardware accelerators. While applications must migrate from Partial-ISA cores when they need the removed ISA support, they also execute at lower power consumption during their ISA-extension-free phases, since partial cores have much simpler datapaths compared to their full-ISA counterparts. On top of it, the additional cores and accelerators increase task-level parallelism and make the MPSoC more suitable for application-specific scenarios. We show the effectiveness of our approach by composing different MPSoCs in distinct execution scenarios, using the FP instructions and RISC-V ISA as a case study. To support our system, we also propose two scheduling policies, performance- and energy-oriented, to coordinate the execution of this novel design. For the former policy, we achieve 2.8x speedup for a neural network road sign detection, 1.53x speedup for a video-streaming app, and 1.2x speedup for a task-parallel scenario, consuming 68%, 75%, and 33% less energy, respectively. For the energy-oriented policy, partial-ISA reduces energy consumption by 29% over a highly efficient baseline, with increased performance.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:30

12.7.2

USER INTERACTION AWARE REINFORCEMENT LEARNING FOR POWER AND THERMAL EFFICIENCY OF CPU-GPU MOBILE MPSOCS
Speaker:
Somdip Dey, University of Essex, GB
Authors:
Somdip Dey¹, Amit Kumar Singh¹, Xiaohang Wang² and Klaus McDonald-Maier¹
¹University of Essex, GB; ²South China University of Technology, CN
Abstract
Mobile user's usage behaviour changes throughout the day and the desirable Quality of Service (QoS) could thus change for each session. In this paper, we propose a QoS aware agent to monitor mobile user's usage behaviour to find the target frame rate, which satisfies the desired user's QoS, and applies reinforcement learning based DVFS on a CPU-GPU MPSoC to satisfy the frame rate requirement. Experimental study on a real Exynos hardware platform shows that our proposed agent is able to achieve a maximum of 50% power saving and 29% reduction in peak temperature compared to stock Android's power saving scheme. It also outperforms the existing state-of-the-art power and thermal management scheme by 41% and 19%, respectively.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:00

12.7.3

ENERGY-EFFICIENT TWO-LEVEL INSTRUCTION CACHE DESIGN FOR AN ULTRA-LOW-POWER MULTI-CORE CLUSTER
Speaker:
Jie Chen, Università di Bologna, CN
Authors:
Jie Chen¹, Igor Loi², Luca Benini³ and Davide Rossi³
¹Università di Bologna, FR; ²GreenWaves Technologies, FR; ³Università di Bologna, IT
Abstract
High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) edge devices. Exploiting clusters of multiple programmable processors has recently emerged as a suitable solution to address this challenge. However, one of the main power bottlenecks for multi-core architectures is the instruction cache memory. We propose a two-level structure based on Standard Cell Memories (SCMs) which combines a private instruction cache (L1) per-core and a low-latency (only one cycle latency) shared instruction cache (L1,5). We present a detailed comparison of performance and energy efficiency for different instruction cache architectures. Our system-level analysis shows that the proposed design improves upon both state-of-the art private and shared cache architectures and balances well performance with energy-efficacy. On average, when executing a set of real-life IoT applications, our multi-level cache improves performance and energy efficiency both by 10% with respect to the private instruction cache system, and improves energy efficiency by 15% and 7% with a performance loss of only 2% with respect to the shared instruction cache. Besides, relaxed timing makes two-level instruction cache an attractive choice for aggressive implementation, with more slack for convergence in physical design.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:30

End of session