5.6 Energy efficiency in IoT - Edge to Cloud

Time	Label	Presentation Title Authors
08:30	5.6.1	FLEXICHECK: AN ADAPTIVE CHECKPOINTING ARCHITECTURE FOR ENERGY HARVESTING DEVICES Speaker: Priyanka Singla, IIT Delhi, IN Authors: Priyanka Singla, Shubhankar Suman Singh and Smruti R. Sarangi, IIT Delhi, IN Abstract With the advent of 5G and M2M architectures, energy harvesting devices are expected to become far more prevalent. Such devices harvest energy from ambient sources such as solar energy or vibration energy (from machines) and use it for sensing the environmental parameters and further processing them. Given that the rate of energy consumption is more than the rate of energy production, it is necessary to frequently halt the processor and accumulate energy from the environment. During this period it is mandatory to take a checkpoint to avoid the loss of data. State of the art algorithms use software based methods that extensively rely on compiler analyses. In this paper, we provide the first formal model for such systems, and show that we can arrive at an optimal check- pointing schedule using a quadratically constrained linear pro- gram (QCLP) solver. Using this as a baseline, we show that existing algorithms for checkpointing significantly underperform. Furthermore, we prove and demonstrate that when we have a relatively constant energy source, a greedy algorithm provides an optimal solution. To model more complex situations where the energy varies, we create a novel checkpointing algorithm that adapts itself according to the ambient energy. We obtain a speedup of 2 − 5× over the nearest competing approach, and we are within 3 − 8% of the optimal solution in the general case where the ambient energy exhibits variations. Download Paper (PDF; Only available from the DATE venue WiFi)
09:00	5.6.2	HARDWARE-ACCELERATED ENERGY-EFFICIENT SYNCHRONIZATION AND COMMUNICATION FOR ULTRA-LOW-POWER TIGHTLY COUPLED CLUSTERS Speaker: Florian Glaser, ETH Zurich, CH Authors: Florian Glaser¹, Germain Haugou¹, Davide Rossi², Qiuting Huang¹ and Luca Benini¹ ¹ETH Zürich, CH; ²University of Bologna, IT Abstract Parallel ultra low power computing is emerging as an enabler to meet the growing performance and energy efficiency demands in deeply embedded systems such as the end-nodes of the internet-of-things (IoT). The parallel nature of these systems however adds a significant degree of complexity as processing elements (PEs) need to communicate in various ways to organize and synchronize execution. Naive implementations of these central and non-trivial mechanisms can quickly jeopardize overall system performance and limit the achievable speedup and energy efficiency. To avoid this bottleneck, we present an event-based solution centered around a technology-independent, light-weight and scalable (up to 16 cores) synchronization and communication unit (SCU) and its integration into a shared-memory multicore cluster. Careful design and tight coupling of the SCU to the data interfaces of the cores allows to execute common synchronization procedures with a single instruction. Furthermore, we present hardware support for the common barrier and lock synchronization primitives with a barrier latency of only eleven cycles, independent of the number of involved cores. We demonstrate the efficiency of the solution based on experiments with a post-layout implementation of the multicore cluster in a 22 nm CMOS process where the SCU constitutes less than 2 % of area overhead. Our solution supports parallel sections as small as 100 or 72 cycles with a synchronization overhead of just 10 %, an improvement of up to 14 or 30 times with respect to cycle count or energy, respectively, compared to a test-and-set based implementation. Download Paper (PDF; Only available from the DATE venue WiFi)
09:30	5.6.3	MAMUT: MULTI-AGENT REINFORCEMENT LEARNING FOR EFFICIENT REAL-TIME MULTI-USER VIDEO TRANSCODING Speaker: Luis Costero, Universidad Complutense de Madrid, ES Authors: Luis Costero¹, Arman Iranfar², Marina Zapater², Francisco D. Igual¹, Katzalin Olcoz³ and David Atienza² ¹Dpto. de Arquitectura de computadores y Automática. Universidad Complutense de Madrid, ES; ²Embedded Systems Laboratory (ESL), Swiss Federal Institute of Technology Lausanne (EPFL), CH; ³Dpto. de Arquitectura de Computadores y Automática. Universidad Complutense de Madrid, ES Abstract Video transcoding has recently raised as a valid alternative to address the ever-increasing demands for video contents in servers' infrastructures in current multi-user environments, as it enhances user experience by providing the adequate video configuration, reduces pressure on the network, and minimizes inefficient and costly video storage. The advent of next-generation video coding standards makes efficient transcoding feasible. However, the computational complexity of HEVC, together with its myriad of configuration parameters, raises challenges for power management, throughput control, and Quality of Service (QoS) satisfaction. This is particularly challenging in multi-user environments where multiple users with different resolution demands and bandwidth constraints need to be served simultaneously. In this work, we present MAMUT, a multi-agent machine learning approach to tackle these challenges. Our proposal breaks the design space composed of runtime adaptation of the transcoder and system parameters into smaller sub-spaces that can be explored in a reasonable time by individual agents. While working cooperatively, each agent is in charge of learning and dynamically applying the optimal values for internal HEVC parameters and system-wide parameters such as number of threads per video and operating frequency, targeting throughput and video quality as objectives, and compression and power consumption as constraints. We implement M AMUT on an enterprise multicore server and compare equivalent scenarios to state-of-the-art alternative approaches. The obtained results reveal that MAMUT consistently attains up to 8x improvement in terms of FPS violations (and thus Quality of Service), 24% power reduction, and faster and more accurate adaptation both to the video contents and the available resources. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00	IP2-17, 271	SOFTWARE-HARDWARE CO-DESIGN OF MULTI-STANDARD DIGITAL BASEBAND PROCESSOR FOR IOT Speaker: Carolynn Bernier, CEA-Leti, FR Authors: Hela Belhadj Amor and Carolynn Bernier, CEA, LETI, FR Abstract This work demonstrates an ultra-low power, software-defined wireless transceiver designed for IoT applications using an open-source 32-bit RISC-V core. The key driver behind this success is an optimized hardware/software partitioning of the receiver's digital signal processing operators. We benchmarked our architecture on an algorithm for the detection of FSK-modulated frames using a RISC-V compatible core and ARM Cortex-M series processors. We use only standard compilation tools and no assembly-level optimizations. Our results show that Bluetooth LE frames can be detected with an estimated peak core power consumption of 1.6 mW on a 28 nm FDSOI technology, and falling to less than 0.6 mW (on average) during symbol demodulation. This is achieved at nominal voltage. Compared to state of the art, our work offers a power efficient alternative to the design of dedicated baseband processors for ultra-low power software-defined radios with a low software complexity. Download Paper (PDF; Only available from the DATE venue WiFi)
10:00		End of session Coffee Break in Exhibition Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Breaks (Lunch Area) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 26, 2019 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 27, 2019 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 28, 2019 Coffee Break 10:00 - 11:00 University Booth Best Demo Award Presentation at the University Booth 10:30 Lunch Break 12:30 - 14:00 Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

08:30

5.6.1

FLEXICHECK: AN ADAPTIVE CHECKPOINTING ARCHITECTURE FOR ENERGY HARVESTING DEVICES
Speaker:
Priyanka Singla, IIT Delhi, IN
Authors:
Priyanka Singla, Shubhankar Suman Singh and Smruti R. Sarangi, IIT Delhi, IN
Abstract
With the advent of 5G and M2M architectures, energy harvesting devices are expected to become far more prevalent. Such devices harvest energy from ambient sources such as solar energy or vibration energy (from machines) and use it for sensing the environmental parameters and further processing them. Given that the rate of energy consumption is more than the rate of energy production, it is necessary to frequently halt the processor and accumulate energy from the environment. During this period it is mandatory to take a checkpoint to avoid the loss of data. State of the art algorithms use software based methods that extensively rely on compiler analyses. In this paper, we provide the first formal model for such systems, and show that we can arrive at an optimal check- pointing schedule using a quadratically constrained linear pro- gram (QCLP) solver. Using this as a baseline, we show that existing algorithms for checkpointing significantly underperform. Furthermore, we prove and demonstrate that when we have a relatively constant energy source, a greedy algorithm provides an optimal solution. To model more complex situations where the energy varies, we create a novel checkpointing algorithm that adapts itself according to the ambient energy. We obtain a speedup of 2 − 5× over the nearest competing approach, and we are within 3 − 8% of the optimal solution in the general case where the ambient energy exhibits variations.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:00

5.6.2

HARDWARE-ACCELERATED ENERGY-EFFICIENT SYNCHRONIZATION AND COMMUNICATION FOR ULTRA-LOW-POWER TIGHTLY COUPLED CLUSTERS
Speaker:
Florian Glaser, ETH Zurich, CH
Authors:
Florian Glaser¹, Germain Haugou¹, Davide Rossi², Qiuting Huang¹ and Luca Benini¹
¹ETH Zürich, CH; ²University of Bologna, IT
Abstract
Parallel ultra low power computing is emerging as an enabler to meet the growing performance and energy efficiency demands in deeply embedded systems such as the end-nodes of the internet-of-things (IoT). The parallel nature of these systems however adds a significant degree of complexity as processing elements (PEs) need to communicate in various ways to organize and synchronize execution. Naive implementations of these central and non-trivial mechanisms can quickly jeopardize overall system performance and limit the achievable speedup and energy efficiency. To avoid this bottleneck, we present an event-based solution centered around a technology-independent, light-weight and scalable (up to 16 cores) synchronization and communication unit (SCU) and its integration into a shared-memory multicore cluster. Careful design and tight coupling of the SCU to the data interfaces of the cores allows to execute common synchronization procedures with a single instruction. Furthermore, we present hardware support for the common barrier and lock synchronization primitives with a barrier latency of only eleven cycles, independent of the number of involved cores. We demonstrate the efficiency of the solution based on experiments with a post-layout implementation of the multicore cluster in a 22 nm CMOS process where the SCU constitutes less than 2 % of area overhead. Our solution supports parallel sections as small as 100 or 72 cycles with a synchronization overhead of just 10 %, an improvement of up to 14 or 30 times with respect to cycle count or energy, respectively, compared to a test-and-set based implementation.
Download Paper (PDF; Only available from the DATE venue WiFi)

09:30

5.6.3

MAMUT: MULTI-AGENT REINFORCEMENT LEARNING FOR EFFICIENT REAL-TIME MULTI-USER VIDEO TRANSCODING
Speaker:
Luis Costero, Universidad Complutense de Madrid, ES
Authors:
Luis Costero¹, Arman Iranfar², Marina Zapater², Francisco D. Igual¹, Katzalin Olcoz³ and David Atienza²
¹Dpto. de Arquitectura de computadores y Automática. Universidad Complutense de Madrid, ES; ²Embedded Systems Laboratory (ESL), Swiss Federal Institute of Technology Lausanne (EPFL), CH; ³Dpto. de Arquitectura de Computadores y Automática. Universidad Complutense de Madrid, ES
Abstract
Video transcoding has recently raised as a valid alternative to address the ever-increasing demands for video contents in servers' infrastructures in current multi-user environments, as it enhances user experience by providing the adequate video configuration, reduces pressure on the network, and minimizes inefficient and costly video storage. The advent of next-generation video coding standards makes efficient transcoding feasible. However, the computational complexity of HEVC, together with its myriad of configuration parameters, raises challenges for power management, throughput control, and Quality of Service (QoS) satisfaction. This is particularly challenging in multi-user environments where multiple users with different resolution demands and bandwidth constraints need to be served simultaneously. In this work, we present MAMUT, a multi-agent machine learning approach to tackle these challenges. Our proposal breaks the design space composed of runtime adaptation of the transcoder and system parameters into smaller sub-spaces that can be explored in a reasonable time by individual agents. While working cooperatively, each agent is in charge of learning and dynamically applying the optimal values for internal HEVC parameters and system-wide parameters such as number of threads per video and operating frequency, targeting throughput and video quality as objectives, and compression and power consumption as constraints. We implement M AMUT on an enterprise multicore server and compare equivalent scenarios to state-of-the-art alternative approaches. The obtained results reveal that MAMUT consistently attains up to 8x improvement in terms of FPS violations (and thus Quality of Service), 24% power reduction, and faster and more accurate adaptation both to the video contents and the available resources.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

IP2-17, 271

SOFTWARE-HARDWARE CO-DESIGN OF MULTI-STANDARD DIGITAL BASEBAND PROCESSOR FOR IOT
Speaker:
Carolynn Bernier, CEA-Leti, FR
Authors:
Hela Belhadj Amor and Carolynn Bernier, CEA, LETI, FR
Abstract
This work demonstrates an ultra-low power, software-defined wireless transceiver designed for IoT applications using an open-source 32-bit RISC-V core. The key driver behind this success is an optimized hardware/software partitioning of the receiver's digital signal processing operators. We benchmarked our architecture on an algorithm for the detection of FSK-modulated frames using a RISC-V compatible core and ARM Cortex-M series processors. We use only standard compilation tools and no assembly-level optimizations. Our results show that Bluetooth LE frames can be detected with an estimated peak core power consumption of 1.6 mW on a 28 nm FDSOI technology, and falling to less than 0.6 mW (on average) during symbol demodulation. This is achieved at nominal voltage. Compared to state of the art, our work offers a power efficient alternative to the design of dedicated baseband processors for ultra-low power software-defined radios with a low software complexity.
Download Paper (PDF; Only available from the DATE venue WiFi)

10:00

End of session
Coffee Break in Exhibition Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Coffee Break 10:30 - 11:30
Lunch Break 13:00 - 14:30
Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Wednesday, March 27, 2019

Coffee Break 10:00 - 11:00
Lunch Break 12:30 - 14:30
Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Thursday, March 28, 2019

Coffee Break 10:00 - 11:00
University Booth Best Demo Award Presentation at the University Booth 10:30
Lunch Break 12:30 - 14:00
Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50
Coffee Break 15:30 - 16:00