8.3 Real time intelligent methods for energy-efficient approaches in CNN and biomedical applications

Time	Label	Presentation Title Authors
17:00	8.3.1	ONLINE EFFICIENT BIO-MEDICAL VIDEO TRANSCODING ON MPSOCS THROUGH CONTENT-AWARE WORKLOAD ALLOCATION Speaker: Arman Iranfar, Embedded Systems Lab (ESL), EPFL, CH Authors: Arman Iranfar¹, Ali Pahlevan¹, Marina Zapater¹, Martin Žagar², Mario Kovač² and David Atienza¹ ¹Embedded Systems Lab (ESL), EPFL, CH; ²University of Zagreb, HR Abstract Bio-medical image processing in the field of telemedicine, and in particular the definition of systems that allow medical diagnostics in a collaborative and distributed way is experiencing an undeniable growth. Due to the high quality of bio-medical videos and the subsequent large volumes of data generated, to enable medical diagnosis on-the-go it is imperative to efficiently transcode and stream the stored videos on real time, without quality loss. However, online video transcoding is a high-demanding computationally-intensive task and its efficient management in Multiprocessor Systems-on-Chip (MPSoCs) poses an important challenge. In this work we propose an efficient motion- and texture-aware frame-level parallelization approach to enable online medical imaging transcoding on MPSoCs for next generation video encoders. By exploiting the unique characteristics of bio-medical videos and the medical procedure that enable diagnosis, we split frames into tiles based on their motion and texture, deciding the most adequate level of parallelization. Then, we employ the available encoding parameters to satisfy the required video quality and compression. Moreover, we propose a new fast motion search algorithm for bio-medical videos that allows to drastically reduce the computational complexity of the encoder, thus achieving the frame rates required for online transcoding. Finally, we heuristically allocate the threads to the most appropriate available resources and set the operating frequency of each one. We evaluate our work on an enterprise multi-core server achieving online medical imaging with 1.6x higher throughput and 44% less power consumption when compared to the state-of-the-art techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.3.2	HIGHLY EFFICIENT AND ACCURATE SEIZURE PREDICTION ON CONSTRAINED IOT DEVICES Speaker: Farzad Samie, Karlsruhe Institute of Technology (KIT), DE Authors: Farzad Samie, Sebastian Paul, Lars Bauer and Joerg Henkel, Karlsruhe Institute of Technology, DE Abstract In this paper we present an efficient and accurate algorithm for epileptic seizure prediction on low-power and portable IoT devices. State-of-the-art algorithms suffer from two issues: computation intensive features and large internal memory requirement, which make them inapplicable for constrained devices. We reduce the memory requirement of our algorithm by reducing the size of data segments (i.e. the window of input stream data on which the processing is performed), and the number of required EEG channels. To respect the limitation of the processing capability, we reduce the complexity of our exploited features by only considering the simple features, which also contributes to reducing the memory requirements. Then, we provide new relevant features to compensate the information loss due to the simplifications (i.e. less number of channels, simpler features, shorter segment, etc.). We measured the energy consumption (12.41 mJ) and execution time (565 ms) for processing each segment (i.e. 5.12 seconds of EEG data) on a low-power MSP432 device. Even though the state-of-art does not fit to IoT devices, we evaluate the classification performance and show that our algorithm achieves the highest AUC score (0.79) for the held-out data and outperforms the state-of-the-art. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.3.3	A WEARABLE LONG-TERM SINGLE-LEAD ECG PROCESSOR FOR EARLY DETECTION OF CARDIAC ARRHYTHMIA Speaker: Muhammad Awais Bin Altaf, Lahore University of Management Sciences (LUMS), PK Authors: Syed Muhammad Abubakar, Wala Saadeh and Muhammad Awais Bin Altaf, Lahore University of Management Sciences (LUMS), PK Abstract Cardiac arrhythmia (CA) is one of the most serious heart diseases that lead to a very large number of annual casualties around the world. The traditional electrocardiography (ECG) devices usually fail to capture arrhythmia symptoms during patients' hospital visits due to their recurrent nature. This paper presents a wearable long term single-lead ECG processor for the CA detection at an early stage. To achieve on-sensor integration and long-term continuous monitoring, an ultra-low complexity feature extraction engine using reduced feature set of four (RFS4) is proposed. It reduces the area by >25% compared to the conventional QRS complex detection algorithms without compromising the accuracy. Moreover, RFS4 eliminates the need for complex machine learning decision logics for the detection of premature ventricular contraction (PVC) and nonsustained ventricular tachycardia (NVT). To ensure correct functional verification, the proposed system is implemented on FPGA and tested using the MIT-BIH ECG arrhythmia database. It achieves a sensitivity and specificity of 94.64% and 99.41%, respectively. The proposed processor is also synthesized using 0.18um CMOS technology with an overall energy efficiency of 139 nJ/detection. Download Paper (PDF; Only available from the DATE venue WiFi)
18:15	8.3.4	DRONET: EFFICIENT CONVOLUTIONAL NEURAL NETWORK DETECTOR FOR REAL-TIME UAV APPLICATIONS Speaker: Christos Kyrkou, University of Cyprus, KIOS CoE, CY Authors: Christos Kyrkou¹, George Plastiras¹, Stylianos Venieris², Theocharis Theocharides¹ and Christos Bouganis² ¹University of Cyprus, CY; ²Imperial College London, GB Abstract Unmanned Aerial Vehicles (drones) are emerging as a promising technology for both environmental and infrastructure monitoring, with broad use in a plethora of applications. Many such applications require the use of computer vision algorithms in order to analyse the information captured from an on-board camera. Such applications include detecting vehicles for emergency response and traffic monitoring. This paper therefore, explores the trade-offs involved in the development of a single-shot object detector based on deep convolutional neural networks (CNNs) that can enable UAVs to perform vehicle detection under a resource constrained environment such as in a UAV. The paper presents a holistic approach for designing such systems; the data collection and training stages, the CNN architecture, and the optimizations necessary to efficiently map such a CNN on a lightweight embedded processing platform suitable for deployment on UAVs. Through the analysis we propose a CNN architecture that is capable of detecting vehicles from aerial UAV images and can operate between 5-18 frames-per-second for a variety of platforms with an overall accuracy of ~95%. Overall, the proposed architecture is suitable for UAV applications, utilizing low-power embedded processors that can be deployed on commercial UAVs. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP3-15, 889	ERROR RESILIENCE ANALYSIS FOR SYSTEMATICALLY EMPLOYING APPROXIMATE COMPUTING IN CONVOLUTIONAL NEURAL NETWORKS Speaker: Muhammad Abdullah Hanif, Vienna University of Technology, Vienna, AT Authors: Muhammad Abdullah Hanif¹, Rehan Hafiz² and Muhammad Shafique¹ ¹TU Wien, AT; ²ITU, PK Abstract Approximate computing is an emerging paradigm for error resilient applications as it leverages accuracy loss for improving power, energy, area, and/or performance of an application. The spectrum of error resilient applications includes the domains of Image and video processing, Artificial intelligence (AI) and Machine Learning (ML), data analytics, and other Recognition, Mining, and Synthesis (RMS) applications. In this work, we address one of the most challenging question, i.e., how to systematically employ approximate computing in Convolutional Neural Networks (CNNs), which are one of the most compute-intensive and the pivotal part of AI. Towards this, we propose a methodology to systematically analyze error resilience of deep CNNs and identify parameters that can be exploited for improving performance/efficiency of these networks for inference purposes. We also present a case study for significance-driven classification of filters for different convolutional layers, and propose to prune those having the least significance, and thereby enabling accuracy vs. efficiency tradeoffs by exploiting their resilience characteristics in a systematic way. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

Time

Label

Presentation Title
Authors

17:00

8.3.1

ONLINE EFFICIENT BIO-MEDICAL VIDEO TRANSCODING ON MPSOCS THROUGH CONTENT-AWARE WORKLOAD ALLOCATION
Speaker:
Arman Iranfar, Embedded Systems Lab (ESL), EPFL, CH
Authors:
Arman Iranfar¹, Ali Pahlevan¹, Marina Zapater¹, Martin Žagar², Mario Kovač² and David Atienza¹
¹Embedded Systems Lab (ESL), EPFL, CH; ²University of Zagreb, HR
Abstract
Bio-medical image processing in the field of telemedicine, and in particular the definition of systems that allow medical diagnostics in a collaborative and distributed way is experiencing an undeniable growth. Due to the high quality of bio-medical videos and the subsequent large volumes of data generated, to enable medical diagnosis on-the-go it is imperative to efficiently transcode and stream the stored videos on real time, without quality loss. However, online video transcoding is a high-demanding computationally-intensive task and its efficient management in Multiprocessor Systems-on-Chip (MPSoCs) poses an important challenge. In this work we propose an efficient motion- and texture-aware frame-level parallelization approach to enable online medical imaging transcoding on MPSoCs for next generation video encoders. By exploiting the unique characteristics of bio-medical videos and the medical procedure that enable diagnosis, we split frames into tiles based on their motion and texture, deciding the most adequate level of parallelization. Then, we employ the available encoding parameters to satisfy the required video quality and compression. Moreover, we propose a new fast motion search algorithm for bio-medical videos that allows to drastically reduce the computational complexity of the encoder, thus achieving the frame rates required for online transcoding. Finally, we heuristically allocate the threads to the most appropriate available resources and set the operating frequency of each one. We evaluate our work on an enterprise multi-core server achieving online medical imaging with 1.6x higher throughput and 44% less power consumption when compared to the state-of-the-art techniques.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:30

8.3.2

HIGHLY EFFICIENT AND ACCURATE SEIZURE PREDICTION ON CONSTRAINED IOT DEVICES
Speaker:
Farzad Samie, Karlsruhe Institute of Technology (KIT), DE
Authors:
Farzad Samie, Sebastian Paul, Lars Bauer and Joerg Henkel, Karlsruhe Institute of Technology, DE
Abstract
In this paper we present an efficient and accurate algorithm for epileptic seizure prediction on low-power and portable IoT devices. State-of-the-art algorithms suffer from two issues: computation intensive features and large internal memory requirement, which make them inapplicable for constrained devices. We reduce the memory requirement of our algorithm by reducing the size of data segments (i.e. the window of input stream data on which the processing is performed), and the number of required EEG channels. To respect the limitation of the processing capability, we reduce the complexity of our exploited features by only considering the simple features, which also contributes to reducing the memory requirements. Then, we provide new relevant features to compensate the information loss due to the simplifications (i.e. less number of channels, simpler features, shorter segment, etc.). We measured the energy consumption (12.41 mJ) and execution time (565 ms) for processing each segment (i.e. 5.12 seconds of EEG data) on a low-power MSP432 device. Even though the state-of-art does not fit to IoT devices, we evaluate the classification performance and show that our algorithm achieves the highest AUC score (0.79) for the held-out data and outperforms the state-of-the-art.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:00

8.3.3

A WEARABLE LONG-TERM SINGLE-LEAD ECG PROCESSOR FOR EARLY DETECTION OF CARDIAC ARRHYTHMIA
Speaker:
Muhammad Awais Bin Altaf, Lahore University of Management Sciences (LUMS), PK
Authors:
Syed Muhammad Abubakar, Wala Saadeh and Muhammad Awais Bin Altaf, Lahore University of Management Sciences (LUMS), PK
Abstract
Cardiac arrhythmia (CA) is one of the most serious heart diseases that lead to a very large number of annual casualties around the world. The traditional electrocardiography (ECG) devices usually fail to capture arrhythmia symptoms during patients' hospital visits due to their recurrent nature. This paper presents a wearable long term single-lead ECG processor for the CA detection at an early stage. To achieve on-sensor integration and long-term continuous monitoring, an ultra-low complexity feature extraction engine using reduced feature set of four (RFS4) is proposed. It reduces the area by >25% compared to the conventional QRS complex detection algorithms without compromising the accuracy. Moreover, RFS4 eliminates the need for complex machine learning decision logics for the detection of premature ventricular contraction (PVC) and nonsustained ventricular tachycardia (NVT). To ensure correct functional verification, the proposed system is implemented on FPGA and tested using the MIT-BIH ECG arrhythmia database. It achieves a sensitivity and specificity of 94.64% and 99.41%, respectively. The proposed processor is also synthesized using 0.18um CMOS technology with an overall energy efficiency of 139 nJ/detection.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:15

8.3.4

DRONET: EFFICIENT CONVOLUTIONAL NEURAL NETWORK DETECTOR FOR REAL-TIME UAV APPLICATIONS
Speaker:
Christos Kyrkou, University of Cyprus, KIOS CoE, CY
Authors:
Christos Kyrkou¹, George Plastiras¹, Stylianos Venieris², Theocharis Theocharides¹ and Christos Bouganis²
¹University of Cyprus, CY; ²Imperial College London, GB
Abstract
Unmanned Aerial Vehicles (drones) are emerging as a promising technology for both environmental and infrastructure monitoring, with broad use in a plethora of applications. Many such applications require the use of computer vision algorithms in order to analyse the information captured from an on-board camera. Such applications include detecting vehicles for emergency response and traffic monitoring. This paper therefore, explores the trade-offs involved in the development of a single-shot object detector based on deep convolutional neural networks (CNNs) that can enable UAVs to perform vehicle detection under a resource constrained environment such as in a UAV. The paper presents a holistic approach for designing such systems; the data collection and training stages, the CNN architecture, and the optimizations necessary to efficiently map such a CNN on a lightweight embedded processing platform suitable for deployment on UAVs. Through the analysis we propose a CNN architecture that is capable of detecting vehicles from aerial UAV images and can operate between 5-18 frames-per-second for a variety of platforms with an overall accuracy of ~95%. Overall, the proposed architecture is suitable for UAV applications, utilizing low-power embedded processors that can be deployed on commercial UAVs.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:30

IP3-15, 889

ERROR RESILIENCE ANALYSIS FOR SYSTEMATICALLY EMPLOYING APPROXIMATE COMPUTING IN CONVOLUTIONAL NEURAL NETWORKS
Speaker:
Muhammad Abdullah Hanif, Vienna University of Technology, Vienna, AT
Authors:
Muhammad Abdullah Hanif¹, Rehan Hafiz² and Muhammad Shafique¹
¹TU Wien, AT; ²ITU, PK
Abstract
Approximate computing is an emerging paradigm for error resilient applications as it leverages accuracy loss for improving power, energy, area, and/or performance of an application. The spectrum of error resilient applications includes the domains of Image and video processing, Artificial intelligence (AI) and Machine Learning (ML), data analytics, and other Recognition, Mining, and Synthesis (RMS) applications. In this work, we address one of the most challenging question, i.e., how to systematically employ approximate computing in Convolutional Neural Networks (CNNs), which are one of the most compute-intensive and the pivotal part of AI. Towards this, we propose a methodology to systematically analyze error resilience of deep CNNs and identify parameters that can be exploited for improving performance/efficiency of these networks for inference purposes. We also present a case study for significance-driven classification of filters for different convolutional layers, and propose to prune those having the least significance, and thereby enabling accuracy vs. efficiency tradeoffs by exploiting their resilience characteristics in a systematic way.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:30

End of session