GENIE: QoS-guided Dynamic Scheduling for CNN-based Tasks on SME Clusters

Zhaoyun Chen1,2, Lei Luo1,a, Haoduo Yang1,2, Jie Yu1, Mei Wen1,2 and Chunyuan Zhang1,2
1College of Computer, National University of Defense Technology, Changsha, China
al.luo@nudt.edu.cn
2National Key Laboratory for Parallel and Distributed Processing, Changsha, China

ABSTRACT


Convolutional Neural Network (CNN) has achieved dramatic developments in emerging Machine Learning (ML) services. Compared to online ML services, offline ML services that are full of diverse CNN workloads are common in small and medium-sized enterprises (SMEs), research institutes and universities. Efficient scheduling and processing of multiple CNNbased tasks on SME clusters is both significant and challenging. Existing schedulers cannot predict the resource requirements of CNN-based tasks. In this paper, we propose GENIE, a QoSguided dynamic scheduling framework for SME clusters that achieves users' QoS guarantee and high system utilization. Based on a prediction model derived from lightweight profiling, a QoS-guided scheduling strategy is proposed to identify the best placements for CNN-based tasks. We implement GENIE as a plugin of Tensorflow and experiment with real SME clusters and large-scale simulations. The results of the experiments demonstrate that the QoS-guided strategy outperforms other baseline schedulers by up to 67.4% and 28.2% in terms of QoSguarantee percentage and makespan.

Keywords: QoS-guided, SME Cluster, Scheduling, CNN, Multi-task.



Full Text (PDF)