A Machine Learning Based Write Policy for SSD Cache in Cloud Block Storage

Yu Zhang1, Ke Zhou1, Ping Huang1, Hua Wang1,a, Jianying Hu2, Yangtao Wang1, Yongguang Ji2 and Bin Cheng2
1Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science & Technology, Huazhong University of Science & Technology
ahwang@hust.edu.cn
2Tencent Technology (Shenzhen) Co., Ltd.

ABSTRACT


Nowadays, SSD cache plays an important role in cloud storage systems. The associated write policy, which enforces an admission control policy regarding filling data into the cache, has a significant impact on the performance of the cache system and the amount of write traffic to SSD caches. Based on our analysis on a typical cloud block storage system, approximately 47.09% writes are write-only, i.e., writes to the blocks which have not been read during a certain time window. Naively writing the write-only data to the SSD cache unnecessarily introduces a large number of harmful writes to the SSD cache without any contribution to cache performance. On the other hand, it is a challenging task to identify and filter out those write-only data in a real-time manner, especially in a cloud environment running changing and diverse workloads.
In this paper, to alleviate the above cache problem, we propose an ML-WP, Machine Learning BasedWrite Policy, which reduces write traffic to SSDs by avoiding writing write-only data. The main challenge in this approach is to identify write-only data in a real-time manner. To realize ML-WP and achieve accurate write-only data identification, we use machine learning methods to classify data into two groups (i.e., write-only and normal data). Based on this classification, the write-only data is directly written to backend storage without being cached. Experimental results show that, compared with the industry widely deployed writeback policy, ML-WP decreases write traffic to SSD cache by 41.52%, while improving the hit ratio by 2.61% and reducing the average read latency by 37.52%.

Keywords: Cache Write Policy, Cloud Block Storage, Machine Learning, SSD.



Full Text (PDF)