Evaluating Impact of Human Errors on the Availability of Data Storage Systems

Mostafa Kishani, Reza Eftekhari and Hossein Asadi
Data Storage, Networks, & Processing (DSN) Lab, Department of Computer Engineering, Sharif University of Technology

ABSTRACT


In this paper, we investigate the effect of incorrect disk replacement service on the availability of data storage systems. To this end, we first conduct Monte Carlo simulations to evaluate the availability of disk subsystem by considering disk failures and incorrect disk replacement service. We also propose a Markov model that corroborates the Monte Carlo simulation results. We further extend the proposed model to consider the effect of automatic disk fail-over policy. The results obtained by the proposed model show that overlooking the impact of incorrect disk replacement can result up to three orders of magnitude unavailability underestimation. Moreover, this study suggests that by considering the effect of human errors, the conventional believes about the dependability of different RAID mechanisms should be revised. The results show that in the presence of human errors, RAID1 can result in lower availability compared to RAID5.



Full Text (PDF)