Improving MPSoC Reliability through Adapting Runtime Task Schedule based on Time-Correlated Fault Behavior
Laura A. Rozo Duquea, Jose M. Monsalve Diazb and Chengmo Yangc
Electrical and Computer Engineering, University of Delaware, 140 Evans Hall, Newark, DE 19716, USA.
The increasing susceptibility of multicore systems to temperature variations, environmental issues and different aging effects has made system reliability a crucial concern. Unpredictability of all these factors makes fault behavior diverse in nature, which should be considered by the runtime task scheduler to improve overall system reliability. To achieve this goal, this paper proposes a fault tolerant approach to model core reliability at runtime and tune resource allocation accordingly. Given variations in fault duration, we propose a reliability model capable of tracking not only faults appeared in each core but also their correlation in time. Taking this model as an input, a runtime scheduling algorithm that allocates critical and vulnerable tasks to reliable cores is also proposed. Experimental results show that the proposed adaptive technique delivers up to 56% improvement in application execution time compared to other techniques.
Full Text (PDF)