SafeDM: a Hardware Diversity Monitor for Redundant Execution on Non-lockstepped Cores

Francisco Bas1,2, Pedro Benedicte1, Sergi Alcaide1, Guillem Cabo1, Fabio Mazzocchetti1 and Jaume Abella1
1Barcelona Supercomputing Center (BSC)
2Universitat Polit`ecnica de Catalunya (UPC)

ABSTRACT


Computing systems in the safety domain, such as those in avionics or space, require specific safety measures related to the criticality of the deployment. A problem these systems face is that of transient failures in hardware. A solution commonly used to tackle potential failures is to introduce redundancy in these systems, for example 2 cores that execute the same program at the same time. However, redundancy does not solve all potential failures, such as Common Cause Failures (CCF), where a single fault affects both cores identically (e.g. a voltage droop). If both redundant cores have identical state when the fault occurs, then there may be a CCF since the fault can affect both cores in the same way. To avoid CCF it is critical to know that there is diversity in the execution amongst the redundant cores.

In this paper we introduce SafeDM, a hardware Diversity Monitor that quantifies the diversity of each redundant processor to guarantee that CCF will not go unnoticed, and without needing to deploy lockstepped cores. SafeDM computes data and instruction diversity separately, using different techniques appropriate for each case. We integrate SafeDM in a RISC-V FPGA space MPSoC from Cobham Gaisler where SafeDM is proven effective with a large benchmark suite, incurring low area and power overheads. Overall, SafeDM is an effective hardware solution to quantify diversity in cores performing redundant execution.



Full Text (PDF)