Identifying the Most Reliable Collaborative Workload Distribution in Heterogeneous Devices

Gabriel Piscoya Dávilaa, Daniel Oliveirab, Philippe Navauxc and Paolo Rechd
Institute of Informatics, UFRGS, Porto Alegre, Brazil
agpdavila@inf.ufrgs.br
bdagoliveira@inf.ufrgs.br
cnavaux@inf.ufrgs.br
dprech@inf.ufrgs.br

ABSTRACT


The constant need of higher performances and reduced power consumption has lead vendors to design heterogeneous devices that embed traditional CPU and an accelerator, like a GPU or FPGA. When the CPU and the accelerator are used collaboratively the device computational performances reach their peak. However, the higher amount of resources employed for computation has, potentially, the side effect of increasing soft error rate. In this paper we evaluate the reliability behavior of AMD Kaveri Accelerated Processing Units executing a set of heterogeneous applications. We distribute the workload between the CPU and GPU and evaluate which configuration provides the lowest error rate or allows the computation of the highest amount of data before experiencing a failure. We show that, in most cases, the most reliable workload distribution is the one that delivers the highest performances. As experimentally proven, by choosing the correct workload distribution the device reliability can increase of up to 9x.



Full Text (PDF)