Reliability of Google's Tensor Processing Units for Embedded Applications

Rubens Luiz Rech Junior1 and Paolo Rech1,2
1Institute of Informatics, UFRGS, Porto Alegre, Brazil
2DAUIN, Politecnico di Torino, and DII, Università di Trento, Italy

ABSTRACT


Convolutional Neural Networks (CNNs) have become the most used and efficient way to identify and classify objects in a scene. CNNs are today fundamental not only for autonomous vehicles, but also for Internet of Things (IoT) and smart cities or smart homes. Vendors are developing lowpower, efficient, and low-cost dedicated accelerators to allow the execution of the computational-demanding CNNs even in embedded applications with strict power and cost budgets. Google's Coral Tensor Processing Unit (TPU) is one of the latest low power accelerators for CNNs. In this paper we investigate the reliability of TPUs to atmospheric neutrons, reporting experimental data equivalent to more than 30 million years of natural irradiation. We analyze the behavior of TPUs executing atomic operations (standard or depthwise convolutions) with increasing input sizes as well as eight CNN designs typical of embedded applications, including transfer learning and reduced data-set configurations. We found that, despite the high error rate, most neutrons-induced errors only slightly modify the convolution output and do not change the CNNs detection or classification. By reporting details about the fault model and error rate, we provide valuable information on how to evaluate and improve the reliability of CNNs executed on a TPU.



Full Text (PDF)