Fault-Criticality Assessment for AI Accelerators using Graph Convolutional Networks

Arjun Chaudhuri1, Jonti Talukdar1, Jinwook Jung2, Gi-Joon Nam2 and Krishnendu Chakrabarty1
1Department of Electrical and Computer Engineering, Duke University, Durham, NC
2IBM Thomas J. Watson Research Center, Yorktown Heights, NY

ABSTRACT


Owing to the inherent fault tolerance of deep neural networks (DNNs), many structural faults in DNN accelerators tend to be functionally benign. In order to identify functionally critical faults, we analyze the functional impact of stuck-at faults in the processing elements of a 128×128 systolic-array accelerator that performs inferencing on the MNIST dataset. We present a 2-tier machine-learning framework that leverages graph convolutional networks (GCNs) for quick assessment of the functional criticality of structural faults. We describe a computationally efficient methodology for data sampling and feature engineering to train the GCN-based framework. The proposed framework achieves up to 90% classification accuracy with negligible misclassification of critical faults.



Full Text (PDF)