Memory Trojan Attack on Neural Network Accelerators
Yang Zhao1,a, Xing Hu1,b, Shuangchen Li1,c, Jing Ye2,3, Lei Deng1,d, Yu Ji1,4,f, Jianyu Xu1,4,g, Dong Wu1,4,h and Yuan Xie1,e
1University of California, Santa Barbara
ayang_zhao@ece.ucsb.edu
bhuxing@ece.ucsb.edu
cshuangchenli@ece.ucsb.edu
dleideng@ece.ucsb.edu
eyuanxie@ece.ucsb.edu
2Institute of Computing Technology, Chinese Academy of Sciences
3University of Chinese Academy of Sciences
yejing@ict.ac.cn
4Tsinghua University
fjiy15@mails.tsinghua.edu.cn
gxu-jy15@mails.tsinghua.edu.cn
hdongwu@tsinghua.edu.cn
ABSTRACT
Neural network accelerators are widely deployed in application systems for computer vision, speech recognition, and machine translation. Due to ubiquitous deployment of these systems, a strong incentive rises for adversaries to attack such artificial intelligence (AI) systems. Trojan is one of the most important attack models in hardware security domain. Hardware Trojans are malicious modifications to original ICs inserted by adversaries, which lead the system to malfunction after being triggered. The globalization of the semiconductor gives a chance for the adversary to conduct the hardware Trojan attacks.
Previous works design Neural Network (NN) Trojans with access to the model, toolchain, and hardware platform. However, the threat model is impractical which hinders their real adoption. In this work, we propose a memory Trojan methodology without the help of toolchain manipulation and model parameter information. We first leverage the memory access patterns to identify the input image data. Then we propose a Trojan triggering method based on the dedicated input image other than the circuit events, which has better controllability. The triggering mechanism works well even with environment noise and preprocessing towards the original images. In the end, we implement and verify the effectiveness of accuracy degradation attack.