DATE 2019

Memory Trojan Attack on Neural Network Accelerators

Yang Zhao^1,a, Xing Hu^1,b, Shuangchen Li^1,c, Jing Ye^2,3, Lei Deng^1,d, Yu Ji^1,4,f, Jianyu Xu^1,4,g, Dong Wu^1,4,h and Yuan Xie^1,e
¹University of California, Santa Barbara
^ayang_zhao@ece.ucsb.edu
^bhuxing@ece.ucsb.edu
^cshuangchenli@ece.ucsb.edu
^dleideng@ece.ucsb.edu
^eyuanxie@ece.ucsb.edu
²Institute of Computing Technology, Chinese Academy of Sciences
³University of Chinese Academy of Sciences
yejing@ict.ac.cn
⁴Tsinghua University
^fjiy15@mails.tsinghua.edu.cn
^gxu-jy15@mails.tsinghua.edu.cn
^hdongwu@tsinghua.edu.cn

ABSTRACT

Neural network accelerators are widely deployed in application systems for computer vision, speech recognition, and machine translation. Due to ubiquitous deployment of these systems, a strong incentive rises for adversaries to attack such artificial intelligence (AI) systems. Trojan is one of the most important attack models in hardware security domain. Hardware Trojans are malicious modifications to original ICs inserted by adversaries, which lead the system to malfunction after being triggered. The globalization of the semiconductor gives a chance for the adversary to conduct the hardware Trojan attacks.

Previous works design Neural Network (NN) Trojans with access to the model, toolchain, and hardware platform. However, the threat model is impractical which hinders their real adoption. In this work, we propose a memory Trojan methodology without the help of toolchain manipulation and model parameter information. We first leverage the memory access patterns to identify the input image data. Then we propose a Trojan triggering method based on the dedicated input image other than the circuit events, which has better controllability. The triggering mechanism works well even with environment noise and preprocessing towards the original images. In the end, we implement and verify the effectiveness of accuracy degradation attack.

Full Text (PDF)