Towards Cross-Platform Inference on Edge Devices with Emerging Neuromorphic Architecture
Shangyu Wu1,a, Yi Wang1,b, Amelie Chi Zhou1,c, Rui Mao1,d, Zili Shao2 and Tao Li3
1The National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China
ashangyuwu1006@gmail.com
byiwang@szu.edu.cn
cchi.zhou@szu.edu.cn
dmao@szu.edu.cn
2The Chinese University of Hong Kong, Hong Kong, China
shao@cse.cuhk.edu.hk
3University of Florida, Gainesville, FL, USA
taoli@ece.ufl.edu
ABSTRACT
Deep convolutional neural networks have become the mainstream solution for many artificial intelligence applications. However, they are still rarely deployed on mobile or edge devices due to the cost of a substantial amount of data movement among limited resources. The emerging processing-inmemory neuromorphic architecture offers a promising direction to accelerate the inference process. The key issue becomes how to effectively allocate the processing of inference between computing and storage resources on an edge device.
This paper presents Mobile-I, a resource allocation scheme to accelerate the Inference process on Mobile or edge devices. Mobile-I targets at the emerging 3D neuromorphic architecture to reduce the processing latency among computing resources and fully utilize the limited on-chip storage resources. We formulate the target problem as a resource allocation problem and use a software-based solution to offer the cross-platform deployment across multiple mobile or edge devices. We conduct a set of experiments using realistic workloads that are generated from Intel Movidius neural compute stick. Experimental results show that Mobile-I can effectively reduce the processing latency and improve the utilization of computing resources with negligible overhead in comparison with representative schemes.
Keywords: Edge computing, Memory management, Scheduling, Neuromorphic architecture, Inference.