M2M-Routing: Environmental Adaptive Multi-agent Reinforcement Learning based Multi-hop Routing Policy for Self-Powered IoT Systems

Wen Zhang1,a, Jeff Zhang2, Mimi Xie3, Tao Liu4, Wenlu Wang1,b and Chen Pan1,c
1Department of Computer Science Texas A&M University–Corpus Christi Corpus Christi, USA
awzhang3@islander.tamucc.edu
bwenlu.wang@tamucc.edu
cchen.pan@tamucc.edu
2Department of Electrical Engineering Harvard University Cambridge, USA
jeffzhang@seas.harvard.edu
3Department of Computer Science University of Texas at San Antonio San Antonio, USA
mimi.xie@utsa.edu
4Department of Math and Computer Science Lawrence Technological University Southfield, USA
tliu3@ltu.edu

ABSTRACT


Energy harvesting (EH) technologies facilitate the trending proliferation of IoT devices with sustainable power supplies. However, the intrinsic weak and unstable nature of EH results in frequent and unpredictable power interruptions in EH IoT devices, which further causes unpleasant packet loss or reconnection failures in IoT network. Therefore, conventional routing and energy allocation methods are inefficient in the EH environments. The complexity of the EH environment caused a stumbling block to an intelligent routing policy and energy allocation. To address the problems, this work proposes an environment adaptive Deep Reinforcement Learning (DRL)-based multi-hop routing policy, M2M-Routing, to jointly optimize energy allocation and routing policy and mitigate these challenges through leveraging the offline computation resources. We prepare multimodels for the complex energy harvesting environment offline. By searching a historically similar power trace to identify the model ID, the prepared DRL model is selected to manage energy allocation and routing policy on the query power traces. Simulation results indicate that M2M-Routing improves the amount of data delivery by ∼ 3∼ to ∼ 4× compared with baselines.



Full Text (PDF)