强化学习
植绒(纹理)
计算机科学
马尔可夫决策过程
碰撞
分布式计算
人工智能
可扩展性
马尔可夫过程
数学
计算机安全
数据库
统计
复合材料
材料科学
作者
Chao Yan,Chang Wang,Xiaojia Xiang,Zhen Lan,Yuna Jiang
出处
期刊:IEEE Transactions on Industrial Informatics
[Institute of Electrical and Electronics Engineers]
日期:2021-07-01
卷期号:18 (2): 1260-1270
被引量:55
标识
DOI:10.1109/tii.2021.3094207
摘要
The evolution of artificial intelligence and Internet of Things (IoT) envision a highly integrated artificial IoT (AIoT) network. Flocking and cooperation with multiple unmanned aerial vehicles (UAVs) are expected to play a vital role in industrial AIoT networks. In this article, we formulate the collision-free flocking problem of fixed-wing UAVs as a Markov decision process and solve it in the deep reinforcement learning (DRL) framework. Our method can deal with a variable number of followers by encoding the dynamic environmental state into a fixed-length embedding tensor. Specifically, each follower constructs a fixed-size local situation map that describes the collision risks with other followers nearby. The local situation maps are used by a proposed DRL algorithm to learn the collision-free flocking behavior. To further improve the learning efficiency, we design a reference-point-based action selection strategy and an adaptive mechanism. We compare the proposed MA2D3QN algorithm with several benchmark DRL algorithms through numerical simulation, and we verify its advantages in learning efficiency and performance. Finally, we demonstrate the scalability and adaptability of MA2D3QN in a semiphysical simulation experiment.
科研通智能强力驱动
Strongly Powered by AbleSci AI