强化学习
计算机科学
人工智能
机器学习
趋同(经济学)
跟踪(教育)
领域(数学)
障碍物
心理学
教育学
数学
政治学
纯数学
法学
经济
经济增长
作者
Jiahua Wang,Ping Zhang,Yan Wang
标识
DOI:10.1016/j.asoc.2023.110604
摘要
In recent years, deep reinforcement learning (DRL) has developed rapidly and has been applied to multi-UAV target tracking (MTT) research. However, DRL still faces challenges in data utilization and learning speed. To better solve the above problems, a novel two-stage DRL-based multi-UAV decision-making method is proposed in this paper. Specifically, a sample generator combining artificial potential field with proportional–integral–derivative is used to produce expert experience data. On this basis, a two-stage reinforcement learning training method is introduced. For the first stage, the policy network and critic network are pre-trained using expert data, combined with behavior cloning loss and additional Q-value loss, which reduces ineffective exploration and speeds up learning. For the second RL stage, by calculating the average return of the last recent k excellent episodes, the excellent experience generated by the agent itself is screened out and used to guide the policy network to choose the actions with high reward, thus improving the efficiency of data utilization. Extensive simulation experiments show that our method not only enables multi-UAV to continuously track the target in obstacle environments but also significantly improves the learning speed and convergence effect.
科研通智能强力驱动
Strongly Powered by AbleSci AI