强化学习
钢筋
决策支持系统
心理学
计算机科学
人工智能
社会心理学
出处
期刊:Defence Science Journal
[Defence Scientific Information and Documentation Centre]
日期:2024-02-26
卷期号:74 (3): 389-398
标识
DOI:10.14429/dsj.74.18864
摘要
While the recent advanced military operational concept requires an intelligent support of command and control, Reinforcement Learning (RL) has not been actively studied in the military domain. This study points out the limitations of RL for military applications from literature review and aims at improving the understanding of RL for military decision support under the limitations. Most of all, the black box characteristic of Deep RL makes the internal process difficult to understand in addition to complex simulation tools. A scalable weapon selection RL framework is built which can be solved either by a tabular form or a neural network form. The transition of the Deep Q-Network (DQN) solution to the tabular form makes it easier to compare the result to the Q-learning solution. Furthermore, rather than using one or two RL models selectively as before, RL models are divided as an actor and a critic, and systematically compared. A random agent, Q-learning and DQN agents as a critic, a Policy Gradient (PG) agent as an actor, Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) agents as an actor-critic approach are designed, trained, and tested. The performance results show that the trained DQN and PPO agents are the best decision supporter candidates for the weapon selection RL framework.
科研通智能强力驱动
Strongly Powered by AbleSci AI