推论
人工智能
贝叶斯推理
贝叶斯概率
对手
计算机科学
跟踪(教育)
机器学习
心理学
计算机安全
教育学
作者
Kuei-Tso Lee,Yen-Yun Huang,Je-Ruei Yang,Sheng-Jyh Wang
出处
期刊:IEEE transactions on games
[Institute of Electrical and Electronics Engineers]
日期:2023-06-12
卷期号:16 (2): 419-430
被引量:1
标识
DOI:10.1109/tg.2023.3285031
摘要
In a multi-agent competitive environment, it is important for an agent to detect the opponent's policy and adopt a suitable policy to exploit the opponent. Conventionally, most methods, e.g., Bayesian Policy Reuse (BPR) variants, assume the opponent adopts a fixed policy or a randomly changing policy. In this paper, we make a more realistic and reasonable assumption that the opponent may select its policy based on the previous observation. Here, we define the term "strategy" as the mapping from the previous observation to the opponent's selected policy, and we propose the Bayesian Strategy Inference (BSI) framework to infer the opponent's strategy. Furthermore, to deal with opponents who may randomly select their policies, the BSI framework is combined with an intra-episode policy tracking mechanism to construct the Bayesian Strategy Inference plus Policy Tracking (BSI-PT) algorithm. In our experiments, we design an extended batter vs. pitcher game (EBvPG) for the evaluation of the proposed BSI-PT framework. The experimental results demonstrate that BSI-PT obtains higher policy prediction accuracy and winning percentage than three other BPR variants against the opponents with a specific policy selection strategy, with a random selection strategy, or with a partially random strategy.
科研通智能强力驱动
Strongly Powered by AbleSci AI