强化学习
马尔可夫决策过程
概率逻辑
计算机科学
马尔可夫链
简单(哲学)
部分可观测马尔可夫决策过程
功能(生物学)
人工智能
数学优化
机器学习
马尔可夫过程
马尔可夫模型
数学
生物
进化生物学
统计
认识论
哲学
出处
期刊:Elsevier eBooks
[Elsevier]
日期:1994-01-01
卷期号:: 157-163
被引量:2349
标识
DOI:10.1016/b978-1-55860-335-6.50027-1
摘要
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsis-tic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Q-learning-like algorithm for finding optimal policies and demonstrates its application to a simple two-player game in which the optimal policy is probabilistic.
科研通智能强力驱动
Strongly Powered by AbleSci AI