强化学习
趋同(经济学)
计算机科学
马尔可夫决策过程
马尔可夫链
集合(抽象数据类型)
功能(生物学)
贝尔曼方程
价值(数学)
人工智能
马尔可夫过程
钢筋
增强学习
数学优化
机器学习
数学
工程类
统计
结构工程
进化生物学
程序设计语言
经济
生物
经济增长
标识
DOI:10.1016/s1389-0417(01)00015-8
摘要
Markov games are a model of multiagent environments that are convenient for studying multiagent reinforcement learning. This paper describes a set of reinforcement-learning algorithms based on estimating value functions and presents convergence theorems for these algorithms. The main contribution of this paper is that it presents the convergence theorems in a way that makes it easy to reason about the behavior of simultaneous learners in a shared environment.
科研通智能强力驱动
Strongly Powered by AbleSci AI