亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach

强化学习 数学 领域(数学) 数学优化 人工智能 钢筋 多智能体系统 计算机科学 工程类 结构工程 纯数学
作者
Haotian Gu,Xin Guo,Xiaoli Wei,Renyuan Xu
出处
期刊:Mathematics of Operations Research [Institute for Operations Research and the Management Sciences]
被引量:1
标识
DOI:10.1287/moor.2022.0055
摘要

One of the challenges for multiagent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. Whereas exciting progress has been made to analyze decentralized MARL with the network of agents for social networks and team video games, little is known theoretically for decentralized MARL with the network of states for modeling self-driving vehicles, ride-sharing, and data and traffic routing. This paper proposes a framework of localized training and decentralized execution to study MARL with the network of states. Localized training means that agents only need to collect local information in their neighboring states during the training phase; decentralized execution implies that agents can execute afterward the learned decentralized policies, which depend only on agents’ current states. The theoretical analysis consists of three key components: the first is the reformulation of the MARL system as a networked Markov decision process with teams of agents, enabling updating the associated team Q-function in a localized fashion; the second is the Bellman equation for the value function and the appropriate Q-function on the probability measure space; and the third is the exponential decay property of the team Q-function, facilitating its approximation with efficient sample efficiency and controllable error. The theoretical analysis paves the way for a new algorithm LTDE-Neural-AC, in which the actor–critic approach with overparameterized neural networks is proposed. The convergence and sample complexity are established and shown to be scalable with respect to the sizes of both agents and states. To the best of our knowledge, this is the first neural network–based MARL algorithm with network structure and provable convergence guarantee. Funding: X. Wei is partially supported by NSFC no. 12201343. R. Xu is partially supported by the NSF CAREER award DMS-2339240.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
Anewone完成签到,获得积分10
2秒前
Anewone发布了新的文献求助50
8秒前
45秒前
文艺猫咪发布了新的文献求助10
51秒前
西兰完成签到,获得积分10
53秒前
科研通AI2S应助科研通管家采纳,获得10
57秒前
科研通AI2S应助科研通管家采纳,获得10
57秒前
1分钟前
甜菜发布了新的文献求助10
1分钟前
efren1806完成签到,获得积分10
1分钟前
姚老表完成签到,获得积分10
2分钟前
2分钟前
orixero应助科研通管家采纳,获得10
2分钟前
Hello应助科研通管家采纳,获得10
2分钟前
科研通AI2S应助Wei采纳,获得10
2分钟前
3分钟前
归尘发布了新的文献求助10
3分钟前
3分钟前
归尘发布了新的文献求助10
3分钟前
英姑应助Yesaniar采纳,获得30
3分钟前
4分钟前
斯文败类应助Yesaniar采纳,获得10
4分钟前
4分钟前
5分钟前
睿睿斌斌发布了新的文献求助10
5分钟前
JamesPei应助睿睿斌斌采纳,获得10
5分钟前
研友_LwbkK8完成签到,获得积分10
5分钟前
6分钟前
PIngguo完成签到,获得积分10
6分钟前
Rebeccaiscute完成签到 ,获得积分10
6分钟前
8分钟前
leo完成签到,获得积分10
8分钟前
leo发布了新的文献求助10
8分钟前
8分钟前
8分钟前
8分钟前
charliechen完成签到 ,获得积分10
8分钟前
hugeyoung完成签到,获得积分10
9分钟前
9分钟前
高分求助中
Production Logging: Theoretical and Interpretive Elements 2700
Social media impact on athlete mental health: #RealityCheck 1020
1.3μm GaAs基InAs量子点材料生长及器件应用 1000
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3526553
求助须知:如何正确求助?哪些是违规求助? 3107017
关于积分的说明 9282035
捐赠科研通 2804613
什么是DOI,文献DOI怎么找? 1539526
邀请新用户注册赠送积分活动 716583
科研通“疑难数据库(出版商)”最低求助积分说明 709579