强化学习
计算机科学
有向图
趋同(经济学)
分布式算法
网络拓扑
随机逼近
多智能体系统
强连通分量
功能(生物学)
数学优化
算法
数学
人工智能
分布式计算
计算机安全
进化生物学
钥匙(锁)
经济
生物
经济增长
操作系统
作者
Pengcheng Dai,Maolong Lv,He Wang,Simone Baldi
标识
DOI:10.1109/tnnls.2021.3139138
摘要
Actor-critic (AC) cooperative multiagent reinforcement learning (MARL) over directed graphs is studied in this article. The goal of the agents in MARL is to maximize the globally averaged return in a distributed way, i.e., each agent can only exchange information with its neighboring agents. AC methods proposed in the literature require the communication graphs to be undirected and the weight matrices to be doubly stochastic (more precisely, the weight matrices are row stochastic and their expectation are column stochastic). Differently from these methods, we propose a distributed AC algorithm for MARL over directed graph with fixed topology that only requires the weight matrix to be row stochastic. Then, we also study the MARL over directed graphs (possibly not connected) with changing topologies, proposing a different distributed AC algorithm based on the push-sum protocol that only requires the weight matrices to be column stochastic. Convergence of the proposed algorithms is proven for linear function approximation of the action value function. Simulations are presented to demonstrate the effectiveness of the proposed algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI