发布文献求助

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

强化学习极小极大一般化计算机科学对抗制随机性人工智能动作（物理）对手功能（生物学）数学优化贝尔曼方程机器学习计算机安全数学进化生物学生物量子力学统计物理数学分析

作者

Yangang Ren,Jingliang Duan,Shengbo Eben Li,Yang Guan,Qi Sun

链接

arxiv.org arxiv.orgdoi.org

标识

DOI：10.1109/itsc45102.2020.9294300

摘要

Reinforcement learning (RL) has achieved remarkable performance in numerous sequential decision making and control tasks. However, a common problem is that learned nearly optimal policy always overfits to the training environment and may not be extended to situations never encountered during training. For practical applications, the randomness of environment usually leads to some devastating events, which should be the focus of safety-critical systems such as autonomous driving. In this paper, we introduce the minimax formulation and distributional framework to improve the generalization ability of RL algorithms and develop the Minimax Distributional Soft Actor-Critic (Minimax DSAC) algorithm. Minimax formulation aims to seek optimal policy considering the most severe variations from environment, in which the protagonist policy maximizes action-value function while the adversary policy tries to minimize it. Distributional framework aims to learn a state-action return distribution, from which we can model the risk of different returns explicitly, thereby formulating a risk-averse protagonist policy and a risk-seeking adversarial policy. We implement our method on the decision-making tasks of autonomous vehicles at intersections and test the trained policy in distinct environments. Results demonstrate that our method can greatly improve the generalization ability of the protagonist agent to different environmental variations.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

活动

『应助活动周』获奖名单已公布 🔥 (2025-4-2)

更新

『中科院2025期刊分区』已更新 (2025-3-23)

更新

『即时热点』模块已上线 (2025-2-28)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 乐于助人大好人完成签到，获得积分10

刚刚; 北地风情完成签到，获得积分10

刚刚; 写得出发的中发布了新的文献求助10

刚刚; 辛涩发布了新的文献求助10

1秒前; 刘开阳发布了新的文献求助10

3秒前; T012完成签到，获得积分10

6秒前; 万能图书馆上传了应助文件

7秒前; 爆米花的应助被微笑耳机采纳，获得10

9秒前; 不安毛豆发布了新的文献求助20

9秒前; ZetianYang发布了新的文献求助10

9秒前; 小蘑菇上传了应助文件

11秒前; 贺英发布了新的文献求助10

12秒前; 哈哈完成签到，获得积分10

12秒前; 丰富的小甜瓜发布了新的文献求助10

13秒前; SYLH上传了应助文件

13秒前; 小蘑菇的应助被知足且上进采纳，获得10

13秒前; 深情安青上传了应助文件

14秒前; 元复天发布了新的文献求助10

15秒前; 我是老大的应助被fff采纳，获得10

15秒前; CodeCraft上传了应助文件

16秒前; 科研通AI5上传了应助文件

16秒前; zzz发布了新的文献求助10

17秒前; 坚定的跳跳糖完成签到，获得积分10

18秒前; 酷波er的应助被赶路人采纳，获得10

18秒前; 畅快老虎的应助被不安毛豆采纳，获得20

19秒前; 隐形曼青上传了应助文件

20秒前; 搜集达人的应助被嘉嘉采纳，获得10

20秒前; 半柚上传了应助文件

20秒前; CQ发布了新的文献求助10

20秒前; CodeCraft的应助被ZetianYang采纳，获得10

20秒前; 健壮的雪巧发布了新的文献求助30

23秒前; JamesPei上传了应助文件

24秒前; SYLH上传了应助文件

25秒前; 科研小民工上传了应助文件

26秒前; 谁家的花花发布了新的文献求助10

27秒前; 在水一方的应助被清秀的雨双采纳，获得30

27秒前; 共享精神上传了应助文件

29秒前; 香蕉觅云的应助被卡他采纳，获得10

29秒前; 风趣的涵柏完成签到，获得积分10

29秒前; edenz完成签到，获得积分10

30秒前

高分求助中: All the Birds of the World 3000; General Equilibrium, Capital and Macroeconomics 1000; Weirder than Sci-fi: Speculative Practice in Art and Finance 960; IZELTABART TAPATANSINE 500; Synthesis of Novel Salt-Resistant and High-Temperature Hydroxyapatite Nanoparticle for Rheology, Lubricity, Surface Tension, and Filtration Property Modifications of Water-Based Drilling Mud 300; Introduction to Comparative Public Administration: Administrative Systems and Reforms in Europe: Second Edition 2nd Edition 300; Spontaneous closure of a dural arteriovenous malformation 300

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3724105; 求助须知：如何正确求助？哪些是违规求助？ 3269638; 关于积分的说明 9961480; 捐赠科研通 2984162; 什么是DOI，文献DOI怎么找？ 1637237; 邀请新用户注册赠送积分活动 777413; 科研通“疑难数据库（出版商）”最低求助积分说明 747008

今日热心研友

科研小民工

昏睡的蟠桃

傲娇的曼香

默默地读文献

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通