发布文献求助

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

强化学习贝尔曼方程嵌入计算机科学功能（生物学）差异（会计）数学优化价值（数学）数学人工智能机器学习经济进化生物学生物会计

作者

Jingliang Duan,Yang Guan,Shengbo Eben Li,Yangang Ren,Qi Sun,Bo Cheng

出处

期刊：IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
日期：2021-06-09 卷期号：33 (11): 6584-6598 被引量：150

链接

arxiv.org arxiv.org arxiv.org arxiv.org nih.gov datacite.orgdoi.org

标识

DOI：10.1109/tnnls.2021.3082568

摘要

In reinforcement learning (RL), function approximation errors are known to easily lead to the Q -value overestimations, thus greatly reducing policy performance. This article presents a distributional soft actor-critic (DSAC) algorithm, which is an off-policy RL method for continuous control setting, to improve the policy performance by mitigating Q -value overestimations. We first discover in theory that learning a distribution function of state-action returns can effectively mitigate Q -value overestimations because it is capable of adaptively adjusting the update step size of the Q -value function. Then, a distributional soft policy iteration (DSPI) framework is developed by embedding the return distribution function into maximum entropy RL. Finally, we present a deep off-policy actor-critic variant of DSPI, called DSAC, which directly learns a continuous return distribution by keeping the variance of the state-action returns within a reasonable range to address exploding and vanishing gradient problems. We evaluate DSAC on the suite of MuJoCo continuous control tasks, achieving the state-of-the-art performance.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2025年影响因子查询已上线 (2025-6-18)

更新

PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 大个的应助被fjm采纳，获得10

3秒前; 于啷啷发布了新的文献求助10

4秒前; 小洋完成签到，获得积分10

4秒前; Handsome发布了新的文献求助10

4秒前; Ava上传了应助文件

5秒前; Suny完成签到，获得积分10

5秒前; 依依完成签到，获得积分10

5秒前; Bio上传了应助文件

8秒前; SL发布了新的文献求助10

9秒前; try完成签到，获得积分10

9秒前; lyn完成签到，获得积分10

10秒前; orixero的应助被Handsome采纳，获得10

11秒前; 聪慧的草丛发布了新的文献求助10

11秒前; 15902933324sjc发布了新的文献求助10

12秒前; Candy2024完成签到，获得积分10

12秒前; 慕青的应助被星沉静默采纳，获得10

14秒前; 乖猫要努力上传了应助文件

14秒前; 邹宇轩完成签到，获得积分10

19秒前; 沐沐1003完成签到，获得积分10

19秒前; 完美世界上传了应助文件

20秒前; 科研通AI2S的应助被SpongeBob采纳，获得10

20秒前; 科研通AI5上传了应助文件

21秒前; 邹宇轩关注了科研通微信公众号

23秒前; ding上传了应助文件

24秒前; 科研通管家关闭了迨你个迨迨的文献求助

24秒前; 希望天下0贩的0上传了应助文件

24秒前; 你一头牛牛牛牛完成签到，获得积分10

25秒前; CipherSage上传了应助文件

25秒前; kaini完成签到，获得积分20

25秒前; 迨你个迨迨发布了新的文献求助10

26秒前; 拉长的南松发布了新的文献求助10

26秒前; dmj发布了新的文献求助10

26秒前; 于啷啷完成签到，获得积分10

28秒前; kaini发布了新的文献求助10

29秒前; bkagyin的应助被务实大白采纳，获得10

29秒前; Hello的应助被风轻云淡采纳，获得10

29秒前; ZZ发布了新的文献求助10

29秒前; 科研通AI5的应助被YoursSummer采纳，获得10

30秒前; 小马甲上传了应助文件

31秒前; 哭泣的薯片完成签到，获得积分10

31秒前

高分求助中: Picture Books with Same-sex Parented Families: Unintentional Censorship 1000; A new approach to the extrapolation of accelerated life test data 1000; ACSM’s Guidelines for Exercise Testing and Prescription, 12th edition 500; Nucleophilic substitution in azasydnone-modified dinitroanisoles 500; Indomethacinのヒトにおける経皮吸収 400; Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 370; 基于可调谐半导体激光吸收光谱技术泄漏气体检测系统的研究 310

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3979704; 求助须知：如何正确求助？哪些是违规求助？ 3523679; 关于积分的说明 11218338; 捐赠科研通 3261196; 什么是DOI，文献DOI怎么找？ 1800490; 邀请新用户注册赠送积分活动 879113; 科研通“疑难数据库（出版商）”最低求助积分说明 807182

今日热心研友

昏睡的蟠桃

热心市民小红花

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通