A goal-conditioned policy search method with multi-timescale value function tuning

强化学习 贝尔曼方程 计算机科学 一般化 功能(生物学) 数学优化 价值(数学) 人工智能 约束(计算机辅助设计) 机器人 代表(政治) 机器学习 数学 进化生物学 生物 数学分析 几何学 政治 政治学 法学
作者
Zhihong Jiang,Jiachen Hu,Yan Zhao,Xiao Huang,Hui Li
标识
DOI:10.1108/ria-11-2023-0167
摘要

Purpose Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions. Design/methodology/approach A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy. Findings The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms. Originality/value This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
FashionBoy应助vvlydia采纳,获得10
1秒前
可爱的函函应助九卫采纳,获得10
1秒前
嗨是完成签到,获得积分10
2秒前
我我我我我不一样烟火完成签到,获得积分10
2秒前
渔夫完成签到,获得积分10
2秒前
酷波er应助吴洲凤采纳,获得10
3秒前
3秒前
3秒前
6秒前
邓邓发布了新的文献求助10
6秒前
7秒前
7秒前
9秒前
10秒前
hyw发布了新的文献求助10
14秒前
九卫发布了新的文献求助10
15秒前
15秒前
小鱼儿发布了新的文献求助10
15秒前
16秒前
吴洲凤发布了新的文献求助10
19秒前
20秒前
小二郎应助小钱钱采纳,获得10
20秒前
量子星尘发布了新的文献求助30
22秒前
鲸鱼发布了新的文献求助10
22秒前
浮游应助Sissel采纳,获得10
23秒前
bbihk完成签到,获得积分10
23秒前
默默善愁发布了新的文献求助10
24秒前
26秒前
天天快乐应助slin_sjtu采纳,获得10
27秒前
Sea_moon完成签到,获得积分10
28秒前
29秒前
科目三应助HENHer采纳,获得10
30秒前
30秒前
qifeng完成签到,获得积分10
31秒前
31秒前
Mizuki完成签到,获得积分10
31秒前
32秒前
彭于晏应助风中的眼神采纳,获得10
35秒前
35秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
《微型计算机》杂志2006年增刊 1600
Symbiosis: A Very Short Introduction 1500
Einführung in die Rechtsphilosophie und Rechtstheorie der Gegenwart 1500
Binary Alloy Phase Diagrams, 2nd Edition 1000
Air Transportation A Global Management Perspective 9th Edition 700
Letters from Rewi Alley to Ida Pruitt, 1954-1964, vol. 1 600
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4967770
求助须知:如何正确求助?哪些是违规求助? 4225455
关于积分的说明 13159277
捐赠科研通 4012275
什么是DOI,文献DOI怎么找? 2195475
邀请新用户注册赠送积分活动 1208861
关于科研通互助平台的介绍 1122837