A Survey on Vision-Language-Action Models for Embodied AI

具身认知 动作(物理) 计算机科学 认知科学 人工智能 自然语言处理 心理学 物理 量子力学
作者
Yueen Ma,Zixing Song,Yuzheng Zhuang,Jianye Hao,Irwin King
标识
DOI:10.48550/arxiv.2405.14093
摘要

Embodied AI is widely recognized as a cornerstone of artificial general intelligence because it involves controlling embodied agents to perform tasks in the physical world. Building on the success of large language models and vision-language models, a new category of multimodal models -- referred to as vision-language-action models (VLAs) -- has emerged to address language-conditioned robotic tasks in embodied AI by leveraging their distinct ability to generate actions. The recent proliferation of VLAs necessitates a comprehensive survey to capture the rapidly evolving landscape. To this end, we present the first survey on VLAs for embodied AI. This work provides a detailed taxonomy of VLAs, organized into three major lines of research. The first line focuses on individual components of VLAs. The second line is dedicated to developing VLA-based control policies adept at predicting low-level actions. The third line comprises high-level task planners capable of decomposing long-horizon tasks into a sequence of subtasks, thereby guiding VLAs to follow more general user instructions. Furthermore, we provide an extensive summary of relevant resources, including datasets, simulators, and benchmarks. Finally, we discuss the challenges facing VLAs and outline promising future directions in embodied AI. A curated repository associated with this survey is available at: https://github.com/yueen-ma/Awesome-VLA.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
乐美完成签到,获得积分20
1秒前
欣逸发布了新的文献求助10
1秒前
qclluu完成签到,获得积分20
1秒前
仓颉完成签到,获得积分10
1秒前
粗暴的镜子完成签到,获得积分10
2秒前
2秒前
852应助chaow采纳,获得10
5秒前
5秒前
5秒前
领导范儿应助漂亮凌旋采纳,获得10
5秒前
钟嘉琪发布了新的文献求助10
6秒前
7秒前
7秒前
从容的胡萝卜应助小兵采纳,获得10
8秒前
SuperYing发布了新的文献求助10
8秒前
9秒前
9秒前
geekxh发布了新的文献求助10
9秒前
jianghao完成签到,获得积分10
9秒前
冷艳若云发布了新的文献求助10
10秒前
10秒前
10秒前
小泥坑发布了新的文献求助10
11秒前
11秒前
zxxxx完成签到 ,获得积分10
11秒前
jianghao发布了新的文献求助10
11秒前
冯露瑶发布了新的文献求助10
12秒前
微微一笑完成签到,获得积分10
12秒前
天天快乐应助Sheepycat采纳,获得10
13秒前
大个应助chenli采纳,获得10
13秒前
吴祖恒完成签到,获得积分10
13秒前
14秒前
Joleneli100发布了新的文献求助20
14秒前
蔡1应助飘逸善若采纳,获得10
14秒前
风清扬发布了新的文献求助10
14秒前
汤一德发布了新的文献求助10
15秒前
雾见春完成签到 ,获得积分10
15秒前
等待雪青完成签到,获得积分10
16秒前
16秒前
xuke完成签到,获得积分20
16秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Lewis’s Child and Adolescent Psychiatry: A Comprehensive Textbook Sixth Edition 2000
Cronologia da história de Macau 1600
Continuing Syntax 1000
Encyclopedia of Quaternary Science Reference Work • Third edition • 2025 800
Signals, Systems, and Signal Processing 510
Pharma R&D Annual Review 2026 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6214350
求助须知:如何正确求助?哪些是违规求助? 8039865
关于积分的说明 16754646
捐赠科研通 5302642
什么是DOI,文献DOI怎么找? 2825065
邀请新用户注册赠送积分活动 1803475
关于科研通互助平台的介绍 1663969