Agent AI: Surveying the Horizons of Multimodal Interaction

具身认知 内含代理 计算机科学 人机交互 杠杆(统计) 自主代理人 过程(计算) 背景(考古学) 观点 人工智能 视觉艺术 操作系统 生物 古生物学 艺术
作者
Zane Durante,Qiuyuan Huang,Naoki Wake,Ran Gong,Jae Sung Park,Bidipta Sarkar,Rohan Taori,Yusuke Noda,Demetri Terzopoulos,Yejin Choi,Katsushi Ikeuchi,Hoi Vo,Li Fei-Fei,Jianfeng Gao
出处
期刊:Cornell University - arXiv 被引量:7
标识
DOI:10.48550/arxiv.2401.03568
摘要

Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by developing agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
1秒前
赵子完成签到,获得积分10
1秒前
4秒前
如初发布了新的文献求助10
5秒前
5秒前
7秒前
小伏完成签到 ,获得积分20
7秒前
8秒前
9秒前
沈陈磊发布了新的文献求助10
10秒前
爆米花应助勤恳诗筠采纳,获得10
10秒前
10秒前
贾若楠完成签到,获得积分20
10秒前
Jasper应助ws采纳,获得10
10秒前
聂立双发布了新的文献求助10
12秒前
JamesPei应助爱你不商量采纳,获得10
13秒前
小蘑菇应助blueberry采纳,获得10
15秒前
15秒前
刘大米发布了新的文献求助10
15秒前
15秒前
天天快乐应助优美的冥幽采纳,获得50
16秒前
小小筱发布了新的文献求助10
16秒前
顶呱呱发布了新的文献求助10
18秒前
Swilder完成签到 ,获得积分10
19秒前
minya完成签到,获得积分10
20秒前
20秒前
DE2022发布了新的文献求助10
20秒前
21秒前
追光发布了新的文献求助10
21秒前
22秒前
自信棒棒糖完成签到,获得积分10
23秒前
可悲的牛马完成签到,获得积分20
23秒前
ws发布了新的文献求助10
25秒前
25秒前
星辰大海应助煎饼果子采纳,获得10
26秒前
是莉莉娅发布了新的文献求助10
26秒前
27秒前
贾若楠关注了科研通微信公众号
28秒前
脑洞疼应助激动的南烟采纳,获得10
28秒前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
ISCN 2024 – An International System for Human Cytogenomic Nomenclature (2024) 3000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
T/CAB 0344-2024 重组人源化胶原蛋白内毒素去除方法 1000
Maneuvering of a Damaged Navy Combatant 650
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3775713
求助须知:如何正确求助?哪些是违规求助? 3321315
关于积分的说明 10204848
捐赠科研通 3036291
什么是DOI,文献DOI怎么找? 1666031
邀请新用户注册赠送积分活动 797258
科研通“疑难数据库(出版商)”最低求助积分说明 757783