计算机科学
机器人
人机交互
光学(聚焦)
人工智能
感知
动作(物理)
编码
任务(项目管理)
系统工程
量子力学
生物化学
化学
物理
神经科学
基因
光学
生物
工程类
作者
Jianfeng Liao,Haoyang Zhang,Haofu Qian,Qiwei Meng,Yinan Sun,Yao Sun,Wei Song,Shiqiang Zhu,Jason Gu
标识
DOI:10.1007/978-981-99-6495-6_36
摘要
Recent advances in large language models have highlighted their potential to encode massive amounts of semantic knowledge for long-term autonomous decision-making, positioning them as a promising solution for powering the cognitive capabilities of future home-assistant robots. However, while large language models can provide high-level decision, there is still no unified paradigm for integrating them with robots’ perception and low-level action. In this paper, we propose a framework centered around a large language model, integrated with visual perception and motion planning modules, to investigate the robotic grasping task. Unlike traditional methods that only focus on generating stable grasps, our proposed approach can handle personalized user instructions and perform tasks more effectively in home scenarios. Our approach integrates existing state-of-the-art models in a simple and effective way, without requiring any fine-tuning, which makes it low-cost and easy to deploy. Experiments on a physical robot system demonstrate the feasibility of our approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI