人工智能
计算机视觉
计算机科学
感知
机器人学
对象(语法)
机器人
姿势
心理学
神经科学
作者
Sudharshan Suresh,Haozhi Qi,Tingfan Wu,Taosha Fan,Luis A. Pineda,Mike Lambeta,Jitendra Malik,Mrinal Kalakrishnan,Roberto Calandra,Michael Kaess,Joseph D. Ortiz,Mustafa Mukadam
出处
期刊:Science robotics
[American Association for the Advancement of Science (AAAS)]
日期:2024-11-13
卷期号:9 (96)
标识
DOI:10.1126/scirobotics.adl0628
摘要
To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object’s pose and shape. The status quo for in-hand perception primarily uses vision and is restricted to tracking a priori known objects. Moreover, visual occlusion of objects in hand is imminent during manipulation, preventing current systems from pushing beyond tasks without occlusion. We combined vision and touch sensing on a multifingered hand to estimate an object’s pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We studied multimodal in-hand perception in simulation and the real world, interacting with different objects via a proprioception-driven policy. Our experiments showed final reconstruction F scores of 81% and average pose drifts of 4.7 millimeters, which was further reduced to 2.3 millimeters with known object models. In addition, we observed that, under heavy visual occlusion, we could achieve improvements in tracking up to 94% compared with vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step toward benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone toward advancing robot dexterity.
科研通智能强力驱动
Strongly Powered by AbleSci AI