模态(人机交互)
计算机科学
对象(语法)
目标检测
情态动词
编码器
人工智能
边距(机器学习)
职位(财务)
计算机视觉
机器学习
模式识别(心理学)
化学
财务
高分子化学
经济
操作系统
作者
Zeyu Yang,Jiaqi Chen,Zhenwei Miao,Wei Li,Xiatian Zhu,Zhang Li
出处
期刊:Cornell University - arXiv
日期:2022-01-01
被引量:47
标识
DOI:10.48550/arxiv.2208.11112
摘要
Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.
科研通智能强力驱动
Strongly Powered by AbleSci AI