保险丝(电气)
计算机科学
人工智能
卷积(计算机科学)
特征(语言学)
概括性
计算机视觉
模式识别(心理学)
分割
目标检测
特征提取
单眼
对象(语法)
人工神经网络
语言学
哲学
电气工程
工程类
心理学
心理治疗师
作者
Wencheng Han,Xingping Dong,Yiyuan Zhang,David Crandall,Cheng‐Zhong Xu,Jianbing Shen
标识
DOI:10.1109/tpami.2024.3400873
摘要
Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and timeconsuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks: visual object tracking, referring video object segmentation, and monocular 3D object detection. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI