人工智能
RGB颜色模型
计算机科学
计算机视觉
变压器
边缘检测
图像处理
工程类
图像(数学)
电压
电气工程
作者
Zhengyi Liu,Yacheng Tan,Qian He,Yun Xiao
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2021-11-10
卷期号:32 (7): 4486-4497
被引量:204
标识
DOI:10.1109/tcsvt.2021.3127149
摘要
Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a cross-modality fusion model, SwinNet , for RGB-D and RGB-T salient object detection. It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided decoder achieves inter-level cross-modality fusion under the guidance of edge features. The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets, showing that it provides more insight into the cross-modality complementarity task.
科研通智能强力驱动
Strongly Powered by AbleSci AI