突出
计算机科学
人工智能
目标检测
计算机视觉
水准点(测量)
特征(语言学)
背景(考古学)
上下文模型
对象(语法)
模式识别(心理学)
哲学
古生物学
生物
地理
语言学
大地测量学
作者
Jun Chen,Heye Zhang,Mingming Gong,Zhifan Gao
标识
DOI:10.1016/j.patcog.2024.110600
摘要
Salient object detection (SOD) is of high significance for various computer vision applications but is a challenging task due to the complicated scenes in real-world images. Most state-of-the-art SOD methods aim to build long-range dependency for improving global contrast modeling in complicated scenes. However, most of them suffer from the prior assumption of treating image patches as visual tokens for building long-range dependency. This is because this assumption leads to localizing salient regions with uncertain boundaries due to the lost object structure information. In this paper, to address this issue, we re-construct the prior assumption of treating both patches and superpixels as visual tokens for building long-range dependency, which takes into account the properties of superpixels and patches in preserving detailed structural-aware information and local context information, respectively. Based on the re-constructed prior assumption, we propose a Collaborative Compensative Transformer Network (CCTNet) for the SOD task. CCTNet firstly alternates the computation within the same kind of vision tokens and among different vision tokens to build their dependencies. By this means, the relationship between multi-level global context and detailed structure representation can be explicitly modeled for consistent semantic and object structure understanding. Then, CCTNet performs feature joint decoding for SOD by fusing the complementary global context and detailed structure for locating objects with certain boundaries. Extensive experiments were conducted to validate the effectiveness of the proposed modules. Furthermore, the experiments on ten benchmark datasets demonstrated the state-of-the-art performance of CCTNet on both RGB and RGB-D SOD.
科研通智能强力驱动
Strongly Powered by AbleSci AI