计算机科学
人工智能
自然语言处理
语义压缩
语义学(计算机科学)
特征(语言学)
语义计算
语义匹配
语义特征
模式识别(心理学)
匹配(统计)
语义技术
语言学
语义网
数学
程序设计语言
哲学
统计
作者
Chao Shang,Hongliang Li,Heqian Qiu,Qingbo Wu,Fanman Meng,Taijin Zhao,King Ngi Ngan
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2022-12-23
卷期号:33 (7): 3229-3242
被引量:5
标识
DOI:10.1109/tcsvt.2022.3231964
摘要
Referring image segmentation aims to segment the target object from the image according to the description of language expression. Due to the diversity of language expressions, word sequences in different orders often express different semantic information. The previous methods focus more on matching different words to different visual regions in the image separately, ignoring the global semantic understanding of language expression based on the sequence structure. To address this problem, we redesign a new recurrent network structure for referring image segmentation, called Cross-Modal Recurrent Semantic Comprehension Network (CRSCNet), to obtain a more comprehensive global semantic understanding through iterative cross-modal semantic reasoning. Specifically, in each iteration, we first propose a Dynamic SepConv to extract relevant visual features guided by language and further propose Language Attentional Feature Modulation to improve the feature discriminability, then propose a Cross-Modal Semantic Reasoning module to perform global semantic reasoning by capturing both linguistic and visual information, and finally updates and corrects the visual features of the predicted object based on semantic information. Moreover, we further propose a Cross-Modal ASPP to capture richer visual information referred to in the global semantics of the language expression from larger receptive fields. Extensive experiments demonstrate that our proposed network significantly outperforms previous state-of-the-art methods on multiple datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI