Mask Grounding for Referring Image Segmentation

计算机科学 任务(项目管理) 接地 分割 人工智能 视觉推理 自然语言处理 判决 匹配(统计) 共同点 手语 钥匙(锁) 图像(数学) 模态(人机交互) 计算机视觉 模式识别(心理学) 语言学 工程类 沟通 电气工程 统计 哲学 数学 系统工程 计算机安全 社会学
作者
Yong Xien Chng,Henry Zheng,Yizeng Han,Xuchong Qiu,Gao Huang
出处
期刊:Cornell University - arXiv 被引量:1
标识
DOI:10.48550/arxiv.2312.12198
摘要

Referring Image Segmentation (RIS) is a challenging task that requires an algorithm to segment objects referred by free-form language expressions. Despite significant progress in recent years, most state-of-the-art (SOTA) methods still suffer from considerable language-image modality gap at the pixel and word level. These methods generally 1) rely on sentence-level language features for language-image alignment and 2) lack explicit training supervision for fine-grained visual grounding. Consequently, they exhibit weak object-level correspondence between visual and language features. Without well-grounded features, prior methods struggle to understand complex expressions that require strong reasoning over relationships among multiple objects, especially when dealing with rarely used or ambiguous clauses. To tackle this challenge, we introduce a novel Mask Grounding auxiliary task that significantly improves visual grounding within language features, by explicitly teaching the model to learn fine-grained correspondence between masked textual tokens and their matching visual objects. Mask Grounding can be directly used on prior RIS methods and consistently bring improvements. Furthermore, to holistically address the modality gap, we also design a cross-modal alignment loss and an accompanying alignment module. These additions work synergistically with Mask Grounding. With all these techniques, our comprehensive approach culminates in MagNet (Mask-grounded Network), an architecture that significantly outperforms prior arts on three key benchmarks (RefCOCO, RefCOCO+ and G-Ref), demonstrating our method's effectiveness in addressing current limitations of RIS algorithms. Our code and pre-trained weights will be released.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
123完成签到,获得积分10
刚刚
wyc发布了新的文献求助10
1秒前
善学以致用应助太叔夜南采纳,获得10
2秒前
3秒前
4秒前
fuxiao完成签到 ,获得积分10
5秒前
Ran-HT发布了新的文献求助10
5秒前
尼罗河沙漠完成签到,获得积分10
5秒前
6秒前
7秒前
9秒前
12秒前
heyan完成签到,获得积分10
12秒前
13秒前
崔昕雨发布了新的文献求助10
13秒前
一蓑烟雨任平生完成签到,获得积分10
13秒前
14秒前
Duderpia完成签到,获得积分10
14秒前
14秒前
15秒前
16秒前
太叔夜南发布了新的文献求助10
16秒前
MoNeng完成签到,获得积分10
16秒前
17秒前
qrwyqjbsd应助小圭采纳,获得30
17秒前
柳叶洋完成签到,获得积分10
18秒前
小梧完成签到 ,获得积分10
18秒前
jj发布了新的文献求助10
19秒前
21秒前
后叶忽安发布了新的文献求助10
22秒前
Duderpia发布了新的文献求助10
23秒前
炖地瓜发布了新的文献求助10
24秒前
在水一方应助老咸鱼采纳,获得40
24秒前
呆呆发布了新的文献求助10
25秒前
26秒前
27秒前
SYLH应助xy820采纳,获得10
29秒前
29秒前
30秒前
小小牛完成签到 ,获得积分10
32秒前
高分求助中
中央政治學校研究部新政治月刊社出版之《新政治》(第二卷第四期) 1000
Hopemont Capacity Assessment Interview manual and scoring guide 1000
Classics in Total Synthesis IV: New Targets, Strategies, Methods 1000
Mantids of the euro-mediterranean area 600
【港理工学位论文】Telling the tale of health crisis response on social media : an exploration of narrative plot and commenters' co-narration 500
Mantodea of the World: Species Catalog Andrew M 500
Insecta 2. Blattodea, Mantodea, Isoptera, Grylloblattodea, Phasmatodea, Dermaptera and Embioptera 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 内科学 物理 纳米技术 计算机科学 基因 遗传学 化学工程 复合材料 免疫学 物理化学 细胞生物学 催化作用 病理
热门帖子
关注 科研通微信公众号,转发送积分 3433815
求助须知:如何正确求助?哪些是违规求助? 3030979
关于积分的说明 8940427
捐赠科研通 2719043
什么是DOI,文献DOI怎么找? 1491619
科研通“疑难数据库(出版商)”最低求助积分说明 689331
邀请新用户注册赠送积分活动 685455