计算机科学
推论
分割
人工智能
编码(集合论)
图像(数学)
对象(语法)
表达式(计算机科学)
图像分割
二进制数
源代码
情态动词
计算机视觉
模式识别(心理学)
程序设计语言
算术
数学
集合(抽象数据类型)
化学
高分子化学
作者
Jiangquan Li,Shimin Shan,Yu Liu,Kaiping Xu,Xiwen Hu,Mingcheng Xue
标识
DOI:10.1145/3577190.3614176
摘要
Referring image segmentation aims to segment a target object from an image by providing a natural language expression. While recent methods have made remarkable advancements, few have designed effective deep fusion processes for cross-model features or focused on the fine details of vision. In this paper, we propose AIUnet, an asymptotic inference method that uses U2-Net. The core of AIUnet is a Cross-model U2-Net (CMU) module, which integrates a Text guide vision (TGV) module into U2-Net, achieving efficient interaction of cross-model information at different scales. CMU focuses more on location information in high-level features and learns finer detail information in low-level features. Additionally, we propose a Features Enhance Decoder (FED) module to improve the recognition of fine details and decode cross-model features to binary masks. The FED module leverages a simple CNN-based approach to enhance multi-modal features. Our experiments show that AIUnet achieved competitive results on three standard datasets.Code is available at https://github.com/LJQbiu/AIUnet.
科研通智能强力驱动
Strongly Powered by AbleSci AI