模态(人机交互)
计算机科学
水准点(测量)
人工智能
布线(电子设计自动化)
灵活性(工程)
路由器
情报检索
机器学习
数学
统计
计算机网络
大地测量学
地理
作者
Leigang Qu,Meng Li,Jian Wu,Zan Gao,Liqiang Nie
出处
期刊:International ACM SIGIR Conference on Research and Development in Information Retrieval
日期:2021-07-11
被引量:69
标识
DOI:10.1145/3404835.3462829
摘要
Image-text retrieval is a fundamental and crucial branch in information retrieval. Although much progress has been made in bridging vision and language, it remains challenging because of the difficult intra-modal reasoning and cross-modal alignment. Existing modality interaction methods have achieved impressive results on public datasets. However, they heavily rely on expert experience and empirical feedback towards the design of interaction patterns, therefore, lacking flexibility. To address these issues, we develop a novel modality interaction modeling network based upon the routing mechanism, which is the first unified and dynamic multimodal interaction framework towards image-text retrieval. In particular, we first design four types of cells as basic units to explore different levels of modality interactions, and then connect them in a dense strategy to construct a routing space. To endow the model with the capability of path decision, we integrate a dynamic router in each cell for pattern exploration. As the routers are conditioned on inputs, our model can dynamically learn different activated paths for different data. Extensive experiments on two benchmark datasets, i.e., Flickr30K and MS-COCO, verify the superiority of our model compared with several state-of-the-art baselines.
科研通智能强力驱动
Strongly Powered by AbleSci AI