瓶颈
关系(数据库)
关系抽取
信息瓶颈法
萃取(化学)
计算机科学
信息抽取
人工智能
情报检索
自然语言处理
数据挖掘
化学
相互信息
色谱法
嵌入式系统
作者
Shiyao Cui,Jiawei Sheng,Xin Cong,Jiawei Sheng,Quangang Li,Tingwen Liu,Jinqiao Shi
出处
期刊:IEEE/ACM transactions on audio, speech, and language processing
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:: 1-12
被引量:1
标识
DOI:10.1109/taslp.2023.3345146
摘要
This paper studies the multimodal named entity recognition (MNER) and multimodal relation extraction (MRE), which are important for content analysis and various applications. The core of MNER and MRE lies in incorporating evident visual information to enhance textual semantics, where two issues inherently demand investigations. The first issue is modality-noise, where the task-irrelevant information in each modality may be noises misleading the task prediction. The second issue is modality-gap, where representations from different modalities are inconsistent, preventing from building the semantic alignment between the text and image. To address these issues, we propose a novel method for MNER and MRE by M ulti M odal representation learning with I nformation B ottleneck (MMIB). For the first issue, a refinement-regularizer probes the information-bottleneck principle to balance the predictive evidence and noisy information, yielding expressive representations for prediction. For the second issue, an alignment-regularizer is proposed, where a mutual information-based item works in a contrastive manner to regularize the consistent text-image representations. To our best knowledge, we are the first to explore variational IB estimation for MNER and MRE. Experiments show that MMIB achieves the state-of-the-art performances on three public benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI