隐藏字幕
计算机科学
杠杆(统计)
机器翻译
人工智能
自然语言处理
翻译(生物学)
利用
话语
语音识别
图像(数学)
生物化学
化学
计算机安全
信使核糖核酸
基因
作者
Xuxin Cheng,Zhihong Zhu,Y. Li,Hongxiang Li,Yuexian Zou
标识
DOI:10.1145/3583780.3614832
摘要
Multimodal machine translation (MMT) aims to exploit visual information to improve neural machine translation (NMT). It has been demonstrated that image captioning and object detection can further improve MMT. In this paper, to leverage image captioning and object detection more effectively, we propose a Dual-level ASymmetric Contrastive Learning (DAS-CL) framework. Specifically, we leverage image captioning and object detection to generate more pairs of visual inputs and textual inputs. At the utterance level, we introduce an image captioning model to generate more coarse-grained pairs. At the word level, we introduce an object detection model to generate more fine-grained pairs. To mitigate the negative impact of noise in generated pairs, we apply asymmetric contrastive learning at these two levels. Experiments on the Multi30K dataset of three translation directions demonstrate that DAS-CL significantly outperforms existing MMT frameworks and achieves new state-of-the-art performance. More encouragingly, further analysis displays that DAS-CL is more robust to irrelevant visual information.
科研通智能强力驱动
Strongly Powered by AbleSci AI