计算机科学
模式
嵌入
模态(人机交互)
自然语言处理
人工智能
情报检索
推论
图形
图像(数学)
标题
机器学习
理论计算机科学
语言学
哲学
社会学
社会科学
作者
Victor Machado Gonzaga,Nils Murrugarra-Llerena,Solange Oliveira Rezende
标识
DOI:10.1145/3470482.3479636
摘要
Determining the author's intent in a social media post is a challenging multimodal task and requires identifying complex relationships between image and text in the post. For example, the post image can represent an object, person, product, or company, while the text can be an ironic message about the image content. Similarly, a text can be a news headline, while the image represents a provocation, meme, or satire about the news. Existing approaches propose intent classification techniques combining both modalities. However, some posts may have missing textual annotations. Hence, we investigate a graph-based approach that propagates available text embedding data from complete multimodal posts to incomplete ones. This paper presents a text embedding propagation method, which transfers embeddings from BERT neural language models to image-only posts (i.e., posts with incomplete modality) considering the topology of a graph constructed from both visual and textual modalities available during the training step. By using this inference approach, our method provides competitive results when textual modality is available at different completeness levels, even compared to reference methods that require complete modalities.
科研通智能强力驱动
Strongly Powered by AbleSci AI