计算机科学
任务(项目管理)
模式
人工智能
情绪分析
任务分析
自然语言处理
多模态
编码器
多模式学习
建筑
编码(集合论)
语言模型
机器学习
人机交互
程序设计语言
艺术
社会科学
管理
集合(抽象数据类型)
社会学
万维网
经济
视觉艺术
操作系统
作者
Ling Yan,Jianfei Yu,Rui Xia
标识
DOI:10.18653/v1/2022.acl-long.152
摘要
As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention inrecent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodalalignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify fine-grainedaspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-LanguagePre-training framework for MABSA (VLP-MABSA), which is a unified multimodal encoder-decoder architecture for all the pretrainingand downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodalmodalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pre-training task. The source code is publicly released at https://github.com/NUSTM/VLP-MABSA.
科研通智能强力驱动
Strongly Powered by AbleSci AI