计算机科学
杠杆(统计)
卷积神经网络
情绪分类
人工智能
任务(项目管理)
边距(机器学习)
图像(数学)
语义学(计算机科学)
上下文图像分类
机器学习
自然语言处理
模式识别(心理学)
经济
管理
程序设计语言
作者
Sinuo Deng,Ge Shi,Lifang Wu,Lehao Xing,Wenjin Hu,Heng Zhang,Ye Xiang
标识
DOI:10.1007/978-3-031-00129-1_15
摘要
Image emotion classification is an important computer vision task to extract emotions from images. The state-of-the-art methods for image emotion classification are primarily based on proposing new architectures and fine-tuning them on pre-trained Convolutional Neural Networks. Recently, learning transferable visual models from natural language supervision has shown great success in zero-shot settings due to the easily accessible web-scale training data, i.e., CLIP. In this paper, we present a conceptually simple while empirically powerful framework for supervised image emotion classification, SimEmotion, to effectively leverage the rich image and text semantics entailed in CLIP. Specifically, we propose a prompt-based fine-tuning strategy to learn task-specific representations while preserving knowledge contained in CLIP. As image emotion classification tasks lack text descriptions, sentiment-level concept and entity-level information are introduced to enrich text semantics, forming knowledgeable prompts and avoiding considerable bias introduced by fixed designed prompts, further improving the model’s ability to distinguish emotion categories. Evaluations on four widely-used affective datasets, namely, Flickr and Instagram (FI), EmotionROI, Twitter I, and Twitter II, demonstrate that the proposed algorithm outperforms the state-of-the-art methods to a large margin (i.e., 5.27% absolute accuracy gain on FI) on image emotion classification tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI