计算机科学
领域(数学分析)
人工智能
代表(政治)
图像(数学)
编码(集合论)
自然语言处理
数学
数学分析
集合(抽象数据类型)
政治
政治学
法学
程序设计语言
作者
Xiangfei Sheng,Leida Li,Pengfei Chen,Jinjian Wu,Weisheng Dong,Yuzhe Yang,Liwu Xu,Yaqian Li,Guangming Shi
标识
DOI:10.1145/3581783.3611969
摘要
Image aesthetics assessment (IAA) aims at predicting the aesthetic quality of images. Recently, large pre-trained vision-language models, like CLIP, have shown impressive performances on various visual tasks. When it comes to IAA, a straightforward way is to finetune the CLIP image encoder using aesthetic images. However, this can only achieve limited success without considering the uniqueness of multimodal data in the aesthetics domain. People usually assess image aesthetics according to fine-grained visual attributes, e.g., color, light and composition. However, how to learn aesthetics-aware attributes from CLIP-based semantic space has not been addressed before. With this motivation, this paper presents a CLIP-based multi-attribute contrastive learning framework for IAA, dubbed AesCLIP. Specifically, AesCLIP consists of two major components, i.e., aesthetic attribute-based comment classification and attribute-aware learning. The former classifies the aesthetic comments into different attribute categories. Then the latter learns an aesthetic attribute-aware representation by contrastive learning, aiming to mitigate the domain shift from the general visual domain to the aesthetics domain. Extensive experiments have been done by using the pre-trained AesCLIP on four popular IAA databases, and the results demonstrate the advantage of AesCLIP over the state-of-the-arts. The source code will be public at https://github.com/OPPOMKLab/AesCLIP.
科研通智能强力驱动
Strongly Powered by AbleSci AI