图像编辑
计算机科学
弹丸
图像(数学)
领域(数学分析)
人工智能
人机交互
计算机图形学(图像)
数学
材料科学
数学分析
冶金
作者
Zhipeng Lin,Wenjing Yang,Long Lan,Mingyang Geng,Haotian Wang,Haoang Chi,Xueqiong Li,Ji Wang
标识
DOI:10.1109/icassp48485.2024.10447785
摘要
Standing out as one of the most widely used tools in Cross-Domain Few-Shot Learning (CDFSL), data augmentation forms the bedrock of numerous recent advancements. However, the current augmentations in CDFSL are limited in their ability to modify high-level semantic attributes, resulting in a lack of diversity along key semantic dimensions. One of the most promising tools to edit images with key semantic attributes, e.g. backgrounds, is image-to-image generation via large multimodal models (LMMs). Given the promising image editing results of recent LMMs, we delve into leveraging LMMs to augment data diversity for CDFSL. We propose a novel method named, Multimodal Few-shot Image Editing (MFIE), which uses LMMs to automatically translate class-specific images into class-agnostic natural language descriptions for various key semantic attributes in target domains and editing origin images based on class-agnostic natural language descriptions. To filter out corrupted data that disturbs the class-specific information, we apply semantic filtering using image-language similarity. Experiments on Meta-Datset show that MFIE surpasses SOTA CDFSL algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI