计算机科学
同义词(分类学)
杠杆(统计)
稳健性(进化)
人工智能
理解力
自然语言处理
情报检索
生物化学
植物
生物
基因
属
化学
程序设计语言
作者
Jianwei Sun,Yang An,Xinyu Jiang,Qian Li,Yulong Liu,Yongshun Gong
标识
DOI:10.1109/icassp48485.2024.10448339
摘要
Document AI, or Document Intelligence pertains to the technology used for document comprehension and analysis. Given the multimodality of documents, the importance of multimodal learning cannot be overstated in the field of document intelligence research. Multimodal data augmentation, as a crucial aspect of multimodal learning, aims to enhance efficiency and accuracy of the model. This paper proposes a novel multimodal data augmentation approach called DADA, specifically designed for the document augmentation. The objective is to improve the expressive capabilities of multi-modal data and enhance the model robustness. To this end, we leverage synonym conversion and text-to-image generation techniques to augment text and image representations, respectively. Our proposed method is built upon a state-of-the-art document pre-training model. Experimental results on two small document datasets demonstrate that our approach can effectively enhance the performance of document comprehension and analysis models.
科研通智能强力驱动
Strongly Powered by AbleSci AI