计算机科学
判决
自然语言处理
语义学(计算机科学)
人工智能
Glyph(数据可视化)
领域(数学分析)
可视化
程序设计语言
数学
数学分析
作者
Jie Hong,Tingting He,Jie Mei,Ming Dong,Zheming Zhang,Xinhui Tu
标识
DOI:10.1109/bigdata59044.2023.10386747
摘要
Ancient Chinese poetry (ACP) is a vital component of Chinese traditional culture. Enhancing the performance of related downstream tasks demands the development of high-quality pre-trained language models (PLMs) dedicated to ACP. Notably, the semantics of ACP significantly differ from modern Chinese. Existing PLMs have limited knowledge of ACP and are inadequately aligned with the semantic space of modern Chinese, which constrains the utility for tasks related to ACP. In this paper, we propose a fine-tuning strategy to establish a precise alignment between ACP and modern Chinese semantics on sentence level. This strategy involves the inclusion of corresponding modern Chinese translations alongside original ancient poems, creating a hybrid corpus. This corpus facilitates a more effective transfer of knowledge from existing PLMs to the domain of ACP. Furthermore, we employ a training strategy based on a glyph-based foundational PLM, enabling meticulous fine-tuning. Consequently, we develop a specialized PLM named CP-ChineseBERT. To evaluate the effectiveness of our proposed strategies, we conducted experiments on two real-world datasets, focusing on tasks related to ACP sentiment classification and ACP title prediction. The experimental results demonstrate the significant improvements in performance achieved through our innovative approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI