计算机科学
安全性令牌
人工智能
自然语言处理
性格(数学)
词(群论)
语言模型
光学(聚焦)
自然语言理解
自然语言
语言学
物理
哲学
几何学
光学
计算机安全
数学
作者
Lifan Han,Xin Wang,Meng Wang,Li Zhao,Heyi Zhang,Zirui Chen,Xiaowang Zhang
标识
DOI:10.1007/978-3-031-44696-2_59
摘要
In recent years, Chinese pre-training language models have achieved significant improvements in the fields, such as natural language understanding (NLU) and text generation. However, most of these existing pre-trained language models focus on modern Chinese but ignore the rich semantic information embedded for Chinese characters, especially the radical information. To this end, we present RAC-BERT, a language-specific BERT model for ancient Chinese. Specifically, we propose two new radical-based pre-training tasks, which are: (1) replacing the masked tokens with random words of the same radical, that can mitigate the gap between the pre-training and fine-tuning stages; (2) predicting the radical of the masked token, not the original word, that reduces the computational effort. Extensive experiments were conducted on two ancient Chinese NLP datasets. The results show that our model significantly outperforms the state-of-the-art models on most tasks. And we conducted ablation experiments to demonstrate the effectiveness of our approach. The pre-trained model are publicly available at https://github.com/CubeHan/RAC-BERT
科研通智能强力驱动
Strongly Powered by AbleSci AI