计算机科学
自然语言处理
系列(地层学)
实体链接
人工智能
知识库
古生物学
生物
作者
Xiaotong Wang,X M Liu,Shuai Zhong,X. R. Chen,Bin Wu
标识
DOI:10.1145/3627673.3679917
摘要
In the field of ancient Chinese text, extracting and analysing temporal and geographic information are crucial for understanding the personal experiences of historical figures, the development of historical events, and the overall historical background. Currently, named entity recognition(NER) strategies such as BERT+CRF are used to extract temporal and geographic information from ancient Chinese text. However, ancient Chinese text covers a vast time span, and the temporal and geographic entities constantly evolve and change, making it difficult to extract these entities from text. This paper proposes a temporal and geographic extraction model for ancient Chinese text, enhanced by time-series external knowledge base. The extraction of proprietary nouns and general structures are divided into two independent networks. An external database is applied to enhance extraction of proprietary nouns and reduce noise for general structure inference. We constructed address trees and chronological tables containing commonly used places and time-related keywords from different periods and collected 12,000 texts spanning 3,000 years for extensive training. Overall, our research highlights the importance of external knowledge base for ancient Chinese NER, and provides new ideas for research in related fields.
科研通智能强力驱动
Strongly Powered by AbleSci AI