计算机科学
条件随机场
人工智能
命名实体识别
自然语言处理
构造(python库)
注释
信息抽取
特征工程
深度学习
领域(数学分析)
自然语言
情报检索
数据科学
数学分析
数学
管理
程序设计语言
经济
任务(项目管理)
作者
Qinjun Qiu,Miao Tian,Zhen Huang,Zhong Xie,Kai Ma,Liufeng Tao,Dexin Xu
标识
DOI:10.1016/j.eswa.2023.121925
摘要
The engineering geology report serves as a comprehensive portrayal of the geological conditions and information within a surveyed region, making it highly valuable for extracting and mining engineering geology-related knowledge. Geological Named Entity Recognition (GNER), as a pivotal technology for information extraction and knowledge discovery, aims to identify geological objects that convey significant meanings within textual data. While general NER tools and existing approaches are commonly employed for recognizing generic entities, their effectiveness is constrained by the diverse language irregularities inherent in natural language texts, including nested entities, lengthy entities, and a scarcity of domain-specific annotated corpora. Adhering to established standards and principles governing engineering geology reports, we undertake an analysis of text structures and characteristics, as well as the linguistic descriptions and data attributes. By employing an Electronic Design Automation (EDA) enhancement method in conjunction with manual annotation, we construct an engineering GNER dataset. To address these linguistic irregularities, we propose a novel deep learning model that combines both the geological pre-training model (GeoBERT) and multiple features (pinyin, radical, and position vectors) to generate representations from byte sequences. These representations are subsequently fused and passed through a BiLSTM-Attention model for training. Finally, entity classification results are obtained using conditional random fields (CRF). Experimental evaluation demonstrates that the proposed model achieves an impressive F1 value of 79.60% on the constructed datasets, outperforming ten baseline models analyzed in this study.
科研通智能强力驱动
Strongly Powered by AbleSci AI