计算机科学
信息抽取
条件随机场
判决
领域(数学)
语义学(计算机科学)
人工智能
自然语言处理
数据挖掘
情报检索
作者
Zhichen Hu,Xiangben Hu,Lianyong Qi,Shengjun Xue,Xiaolong Xu
标识
DOI:10.1109/dasc-picom-cbdcom-cyberscitech52372.2021.00085
摘要
Because of the high demand for text information, the semantic network is under tremendous pressure to improve semantic accuracy. The domain data set expands the types of entities in order to ensure that entity analysis is performed directly between contexts, thereby saving the time consumption of information search and improving the quality control of key data. However, literature data in the field of geological sedimentology relies mainly on manual annotation, which consumes considerable time. It is still a big problem in reducing human error and dynamically expanding entity classification. For solving these problems, this paper proposes a batch document information extraction method based on sentence part of speech rules (ESM). Technically speaking, nltk (Natural Language Toolkit) is used to identify specific sentence components by adding sedimentology specific semantic rules and Bi-LSTM + CRF (bidirectional Long Short-Term Memory network and conditional random field). Therefore, this paper makes an experimental evaluation to prove the efficiency of ESM.
科研通智能强力驱动
Strongly Powered by AbleSci AI