作者
Qifeng Lou,Shutong Wang,Jiahao Chen,Dongmei Mu,Ying Wang,Lili Huang
摘要
Named entity recognition of Traditional Chinese Medicine cases plays an important position in TCM text mining. In this research , RoBERTa-BiLSTM-CRF model is constructed to realize the named entity recognition task of TCM cases text. With RoBERTa as the pre-training model, BiLSTM as the feature extractor, and CRF as the sequence annotation, the recognition of six entity named entity types, namely, symptom, tongue diagnosis, pulse diagnosis, prescription, dialectic, and Chinese medicine, is realized by manually annotating the corpus set. After iterative training of the model, the accuracy of the comprehensive experimental results was 96. 24% for accuracy, 83. 51% for precision, 88. 39% for recall, and 85. 88% for F-value; In each classification task, the accuracy rate of symptom was 79. 16%, the accuracy rate of T tongue diagnosis was 64. 59%, the accuracy rate of pulse diagnosis was 61. 83%, the accuracy rate of prescription was 90. 35%, the accuracy rate of dialectic was 77. 94%, and the accuracy rate of Chinese medicine was 98. 02%. Named entity recognition using RoBERTa-BiLSTM-CRF model provides effective support for TCM knowledge discovery, construction of knowledge graphs in TCM field and assisting physicians to utilize the potential application values in Traditional Chinese Medicine cases more effectively.