人工智能
预处理器
计算机科学
特征工程
深度学习
分类器(UML)
自然语言处理
中医药
F1得分
水准点(测量)
编码器
数据预处理
变压器
机器学习
模式识别(心理学)
医学
替代医学
地理
物理
电压
病理
操作系统
量子力学
大地测量学
作者
Liang Yao,Zhe Jin,Chengsheng Mao,Yin Zhang,Yuan Luo
摘要
Abstract Traditional Chinese Medicine (TCM) has been developed for several thousand years and plays a significant role in health care for Chinese people. This paper studies the problem of classifying TCM clinical records into 5 main disease categories in TCM. We explored a number of state-of-the-art deep learning models and found that the recent Bidirectional Encoder Representations from Transformers can achieve better results than other deep learning models and other state-of-the-art methods. We further utilized an unlabeled clinical corpus to fine-tune the BERT language model before training the text classifier. The method only uses Chinese characters in clinical text as input without preprocessing or feature engineering. We evaluated deep learning models and traditional text classifiers on a benchmark data set. Our method achieves a state-of-the-art accuracy 89.39% ± 0.35%, Macro F1 score 88.64% ± 0.40% and Micro F1 score 89.39% ± 0.35%. We also visualized attention weights in our method, which can reveal indicative characters in clinical text.
科研通智能强力驱动
Strongly Powered by AbleSci AI