电子病历
人工智能
病历
前提
支持向量机
机器学习
中心(范畴论)
计算机科学
鉴定(生物学)
自然语言处理
情报检索
医学
化学
哲学
放射科
植物
生物
互联网隐私
语言学
结晶学
作者
Meng Jin,Kai Zhang,Yunhaonan Yang,Shuanglian Xie,Kai Song,Yonghua Hu,Xiaoyuan Bao
标识
DOI:10.1109/icbk.2019.00023
摘要
The premise of the full use of unstructured electronic medical records is to maintain the fully protection of a patient's information privacy. Presently, in prior of processing the electronic medical record date, identification and removing of relevant information which can be used to identify a patient is a research hotspot nowadays. There are very few methods in de-identification of Chinese electronic medical records and their cross-center performance is poor. Therefore we develop a de-identification method which is a mixture of rule-based methods and machine learning methods. The method was tested on 700 electronic medical records from six hospitals. Five-fold cross test was used to evaluate the results of c5.0, Random Forest, SVM and XGBOOST. Leave-one-out test was used to evaluate CRF. And the F1 Measure of machine learning reached 91.18% in PHI_Names, 98.21% in PHI_MEDICALID, 95.74% in PHI_OTHERNFC, 97.14% in PHI_GEO, 89.19% in PHI_DATES, and 91.49% in PHI_TEL. And the F1 Measure of rule-based methods reached 93.00% in PHI_Names, 97.00% in PHI_MEDICALID, 97.00% in PHI_OTHERNFC, 97.00% in PHI_GEO, 96.00% in PHI_DATES, and 89.00% in PHI_TEL.
科研通智能强力驱动
Strongly Powered by AbleSci AI