Developing a computable phenotype for glioblastoma

图表诊断代码计算机科学电子健康档案病历健康档案胶质母细胞瘤 F1得分人口非结构化数据医学数据挖掘自然语言处理人工智能医疗保健内科学数学统计大数据环境卫生癌症研究经济经济增长

作者

Sandra C. Yan,Kaitlyn Melnick,Xing He,Tianchen Lyu,Rachel Moor,Megan Still,Duane A. Mitchell,Elizabeth Shenkman,Han Wang,Yi Guo,Jiang Bian,Ashley Ghiaseddin

出处

期刊：Neuro-oncology [Oxford University Press]
日期：2023-12-20 卷期号：26 (6): 1163-1170

链接

nih.govdoi.org

标识

DOI：10.1093/neuonc/noad249

摘要

Abstract Background Glioblastoma is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are nonspecific. The aim of this study was to create a computable phenotype (CP) for glioblastoma multiforme (GBM) from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR). Methods We used the University of Florida (UF) Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords. Results We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule “if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword” demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule. Conclusions We developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance, which minimizes possible biases from misclassification errors.

求助该文献

最长约 10秒，即可获得该文献文件

Developing a computable phenotype for glioblastoma

今日热心研友