人工智能
变压器
医学
计算机科学
自然语言处理
介绍
语言模型
偏头痛
生成模型
背景(考古学)
置信区间
机器学习
生成语法
家庭医学
内科学
物理
古生物学
生物
电压
量子力学
作者
Chia‐Chun Chiang,Man Luo,Gina Dumkrieger,Shubham Trivedi,Yi‐Chieh Chen,Chieh‐Ju Chao,Todd J. Schwedt,Abeed Sarker,Imon Banerjee
出处
期刊:Headache
[Wiley]
日期:2024-03-25
卷期号:64 (4): 400-409
被引量:6
摘要
Abstract Objective To develop a natural language processing (NLP) algorithm that can accurately extract headache frequency from free‐text clinical notes. Background Headache frequency, defined as the number of days with any headache in a month (or 4 weeks), remains a key parameter in the evaluation of treatment response to migraine preventive medications. However, due to the variations and inconsistencies in documentation by clinicians, significant challenges exist to accurately extract headache frequency from the electronic health record (EHR) by traditional NLP algorithms. Methods This was a retrospective cross‐sectional study with patients identified from two tertiary headache referral centers, Mayo Clinic Arizona and Mayo Clinic Rochester. All neurology consultation notes written by 15 specialized clinicians (11 headache specialists and 4 nurse practitioners) between 2012 and 2022 were extracted and 1915 notes were used for model fine‐tuning (90%) and testing (10%). We employed four different NLP frameworks: (1) ClinicalBERT (Bidirectional Encoder Representations from Transformers) regression model, (2) Generative Pre‐Trained Transformer‐2 (GPT‐2) Question Answering (QA) model zero‐shot, (3) GPT‐2 QA model few‐shot training fine‐tuned on clinical notes, and (4) GPT‐2 generative model few‐shot training fine‐tuned on clinical notes to generate the answer by considering the context of included text. Results The mean (standard deviation) headache frequency of our training and testing datasets were 13.4 (10.9) and 14.4 (11.2), respectively. The GPT‐2 generative model was the best‐performing model with an accuracy of 0.92 (0.91, 0.93, 95% confidence interval [CI]) and R 2 score of 0.89 (0.87, 0.90, 95% CI), and all GPT‐2–based models outperformed the ClinicalBERT model in terms of exact matching accuracy. Although the ClinicalBERT regression model had the lowest accuracy of 0.27 (0.26, 0.28), it demonstrated a high R 2 score of 0.88 (0.85, 0.89), suggesting the ClinicalBERT model can reasonably predict the headache frequency within a range of ≤ ± 3 days, and the R 2 score was higher than the GPT‐2 QA zero‐shot model or GPT‐2 QA model few‐shot training fine‐tuned model. Conclusion We developed a robust information extraction model based on a state‐of‐the‐art large language model, a GPT‐2 generative model that can extract headache frequency from EHR free‐text clinical notes with high accuracy and R 2 score. It overcame several challenges related to different ways clinicians document headache frequency that were not easily achieved by traditional NLP models. We also showed that GPT‐2–based frameworks outperformed ClinicalBERT in terms of accuracy in extracting headache frequency from clinical notes. To facilitate research in the field, we released the GPT‐2 generative model and inference code with open‐source license of community use in GitHub. Additional fine‐tuning of the algorithm might be required when applied to different health‐care systems for various clinical use cases.
科研通智能强力驱动
Strongly Powered by AbleSci AI