亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

A large language model–based generative natural language processing framework fine‐tuned on clinical notes accurately extracts headache frequency from electronic health records

人工智能 变压器 医学 计算机科学 自然语言处理 介绍 语言模型 偏头痛 生成模型 背景(考古学) 置信区间 机器学习 生成语法 家庭医学 内科学 物理 古生物学 生物 电压 量子力学
作者
Chia‐Chun Chiang,Man Luo,Gina Dumkrieger,Shubham Trivedi,Yi‐Chieh Chen,Chieh‐Ju Chao,Todd J. Schwedt,Abeed Sarker,Imon Banerjee
出处
期刊:Headache [Wiley]
卷期号:64 (4): 400-409 被引量:6
标识
DOI:10.1111/head.14702
摘要

Abstract Objective To develop a natural language processing (NLP) algorithm that can accurately extract headache frequency from free‐text clinical notes. Background Headache frequency, defined as the number of days with any headache in a month (or 4 weeks), remains a key parameter in the evaluation of treatment response to migraine preventive medications. However, due to the variations and inconsistencies in documentation by clinicians, significant challenges exist to accurately extract headache frequency from the electronic health record (EHR) by traditional NLP algorithms. Methods This was a retrospective cross‐sectional study with patients identified from two tertiary headache referral centers, Mayo Clinic Arizona and Mayo Clinic Rochester. All neurology consultation notes written by 15 specialized clinicians (11 headache specialists and 4 nurse practitioners) between 2012 and 2022 were extracted and 1915 notes were used for model fine‐tuning (90%) and testing (10%). We employed four different NLP frameworks: (1) ClinicalBERT (Bidirectional Encoder Representations from Transformers) regression model, (2) Generative Pre‐Trained Transformer‐2 (GPT‐2) Question Answering (QA) model zero‐shot, (3) GPT‐2 QA model few‐shot training fine‐tuned on clinical notes, and (4) GPT‐2 generative model few‐shot training fine‐tuned on clinical notes to generate the answer by considering the context of included text. Results The mean (standard deviation) headache frequency of our training and testing datasets were 13.4 (10.9) and 14.4 (11.2), respectively. The GPT‐2 generative model was the best‐performing model with an accuracy of 0.92 (0.91, 0.93, 95% confidence interval [CI]) and R 2 score of 0.89 (0.87, 0.90, 95% CI), and all GPT‐2–based models outperformed the ClinicalBERT model in terms of exact matching accuracy. Although the ClinicalBERT regression model had the lowest accuracy of 0.27 (0.26, 0.28), it demonstrated a high R 2 score of 0.88 (0.85, 0.89), suggesting the ClinicalBERT model can reasonably predict the headache frequency within a range of ≤ ± 3 days, and the R 2 score was higher than the GPT‐2 QA zero‐shot model or GPT‐2 QA model few‐shot training fine‐tuned model. Conclusion We developed a robust information extraction model based on a state‐of‐the‐art large language model, a GPT‐2 generative model that can extract headache frequency from EHR free‐text clinical notes with high accuracy and R 2 score. It overcame several challenges related to different ways clinicians document headache frequency that were not easily achieved by traditional NLP models. We also showed that GPT‐2–based frameworks outperformed ClinicalBERT in terms of accuracy in extracting headache frequency from clinical notes. To facilitate research in the field, we released the GPT‐2 generative model and inference code with open‐source license of community use in GitHub. Additional fine‐tuning of the algorithm might be required when applied to different health‐care systems for various clinical use cases.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
JamesPei应助科研通管家采纳,获得10
11秒前
郗妫完成签到,获得积分10
13秒前
1分钟前
郜南烟发布了新的文献求助10
1分钟前
Venus完成签到 ,获得积分10
3分钟前
在水一方应助chenyuns采纳,获得30
3分钟前
JACk完成签到 ,获得积分10
3分钟前
3分钟前
chenyuns发布了新的文献求助30
3分钟前
爱静静应助李伟采纳,获得10
3分钟前
3分钟前
zhangyimg发布了新的文献求助10
4分钟前
4分钟前
郜南烟发布了新的文献求助10
4分钟前
斯文败类应助郜南烟采纳,获得10
4分钟前
思源应助chenyuns采纳,获得20
4分钟前
Akim应助chenyuns采纳,获得20
5分钟前
领导范儿应助圆圆的波仔采纳,获得10
6分钟前
6分钟前
6分钟前
李爱国应助怕孤单的灵寒采纳,获得10
6分钟前
圆圆的波仔完成签到,获得积分10
6分钟前
6分钟前
6分钟前
怕孤单的灵寒完成签到,获得积分20
6分钟前
7分钟前
chenyuns发布了新的文献求助20
7分钟前
7分钟前
CZLhaust发布了新的文献求助10
7分钟前
7分钟前
Sherling发布了新的文献求助10
7分钟前
李爱国应助Sherling采纳,获得10
7分钟前
CZLhaust完成签到,获得积分10
7分钟前
8分钟前
jingjili发布了新的文献求助30
8分钟前
酷波er应助科研通管家采纳,获得10
8分钟前
8分钟前
郜南烟发布了新的文献求助10
8分钟前
8分钟前
chenyuns发布了新的文献求助20
8分钟前
高分求助中
Evolution 10000
Sustainability in Tides Chemistry 2800
The Young builders of New china : the visit of the delegation of the WFDY to the Chinese People's Republic 1000
юрские динозавры восточного забайкалья 800
English Wealden Fossils 700
叶剑英与华南分局档案史料 500
Foreign Policy of the French Second Empire: A Bibliography 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3146771
求助须知:如何正确求助?哪些是违规求助? 2798063
关于积分的说明 7826621
捐赠科研通 2454573
什么是DOI,文献DOI怎么找? 1306394
科研通“疑难数据库(出版商)”最低求助积分说明 627708
版权声明 601527