Extracting longitudinal anticancer treatments at scale using deep natural language processing and temporal reasoning.

医学 人工智能 管道(软件) 自然语言处理 条件随机场 肺癌 队列 癌症 时间轴 机器学习 计算机科学 肿瘤科 内科学 历史 考古 程序设计语言
作者
Meng Ma,Kyeryoung Lee,Yun Mai,Christopher Gilman,Zongzhi Liu,Mingwei Zhang,Minghao Li,Arielle Redfern,Tommy Mullaney,Tony Prentice,Paul McDonagh,Qi Pan,Rong Chen,Eric E. Schadt,Xiaoyan Wang
出处
期刊:Journal of Clinical Oncology [American Society of Clinical Oncology]
卷期号:39 (15_suppl): e18747-e18747
标识
DOI:10.1200/jco.2021.39.15_suppl.e18747
摘要

e18747 Background: Accurate longitudinal cancer treatments are vital for establishing primary endpoints such as outcome as well as for the investigation of adverse events. However, many longitudinal therapeutic regimens are not well captured in structured electronic health records (EHRs). Thus, their recognition in unstructured data such as clinical notes is critical to gain an accurate description of the real-world patient treatment journey. Here, we demonstrate a scalable approach to extract high-quality longitudinal cancer treatments from lung cancer patients' clinical notes using a Bidirectional Long Short Term Memory (BiLSTM) and Conditional Random Fields (CRF) based natural language processing (NLP) pipeline. Methods: The lung cancer (LC) cohort of 4,698 patients was curated from the Mount Sinai Healthcare system (2003-2020). Two domain experts developed a structured framework of entities and semantics that captured treatment and its temporality. The framework included therapy type (chemotherapy, targeted therapy, immunotherapy, etc.), status (on, off, hold, planned, etc.) and temporal reasoning entities and relations (admin_date, duration, etc.) We pre-annotated 149 FDA-approved cancer drugs and longitudinal timelines of treatment on the training corpus. A NLP pipeline was implemented with BiLSTM-CRF-based deep learning models to train and then apply the resulting models to the clinical notes of LC cohort. A postprocessor was developed to subsequently post-coordinate and refine the output. We performed both cross-evaluation and independent evaluation to assess the pipeline performance. Results: We applied the NLP pipeline to the 853,755 clinical notes, and identified 1,155 distinct entities for 194 cancer generic drugs, including 74 chemotherapy drugs, 21 immunotherapy drugs, and 99 targeted therapy drugs. We identified chemotherapy, immunotherapy, or targeted therapy data for 3,509 patients in the LC cohort from the clinical notes. Compared to only 2,395 patients with cancer treatments in structured EHR, this pipeline identified cancer treatments from notes for additional 2,303 patients who did not have any available cancer treatment data in the structured EHR. Our evaluation schema indicates that the longitudinal cancer drug recognition pipeline delivers strong performance (named entity recognization for drugs and temporal: F1 = 95%; drug-temporal relation recognition: F1 = 90%). Conclusions: We developed a high-performance BiLSTM-CRF based NLP pipeline to recognize longitudinal cancer treatments. The pipeline recovers and encodes as twice as many patients with cancer treatments compared with structured EHR. Our study indicates deep NLP with temporal reasoning could substantially accelerate the extraction of treatment profiles at scale. The pipeline is adjustable and can be applied across different cancers.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
虚心白凡关注了科研通微信公众号
刚刚
1秒前
1秒前
VENTUS完成签到,获得积分10
1秒前
1秒前
1秒前
平淡寒烟完成签到 ,获得积分10
1秒前
量子星尘发布了新的文献求助10
1秒前
Zq发布了新的文献求助10
2秒前
2秒前
2秒前
月下独酌完成签到,获得积分10
2秒前
leehoo完成签到,获得积分10
3秒前
linfordlu完成签到,获得积分0
4秒前
4秒前
Lucky完成签到 ,获得积分10
4秒前
zhangkx23完成签到,获得积分10
5秒前
小杨完成签到,获得积分10
5秒前
gyq完成签到,获得积分10
6秒前
lyy发布了新的文献求助10
7秒前
bybyby发布了新的文献求助10
7秒前
无限的可乐完成签到,获得积分10
7秒前
JACK发布了新的文献求助10
7秒前
开心果大王完成签到,获得积分10
8秒前
qinkoko完成签到,获得积分10
8秒前
油菜花完成签到,获得积分10
8秒前
魅雪霓完成签到,获得积分10
9秒前
百樗百完成签到,获得积分10
9秒前
1111完成签到,获得积分10
9秒前
林中白狼完成签到,获得积分10
9秒前
夏天完成签到,获得积分10
10秒前
gzll发布了新的文献求助10
10秒前
zhuwjun完成签到,获得积分10
11秒前
你好啊完成签到,获得积分10
11秒前
虚心白凡发布了新的文献求助10
11秒前
nc完成签到,获得积分10
11秒前
12秒前
哈哈哈完成签到 ,获得积分10
12秒前
皮皮虾完成签到,获得积分10
12秒前
贪玩的蛋挞完成签到,获得积分20
13秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Handbook of pharmaceutical excipients, Ninth edition 5000
Aerospace Standards Index - 2026 ASIN2026 3000
Signals, Systems, and Signal Processing 610
Discrete-Time Signals and Systems 610
Principles of town planning : translating concepts to applications 500
Social Work and Social Welfare: An Invitation(7th Edition) 410
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6059219
求助须知:如何正确求助?哪些是违规求助? 7891832
关于积分的说明 16297633
捐赠科研通 5203470
什么是DOI,文献DOI怎么找? 2783957
邀请新用户注册赠送积分活动 1766631
关于科研通互助平台的介绍 1647165