已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Automated real-world data integration improves cancer outcome prediction

癌症 结果(博弈论) 计算机科学 计算生物学 人工智能 内科学 医学 生物 数学 数理经济学
作者
Justin Jee,Christopher J. Fong,Karl Pichotta,Thinh Ngoc Tran,Anisha Luthra,Michele Waters,Chenlian Fu,Mirella L. Altoé,Siyang Liu,Steven B. Maron,Mehnaj Ahmed,Susie Kim,Mono Pirun,Walid K. Chatila,Ino de Bruijn,Arfath Pasha,Ritika Kundra,Benjamin Groß,Brooke Mastrogiacomo,Tyler Aprati
出处
期刊:Nature [Nature Portfolio]
卷期号:636 (8043): 728-736 被引量:141
标识
DOI:10.1038/s41586-024-08167-5
摘要

The digitization of health records and growing availability of tumour DNA sequencing provide an opportunity to study the determinants of cancer outcomes with unprecedented richness. Patient data are often stored in unstructured text and siloed datasets. Here we combine natural language processing annotations1,2 with structured medication, patient-reported demographic, tumour registry and tumour genomic data from 24,950 patients at Memorial Sloan Kettering Cancer Center to generate a clinicogenomic, harmonized oncologic real-world dataset (MSK-CHORD). MSK-CHORD includes data for non-small-cell lung (n = 7,809), breast (n = 5,368), colorectal (n = 5,543), prostate (n = 3,211) and pancreatic (n = 3,109) cancers and enables discovery of clinicogenomic relationships not apparent in smaller datasets. Leveraging MSK-CHORD to train machine learning models to predict overall survival, we find that models including features derived from natural language processing, such as sites of disease, outperform those based on genomic data or stage alone as tested by cross-validation and an external, multi-institution dataset. By annotating 705,241 radiology reports, MSK-CHORD also uncovers predictors of metastasis to specific organ sites, including a relationship between SETD2 mutation and lower metastatic potential in immunotherapy-treated lung adenocarcinoma corroborated in independent datasets. We demonstrate the feasibility of automated annotation from unstructured notes and its utility in predicting patient outcomes. The resulting data are provided as a public resource for real-world oncologic research. A study generates a clinicogenomics dataset resource, MSK-CHORD, that combines natural language processing-derived clinical annotations with patient medical data from various sources to improve models of cancer outcome.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
2秒前
SciGPT应助垃笔小心采纳,获得10
3秒前
3秒前
L_完成签到 ,获得积分10
4秒前
ding应助shufei采纳,获得10
5秒前
JamesPei应助xaaaa采纳,获得30
6秒前
6秒前
神奇女侠完成签到,获得积分10
7秒前
耍酷乘云发布了新的文献求助10
7秒前
苏梗完成签到 ,获得积分10
7秒前
yeahyeahhh发布了新的文献求助10
8秒前
可爱的函函应助xuzhigang采纳,获得10
10秒前
耿开祥完成签到,获得积分20
11秒前
chen完成签到,获得积分10
12秒前
lisaltp完成签到 ,获得积分10
13秒前
13秒前
April_5发布了新的文献求助10
13秒前
yeahyeahhh完成签到,获得积分20
14秒前
15秒前
16秒前
北觅完成签到 ,获得积分10
19秒前
科研通AI6.2应助tom采纳,获得10
19秒前
xaaaa发布了新的文献求助30
20秒前
少川完成签到 ,获得积分10
21秒前
张辰熙完成签到 ,获得积分10
22秒前
22秒前
CHEN发布了新的文献求助40
23秒前
yesyesok发布了新的文献求助10
23秒前
Gin完成签到 ,获得积分10
24秒前
ThomsonLi6完成签到 ,获得积分10
24秒前
25秒前
Irelia完成签到,获得积分10
26秒前
Hello应助课题分离采纳,获得10
26秒前
28秒前
bkagyin应助科研通管家采纳,获得10
28秒前
隐形曼青应助科研通管家采纳,获得10
29秒前
29秒前
希望天下0贩的0应助loong采纳,获得10
29秒前
无花果应助xaaaa采纳,获得10
29秒前
xuzhigang发布了新的文献求助10
30秒前
高分求助中
Annie Ernaux: De la perte au corps glorieux 600
Petrology and Plate Tectonics,2025 500
A revision of Limenitis helmanni and its related species (Nymphalidae) from Central and South China 400
Moore's Clinically Oriented Anatomy 10th Edition 400
Direct and Iterative Linear System Solvers 400
Cardiopulmonary Bypass and Mechanical Support: Principles and Practice, Fifth Edition 400
Circular Polar Constellations Providing Continuous Single or Multiple Coverage Above a Specified Latitude 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6775843
求助须知:如何正确求助?哪些是违规求助? 8499571
关于积分的说明 18108729
捐赠科研通 6072662
什么是DOI,文献DOI怎么找? 3016321
邀请新用户注册赠送积分活动 1993358
关于科研通互助平台的介绍 1974433