Radiomics feature analysis and model research for predicting histopathological subtypes of non‐small cell lung cancer on CT images: A multi‐dataset study

无线电技术 人工智能 医学影像学 特征(语言学) 计算机科学 肺癌 计算机断层摄影术 放射科 医学物理学 癌症 医学 病理 内科学 哲学 语言学
作者
Fan Song,Xiao Song,Youdan Feng,Guangda Fan,Yangyang Sun,Peng Zhang,Jinkai Li,Fei Liu,Guanglei Zhang
出处
期刊:Medical Physics [Wiley]
卷期号:50 (7): 4351-4365 被引量:19
标识
DOI:10.1002/mp.16233
摘要

Classifying the subtypes of non-small cell lung cancer (NSCLC) is essential for clinically adopting optimal treatment strategies and improving clinical outcomes, but the histological subtypes are confirmed by invasive biopsy or post-operative examination at present. Based on multi-center data, this study aimed to analyze the importance of extracted CT radiomics features and develop the model with good generalization performance for precisely distinguishing major NSCLC subtypes: adenocarcinoma (ADC) and squamous cell carcinoma (SCC).We collected a multi-center CT dataset with 868 patients from eight international databases on the cancer imaging archive (TCIA). Among them, patients from five databases were mixed and split to training and test sets (560:140). The remaining three databases were used as independent test sets: TCGA set (n = 97) and lung3 set (n = 71). A total of 1409 features containing shape, intensity, and texture information were extracted from tumor volume of interest (VOI), then the ℓ2,1 -norm minimization was used for feature selection and the importance of selected features was analyzed. Next, the prediction and generalization performance of 130 radiomics models (10 common algorithms and 120 heterogeneous ensemble combinations) were compared by the average AUC value on three test sets. Finally, predictive results of the optimal model were shown.After feature selection, 401 features were obtained. Features of intensity, texture GLCM, GLRLM, and GLSZM had higher classification weight coefficients than other features (shape, texture GLDM, and NGTDM), and the filtered image features exhibited significant importance than original image features (p-value = 0.0210). Moreover, five ensemble learning algorithms (Bagging, AdaBoost, RF, XGBoost, GBDT) had better generalization performance (p-value = 0.00418) than other non-ensemble algorithms (MLP, LR, GNB, SVM, KNN). The Bagging-AdaBoost-SVM model had the highest AUC value (0.815 ± 0.010) on three test sets. It obtained AUC values of 0.819, 0.823, and 0.804 on test set, TCGA set and lung3 set, respectively.Our multi-dataset study showed that intensity features, texture features (GLCM, GLRLM, and GLSZM) and filtered image features were more important for distinguishing ADCs from SCCs. The method of ensemble learning can improve the prediction and generalization performance on the complicated multi-center data. The Bagging-AdaBoost-SVM model had the strongest generalization performance, and it showed promising clinical value for non-invasively predicting the histopathological subtypes of NSCLC.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
老喻完成签到,获得积分10
刚刚
2秒前
极乐鸟发布了新的文献求助10
4秒前
沫沫完成签到 ,获得积分0
5秒前
5秒前
105完成签到 ,获得积分0
8秒前
wmc1357发布了新的文献求助10
11秒前
yuxi2025完成签到 ,获得积分10
16秒前
小爱完成签到,获得积分10
18秒前
极乐鸟完成签到,获得积分20
19秒前
搜集达人应助狂野灵波采纳,获得10
20秒前
吴谷杂粮完成签到 ,获得积分10
21秒前
晚意完成签到 ,获得积分10
21秒前
22秒前
任性的思远完成签到 ,获得积分10
23秒前
jinjing完成签到,获得积分10
26秒前
zhang完成签到 ,获得积分10
26秒前
s_yu完成签到,获得积分10
27秒前
flj7038完成签到,获得积分10
28秒前
29秒前
clm完成签到 ,获得积分10
29秒前
搜集达人应助cheng采纳,获得10
31秒前
年轻花卷完成签到,获得积分10
31秒前
laohu完成签到,获得积分10
31秒前
萧幻枫完成签到 ,获得积分10
34秒前
灵巧的长颈鹿完成签到,获得积分10
34秒前
38秒前
呼呼完成签到,获得积分10
38秒前
L_完成签到 ,获得积分10
40秒前
cheng发布了新的文献求助10
42秒前
43秒前
cdercder应助科研通管家采纳,获得10
44秒前
无极微光应助科研通管家采纳,获得20
44秒前
cdercder应助科研通管家采纳,获得10
44秒前
cdercder应助科研通管家采纳,获得10
44秒前
46秒前
拉长的芷烟完成签到 ,获得积分10
47秒前
伶俐书蝶完成签到 ,获得积分10
48秒前
jeery完成签到 ,获得积分10
52秒前
ira完成签到,获得积分10
1分钟前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Burger's Medicinal Chemistry and Drug Discovery 400
A Step-by-Step Guide to Qualitative Data Coding 2nd Edition 400
Impact of Storage Orientation and Duration on Prefilled Syringe Performance: Break-Loose and Glide Forces, and Injection Time Across Multiple Time Points 360
Programming for Chemical Engineers Using C, C++, and MATLAB 300
Upland Kenya wild flowers and ferns: a flora of the flowers, ferns, grasses, and sedges of highland Kenya 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6662938
求助须知:如何正确求助?哪些是违规求助? 8413037
关于积分的说明 17984348
捐赠科研通 5866763
什么是DOI,文献DOI怎么找? 2974939
邀请新用户注册赠送积分活动 1950845
关于科研通互助平台的介绍 1876490