A feature-fusion framework of clinical, genomics, and histopathological data for METABRIC breast cancer subtype classification

亚型支持向量机乳腺癌雅卡索引人工智能随机森林人口模式识别（心理学）机器学习计算机科学医学癌症内科学环境卫生程序设计语言

作者

Ala’a El-Nabawy,Nashwa El-Bendary,Nahla A. Belal

出处

期刊：Applied Soft Computing [Elsevier]
日期：2020-06-01 卷期号：91: 106238-106238 被引量：22

标识

DOI：10.1016/j.asoc.2020.106238

摘要

Breast cancer is the most common cancer type attacking women worldwide. Also, breast cancer has been phenotypically classified into five subtypes. Each subtype group has unique characteristics that demonstrate the heterogeneity present within the breast cancer tumour. In 2012, the American Association for Cancer Research provided a population based molecular integrative clusters for the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) dataset, resulting in ten subtypes. Previous work on the METABRIC dataset used only gene expression data to figure out the effective genes for each subtype, without applying integration to benefit from all data sources. The objective of this paper is to present a breast cancer subtype classification model that applies feature fusion on the METABRIC datasets, namely clinical, gene expression, Copy Number Aberrations (CNA), Copy Number Variations (CNV), and histopathological images. State-of-the-art machine learning classifiers were applied on different data profiles, including Linear-SVM, Radial-SVM, Random Forests (RF), Ensemble SVM (E-SVM), and Boosting. The highest accuracy achieved for IntClust subtyping was 88.36% using Linear-SVM, applied on the data profile with features fused from the clinical, gene expression, CNA, and CNV datasets, with a Jaccard and Dice scores of 0.802 and 0.8835, respectively. On the other hand, for the Pam50 subtyping, an accuracy of 97.1% was achieved, Jaccard score ranging from 0.9439 to 0.9472, and Dice score of 0.971, using Linear-SVM and E-SVM classifiers, with several data profiles that include features from histopathological images. Conclusively, the significance of our study is to validate that using feature fusion from various METABRIC datasets improves breast cancer subtypes classification performance. Moreover, histopathological images give promising results on Pam50 subtypes, and it is expected to improve the accuracy for IntClust subtyping when applied on a higher population.

求助该文献

最长约 10秒，即可获得该文献文件

A feature-fusion framework of clinical, genomics, and histopathological data for METABRIC breast cancer subtype classification

今日热心研友