特征选择
基因
降维
相互信息
计算机科学
特征(语言学)
基因组
基因选择
维数之咒
选择(遗传算法)
计算生物学
表达式(计算机科学)
维数(图论)
模式识别(心理学)
数据挖掘
人工智能
基因表达
生物
数学
遗传学
微阵列分析技术
哲学
语言学
程序设计语言
纯数学
作者
Nimrita Koul,Sunilkumar S. Manvi
标识
DOI:10.1109/icaecc50550.2020.9339518
摘要
Availability of high through put gene expression data has enabled computational analysis of it for early diagnosis of diseases like cancer. This data contains expression values of thousands of genes in the genome of an organism. However, this gene expression data is very high dimensional, one dimension each corresponding to one genes in the genome and very few of these genes are associated with a disease. At the same time, the number of samples or observations available is very small as compared to the number of features, also this data suffers from class imbalance. Therefore, the task of selecting the genes that are relevant to the disease being studies is an important task and being researched widely in the computational sciences. In this paper, we have proposed a randomized ensemble method for feature selection from cancer gene expression data using a combination of mutual information and recursive feature elimination. The approach has been applied on Leukemia gene expression dataset. We obtained a classification accuracy of 99% with a gene subset of size 316 genes and with a subset of size 4 the accuracy is 95%. Thus we achieved a dimensionality reduction of 98.5% with 99% accuracy. Comparison with standard methods shows that the proposed method performs better.
科研通智能强力驱动
Strongly Powered by AbleSci AI