随机森林
特征选择
计算机科学
冗余(工程)
水准点(测量)
数据挖掘
块(置换群论)
维数之咒
人工智能
特征提取
特征(语言学)
模式识别(心理学)
数学
哲学
地理
几何学
操作系统
语言学
大地测量学
作者
Pakizah Saqib,Usman Qamar,Reda Ayesha Khan,Andleeb Aslam
标识
DOI:10.23919/icact48636.2020.9061234
摘要
DNA Microarray technology is a valuable advancement in medical field but it gives birth to many challenges like curse of dimensionality, storage and computational requirements. In this paper we have proposed, a multiple filters and GA wrapper based hybrid approach (MF-GARF) that incorporates Random forest as fitness evaluator of features. The proposed hybrid approach MF-GARF is comprised of three phases relevancy block; containing information theory based filters Information Gain, Gain Ratio and Gini Index, responsible for ensuring relevancy and removal of irrelevant and noisy features. Second phase is Redundancy block; incorporating Pearson Correlation statistics to remove redundancy among features, and then final phase Optimization Block; containing Genetic Algorithm wrapper with Random Forest as fitness evaluator, responsible for generating an optimal feature subset with high predictive power. Random Forest with 10-fold cross validation is used to calculate the classification accuracy of selected feature subset. Experiments are carried out on 7 publically available benchmark Microarray cancer datasets and the proposed algorithm has achieved good accuracy with minimal selected features for all datasets. The comparison with other state of the art hybrid techniques validates the effectiveness of our proposed approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI