随机森林
简单随机抽样
特征(语言学)
计算机科学
分层抽样
采样(信号处理)
节点(物理)
数据挖掘
统计
模式识别(心理学)
人工智能
数学
工程类
滤波器(信号处理)
人口学
哲学
计算机视觉
结构工程
人口
社会学
语言学
作者
Debopriya Ghosh,Javier Cabrera
标识
DOI:10.1109/tcbb.2021.3089417
摘要
Ensemble methods such as random forest works well on high-dimensional datasets. However, when the number of features is extremely large compared to the number of samples and the percentage of truly informative feature is very small, performance of traditional random forest decline significantly. To this end, we develop a novel approach that enhance the performance of traditional random forest by reducing the contribution of trees whose nodes are populated with less informative features. The proposed method selects eligible subsets at each node by weighted random sampling as opposed to simple random sampling in traditional random forest. We refer to this modified random forest algorithm as "Enriched Random Forest". Using several high-dimensional micro-array datasets, we evaluate the performance of our approach in both regression and classification settings. In addition, we also demonstrate the effectiveness of balanced leave-one-out cross-validation to reduce computational load and decrease sample size while computing feature weights. Overall, the results indicate that enriched random forest improves the prediction accuracy of traditional random forest, especially when relevant features are very few.
科研通智能强力驱动
Strongly Powered by AbleSci AI