随机森林
计算机科学
可视化
成对比较
离群值
数据挖掘
插补(统计学)
人工智能
模式识别(心理学)
回归
数学
统计
机器学习
缺少数据
作者
Jake S. Rhodes,Adele Cutler,Kevin R. Moon
标识
DOI:10.1109/tpami.2023.3263774
摘要
Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise proximities can be computed from a trained random forest and measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization. However, existing definitions of random forest proximities do not accurately reflect the data geometry learned by the random forest. In this paper, we introduce a novel definition of random forest proximities called Random Forest-Geometry- and Accuracy-Preserving proximities (RF-GAP). We prove that the proximity-weighted sum (regression) or majority vote (classification) using RF-GAP exactly matches the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest. We empirically show that this improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry.
科研通智能强力驱动
Strongly Powered by AbleSci AI