An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data

人工神经网络计算机科学变量（数学）相似性（几何）人工智能机器学习黑匣子原始数据数据挖掘生态学数学生物图像（数学）数学分析程序设计语言

作者

Julian D. Olden,Michael K. Joy,Russell G. Death

出处

期刊：Ecological Modelling [Elsevier BV]
日期：2004-11-01 卷期号：178 (3-4): 389-397 被引量：846

标识

DOI：10.1016/j.ecolmodel.2004.03.013

摘要

Artificial neural networks (ANNs) are receiving greater attention in the ecological sciences as a powerful statistical modeling technique; however, they have also been labeled a “black box” because they are believed to provide little explanatory insight into the contributions of the independent variables in the prediction process. A recent paper published in Ecological Modelling [Review and comparison of methods to study the contribution of variables in artificial neural network models, Ecol. Model. 160 (2003) 249–264] addressed this concern by providing a comprehensive comparison of eight different methodologies for estimating variable importance in neural networks that are commonly used in ecology. Unfortunately, comparisons of the different methodologies were based on an empirical dataset, which precludes the ability to establish generalizations regarding the true accuracy and precision of the different approaches because the true importance of the variables is unknown. Here, we provide a more appropriate comparison of the different methodologies by using Monte Carlo simulations with data exhibiting defined (and consequently known) numeric relationships. Our results show that a Connection Weight Approach that uses raw input-hidden and hidden-output connection weights in the neural network provides the best methodology for accurately quantifying variable importance and should be favored over the other approaches commonly used in the ecological literature. Average similarity between true and estimated ranked variable importance using this approach was 0.92, whereas, similarity coefficients ranged between 0.28 and 0.74 for the other approaches. Furthermore, the Connection Weight Approach was the only method that consistently identified the correct ranked importance of all predictor variables, whereas, the other methods either only identified the first few important variables in the network or no variables at all. The most notably result was that Garson’s Algorithm was the poorest performing approach, yet is the most commonly used in the ecological literature. In conclusion, this study provides a robust comparison of different methodologies for assessing variable importance in neural networks that can be generalized to other data and from which valid recommendations can be made for future studies.

求助该文献

最长约 10秒，即可获得该文献文件

An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data

今日热心研友