计算机科学
人工智能
人工神经网络
网(多面体)
MNIST数据库
决策树
机器学习
班级(哲学)
集合(抽象数据类型)
树(集合论)
黑匣子
光学(聚焦)
深度学习
斜格
模式识别(心理学)
数学
数学分析
语言学
哲学
物理
几何学
光学
程序设计语言
作者
Suryabhan Singh Hada,Miguel Á. Carreira-Perpiñán,Arman Zharmagambetov
标识
DOI:10.1007/s10618-022-00892-7
摘要
The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding which of the internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the neural net with an oblique decision tree having sparse weight vectors at the decision nodes. Using the recently proposed Tree Alternating Optimization (TAO) algorithm, we are able to learn trees that are both highly accurate and interpretable. Such trees can faithfully mimic the part of the neural net they replaced, and hence they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. These insights and manipulations apply globally to the entire training and test set, not just at a local (single-instance) level. We demonstrate this robustly in the MNIST and ImageNet datasets with LeNet5 and VGG networks.
科研通智能强力驱动
Strongly Powered by AbleSci AI