可解释性
计算机科学
导线
规范化(社会学)
人工神经网络
图层(电子)
人工智能
计算
空格(标点符号)
编码
模式识别(心理学)
机器学习
算法
社会学
操作系统
人类学
有机化学
化学
地理
基因
生物化学
大地测量学
作者
Zhi Chen,Yijie Bei,Cynthia Rudin
标识
DOI:10.1038/s42256-020-00265-z
摘要
What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can either be misleading, unusable, or rely on the latent space to possess properties that it may not have. In this work, rather than attempting to analyze a neural network posthoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a CNN, the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us a much clearer understanding for how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens) the latent space. CW can be used in any layer of the network without hurting predictive performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI