人工神经网络
计算机科学
高斯分布
非线性系统
班级(哲学)
透视图(图形)
信号(编程语言)
人工智能
障碍物
深层神经网络
算法
数学
物理
量子力学
程序设计语言
法学
政治学
作者
Yao Lu,Stephen Jay Gould,Thalaiyasingam Ajanthan
标识
DOI:10.1016/j.neunet.2023.08.017
摘要
The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian–Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.
科研通智能强力驱动
Strongly Powered by AbleSci AI