稳健性(进化)
高斯分布
插值(计算机图形学)
数学
应用数学
一般化
班级(哲学)
计算机科学
人工智能
数学分析
生物化学
量子力学
运动(物理)
基因
物理
化学
作者
Sébastien Bubeck,Mark Sellke
出处
期刊:Journal of the ACM
[Association for Computing Machinery]
日期:2023-03-21
卷期号:70 (2): 1-18
被引量:5
摘要
Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a partial theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly . Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry (or a mixture thereof). In the case of two-layer neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li, and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.
科研通智能强力驱动
Strongly Powered by AbleSci AI