Adaptive energy-based gradient methods for large-scale optimization and data-driven discovery of dynamical systems via neural networks

人工神经网络计算机科学动力系统理论人工智能最优控制深度学习机器学习趋同（经济学）随机优化理论（学习稳定性）水准点（测量）数学优化数学物理量子力学大地测量学地理经济经济增长

作者

Xuping Tian

链接

iastate.edudoi.org

标识

DOI：10.31274/td-20240617-16

摘要

Machine learning and data science have revolutionized numerous scientific and engineering domains, promising a renaissance in complex data analysis and understanding. This thesis addresses two critical challenges at the forefront of these fields: (1) developing efficient optimization methods for training large-scale machine learning models, and (2) the discovery of dynamical systems from observational data. To tackle the first challenge, we introduce a new family of gradient-based optimization methods. These methods employ an adaptive energy-based strategy, ensuring unconditional energy stability regardless of the step size (learning rate) value. We provide convergence analyses for both deterministic and stochastic settings, with particular emphasis placed on the SGEM (Stochastic Gradient with Energy and Momentum) method, notable for its incorporation of momentum acceleration. Experimental results on benchmark deep learning problems demonstrate SGEM's rapid convergence and superior generalization capabilities. Furthermore, we investigate the dynamic behavior of a deterministic variant of SGEM through the lens of limiting Ordinary Differential Equations (ODEs). Our results illuminate the impact of momentum and step size on the stability and convergence of discrete schemes. Addressing the second challenge, we propose a data-driven optimal control approach for learning system parameters. This approach is subsequently extended to encompass the learning of the entire governing function by incorporating neural network approximation into the framework. Specifically, we exemplify the data-driven optimal control approach by learning the parameters of the Susceptible-Exposed-Infectious-Recovered (SEIR) model from reported COVID-19 data. The Optimal Control Neural Networks (OCN) framework is demonstrated through its application to a gradient flow system. The training process of the neural networks is meticulously designed using the adjoint method alongside symplectic ODE solvers. Numerical experiments on several canonical systems validate the OCN framework. In summary, this research contributes to the advancement of both the theoretical understanding and practical applications of large-scale optimization in machine learning, as well as the data-driven discovery of dynamical systems.

求助该文献

Adaptive energy-based gradient methods for large-scale optimization and data-driven discovery of dynamical systems via neural networks

今日热心研友