人工神经网络
计算机科学
动力系统理论
人工智能
最优控制
深度学习
机器学习
趋同(经济学)
随机优化
理论(学习稳定性)
水准点(测量)
数学优化
数学
物理
量子力学
大地测量学
地理
经济
经济增长
标识
DOI:10.31274/td-20240617-16
摘要
Machine learning and data science have revolutionized numerous scientific and engineering domains, promising a renaissance in complex data analysis and understanding. This thesis addresses two critical challenges at the forefront of these fields: (1) developing efficient optimization methods for training large-scale machine learning models, and (2) the discovery of dynamical systems from observational data. To tackle the first challenge, we introduce a new family of gradient-based optimization methods. These methods employ an adaptive energy-based strategy, ensuring unconditional energy stability regardless of the step size (learning rate) value. We provide convergence analyses for both deterministic and stochastic settings, with particular emphasis placed on the SGEM (Stochastic Gradient with Energy and Momentum) method, notable for its incorporation of momentum acceleration. Experimental results on benchmark deep learning problems demonstrate SGEM's rapid convergence and superior generalization capabilities. Furthermore, we investigate the dynamic behavior of a deterministic variant of SGEM through the lens of limiting Ordinary Differential Equations (ODEs). Our results illuminate the impact of momentum and step size on the stability and convergence of discrete schemes. Addressing the second challenge, we propose a data-driven optimal control approach for learning system parameters. This approach is subsequently extended to encompass the learning of the entire governing function by incorporating neural network approximation into the framework. Specifically, we exemplify the data-driven optimal control approach by learning the parameters of the Susceptible-Exposed-Infectious-Recovered (SEIR) model from reported COVID-19 data. The Optimal Control Neural Networks (OCN) framework is demonstrated through its application to a gradient flow system. The training process of the neural networks is meticulously designed using the adjoint method alongside symplectic ODE solvers. Numerical experiments on several canonical systems validate the OCN framework. In summary, this research contributes to the advancement of both the theoretical understanding and practical applications of large-scale optimization in machine learning, as well as the data-driven discovery of dynamical systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI