Convergence Analysis for the Adam Optimizer

Date: 
Friday, 1 August, 2025 - 15:00 - 16:00
Venue: 
LSB 222
Seminar Type: 
Seminar
Speaker Name: 
Prof. Arnulf JENTZEN
Affiliation: 
The Chinese University of Hong Kong, Shenzhen (CUHK-SZ) / University of Münster
Abstract: 

Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems. In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods such as the famous Adam optimizer are applied. In this work we establishing optimal convergence rates for the Adam optimizer covering the situation of strongly convex stochastic optimization problems (SOPs). The key ingredient of our convergence analysis is a new vector field function which we propose to refer to as the Adam vector field. This Adam vector field accurately describes the macroscopic behaviour of the Adam optimization process but differs from the negative gradient of the objective function (the function we intend to minimize) of the considered stochastic optimization problem. In particular, for a class of simple quadratic SOPs we disprove that Adam converges to critical points of the objective function (zeros of the gradient of the objective function) of the considered optimization problem. We also establish higher order convergence rates and advanced stability properties for Adam. The talk is based on joint works with Steffen Dereich, Thang Do, Robin Graeber, and Adrian Riekert.

References:

[1] S. Dereich & A. Jentzen, Convergence rates for the Adam optimizer, arXiv:2407.21078 (2024), 43 pages.

[2] S. Dereich, R. Graeber, & A. Jentzen, Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates, arXiv:2407.08100 (2024), 54 pages.

[3] Dereich, S., Jentzen, A., and Riekert, A., Sharp higher order convergence rates for the Adam optimizer, arXiv:2504.19426 (2025), 27 pages.

[4] T. Do, A. Jentzen, & A. Riekert, Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks, arXiv:2503.01660 (2025), 42 pages.

[5] A. Jentzen & A. Riekert, Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks, arXiv:2402.05155 (2024), 36 pages, to appear in SIAM/ASA J. Uncertain. Quantif.