Gradient Flow Through Diagram Expansions: Learning Regimes and Explicit Solutions
Gradient descent is a fundamental algorithm in machine learning, and the dynamics of gradient flows in large-scale problems are of significant interest. We aim to develop a new method for studying these dynamics, based on expanding the loss function as a power series in time. Under standard Gaussian model initialization, the coefficients of this expansion can be described using Wick's theorem in terms of certain diagrams analogous to Feynman diagrams. Then, by passing to the large-model-size limit, one can obtain various formal limits of this expansion, depending on the relative scaling of the problem parameters. These limits can be connected to different qualitative regimes of learning, such as NTK, mean-field, underparameterized learning, free evolution, etc. Moreover, in some cases the resulting limiting expansions admit formal summation, yielding an explicit analytical formula for the dynamics. To do this, we write recurrence relations between the coefficients as a partial differential equation, and, when it is first order, solve it using the method of characteristics. In the problem of order-4 tensor factorization, this integration yields an explicit analytical function defined for negative times, that is, for "gradient ascent." The solution shows that there are two distinct ascent regimes, convergent and divergent, and provides a concrete quantitative criterion separating them. Overall, the theory in its current form raises many mathematical questions, but the results obtained agree well with numerical experiments. This work was carried out jointly with E. Golikov and Y. Gusev.
Bio: Dmitry Yarotsky is a mathematician working at the interface of approximation theory, machine learning, and mathematical physics. He is affiliated with the Steklov Institute of Mathematics in Moscow. In learning theory, he is especially known for sharp results on the expressive power of deep ReLU networks, including error bounds, lower bounds, and depth-separation phenomena. His influential works include Error bounds for approximations with deep ReLU networks and Optimal approximation of continuous functions by very deep ReLU networks, which study how network depth and size affect approximation rates for smooth and continuous functions.

