Acceleration through Spectral Density Estimation
Gradient-based methods like Nesterov acceleration achieve an optimal rate of convergence through knowledge of the Hessian's largest and smallest singular value. This has the drawback that it requires access to these constants which can be costly to estimate, and at the same time does not use other statistics of the spectrum which are much cheaper to compute, like the mean. We derive new methods that achieve acceleration through a model of the Hessian's spectral density. Through a model for the spectral density based on random matrix theory, we provide a cheap approximation to the spectral density that is then used to derive a practical accelerated method. The resulting method can be seen as an instance of momentum gradient descent with iterate averaging. It achieves an empirical rate of convergence that in many cases significantly improves upon Nesterov acceleration and momentum.