Computer Science and Engineering

Theory Seminar

Adaptive gradient descent methods for constrained optimization

Alina EneBoston University
SHARE:

Abstract: Adaptive gradient descent methods, such as the celebrated Adagrad algorithm (Duchi, Hazan, and Singer; McMahan and Streeter) and ADAM algorithm (Kingma and Ba), are some of the most popular and influential iterative algorithms for optimizing modern machine learning models. Algorithms in the Adagrad family use past gradients to set their step sizes and are remarkable due to their ability to automatically adapt to unknown problem structures such as (local or global) smoothness and convexity. However, these methods achieve suboptimal convergence guarantees even in the standard setting of minimizing a smooth convex function, and it has been a long-standing open problem to develop an accelerated analogue of Adagrad in the constrained setting.

In this talk, we present one such accelerated adaptive algorithm for constrained convex optimization that simultaneously achieves optimal convergence in the smooth, non-smooth, and stochastic setting, without any knowledge of the smoothness parameters or the variance of the stochastic gradients.

The talk is based on joint work with Huy Nguyen (Northeastern University) and Adrian Vladu (CNRS & IRIF, University de Paris), available here: https://arxiv.org/abs/2007.08840.

Organizer

Greg Bodwin

Organizer

Euiwoong Lee