Course Description
This course covers complexity results for first order and related optimization algorithms. Such algorithms are widely applied to numerous problems in machine learning, signal processing, communications, etc. and have witnessed a surge in popularity since the advent of Big Data around 2005. Wherever possible, we will attempt to follow a generic template for the proofs, thereby unifying a large number of results in the literature.
Course Content
- Introduction: convex functions, black box model
- Basics: dual norm, smoothness and strong convexity
- Gradient descent for smooth functions
- Projected gradient descent
- Subgradient descent
- Frank-Wolfe or conditional gradient descent
- Mirror descent
- Proximal gradient descent
- Accelerated gradient descent for smooth functions
- Dual descent, saddle point method
- Augmented Lagrangian method of multipliers
- Stochastic gradient descent, mini batch SGD
- Variance-reduced SGD
- Other topics as time permits