Zoom(Password if needed: MichiganAI)
Randomized algorithms play a central role in analyzing high-dimensional data and training machine learning models. This includes stochastic optimization algorithms, as well as randomized sketching and sampling methods for dimensionality reduction. While providing remarkable computational gains, algorithmic randomness has another important consequence, namely, transforming complex datasets into data distributions which, in a sense, smooth out their higher-order structure. This effect — which we refer to as Algorithmic Gaussianization since the prime example of it is the classical Gaussian embedding — provides both benefits and challenges in utilizing randomized algorithms. On the one hand, it unlocks a wealth of theoretical insights into the performance of those algorithms, such as convergence rates and bias-variance trade-offs, which help practitioners in designing an effective machine learning pipeline. On the other hand, higher-order properties, such as a clustering structure, can be crucial to understanding complex data and models, and this information may not be preserved by randomized algorithms that Gaussianize the data. In this talk, I will introduce the concept of Algorithmic Gaussianization, focusing on the settings of stochastic optimization and dimensionality reduction as motivating examples. Along the way, I will highlight two families of randomized algorithms with Gaussianizing properties: (1) a sketching method called Leverage Score Sparsified (LESS) embeddings; and (2) an importance sampling method called Determinantal Point Processes (DPP).
Michał Dereziński is an Assistant Professor at U-M CSE. Previously, he was a postdoctoral fellow in the Department of Statistics at the University of California, Berkeley. Previously, he was a research fellow at the Simons Institute for the Theory of Computing (Fall 2018, Foundations of Data Science program). He obtained his Ph.D. in Computer Science at the University of California, Santa Cruz, advised by professor Manfred Warmuth, where he received the Best Dissertation Award for his work on sampling methods in statistical learning. Michał’s current research is focused on developing scalable randomized algorithms with robust statistical guarantees for machine learning, data science and optimization. His work on reducing the cost of interpretability in dimensionality reduction received the Best Paper Award at the Thirty-fourth Conference on Neural Information Processing Systems. More information is available at: https://users.soe.ucsc.edu/~mderezin/.