Faculty Candidate Seminar
Opening the Black Box: Towards Theoretical Understanding of Deep Learning
This event is free and open to the publicAdd to Google Calendar
Abstract: Despite the phenomenal empirical successes of deep learning in many application domains, its underlying mathematical mechanisms remain poorly understood. Mysteriously, deep neural networks in practice can often fit training data almost perfectly and generalize remarkably well to unseen test data, despite highly non-convex optimization landscapes and significant over-parameterization. A solid theory not only can help us understand such mysteries, but also will be the key to improving the practice of deep learning and making it more principled, reliable, and easy-to-use.
In this talk, I will present our recent progress on building the theoretical foundations of deep learning, by opening the black box of the interactions among data, model architecture, and training algorithm. First, I will show that gradient descent on deep linear neural networks induces an implicit bias towards low-rank solutions, which leads to an improved method for the classical low-rank matrix completion problem. Next, turning to nonlinear deep neural networks, I will talk about a line of studies on wide neural networks, where by drawing a connection to the neural tangent kernels, we can answer various questions such as how training loss is minimized, why trained network can generalize well, and why certain component in the network architecture is useful; we also use theoretical insights to design a new simple and effective method for training on noisily labeled datasets. In closing, I will discuss key questions going forward towards building practically relevant theoretical foundations of modern machine learning.
Bio: Wei Hu is a PhD candidate in the Department of Computer Science at Princeton University, advised by Sanjeev Arora. Previously, he obtained his bachelor’s degree in Computer Science from Tsinghua University. He has also spent time as a research intern at research labs of Google and Microsoft. His current research interest is broadly in the theoretical foundations of modern machine learning. In particular, his main focus is on obtaining solid theoretical understanding of deep learning, as well as using theoretical insights to design practical and principled machine learning methods. He is a recipient of the Siebel Scholarship Class of 2021.