Faculty Candidate Seminar
Efficient Deep Learning with Sparsity: Algorithms, Systems, and Applications
This event is free and open to the publicAdd to Google Calendar
Zoom link for remote attendees: password 123123
Abstract: Deep learning is used across a broad spectrum of applications. However, behind its remarkable performance lies an increasing gap between the demand for and supply of computation. On the demand side, the computational costs of deep learning models have surged dramatically, driven by ever-larger input and model sizes. On the supply side, as Moore’s Law slows down, hardware no longer delivers increasing performance within the same power budget.
In this talk, I will discuss my research efforts to bridge this demand-supply gap through the lens of sparsity. I will begin with my research on input sparsity. First, I will introduce algorithms that systematically eliminate the least important patches/tokens from dense input data, such as images, enabling up to 60% sparsity without any loss in accuracy. Then, I will present the system library that we have developed to effectively translate the theoretical savings from sparsity to practical speedups on hardware. Our system is up to 3 times faster than the leading industry solution from NVIDIA. Following this, I will touch on my research on model sparsity, highlighting a family of automated, hardware-aware model compression frameworks that surpass manual solutions in accuracy and reduce the design cycle from weeks of human efforts to mere hours of GPU computation. Finally, I will demonstrate the use of sparsity to accelerate a wide range of computation-intensive AI applications, such as autonomous driving, language modeling, and high-energy physics. I will conclude this talk with my vision towards building more efficient and accessible AI.