Faculty Candidate Seminar
Diversity and Fairness in Data Summarization Algorithms
This event is free and open to the publicAdd to Google Calendar
Abstract: Searching and summarization are two of the most fundamental tasks in massive data analysis. In this talk, I will focus on these two tasks from the perspective of diversity and fairness.
Search is often formalized as the (approximate) nearest neighbor problem. Despite an extensive research on this topic, its basic formulation is insufficient for many applications. In this talk, I will describe such applications and our approaches to address them. For example, we show how to incorporate diversity or fairness in the results of a search query.
A prominent approach to summarize the data is to compute a small “core-set”: a subset of the data that is sufficient for approximating the solution of a given task. We introduce the notion of “composable core-sets” as core-sets with the composability property: the union of multiple core-sets should form a good summary for the union of the original data sets. This composability property enables efficient solutions to a wide variety of massive data processing applications, including distributed computation (e.g. Map-Reduce model), streaming algorithms, and similarity search. We show how to produce such efficient summaries of the data while preserving the diversity in the data set. I will describe several metrics for capturing the notion of diversity, and present efficient algorithms for construction of composable core-sets with respect to those metrics.
Bio: Sepideh Mahabadi is a research assistant professor at the Toyota Technological Institute at Chicago (TTIC). She received her PhD from MIT, where she was advised by Piotr Indyk. For a year, she was a postdoctoral research scientist at Simons Collaboration on Algorithms and Geometry based at Columbia University. Her research focuses on Theoretical Foundations of Massive Data including High Dimensional Computational Geometry, Streaming Algorithms, and Data Summarization; as well as Social Aspects of Algorithms for Massive Data including Diversity Maximization and Algorithmic Fairness.