CSE Seminar

Data Science with Provable Privacy Guarantees

Ashwin MachanavajjhalaAssistant ProfessorDuke
SHARE:

Data scientists in a number of fields including medicine, IoT, and social science, routinely gather and analyze individual-level data. These data span every aspect of our lives and, thus, could reveal medical diagnoses, sexual orientation, race and other sensitive properties about us. Recent research has shown that ad-hoc practices for privacy preservation like the publication of coarse aggregate statistics are inadequate for preventing the disclosure of sensitive properties. A principled approach for data analysis with provable guarantees of privacy for individuals is differential privacy. However, the adoption of differential privacy in real world data science workflows is limited since it represents a paradigm shift for data scientists. To ensure differential privacy, data scientists must only access the data using noisy aggregate queries and not "see" the raw data, and their frequency of data access is restricted by a strict privacy budget. Moreover, naively adding general purpose differentially private algorithms to existing data science workflows results in an inordinate loss of utility. This leads to the question: what tools do data scientists need to author workflows that preserve data utility while ensuring provable guarantees of privacy?

In this talk, I will describe three aspects of my research on enabling provably private data science: (1) novel frameworks for specifying the privacy guarantees required by applications; (2) tools and programming abstractions that we have built that help author safe and accurate programs for differentially private data analysis; and, (3) new methods to ensure end-to-end privacy for real-world data science workflows. I will situate the work in the context of our ongoing efforts on modernizing the data publication algorithms at the US Census Bureau and the release of IoT sensor data with provable privacy guarantees.

Ashwin Machanavajjhala is an Assistant Professor in the Department of Computer Science, Duke University and an Associate Director at the Information Initiative@Duke (iiD). Previously, he was a Senior Research Scientist in the Knowledge Management group at Yahoo! Research. His primary research interests lie in algorithms for ensuring privacy in statistical databases and augmented reality applications. He is a recipient of a 2017 IEEE Influential paper award for the invention of L-diversity in 2006, the National Science Foundation Faculty Early CAREER award in 2013, and the 2008 ACM SIGMOD Jim Gray Dissertation Award Honorable Mention. In collaboration with the US Census Bureau, he is credited with developing the first differentially private deployment in a real-world system. Ashwin graduated with a Ph.D. from the Department of Computer Science, Cornell University and a B.Tech in Computer Science and Engineering from the Indian Institute of Technology, Madras.

Sponsored by

CSE