Privacy, Information and Generalization
Add to Google Calendar
Consider an agency holding a large database of sensitive personal information — medical records, census survey answers, web search records, or genetic data, for example. The agency would like to discover and publicly release global characteristics of the data while protecting the privacy of individuals' records.
I will begin by discussing what makes this problem difficult, and exhibit issues that plague simple attempts at anonymization. Motivated by this, I will present differential privacy, a rigorous definition of privacy in statistical databases that is now widely studied, and increasingly used to analyze and design deployed systems.
Finally, I will explain how differential privacy is connected to a seemingly different problem: understanding statistical validity in "adaptive data analysis" , the practice by which insights gathered from data are used to inform further analysis of the same data set. I'll show how the limiting the information revealed about a data set during analysis allows one to control bias, and why differential privacy provides a particularly useful tool for limiting revealed information.
Adam Smith is a professor of Computer Science and Engineering at
Penn State. His research interests lie in data privacy and
cryptography, and their connections to machine learning, statistics,
information theory, and quantum computing. He received his Ph.D. from
MIT in 2004 and has held visiting positions at the Weizmann Institute
of Science, UCLA, Boston University and Harvard. In 2009, he received
a Presidential Early Career Award for Scientists and Engineers
(PECASE). In 2016, he received the Theory of Cryptography Test of Time
award, jointly with C. Dwork, F. McSherry and K. Nissim.