Faculty Candidate Seminar
Anonymization Techniques for Published Data
Add to Google Calendar
Dr. LeFevre is from University of Wisconsin, Madison
Many organizations publish and distribute non-aggregate personal data for purposes including medical, demographic, and public health research. For legal and ethical reasons, it is important that these organizations take steps to protect the identities of individuals, as well as their sensitive personal information. At the same time, concern for privacy must be balanced with the need to provide useful, high-quality data.
In this talk, I will first give a brief overview of the anonymity problem in data publishing. Then I will describe a new multidimensional generalization approach (also commonly called "recoding" ) and greedy algorithmic framework.
The contributions of this work span two key dimensions. First, there are a seemingly infinite number of ways to measure data quality. I will take a very direct evaluation approach, based on a target workload of queries and data mining tasks, and I will describe some ways to directly incorporate knowledge of a workload into the anonymization process. Second, as more and more personal information is collected, it is important to develop algorithms that are both efficient and scalable. In the latter part of the talk, I will describe techniques for incorporating scalability into our algorithmic framework.