Privacy, Information and Generalization

Adam Smith - School of Electrical Engineering and Computer Science, Pennsylvania State University

March 20, 2017, 2 p.m. - March 20, 2017, 3 p.m.

MC 321


 Consider an agency holding a large database of sensitive personal information -- medical records, census survey answers, web search records, or genetic data, for example. The agency would like to discover and publicly release global characteristics of the data while protecting the privacy of individuals' records. I will begin by discussing what makes this problem difficult, illustrating some challenges via recent work. Motivated by this, I will present differential privacy, a rigorous definition of privacy in statistical databases that is now widely studied, and increasingly used to analyze and design deployed systems.

 Finally, I will explain how differential privacy is connected to a seemingly different problem: understanding statistical validity in "adaptive data analysis", the practice by which insights gathered from data are used to inform further analysis of the same data set. I'll show how limiting the information revealed about a data set during analysis allows one to control bias, and why differential privacy provides a particularly useful tool for limiting revealed information.

Speaker Bio: Adam Smith is a professor of Computer Science and Engineering at Penn State. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He received his Ph.D. from MIT in 2004 and has held visiting positions at the Weizmann Institute of Science, UCLA, Boston University and Harvard. In 2009, he received a Presidential Early Career Award for Scientists and Engineers (PECASE). In 2016, he received the Theory of Cryptography Test of Time award, jointly with C. Dwork, F. McSherry and K. Nissim.