Abstract

February 13, 2006

Title: Statistical Confidentiality: Is Synthetic Data the Answer?
Presented by: George T. Duncan, Statistics, Heinz School of Public Policy and Management, Carnegie Mellon University

IDRE Lecture Video Video
Podcast
PowerPoint Presentation

Abstract

Contemporary social science research calls for large statistical databases that provide information which is accurate, comprehensive, and typically longitudinal and geographically specific about individuals, households, and organizations. Juxtaposed with this increased demand is a crisis of supply—much of these data are sensitive and are collected by information organizations, such as statistical agencies, from respondents who are progressively more concerned about privacy and confidentiality. With origins in both computer science and statistics, a controversial response has emerged to this tension between the individual’s desire to protect personal information and the community’s desire to learn more about itself. This response calls for the use of the original data to estimate a probability distribution, and then the use of this distribution to simulate data (call it synthetic data, or virtual data, or data by probability distribution) that can be released for analysis. This presentation examines whether inference-valid synthetic data is possible and to what extent it in fact protects confidentiality. In this context, an argument is put forth that the dual dicta that the original data are the gold standard and that Occam’s Razor should guide model building are not generally compelling.

Biography

George Duncan is Professor of Statistics in the H. John Heinz III School of Public Policy and Management and the Department of Statistics at Carnegie Mellon University. He is an internationally renowned statistician, scholar, and policy advisor. He chaired the Panel on Confidentiality and Data Access of the National Academy of Sciences (1989-1993), resulting in the book, Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics. He also chaired the American Statistical Association's Committee on Privacy and Confidentiality.

webmaster@idre.ucla.edu