1 / 22

Secure Data Laboratories: The U.S. Census Bureau Model

Secure Data Laboratories: The U.S. Census Bureau Model. Steven Ruggles University of Minnesota. Why are secure data laboratories needed?. Greater geographic detail needed for multi-level modeling, spatial analysis, and studies of spatial segregation

marthabrown
Télécharger la présentation

Secure Data Laboratories: The U.S. Census Bureau Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Secure Data Laboratories:The U.S. Census Bureau Model Steven Ruggles University of Minnesota

  2. Why are secure data laboratories needed? • Greater geographic detail needed for multi-level modeling, spatial analysis, and studies of spatial segregation • Very large samples (over 10% coverage) and complete-count microdata offer new research opportunities • Adding geographic detail and raising sample sizes raises new confidentiality concerns

  3. Existing Models: • German Research Data Centres • Statistics Canada Research Data Centers • Census Bureau Research Data Centers Key limitation: each holds data for only one country, making comparative research impossible

  4. Emerging standards: • Data Sharing for Demographic Research Project, Inter-university Consortium for Political and Social Research • Eurostat initiative: all statistical agencies are mandated to develop secure data laboratories

  5. Census Bureau Research Data Centers • U.S. Census Bureau made census microdata available to researchers in 1964 through the anonymized Public Use Samples • It was impossible to anonymize the census of business • Original RDC established in 1982 by the Census Bureau Center for Economic Studies to provide access to microdata on firms

  6. The RDC Concept • An office with multiple computers • Staffed by a Census Bureau employee • Computer driven remote data access • Meets physical and computer security requirements for restricted access • Researchers must undergo a background check and obtain Special Sworn Status to use restricted data • Researchers are not permitted to remove anything from the RDC before it passes a disclosure avoidance review

  7. Census RDC Remote Branches • Boston (NBER) 1994 • Carnegie-Mellon 1996-2004 • UC Berkeley 1999 • UCLA 1999 • Research Triangle (Duke, North Carolina) 2000 • Michigan 2002 • Chicago 2002 • New York Cornell 2004 • New York Baruch 2006 • Minnesota 2009

  8. Census RDCs Coming soon: Minneapolis

  9. Census Bureau and RDC partners: • Establish physically secure offices and secure computer systems • Choose projects that use the data appropriately, benefit Census Bureau programs, and present low disclosure risks; • Impart to researchers at the RDC the Census Bureau “culture of confidentiality;” • Establish policies and procedures that protect confidentiality in the RDC office; • Release only research output that does not reveal confidential information.

  10. Each RDC has a security plan. • Locked office with badges, key cards, keypads, etc. • Access limited to researchers with Special Sworn Status (SSS) carrying out active, approved projects at the RDC: • Sign written active project agreements • Obtain security clearance • Sign Census Bureau’s standard sworn agreement to preserve the confidentiality of the data. • Receive awareness training

  11. Census employee (the RDC administrator) stationed at each RDC. • Instills the Census Bureau's “culture of confidentiality” into the researchers • trains the researchers regarding the security and confidentiality restrictions. • Carries out disclosure analysis on any research output a researcher wishes to remove from the secure facilities

  12. Thin client computing environment • Data stored on secure Unix servers at Census Bureau headquarters (Bowie MD). No confidential data stored at the RDCs. • RDCs connected to servers via dedicated T-1 lines. • Researchers use X-terminals (“thin clients”- no local data storage) to access the data authorized for their projects. • Researchers are accountable for their computer use, through the use of passwords and system logs.

  13. The rules: • May not upload or download anything to thin client servers (no physical way to do it) • Have no access to any non-Census Bureau network (including the Internet) from within the RDC facility. • May not bring laptop computers or other portable mass storage devices into the RDC facility.

  14. Demographic and Health Data In the RDCs • Historical focus on “economic” data • Requests for “demographic” data • Higher geographical resolution • Denser samples and complete-count microdata • Obtained permission to provide access to demographic data in RDCs in 1997 • IPUMS is working with Census to reconstruct complete (100%) census microdata from 1960-2000+ for RDCs • RDCs will soon include major collections of U.S. health data as well

  15. The importance of high-density census microdata with fine geographic detail • This is a completely new source with the potential to provide unprecedented insight into residential segregation and the influence of local conditions on behavior. • Analysts of small areas have never had access to microdata, and have been forced to use crude aggregate tabulations that are often incompatible across time and across national boundaries. • As a new kind of data, complete count microdata will stimulate entirely new methods of analysis.

  16. Limitations of the Data Laboratory Model • Access is highly restricted, cumbersome, and expensive • The U.S. experience: just a dozen research projects using censuses in RDCs; number of projects using public-use census microdata over 10,000, most widely used data source in the social sciences • Analysis across national boundaries is essential, and RDCs currently operated by the Census Bureau and the statistical agencies of Germany and Canada cannot meet this need • The Data Sharing for Demographic Research (DSDR) program at the ICPSR has been charged with developing a set of standards for data enclaves

  17. Conclusion • Restricted data enclaves cannot replace public use data, since they prevent access for most researchers. • This strategy, however, does provide the possibility for researchers with compelling needs to gain access to highly confidential data with virtually no risk of disclosure. • To allow analyses that cross national boundaries, we must develop secure data laboratories that are not tied to specific national statistical agencies, but which allow access to data from many countries. • Existing RDCs provide a valuable model

More Related