1 / 24

Treasure Trove of Data:  Conducting Research Using Federal Statistical Surveys

Treasure Trove of Data:  Conducting Research Using Federal Statistical Surveys. So many unanswered research questions…. Census Publications. The World of Printed Reports: Statistical Abstract , 1902, 580 pages. Cost of Living Measurement.

xarles
Télécharger la présentation

Treasure Trove of Data:  Conducting Research Using Federal Statistical Surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Treasure Trove of Data:  Conducting Research Using Federal Statistical Surveys

  2. So many unanswered research questions…

  3. Census Publications

  4. The World of Printed Reports: Statistical Abstract, 1902, 580 pages

  5. Cost of Living Measurement

  6. … but seriously folks… There is a Hierarchy of Federal Data • Published aggregates – dating back over a Century but also (mostly available electronically) • Some predetermined geography and categories • Thinner the data “slice” the more confidentiality protection, i.e. the data’s not there anymore • Public Use file • A sub-sample of the data, only feasible for large samples • …but also with confidentiality protection (see above) • Synthetic Data (new approach) • Restricted Use Micro Data • Proposals for research required • Special access arrangements, terms of use, etc.

  7. Public use data

  8. Census Research Data centers

  9. Demographic Data • 1970, 1980, 1990 and 2000 Decennial Long Form (back to 1940 soon) • American Community Survey (effectively replacing the long form) • March CPS Earnings Supplements • Survey of Income and Program Participation • American Housing Survey

  10. Economic Data Sets Annual Survey of Manufactures Census of Construction Census of Finance and Insurance Census of Manufactures Census of Mining Census of Real Estate Census of Retail Census of Services Census of Transportation Census of Wholesale Characteristics of Business Owners Survey Commodity Flow Survey Auxiliary Establishment Survey Longitudinal Business Database Longitudinal Research Database Manufacturing Energy Consumption Survey Medical Expenditure Panel Survey, Insurance Component National Employer Survey Pollution Abatement Costs and Expenditures Quarterly Financial Reports Research and Development Survey Survey of Manufacturing Technology Worker Establishment Characteristics Database R&D and Innovation Survey

  11. Read the Forms!

  12. Linked Household / Business data • Longitudinal Employer Household Dynamics (LEHD) • Links households to place of employment • Based on unemployment insurance administrative records • Covers most states • Quarterly starting in 1990 • “Tracks” a person based on their place of employment • Establishment (i.e. the place of work) is exact for single plant companies • Establishment is assigned for all others (using geography and industry to improve matches) • Google “LEHD on the map”…

  13. How to Apply • Preliminary Proposal Must Meet Basic Requirements • Need for Non-Public data • Maintains Confidentiality • Feasibility • Describes Census Benefits (LEGAL REQUIREMENT) • Scientific Merit • Work with Census Administrator to Craft Final Proposal

  14. Restricted use Health data

  15. Why is there health data at the Census RDCs? • This data is collected by: • National Center for Health Statistics (NCHS) • Agency for Healthcare Research and Quality (AHRQ) • Dual mission: to provide broad access to health data and statistics, while protecting the privacy of respondents • Most Research uses the Public Use file • NCHS and AHRQ RDCs created to provide access to restricted use files • Now available at all Census RDCs

  16. What type of data is it? NCHS Data National Health Status Surveys • National Health and Nutrition Examination Survey (NHANES) I, II, and III • National Health Interview Survey (NHIS) • Longitudinal Study on Aging I and II (LSOA) • National Survey of Family Growth • National Survey of Children's Health • National Survey of Early Childhood Health • National Survey of Children with Special Health Care Needs • National Survey of Children with Special Health Care Needs • National Asthma Survey National Health Care Surveys • National Ambulatory Medical Care Survey • National Hospital Ambulatory Medical Care Survey • National Survey of Ambulatory Surgery • National Hospital Discharge Survey • National Nursing Home Survey (NNHS) • National Home and Hospice Care Survey • National Employer Health Insurance Survey • National Health Provider Inventory • National Immunization Survey Vital Statistics • Mortality and Multiple Mortality • Birth • Fetal Death • National Death Index • Marriage and Divorce Linked Data Sets • Linked mortality data: NHIS, NHANES LSOA II, NNHS • Linked Medicare Enrollment and Claims data: NHIS, NHANES, LSOA II • Linked Social Security Administration Data: NHIS, NHANES, LSOA II, NNHS • Linked EPA data

  17. What is restricted in the public use files but available in the RDC? • Every survey has at least some data that is restricted for confidentiality • Data can be restricted in a number of ways: • Individual variables: • Removed • Top-coded, bottom-coded, coarsened or masked • Artificial information is substituted • Pieces of datasets are restricted • Whole datasets are unavailable (particularly linked files)

  18. What’s restricted? Variables Examples of restricted variables: • Geographic variables (state, county, or metropolitan area) • Most dates (date of interview, date of death, date of birth) • Income and employment data (industry codes) • Specific diagnoses (ICD-9 codes are generally coarsened) • Details about facilities (accreditation, payments, number of employees) • Some information about children and adolescents, (e.g. height and weight, depression, behavior problems, and drug use) • Some information about race, ethnicity, and country of origin • Contextual data (nearest hospital, % of population with diploma) • Sample design variables (necessary for estimating variances)

  19. What’s restricted? Pieces of datasets Examples • Contextual data: data can be linked to information about area (e.g., number of hospitals, education in county, MEPS Area Resource File) • Medical Expenditure Panel Survey: Provider, Insurance, and Nursing Home Component • NHANES III: Youth Conduct Disorder Datasets, Los Angeles Demographic Dataset, Diagnostic Interview Schedule for Children • National Survey on Family Growth: self-report data and interviewer comments

  20. What’s restricted? Datasets • Linked data sets: • Mortality files linked to NHANES, NHIS, LSOA • EPA emissions data linked to NHDS, NHIS, NHANES • Social Security linked to NHANES, NHIS, LSOA • Medicare files linked to NHANES, NHIS, LSOA • Other datasets unavailable: • National Employer Health Insurance Survey • National Death Index

  21. How can I access it? Submit a proposal to NCHS or AHRQ NCHS/AHRQ evaluates for feasibility, availability of computing resources, and likelihood of disclosure of confidential info (NOT for scientific merit) If approved, researcher sends public use data and code NCHS/AHRQ staff merges public use data with restricted data to create a file for use by researcher Files are only created by NCHS/AHRQ staff

  22. How can I access it? • Proposal must include • Full research proposal • Explanation of why public-use files are insufficient • Data dictionary, which must identify files and years, target sample, and variables • Sample code, examples of desired output, and software requirements • Resumes of researchers, sources of funding, and proposed dates when analysis will take place

  23. How can I access it? (Working through NCHS/AHRQ ) • Working at NCHS or AHRQ RDCs (both in Hyattsville, MD) • RDC analyst prepares data prior to researcher’s arrival • Researchers cannot merge own data sets or work with more than one data set at time • All output and notes must be reviewed before removal; data files cannot be removed • Support is available from RDC staff • Working with NCHS remotely • Researchers send code via email and receive output back via email • Only certain SAS/SUDAAN procedures permitted; no access to micro data • Working with AHRQ remotely • AHRQ has no remote server • Possibility of writing task order for AHRQ

More Related