Download
introduction to secondary data analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Secondary Data Analysis PowerPoint Presentation
Download Presentation
Introduction to Secondary Data Analysis

Introduction to Secondary Data Analysis

577 Vues Download Presentation
Télécharger la présentation

Introduction to Secondary Data Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall, 2009

  2. Data collected by a person or organization other than the users of the data What is secondary data? 2 of 20

  3. Unobtrusive Fast & inexpensive Avoid data collection problems Provide bases for comparison Advantages of Secondary Data 3 of 20

  4. Data availability Level of observation Quality of documentation Data quality control Outdated data Disadvantages of Secondary Data 4 of 20

  5. Inter-university Consortium for Political and Social Research (ICPSR) http://www.icpsr.umich.edu/icpsrweb/ICPSR/ National Center for Health Statistics (NCHS) http://www.cdc.gov/nchs/surveys.htm Center for Medicare and Medicaid Services (CMS)http://www.cms.hhs.gov/home/rsds.asp US Census Bureau http://www.census.gov/main/www/access.html Data Sources 5 of 20

  6. Examples of Directly Downloadable Data from NCHS: National Health and Nutrition Examination Survey(NHANES) National Ambulatory Medical Care Survey (NAMCS) National Hospital Ambulatory Medical Care Survey (NHAMCS) National Hospital Discharge Survey (NHDS) National Home and Hospice Care Survey (NHHCS) National Nursing Home Survey (NNHS) National Survey of Ambulatory Surgery (NSAS) National Employer Health Insurance Survey (NEHIS) National Vital Statistics System (NVSS) National Health Interview Survey (NHIS) Data Sources (cont.) 6 of 20

  7. Data Sources (cont.) • Data Available for Use with Survey Documentation and Analysis (SDA): • http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/sda.jsp • Aging Data • National Archive of Computerized Data on Aging (NACDA) • http://www.icpsr.umich.edu/NACDA/ • Holding about 160 survey data including: • Longitudinal Study of Aging, 70 Years and Older, 1984-1990 • National Survey of Self-Care and Aging: Follow-Up, 1994 • National Health and Nutrition Examination Survey II: Mortality Study, 1992 • National Hospital Discharge Survey, 1994-1997 • National Health Interview Survey, 1994, Second Supplement on Aging 7 of 20

  8. Data Sources (cont.) • SDA (continued): • Substance Abuse Data • Substance Abuse and Mental Health Data Archive (http://www.icpsr.umich.edu/SAMHDA/) • Drug Abuse Warning Network • Monitoring the Future • National Household Survey on Drug Abuse • National Pregnancy and Health Survey • National Treatment Improvement Evaluation Study • Treatment Episode Data Set • Uniform Facility Data Set 8 of 20

  9. Data Sources (cont.) • SDA (continued): • Criminal Justice Data • National Archive of Criminal Justice Data (NACJD) (http://www.icpsr.umich.edu/NACJD/) • International Crime Data • Homicide Data • National Crime Victimization Survey Data • Corrections Data 9 of 20

  10. Purpose of the study Sponsor/collector of the data Mode of data collection Sampling procedures Consistency of data with other sources Evaluation of Data Sources 10 of 20

  11. Documentation Number of observations Number of variables Coding scheme Summary statistics Evaluation of Data Sources (cont.) 11 of 20

  12. Simple Random Sampling Systematic Sampling Complex sample designs stratified designs cluster designs mixed mode designs Types of Survey Sample Design 12 of 20

  13. Simple Random Sampling Each member of the population has an equal and known chance of being selected Simple Random Sample With Replacement (SRSWR) Simple Random Sample Without Replacement (SRSWOR) Types of Survey Sample Design 13 of 20

  14. Systematic Random Sampling the selection of every kth element from a sampling frame with the sampling interval k (=N/n). Types of Survey Sample Design 14 of 20

  15. Stratified sample The population is first divided into non-overlapping subpopulations: strata such as gender, race or SES. Sample from each strata. Works most effectively when the variance is smaller within the strata than in the sample as a whole. Types of Survey Sample Design 15 of 20

  16. Cluster sample Elements are selected in groups or clusters PSU: Primary Sampling Unit.  This is the first unit that is sampled in the design.  For example, school districts from Chicago may be sampled and then schools within districts may be sampled. Homogeneity within cluster: Intracluster Correlation Coefficient (ICC) Types of Survey Sample Design 16 of 20

  17. Increased efficiency Decreased costs Why complex survey design? 17 of 20

  18. Selection weight: Used to adjust for differing probabilities of selection (=N/n). In theory, simple random samples are self-weighted In practice, simple random samples are likely to also require adjustments for non-response Sample Weights 18 of 20

  19. Post-stratification weights: designed to bring the sample proportions in demographic subgroups into agreement with the population proportion in the subgroups. Types of Sample Weights 19 of 20

  20. Non-response weights: designed to inflate the weights of survey respondents to compensate for nonrespondents with similar characteristics. Types of Sample Weights (cont.) 20 of 20

  21. “Blow-up” (expansion) weights: provide estimates for the total population of interest Types of Sample Weights (cont.) 21 of 20

  22. Replicate weights: A series of weight variables that are used instead of PSUs and strata in an effort to protect the respondents' identity.Selection weight and the replicate weights must be used for the correct calculation of the point estimate and its standard error. Types of Sample Weights (cont.) 22 of 20

  23. Complex designs with clustering and unequal selection probabilities generally increase the sampling variance. Not accounting for the impact of complex sample design can lead to biased estimates. Complex Survey Design Effect 23 of 20

  24. The ratio of the design-based standard error to the SRS standard error of a variable: Deff=SE(des)/SE(srs) Deff= 1 + ρ (n – 1) where the ρ is the interclass correlation and n is the number of elements in the cluster. Complex Survey Design Effect 24 of 20

  25. How can we adjust for the design effects? • Find variables identifying the primary sampling units (psu), the strata, and the weight(s). • Use appropriate software to adjust for the design effect. 25 of 20

  26. SAS proc surveyreg data=nhanes; strata strata; cluster psu; class sex race; model fatintk = age sex race; weight finalwt STATA svyset strata strata svyset psu psu svyset pweight finalwt svyreg fatitk age male black hispanic Syntax Examples of Design-based Analysis in SAS, STATA & SUDAAN 26 of 20

  27. SUDAAN proc regress data=”c:\nhanes.sav” filetype=spss desgn=wr; nest strata psu; weight finalwt subpgroup sex race; levels 2 3; model fatintk = age sex race; Syntax Examples of Design-based Analysis in STATA, SUDAAN & SAS 27 of 20