360 likes | 628 Vues
Continuous Surveys: Statistical Challenges and Opportunities. Carl Schmertmann Center for Demography & Population Health Florida State University schmertmann@fsu.edu. Outline. CHALLENGES (long) Increased Temporal Complexity Increased Sampling Error New Weighting Problems
 
                
                E N D
Continuous Surveys: Statistical Challenges and Opportunities Carl Schmertmann Center for Demography & Population Health Florida State University schmertmann@fsu.edu
Outline • CHALLENGES (long) • Increased Temporal Complexity • Increased Sampling Error • New Weighting Problems • OPPORTUNITIES (brief, but important)
Sample Size Comparison • US CENSUS LONG FORM:--- 17% / decade • ACS ROLLING SURVEY: 2 per 1000 Households / month 24 per 1000 Households / year 240 per 1000 Households / decade--- 24% / decade
1. Temporal Complexity 1. Temporal Complexity
What is the Population? • 1-Day Census • Population membership is binary: {0,1} • Each individual is IN or OUT • Continuous Survey • Population membership is fuzzy:0 --------------- + ---------------1 • Individuals can be MORE IN (more person-days of residence) or MORE OUT (fewer) 1. Temporal Complexity
Residents (in 000s) 1. Temporal Complexity
Residents (in 000s) Census Population = 12 000 (83% Type A) 1. Temporal Complexity
Residents (in 000s) An ACS ‘Data Sandwich’ includes samples from all months 1. Temporal Complexity
Residents (in 000s) ACS samples from 184 000 person-months Avg Population: 15 333 (65% Type A) 1. Temporal Complexity
Characteristics change over the Sampling Period • Persons • Age • Marital Status • Employment • Education • Housing Units • Vacancy • Number of Occupants • $ Value 1. Temporal Complexity
Rolling ‘Population’ Population formed by sandwiching monthly samples is the average frame of a film, not a snapshot Individuals and housing units with changing characteristics are sampled and caught ‘in motion’. 1. Temporal Complexity
Reference Period Problems Many ‘long-form’ questions refer to retrospective periods: • Income in last 12 months • Place of residence 1 year ago • Child born in last 12 months? • Etc. 1. Temporal Complexity
Time Reference Example • ‘2004’ data from 12 monthly samples taken in Jan04…Dec04 • Question on fertility in the 12 months prior to the survey, so there are 12 overlapping periods in ‘2004’ data • ‘Jan04’ question covers Jan03-Jan04 • ‘Feb04’ question covers Feb03-Feb04 • etc. 1. Temporal Complexity
Nov 2004 Oct 2004 Sep 2004 Dec 2004 Mar 2004 Aug 2004 Apr 2004 May 2004 Jul 2004 Jun 2004 . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . . . . . x x x x x x x x x x x x ● . . . x x x x x x x x x x x x ● . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . x x x x x x x x x x x x ● . . . . . . . . . . . . x x x x x x x x x x x x ● . . Jan 03 Jan 04 Jan 05 Jan 2004 x x x x x x x x x x x x ● . . . . . . . . . . . Feb 2004 . x x x x x x x x x x x x ● . . . . . . . . . . 1 7 11 12 11 10 8 1 6 9 10 2 3 4 9 8 5 7 6 5 4 3 2 1. Temporal Complexity
Reference Periods for ‘Last 12 Month’ Questions in 1-year ACS Datasets 1. Temporal Complexity
Temporal Issues Summarized ‘Data Sandwiches’ contain: • New meaning of ‘population’ • Units that change over sampling period (moving targets) • Multiple reference periods for retrospective questions 1. Temporal Complexity
2. Sampling Error 2. Sampling Error
Small Samples More overall data from continuous sampling, but…1-, 3-, or 5-Year Sandwiches have smaller samples than the single, decennial long form survey more sampling error in published data 2. Sampling Error
Small Samples The problem is especially acute for • small areas • narrow age groups • rare subpopulations e.g., How many unmarried teen births per year in Sevier County, Tennessee? ACS 2006-2008 says 0 ± 161 2. Sampling Error
C24020. SEX BY OCCUPATION – Key West, Florida Data Set: 2006-2008 American Community Survey 3-Year Estimates(http://tinyurl.com/acs-alap) …etc 2. Sampling Error
Temporal Instability Teenage Birth Rate in a County
Unfortunate Result Aggregating over 1+ years of surveys produces datasets that are often • Unfamiliar and difficult to understand • Still too noisy to be useful for planners and researchers 2. Sampling Error
3. Weighting for Non-Response 3. Weighting Problems
Weighting Weighting from Respondents  Total Population requires Population Control Totals: (Place x Age x Sex x Race x Ethnicity x …) 3. Weighting Problems
Decennial Long Form Sample • Control Totals • Measured from a simultaneous enumeration of the population(Sample & Census on same day) • Only 1 set needed • Sample and Population defined identically (resid. on Census Day) 3. Weighting Problems
Continuous Survey • Control Totals • Must be estimated (no simultaneous census) • Many sets needed (2006, 2007, 2006-8, 2007-9, 2008-12, …) • Sample and Population defined differently 3. Weighting Problems
ACS Control Totals (Persons) • ACS responses are weighted to match official intercensal estimates by • Year (1 July midpoint snapshot) • County (sometimes city) • Age • Race • Sex • Hispanic Origin (yes/no) 3. Weighting Problems
ACS Control Totals (Persons) Potential Errors • Estimates are Wrong: • Unanticipated internal migration • Unanticipated international migration • etc • Population Definition don’t match • Seasonal fluctuations • Different race/ethnic categories 3. Weighting Problems
Census Pop = 12 000 (83% Type A) Average Pop = 15 333 (65% Type A) If every year looks like this…Intercensal Estim= 12 000 (83% Type A) 3. Weighting Problems
Weighting Error Example ACS weighting to estimates produces: • Popn too small (Census < Avg Pop) • Popn too “A” (seasonal Bs missed) • Overestimates of vars + correl. with A (e.g., % with college education) • Underestimates of vars - correl. with A (e.g., % single-parent families) 3. Weighting Problems
Opportunities 4. Opportunities
Opportunities ACS table cells = millions of “seemingly unrelated” maximum likelihood estimates Statistical models that exploit likely cell relationships (over times, ages, sexes, places, variables …) could, in principle • Retain frequency & recency • Reduce variance of estimates • Recover familiar measures 4. Opportunities
Conclusion CONTINUOUS SURVEYS like ACS create • Big Problems for producers and users • Unfamiliar, temporally complex data • Potentially high sample error • Technical problems with weighting • Big Opportunities, IF we can develop appropriate statistical models and practices 5. Conclusion
Thanks! ¡Gracias! Obrigado! 5. Conclusion