710 likes | 857 Vues
Overview of the National Longitudinal Survey of Children and Youth. N L S C Y . Goal of the Presentation. To answer the following questions: What is NLSCY? What does a researcher need to know when using NLSCY data (analytical issues)? What are the tools available for NLSCY users? .
E N D
Overview of the National Longitudinal Survey of Childrenand Youth N L S C Y
Goal of the Presentation • To answer the following questions: • What is NLSCY? • What does a researcher need to know when using NLSCY data (analytical issues)? • What are the tools available for NLSCY users?
What is NLSCY? • Longitudinal survey conducted in partnership by HRDC and Statistics Canada. • Follows the development and well-being of children, from birth to adulthood. • Began in 1994, data collection takes place every 2 years, the 6th cycle of data collection beginning September 2004.
The Analytical Issues Being able to . . . . . .make inferences • The data • strategy / concepts • issues: • partial non-response • inconsistencies • Sampling • Complex design • Impact on • estimation • precision • analysis • Type of analysis • longitudinal, cross-sectional • repeated
NLSCY - Overview • Complex data structure • the lives of children are complex • dual child/household structure • new content in each cycle • some changes in old content • Other constraints • limit on quantity of information • limited resources
NLSCY Content • Child (depending on age) • socio-demographic • health • perinatal information • development (motor, social • and physical) • temperament • academic performance • education • literacy • extracurricular activities • work experience • socialization • relationship with parents • family history and • legal custody of children • child care • behaviour • self-esteem • cigarettes, alcohol, drugs • vocabulary assessment • math test • reading comprehension test • sexual activity and loving • relationship • Parents • socio-demographic • education/literacy • labour market • income • health • social support • parental involvement at school • parents’ aspirations for child’s • education School number of students discipline problems school atmosphere resources characteristics Teachers teaching practices demography qualifications • Family • demography of members • relationships between members • of household • family functioning • household • neighbourhood • Principal • demography • qualifications Note: Minor changes are made in the content from one cycle to the next.
Unit of Analysis • The child • Sources of information • Person most knowledgeable about the child (PMK) • Teacher • School principal • Child himself/herself • cognitive measures • self-administered
Unit of Analysis X • Caution • Other types of Analysis • Weights are designed for the child • Concepts like family are characteristics of the child • Not a domain for estimation C Statements like . . . The NLSCY estimates the number of children whose families with characteristic … The NLSCY estimates number of families with characteristic …
Classic Trade-off of Quantity vs Quality In the NLSCY, you will find data are less processed. In surveys, it is normal to clean up the data
Investment for the NLSCY • Focus on derived variables • scales, cognitive measures • transition measures • Non-response adjustment • total non-response • Processing of financial data • family income • personal income • Dissemination within reasonable time
Partial Non-response (item or component) • Respondent units are those which answered the key questions. • Not necessarily all the questions. • Some variables will include non-response, identified by: • not stated • don’t know • refusal • Note: These are different from • Not applicable
How important is it ? Not ! • Maybe non-response is random. • Maybe it's negligible • Maybe it can be explained away • Maybe I can get away with it Hopefully
What are your options • Report missing data as a value • Ignore missing data (limit your analysis to reported data only) • Correct for the missing data • By re-weighting • With imputation • model non-response information
Get to know your non-respondents • When you have significant non-response • You need to assess non-response • It becomes your first variable of interest • It’s an analysis like any other analysis you will do • Otherwise it casts doubt over every finding 2X Analysis
NLSCY Data Collection Strategy 0 11 1994-95 Cycle 1 0 1 2 13 Early Years 1996-97 Cycle 2 0 1 2 3 4 15 1998-99 Cycle 3 2 3 4 5 6 Released summer 2003 17 0 1 2000-01 Cycle 4 2 3 4 5 8 Release expected December 200419 0 1 E.Y. Cycle 5
Issues • CROSS-SECTIONAL ANALYSIS
Issues • CROSS-SECTIONAL ANALYSIS • Limitations due to the age of the sample • part of the sample was not selected for cross-sectional estimates • inherent complexity in the sample design to meet divergent needs • coverage problems • no update of the sample to reflect changes in the population (e.g., immigration); only the sampling weights have been adjusted to reflect changes • the older the cohort gets, the more difficult it is to adjust the sampling weights properly
Issues • CROSS-SECTIONAL ANALYSIS • Limitations due to the nature of the survey • Problems with sample erosion • Conditioning bias • Changes in the definition of age of the child • Interpretation of the results • Impact on the effectiveness of estimation methods • Making inferences • Greater potential with the supplementary samples that have been added
Issues • LONGITUDINAL ANALYSIS
One Survey but actually many datasets A longitudinal file 0 11 1994-95 Cycle 1 2 13 1996-97 Cycle 2 4 15 1998-99 Cycle 3 6 17 2000-01 Intended for cohort analysis of 2 ages, eg, 0-1, 2-3, 4-5, 6-7, 8-9, 10-11 Cycle 4 6-7 0-1 2-3 4-5 8-9 10-11
Issues • LONGITUDINAL ANALYSIS • Limitations due to sample erosion • sample shrinkage problems • representation (coverage) problems • Swiss cheese problems • Conditioning bias • Interpretation of results • impact on effectiveness of estimation • inferences • New definition for the age of the child
Dissecting NLSCY Data Cross-sectional Data Repeated Surveys 0 11 In 1994-95 Cycle 1 0 1 2 13 Early Years 1996-97 Cycle 2 0 1 2 3 4 15 1998-99 Cycle 3 The sample size is very different from one cycle to the next, from one cohort to the next. 3 data cycles for children aged 0 to 11 2 data cycles for children aged 12 and 13
Dissecting NLSCY Data Cross-sectional Data Repeated Surveys 0 11 In 1994-95 Cycle 1 0 1 2 13 Early Years 1996-97 Cycle 2 0 1 2 3 4 15 1998-99 Cycle 3 NOTE: The sample units are not independent of one another. Whereas these units are independent
Issues • CROSS-SECTIONAL ANALYSIS (REPEATED) • Same limitations as noted earlier • The sample overlaps from one cycle to the next. • Independence or interdependence of samples • There is sample interdependence when the sample is made up of the same respondents • Involves a covariance factor • Sample independence is possible only for certain domains (e.g., children aged 0-1)
The basic • The basic idea of sampling • The reason behind complicating a good idea • The implication when modelling data
How Sampling Works. Now let’s assume that we had some idea about the picture we wanted to see. And we decide to stratify the sample. In this case we decide to sample different areas of the picture at different rates, the backgroud, the dress, the face, the hands, etc... Imagine a well known picture Since a picture is made up of points of colour (pixels), we will sample the points of colour at different rates. 1% Random (systematic) 10% Random 5% Random 3% Random 2.5% Stratified
How Sampling Works. 1% 3% 5% 10% 2.5% Stratified
How does this affect modeling or analysis • The sample is no longer simply random • We purposefully biaised the sample to gain efficiencies to meet other goals • This bias is corrected when we apply the design weights.
Framework If you were to analyse each stratum separately Still there would be some difficulty associated with the correction for non-response and final calibration (post) Each part can actually be treated as surveys each with a simpler design The sampling frame or design allows you to keep all these part together in a cohesive way for analysis.
How to interpret sampling The way we sample is reflected and corrected by how we weight the data in the end. • If you looked only at the parts we sampled • You wouldn’t get an accurate picture. • All the parts would be there but not in the right proportions. • The design weights compensate for the known distortions. The final weights include estimated distortions.
What would you use to base the fundamental multivariate relationships in your model or analysis ?
Steps to calculate the weights – Basic overview • At the survey design stage, some factors are used to determine the sample size required • Probability of selection calculated • First series of adjustments for non-response • Post-stratification
Factors to determine the sample size • Characteristics to be estimated (small proportions) • Required precision of the estimates (targetted CV) • Variability of the data • Expected non-response rate • Size of the population
Original design weight • Once the sample is selected in each stratum, calculate the original weight: • Nh/nh, where « h » is the stratum • Since the sample is selected from LFS, get original weight from LFS. • Adjustments for the number of available children.
Non-response adjustment • Adjustments must be made to take into account the total non-response • Characteristics of respondents vs non-respondents are analyzed: • Province, income, level of education of parents, depression scale of PMK, urban/rural, etc.
Post-stratification • Adjustment factor calculated in order to post-stratify the sample to known population counts, by: • Province, age, gender
Final weight • Wf = Wi X Adj1 X Adj2 • Where • Wf: Final weight • Wi: initial weight • Adj1: Non-response adjustment • Adj2: Post stratification
Link between analysis and the sample design (weight) Intelligence Grade level • Child’s • Ability Social environment Teachers School Materials Subject Curriculum Province The proportion of kids in the sample being taught the PEI curriculum is much larger than what’s found in the population Province is a stratum
Link between analysis and the sample design • There are very few things in a child’s life that is not related to where they live. • In the city versus in a small village • In a small province versus a large one • what social/educational programs are offered • what social support and services are offered • regional cultural differences • to name a few…
Weights for cycle 4 • Cross-sectional weights • Longitudinal weights, including the converted respondents. • Longitudinal weights, children introduced in C1 and respondent to all cycles. NEW • Not to mention the bootstrap weights, which are used for an entirely different purpose.
Cross-sectional Weights • Available for all cycles, up to Cycle 4. • When are they used? • Cycle 4 cross-sectional weights: • to represent the population aged 0-17 in 2000-01. • … • Cycle 1 weights: • to represent the population aged 0-11 in 1994-95.
Cross-sectional Weights - Cycle 4 - Warning • In Cycle 4, children with a cross-sectional weight come from 4 different cohorts (introduced in 1994, 1996, 1998 and 2000). • By 2000, the 1994 cohort has been around for 6 years: • cross-sectional representativity decreases over time because of sample erosion and population change (immigration).
Cross-sectional Weights - Cycle 5 • For Cycle 5 (2002-2003), no children aged 6 and 7. • In addition, the 1994 cohort’s cross-sectional representativity has declined even further (erosion and immigration). • As a result, cross-sectional weights will be calculated only for children aged 0-5.
Cross-sectional weights must be used whenthe analysisconcerns a specific year, when you want a snapshot of the situation at a specific point in time. Cross-sectional weights in a nutshell
Longitudinal Weights • Longitudinal weights represent the population of children at the time they were brought in to the survey. • Children introduced in Cycle 1: longitudinal weights represent the population of children aged 0-11 in 1994-95.
Longitudinal Weights (continued) • Children introduced in Cycle 2: longitudinal weights represent the population of children aged 0-1 in 1996-97. • Children introduced in Cycle 3: longitudinal weights represent the population of children aged 0-1 in 1998-99. • Children introduced in Cycle 4: longitudinal weights represent the population of children aged 0-1 in 2000-01.
When are longitudinal weights used? • When you want to track a cohort of children introduced in a particular cycle and see how they’ve developed over time.
Longitudinal Weights - Cycle 4 • Something new in Cycle 4: • 2 sets of longitudinal weights: • Set 1: Weights for children who responded in their first cycle and in Cycle 4 (possible non-response in Cycle 2 or 3) • Set 2: Weights for those introduced in cycle 1 who responded in every cycle. NEW.
Longitudinal Weights - Cycle 4 • Difference between the 2 sets of longitudinal weights • To avoid total non-response in Cycle 2 or 3, the set of weights for those who responded throughout can be used. • If you’re only interested in the changes between Cycle 1 and Cycle 4 directly, the longitudinal weights including converted respondents can be used.