1 / 25

Multilevel modelling : general ideas and uses

Multilevel modelling : general ideas and uses. 3 0.5.2017 Kari Nissinen Finnish Institute for Educational Research. Hierarchical data. Data in question is organized in a hierarchical / multilevel manner

pvivian
Télécharger la présentation

Multilevel modelling : general ideas and uses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilevelmodelling:general ideas and uses 30.5.2017 Kari Nissinen Finnish Institute for Educational Research

  2. Hierarchical data • Data in question is organized in a hierarchical / multilevel manner • Units at lowerlevel (1-5) arearranged into higher-levelunits (A, B) A B 1 2 3 4 5

  3. Hierarchical data • Examples • Studentswithinclasseswithinschools • Employeeswithinworkplaces • Partners in couples • Residentswithinneighbourhoods • Nestlingswithinbroodswithinpopulations… • Repeatedmeasureswithinindividuals

  4. Hierarchical data • The keyissue is clustering • lower-levelunitswithin an upper-levelunittend to bemorehomogeneousthantwoarbitrarylower-levelunits • E.g. studentswithin a class: intra-clustercorrelation ICC (positive) • Repeatedmeasures: autocorrelation(usuallypositive)

  5. Hierarchical data • Clustering => lower-levelunitsarenotindependent • In cross-sectionalstudiesthis is a problem • Twocorrelatedobservationsprovidelessinformationthantwoindependentobservations (partial ’overlap’) • Efficientsamplesizesmallerthannominalsamplesize => statisticalinferencefalselypowerful

  6. Clustering in cross-sectionalstudies • Basic statisticalmethodsdonotrecognize the dependence of observations • Standard errors(variances) underestimated=> confidenceintervalstooshort, statisticalteststoosignificant • Specialmethodologyneeded for correctvariances… • Design-basedapproaches (varianceestimation in clustersamplingframework) • Model-basedapproaches: multilevelmodels

  7. Clustering in cross-sectionalstudies • Measureof ’inferenceerror’ due to clustering: design effect (DEFF) = ratio of correctvariance to underestimatedvariance (no clusteringassumed) A function of ratio of nominalsamplesizeto effectivesamplesizeand/orhomogeneitywithinclusters (ICC)

  8. Hierarchical data • Hierarchy is a property of population, whichcancarryover into the sample data • Cluster sampling: hierarchy is explicitlypresent in data collection => data possess the samehierarchy(and possibleclustering) exactly • Simplerandomsampling (etc): clusteringmayormaynotappear in the data • It is presentbuthidden, maybedifficult to identify • Effectmaybenegligible

  9. Hierarchical data • Hierarchydoesnotalwayslead to clustering:unitswithin a clustercanbeuncorrelated • Other side of the coin is heterogeneitybetweenupper-levelunits: if no heterogeneity, then no homogeneityamonglower-levelunits • Zero ICC => no need for specialmethodology • Clustering canaffectsometargetvariables, butnotsomeothers

  10. Longitudinal data • Clustering = measurements on an individualarenotindependent • Whenanalyzingchangethis is a benefit • Eachunitsserves as itsown ’controlunit’ (’block design’) => ’true’ change • Autocorrelation ’carries’ thislinkfromtimepoint to another • Appropriatemethodsutilizethiscorrelation => powerfulstatisticalinference

  11. Mixedmodels • An approach for handlinghierarchical / clustered / correlated data • Typically regression or ANOVA models, whichcontaineffects of explanatoryvariables, whichcanbe(i) fixed, (ii) randomor (iii) both • Linearmixedmodels: errordistributionnormal (Gaussian) • Generalizedlinearmixedmodels: errordistributionbinomial, Poisson, gamma, etc

  12. Mixedmodels • Variance component models • Randomcoefficient regression models • Multilevelmodels • Hierachical (generalized) linearmodels • Allthesearespecialcases of mixedmodels • Similarestimationprocedures (maximumlikelihood & itsvariants), etc

  13. Fixedvsrandomeffects • 1-way ANOVA fixedeffectsmodel Y(ij) = μ + α(i) + e(ij) • μ = fixedintercept, grandmean • α(i) = fixedeffect of group i • e(ij) = randomerror (’randomeffect’) of unitij • random, becauseit is drawnfrom a population • ithas a probabilitydistribution (often N(0,σ²))

  14. Fixedvsrandomeffects • Fixedeffectsdetermine the means of observations E(Y(ij)) = μ + α(i), since E(e(ij))=0 • Randomeffectsdetermine the variances (& covariances/correlations) of observations Var(Y(ij)) = Var(e(ij)) = σ²

  15. Fixedvsrandomeffects • 1-way ANOVA randomeffectsmodel Y(ij) = μ + u(i) + e(ij) • μ = fixedintercept, grandmean • u(i) = randomeffect of group i • randomwhen the groupis drawnfrom a population of groups • hasa probabilitydistributionN(0,σ(u)²) • e(ij) = randomerror (’randomeffect’) of unitij

  16. Fixedvsrandomeffects • Now the mean of observations is just E(Y(ij)) = μ • Varianceis Var(Y(ij)) = Var(u(i) + e(ij)) = σ(u)² + σ² • Sum of twovariancecomponents => variancecomponentmodel

  17. Randomeffects and clustering • Randomgroup => unitsij and ikwithingroup i arecorrelated: Cov(Y(ij),Y(ik)) = Cov(u(i) + e(ij), u(i) + e(ik)) = Cov(u(i), u(i)) = σ(u)² • Positiveintra-clustercorrelation ICC = Cov(Y(ij),Y(ik)) / Var(Y(ij)) = σ(u)² / (σ(u)² + σ²)

  18. Mixedmodel • Containsbothfixed and randomeffects, e.g. Y(ij) = μ + βX(ij) + u(i) + e(ij) • i = school, j = student • μ = fixedintercept • β = fixedregression coefficient • u(i) = randomschooleffect (’schoolintercept’) • e(ij) = randomerrorof student j in school i

  19. Mixedmodel Y(ij) = μ + βX(ij) + u(i) + e(ij) • The mean of Y is modelled as a function of explanatoryvariable Xthrough the fixedparametersμand β • The variance of Y and within-clustercovariance (ICC)aremodelledthrough the randomeffects u (’level 2’) and e (’level 1’) • This is the general idea; extendsversatilely

  20. Regression lines in variance component model: high ICC

  21. Regression lines in variance component model: low ICC

  22. An extension: randomcoefficient regression Y(ij) = μ + βX(ij) + u(i) + v(i)X(ij) + e(ij) • v(i) = randomschoolslope • Regression coefficient of X variesbetweenschools: β+ v(i) • A ’side effect’: the variance of Y variesalong with X • onepossibleway to modelunequalvariances (as a function of X)

  23. Randomcoefficient regression

  24. Regression for repeatedmeasures data Y(it) = μ(t) + βX(it) + e(it) • t= time, μ(t) = intercept at time t • i = individual • The errors e(it) of individual i correlated: different (auto)correlationstructures(e.g. AR(1))canbefitted as well as differentvariancestructures (unequalvariances)

  25. Thanks!

More Related