Télécharger la présentation
## POLS 606

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**POLS 606**Hierarchical Models**Intro**• Who are you? • Fields • Substantive interests? • I am Dave • American politics • Campaigns and elections**Logistics**• Book—Gelman and Hill • Snijders and Bosker is recommended. • Chapter 2 of G&H is up to you to master • G&H don’t rely on matrix algebra to teach • Probability and simulation • Bayesian • R • Not easier or harder, just different.**Lectures**• There will be a mix • Math (no PowerPoint) • There will be lectures using R • Will tend to alternate**R**• Very powerful/flexible • Wave of the future • Need to understand stuff more • I have never used it before so we will learn it together**Grades**• Homework • There will be a bunch and they will be a mix of practical and theory • Final • Questions I would write for the methods exam • Paper • Original research using HLM. Won’t be due until start of Fall Semester**What is a multilevel model?**• Theory tells you that concepts at more than one level of aggregation are related • Usually thought of as geographic • Countries • States • Schools**What is a multilevel model?**• Doesn’t have to be • Time • Experimental Condition • Institutions • Regime • Bureaucracies • Individuals (panel data)**Theory is key!**• Two types of relationships • Random intercepts • Mean value of DV depends on aggregate unit • Random slopes • Effect of IV depends on aggregate unit • Can have both**So you have multilevel data**• Choice 1: Aggregate • Combine data to the highest level of aggregation • Create “average” value of variables for each higher unit • Advantage • Easy! • Can easily weigh based on N • Straightforward**Aggregate**• Disadvantages • Shifts meaning • Variables are macro level. Theory is (presumably) micro level. • Ecological fallacy • Classic example: Race and literacy (Robinson 1950)**Why different?**• The key is the within region correlations**Both individual and ecological correlation depend on this,**but in different ways • Individual depends on the internal cells of the region table • Weighted average of the corrs within the regions • Ecological on depends on the marginals • Only the marginals—no use of info in the cells.**So?**• The things that go into the calculation of the ecological correlation do not tell us anything about what we are interested in.**Some Math**• Assume • Total group of N Persons • Two variables x & x • N people divided in to m groups. • X & Y are % of x & y in each of the m groups**Three correlations**• 1) Total individual correlation (r) • Correlation ignoring the grouping • 2) Ecological correlation (re) • Correlation between m pairs (weighted by n of m) • 3) within area Correlations (rw) • This is the weighted average of the correlations withi the m groups**Two correlation ratios**• ηXA & ηYA • Measure the degree of clustering of X&Y by area • High ηXA means wide variation in X across regions**Math**• Can write the relationship between the correlations as:**So?**• re, then, is the weighted difference between individual correlation (thing we care about) and the average of m within area correlations where weights depend on clustering • Bias is not innocuous. Correlations are inflated. re will be large in magnitude than r. • Cannot infer across levels. Don’t do it. Won’t get away with it.**Disaggregate**• You could ignore the higher level of aggregation and pretend everything is observed at the individual level • Advantage: Easy and generous • Disadvantage • You are lying. • Overstate power • Ignores correlations in the errors (not iid)**Dependence of errors**• The problem is a function of the intraclass correlation • Simple model: • Y is the DV • μ = Grand Mean • Uj = Macro effects (errors) • Rij = Unit specific errors**Intraclass correlation**• Errors all mean 0 • Expected value of macro units are: μ+Uj**Intraclass correlation**• It is the proportion of the variance in Y explained by the macro effects • The key concept in HLM • It is the degree of similarity of observations within the groups**Intraclass Correlation**• Note that it changes the error variance • OLS assumes that errors are uncorrelated across observations. • This says they aren’t. • Inflates power • Shrinks standard errors • Macro variables will try to account for this**Other solutions to multilevel data**• Dummy variables • Doesn’t fix standard errors • Can’t specify interesting effects • Clustering • Fixes errors but not all other problems. • Ignores any systematic problems and the theories associated**Real Solution? HLM**• Effects may vary (random slopes) • Use all of the info available and use it accurately • Better predictions • Account for structure in data • Efficiency • Accurate standard errors**How HLM? R!**• R is a different kind of stats package • It is a language, not a program • Open source • http://cran.r-project.org/ • Problem is that it is not obviously user-friendly • No point and click front end embedded. • This can be addressed—R is adaptable**R**• The computer staff tells me it is installed and they will install it on your office machines • Update by adding packages • Rcmdr – gui interface • arm • BRugs • R2WinBUGS • car • foreign • DAAG • Matrix and lme4 if not automatically**Packages**• packages are commands or sets of programs to do things. • sessionInfo() tells you what are currently attached • library(“name”)**R**• Need to load packages each time • The basic starting place for R is the command Prompt (>) • R will take anything you type at this line as a command and will respond • Load packages as library(arm) • Can (and probably should) write a script to do it all**R**• If you just start typing stuff, R assumes you are telling it to evaluate a statement • 2+2 • pi • Any math equation. • R wants you to define “objects” • Everything needs to be an object**Commands**• Basic format • “object”<-”command”(“definition”, option, option) • Example: open data • kidiq <- read.dta(file="c:/R/kidiq.dta") • reads the childrens IQ score data used in Chapter2 • “kidiq” names object kidiq • “<-” tells R that you are going to give it a definition • “read.dta” is the command to read data • “(file=“c:/R/kidiq,dta”)” tells it which data. Note / not \**R**• Random things about objects • Case sensitive • Can (and often do) have . in the name • Will remember that they are there • Can see objects by ls() command • <- defines (equivalent to =) • Look at example 1**R working directory and workspace**• Each session has a working directory • Where R looks for files • If launched from windows icon can define under properties (right click) • getwd() • ls() • q() • Save workspace image? • Saves all objects in a .RData file**Help!**• help.start() • help(“name”)**Script**• Can do line by line commands, but those are slow, temporary and error prone • better to use script editor: • File->new script • control+N • Can save and re-load**Missing data**• R handles missing data • uses “NA” • Will read in data and convert just fine**Reading in data**• kidiq <- read.dta(file="c:/R/kidiq.dta") • We have seen this before • Attaching: • in commands you need to tell R which data you are using (in fact, you can have lots of data sets loaded at once). • fit<-lm(kid.score~mom.hs, data=kidiq) • The command is attach • attach(kidiq) • fit<-lm(kid.score~mom.hs) • detach(kidiq)**Attach**• R looks for things in a particular order • search() • Attach moves stuff around in the order • Order matters a lot—names of objects versus names of variables**Rcmdr**• Handy front end, point and click • library(Rcmdr) • Has a script window • Nice, but don’t lean on it too hard • Thinks it is smarter than you**JGR (“Jaguar”)**• Need to download and install it • Probably need computer staff for machines • Launches separate from R • Package manager is very nice • Runs separate version of R**Graphics**• R has wonderful graphics if you can them to do what you want. • demo(graphics) • Starting point is plot() • plot(y~x) • plot(x, y) • graphs.R**Regression**• You should know the basics and this should be review • Data being used: kidsiq (same as before)**Regression**kid.score = a + b(mom.hs) + error lm(formula = kid.score ~ mom.hs) coef.est coef.se (Intercept) 77.55 2.06 mom.hs 11.77 2.32 n = 434, k = 2 residual sd = 19.85, R-Squared = 0.06 • Interpret • 78 = E(kid.score) if mom.hs=0 • 12 = Expected change in ks when mom.hs = 1**Regression**• kid.score = a + b(mom.iq) + error lm(formula = kid.score ~ mom.iq) coef.est coef.se (Intercept) 25.80 5.92 mom.iq 0.61 0.06 n = 434, k = 2 residual sd = 18.27, R-Squared = 0.20 • Interpret • 26 = E(kid.score) when mom.iq=0 • 0.61 = expected change in ks for every iq point of mom**Regression**• Both predictors lm(formula = kid.score ~ mom.hs + mom.iq) coef.est coef.se (Intercept) 25.73 5.88 mom.hs 5.95 2.21 mom.iq 0.56 0.06 n = 434, k = 3 residual sd = 18.14, R-Squared = 0.21 • Interpret?**Interactions**• Remember, sometimes the effect of a variable is conditional on another variable • In stata you need to create the interaction, in R you can do it on the fly