450 likes | 543 Vues
Statistics and Quantitative Analysis U4320. Lecture 11 : Path Diagrams Prof. Sharyn O’Halloran. Key Points. Slope Coefficient as a Multiplication Factor Path Diagram and Causal Models Direct and Indirect Effects. Regression Coefficients as Multiplication Factors.
E N D
Statistics and Quantitative Analysis U4320 Lecture 11: Path Diagrams Prof. Sharyn O’Halloran
Key Points • Slope Coefficient as a Multiplication Factor • Path Diagram and Causal Models • Direct and Indirect Effects
Regression Coefficients as Multiplication Factors • I. Regression Coefficients as Multiplication Factors • A. Simple Regression • 1. Basic Equation • Remember our basic one variable regression equation is: • b is the slope of the regression line. It represents the change in Y corresponding to a unit change in X.
Regression Coefficients as Multiplication Factors (cont.) • 2. Multiplication Factor • We can also think of b as a multiplication factor. • 3. Example • Take the first fertilizer equation: • Say we add 5 more pounds of fertilizer. Then the change in yield according to this equation will be:
Regression Coefficients as Multiplication Factors (cont.) • B. Multiple Regression: "Other Things Being Equal" • Now consider the multiple regression equation: • We can still think of the slopes as multiplication factors. • But now they are multiplication factors if we change only one variable and keep all others constant.
Regression Coefficients as Multiplication Factors (cont.) • Say we change X1 to (X1 + DX1) • Then we can write: • If X1 changes while all others remain constant, then change in Y = b1(change in X1)
Regression Coefficients as Multiplication Factors (cont.) • C. Examples • Let's try an example. • Say we have the following single and multiple regression equations:
Regression Coefficients as Multiplication Factors (cont.) • 1. What will be the change in yield if a farmer adds another 100 pounds of fertilizer? • Answer: Only the fertilizer will change, not the rain. So use the multiple regression equation: DY = b1DX1 DY = 100 (.038) DY = 3.8 bushels
Regression Coefficients as Multiplication Factors (cont.) • 2. What will be the change in yield if a farmer irrigates his fields with 3 inches of water? • Answer: Only the amount of water will change, not the fertilizer. So use the multiple regression equation: DY = b2DX2 DY = 3 (.83) DY = 2.5 bushels
Regression Coefficients as Multiplication Factors (cont.) • 3. Say the farmer adds both 100 pounds of fertilizer and 3 inches of irrigation. Now what will the difference in yield be? • Answer: The change in yield will reflect the changes in both independent variables: DY = b1DX1 + b2DX2 DY = 0.38 (100) + (0.83) (3) DY = 3.8 + 2.5 DY = 6.3 bushels
Regression Coefficients as Multiplication Factors (cont.) • 4. Now say that we know the rainfall has increased 3 inches and we know that fertilizer is not necessarily held constant. Now what would your best guess be as to the difference in yield?
Regression Coefficients as Multiplication Factors (cont.) • Answer: Since fertilizer is not held constant, we should use the single regression equation: DY = b DX DY = 3 (1.5) DY = 4.5 bushels. • What we want to do is develop a technique that allows us to disaggregate the effects caused directly by the increase in rainfall and indirectly by other factors.
Path Analysis • II. Path Analysis • A. Fiji Women Say we have data on 4700 women from Fiji. • 1. Basic Model • We know for each woman: • Age • Years of education, and • Number children
Path Analysis (cont.) • a. Path Diagram • We might think that a woman's age and education correlate with how many children she has. • We can write a causal model that looks like this:
Path Analysis (cont.) • b. Estimates • When we estimate these relationships, we get the results: CHILDREN = 3.4 + .059 AGE - .16 EDUC • We can represent these results as follows:
Path Analysis (cont.) • 2. Additional Effects within the Model: Direct and Indirect Now, let's say we think there might also be a relationship between a woman's age and education. • a. Estimated Equation • If we estimate this regression, we get the result: EDUC = 7.6 - .032 AGE. • Older women have less education than younger women.
Path Analysis (cont.) • b. Path Diagram • We now add this new information into the causal model:
Path Analysis (cont.) • Question: • 1. What is the change in the expected number of children due to 1 extra year, holding education constant? • 2. What is the change in the years of education from this same 1 extra year of age?
Path Analysis (cont.) • 3. Direct and Indirect Effects Question: • What's the change in number of children from one extra year of age, letting education change too? • The change in age has two effects: a direct and an indirect effect.
Path Analysis (cont.) • a) Direct Effect (Multiple regression coefficient) • The direct effect is captured in the coefficient leading from AGE to CHILDREN. • This is the multiple regression coefficient, and it represents the expected extra number of children from one extra year, holding education constant
Path Analysis (cont.) • b) Indirect Effect • We know that an extra year corresponds with -.032 years of school. • Each extra year of school corresponds with -.16 extra children. • We get the indirect effect by multiplying along the arrows leading from AGE to CHILDREN through EDUC: (-.032) * (-.16) = + .005.
Path Analysis (cont.) • c) Total Effect • So the total effect of AGE on CHILDREN letting EDUC vary too is the sum of the direct and indirect effects. • That is, .059 + .005 = .064. • Question: • What do you think would have happened if we ran a simple regression of CHILDREN on AGE? What would the coefficient have been?
Path Analysis (cont.) • Summary • A path diagram gives us some insight as to the relationship between simple and multiple regression. • Multiple regression gives us the partial effects of the independent variables on the dependent variable holding all else constant. • Simple regression gives us the total effect, which is the sum of the direct and the indirect effects.
Path Analysis (cont.) • B. Brady, Cooper and Hurley • 1. Defining Unity • Party unity scores are calculated as: (% voting in the majority - % voting in the minority) • 2. Building the Causal Model • Two components to party unity: internal and external factors. • So we can write a causal model like this:
Path Analysis (cont.) • External factors define how homogeneous is the constituent base of the party. • Internal factors have to do with the strength of party leadership.
Path Analysis (cont.) • However, it is also thought that external factors influence internal factors. • That is, when legislators from a party are united on the issues, they are more likely to give their leaders power to get things done. • Thus we add another line to our model:
Path Analysis (cont.) • 3. Results • When this model was estimated, the results were: PARTY STRENGTH = .61 INTERNAL + .58 EXTERNAL; INTERNAL = .66 EXTERNAL. • 4. Question • What is the effect of External factors on Party Unity? • Direct Effect = 0.58 • Indirect Effect = (.66)*(.61) = 0.40 • Total Effect = .58 + .40 = .98
Path Analysis (cont.) • C. Commie Model from Shapiro What determines people's attitudes towards whether communists should be allowed to teach college? • 1. How to Build a Causal Model • First of all, what constitutes a valid causal model? • For now, the answer is: no cycles. • That is, you shouldn't be able to start at a point and follow arrows and end up back at the same point.
Path Analysis (cont.) How to build a causal model? (cont.) • Second, once you have a causal model, how do you know which regressions to run? • For each variable, see what arrows are going into it. Then run a regression with those variables as the independent variables.
Path Analysis (cont.) • 2. Variables and the Causal Model • Our hypothesis is that attitudes towards teaching depend on attitudes towards communism in general, party ID, education, and age. • The full causal model can be written like this:
Path Analysis (cont.) • Variables: • Attitudes towards teaching are determined by all the other variables. • Attitude towards the communist system depends on party ID, education, and age. • Party ID depends on education and age. • Finally, education depends on age.
Path Analysis (cont.) • 3. Defining the Variables • First we make our own copies of all the variables. • 1. TeachCom is a dichotomous variable, coded 1 if the respondent thought it was OK for communists to teach college. • 2. Smarts is years of education. • 3. PartyOn is the respondent's party ID. 0 stands for strongly Democrat, up to 6 for strongly Republican.
Path Analysis (cont.) Variables (cont.) • 4. ComPhile is how you think about communism as a system of government. Higher values mean that it's a good system. • 5. Finally, Years is your age.
Path Analysis (cont.) • 4. Estimating the Model • a) Regression commands • How we specify our causal model determines what regression we run. • For instance, TeachCom has arrows going into it from all other variables, so we run the regression with all the variables. • Then we take ComPhile, and regress it on Years, Smarts and PartyOn. • And so on down the line.
Path Analysis (cont.) • b) Descriptive Statistics We then report our descriptive statistics that we'll use: • "Means" gives the mean of each variable. • "Stddev" gives their standard deviation. • N gives the number of valid observations. • "Corr" gives the correlations between variables. • "Sig" tells us the significance of each correlation.
Path Analysis (cont.) • 5. Results • a) Means Table • Look at the means table.
Path Analysis (cont.) • b) Correlation Matrix Next is the correlation matrix. • Smarts is negatively correlated with years. That means that older people tend to have had fewer years of schooling. • PartyOn is negatively correlated with years, so older people tend to be Republican.
Path Analysis (cont.) • b) Correlation Matrix (cont.) • Comphile and Teachcom is also negatively related to years. • Older people tend to have more negative attitudes toward the communist system and be against communists teaching college. • One-tailed p-values are reported beneath the correlation coefficient.
Path Analysis (cont.) • c) Regression Results
Path Analysis (cont.) • d) Question: • What is the effect of Years on Teachcom? 1. Direct Effect -.003 =-0.003 2. Indirect Effect via Comphile (-.006)*(.16)= -.00096 3. Indirect Effect via Partyon Partyon alone (-.0047)*(-.015) =.0000705 Partyon and Comphile (-.0047)(-.029)(.016) =.0000218
Path Analysis (cont.) 4. Indirect via Smarts Smarts alone (-.044)(.035) = -.00154 Smarts & Partyon (-.044)(.053)(-.015) = .000035 Smarts & Comphile (-.044)(.032)(.16) = -.000023 Smarts, Partyon & Comphile (-.044)(.053)(-.029)(.16) = .000011 `````Total Indirect Effects = -.00259 Total Effects = Direct + Indirect Total Effects = -.003 - .0026 = -.0056
Homework • III. Homework • A. Recap • Your homework assignment is to write a Path Diagram. • B. Issues in the Article • There is a dispute between American and European researchers on the effectiveness of AZT. Americans say that it works, and Europeans say that there's not enough evidence.
Homework • 1. The U.S. View • The U.S. allowed AZT to be distributed to HIV-positive individuals on the basis of a study completed in 1989. • Usually the FDA requires that to release a drug the experimenters show: DRUG --------------> HEALTH • Instead of direct link, researchers showed an indirect link. AZT --------------> MARKERS ---------------> HEALTH • If both of these correlations are positive, then so should be the total effect from AZT to health.
Homework • 2. The European View • The European researchers said that although it's true that AZT raised the level of CD-4 markers, these markers didn't indicate any long-term improvement in health. • So they say that the model looks like this: AZT --------------> MARKERS ---------------> HEALTH • If there's no link between CD-4 and health, then the overall link between AZT and health is also 0 on the basis of the information presented so far.
Homework • 3. How To Resolve the Dispute • What kind of evidence would they need to resolve this dispute? • First, they could do studies to show that AZT has a direct effect on health. These studies take longer, but their conclusions are more reliable since they show a direct link. • Or, they could find another marker. That is, another intermediate substance that AZT affects and that affects health.