1 / 45

Statistics and Quantitative Analysis U4320

Statistics and Quantitative Analysis U4320. Lecture 11 : Path Diagrams Prof. Sharyn O’Halloran. Key Points. Slope Coefficient as a Multiplication Factor Path Diagram and Causal Models Direct and Indirect Effects. Regression Coefficients as Multiplication Factors.

tal
Télécharger la présentation

Statistics and Quantitative Analysis U4320

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics and Quantitative Analysis U4320 Lecture 11: Path Diagrams Prof. Sharyn O’Halloran

  2. Key Points • Slope Coefficient as a Multiplication Factor • Path Diagram and Causal Models • Direct and Indirect Effects

  3. Regression Coefficients as Multiplication Factors • I. Regression Coefficients as Multiplication Factors • A. Simple Regression • 1. Basic Equation • Remember our basic one variable regression equation is: • b is the slope of the regression line. It represents the change in Y corresponding to a unit change in X.

  4. Regression Coefficients as Multiplication Factors (cont.) • 2. Multiplication Factor • We can also think of b as a multiplication factor. • 3. Example • Take the first fertilizer equation: • Say we add 5 more pounds of fertilizer. Then the change in yield according to this equation will be:

  5. Regression Coefficients as Multiplication Factors (cont.) • B. Multiple Regression: "Other Things Being Equal" • Now consider the multiple regression equation: • We can still think of the slopes as multiplication factors. • But now they are multiplication factors if we change only one variable and keep all others constant.

  6. Regression Coefficients as Multiplication Factors (cont.) • Say we change X1 to (X1 + DX1) • Then we can write: • If X1 changes while all others remain constant, then change in Y = b1(change in X1)

  7. Regression Coefficients as Multiplication Factors (cont.) • C. Examples • Let's try an example. • Say we have the following single and multiple regression equations:

  8. Regression Coefficients as Multiplication Factors (cont.) • 1. What will be the change in yield if a farmer adds another 100 pounds of fertilizer? • Answer: Only the fertilizer will change, not the rain. So use the multiple regression equation: DY = b1DX1 DY = 100 (.038) DY = 3.8 bushels

  9. Regression Coefficients as Multiplication Factors (cont.) • 2. What will be the change in yield if a farmer irrigates his fields with 3 inches of water? • Answer: Only the amount of water will change, not the fertilizer. So use the multiple regression equation: DY = b2DX2 DY = 3 (.83) DY = 2.5 bushels

  10. Regression Coefficients as Multiplication Factors (cont.) • 3. Say the farmer adds both 100 pounds of fertilizer and 3 inches of irrigation. Now what will the difference in yield be? • Answer: The change in yield will reflect the changes in both independent variables: DY = b1DX1 + b2DX2 DY = 0.38 (100) + (0.83) (3) DY = 3.8 + 2.5 DY = 6.3 bushels

  11. Regression Coefficients as Multiplication Factors (cont.) • 4. Now say that we know the rainfall has increased 3 inches and we know that fertilizer is not necessarily held constant. Now what would your best guess be as to the difference in yield?

  12. Regression Coefficients as Multiplication Factors (cont.) • Answer: Since fertilizer is not held constant, we should use the single regression equation: DY = b DX DY = 3 (1.5) DY = 4.5 bushels. • What we want to do is develop a technique that allows us to disaggregate the effects caused directly by the increase in rainfall and indirectly by other factors.

  13. Path Analysis • II. Path Analysis • A. Fiji Women Say we have data on 4700 women from Fiji. • 1. Basic Model • We know for each woman: • Age • Years of education, and • Number children

  14. Path Analysis (cont.) • a. Path Diagram • We might think that a woman's age and education correlate with how many children she has. • We can write a causal model that looks like this:

  15. Path Analysis (cont.) • b. Estimates • When we estimate these relationships, we get the results: CHILDREN = 3.4 + .059 AGE - .16 EDUC • We can represent these results as follows:

  16. Path Analysis (cont.) • 2. Additional Effects within the Model: Direct and Indirect Now, let's say we think there might also be a relationship between a woman's age and education. • a. Estimated Equation • If we estimate this regression, we get the result: EDUC = 7.6 - .032 AGE. • Older women have less education than younger women.

  17. Path Analysis (cont.) • b. Path Diagram • We now add this new information into the causal model:

  18. Path Analysis (cont.) • Question: • 1. What is the change in the expected number of children due to 1 extra year, holding education constant? • 2. What is the change in the years of education from this same 1 extra year of age?

  19. Path Analysis (cont.) • 3. Direct and Indirect Effects Question: • What's the change in number of children from one extra year of age, letting education change too? • The change in age has two effects: a direct and an indirect effect.

  20. Path Analysis (cont.) • a) Direct Effect (Multiple regression coefficient) • The direct effect is captured in the coefficient leading from AGE to CHILDREN. • This is the multiple regression coefficient, and it represents the expected extra number of children from one extra year, holding education constant

  21. Path Analysis (cont.) • b) Indirect Effect • We know that an extra year corresponds with -.032 years of school. • Each extra year of school corresponds with -.16 extra children. • We get the indirect effect by multiplying along the arrows leading from AGE to CHILDREN through EDUC: (-.032) * (-.16) = + .005.

  22. Path Analysis (cont.) • c) Total Effect • So the total effect of AGE on CHILDREN letting EDUC vary too is the sum of the direct and indirect effects. • That is, .059 + .005 = .064. • Question: • What do you think would have happened if we ran a simple regression of CHILDREN on AGE? What would the coefficient have been?

  23. Path Analysis (cont.) • Summary • A path diagram gives us some insight as to the relationship between simple and multiple regression. • Multiple regression gives us the partial effects of the independent variables on the dependent variable holding all else constant. • Simple regression gives us the total effect, which is the sum of the direct and the indirect effects.

  24. Path Analysis (cont.) • B. Brady, Cooper and Hurley • 1. Defining Unity • Party unity scores are calculated as: (% voting in the majority - % voting in the minority) • 2. Building the Causal Model • Two components to party unity: internal and external factors. • So we can write a causal model like this:

  25. Path Analysis (cont.) • External factors define how homogeneous is the constituent base of the party. • Internal factors have to do with the strength of party leadership.

  26. Path Analysis (cont.) • However, it is also thought that external factors influence internal factors. • That is, when legislators from a party are united on the issues, they are more likely to give their leaders power to get things done. • Thus we add another line to our model:

  27. Path Analysis (cont.) • 3. Results • When this model was estimated, the results were: PARTY STRENGTH = .61 INTERNAL + .58 EXTERNAL; INTERNAL = .66 EXTERNAL. • 4. Question • What is the effect of External factors on Party Unity? • Direct Effect = 0.58 • Indirect Effect = (.66)*(.61) = 0.40 • Total Effect = .58 + .40 = .98

  28. Path Analysis (cont.) • C. Commie Model from Shapiro What determines people's attitudes towards whether communists should be allowed to teach college? • 1. How to Build a Causal Model • First of all, what constitutes a valid causal model? • For now, the answer is: no cycles. • That is, you shouldn't be able to start at a point and follow arrows and end up back at the same point.

  29. Path Analysis (cont.) How to build a causal model? (cont.) • Second, once you have a causal model, how do you know which regressions to run? • For each variable, see what arrows are going into it. Then run a regression with those variables as the independent variables.

  30. Path Analysis (cont.) • 2. Variables and the Causal Model • Our hypothesis is that attitudes towards teaching depend on attitudes towards communism in general, party ID, education, and age. • The full causal model can be written like this:

  31. Path Analysis (cont.) • Variables: • Attitudes towards teaching are determined by all the other variables. • Attitude towards the communist system depends on party ID, education, and age. • Party ID depends on education and age. • Finally, education depends on age.

  32. Path Analysis (cont.) • 3. Defining the Variables • First we make our own copies of all the variables. • 1. TeachCom is a dichotomous variable, coded 1 if the respondent thought it was OK for communists to teach college. • 2. Smarts is years of education. • 3. PartyOn is the respondent's party ID. 0 stands for strongly Democrat, up to 6 for strongly Republican.

  33. Path Analysis (cont.) Variables (cont.) • 4. ComPhile is how you think about communism as a system of government. Higher values mean that it's a good system. • 5. Finally, Years is your age.

  34. Path Analysis (cont.) • 4. Estimating the Model • a) Regression commands • How we specify our causal model determines what regression we run. • For instance, TeachCom has arrows going into it from all other variables, so we run the regression with all the variables. • Then we take ComPhile, and regress it on Years, Smarts and PartyOn. • And so on down the line.

  35. Path Analysis (cont.) • b) Descriptive Statistics We then report our descriptive statistics that we'll use: • "Means" gives the mean of each variable. • "Stddev" gives their standard deviation. • N gives the number of valid observations. • "Corr" gives the correlations between variables. • "Sig" tells us the significance of each correlation.

  36. Path Analysis (cont.) • 5. Results • a) Means Table • Look at the means table.

  37. Path Analysis (cont.) • b) Correlation Matrix Next is the correlation matrix. • Smarts is negatively correlated with years. That means that older people tend to have had fewer years of schooling. • PartyOn is negatively correlated with years, so older people tend to be Republican.

  38. Path Analysis (cont.) • b) Correlation Matrix (cont.) • Comphile and Teachcom is also negatively related to years. • Older people tend to have more negative attitudes toward the communist system and be against communists teaching college. • One-tailed p-values are reported beneath the correlation coefficient.

  39. Path Analysis (cont.) • c) Regression Results

  40. Path Analysis (cont.) • d) Question: • What is the effect of Years on Teachcom? 1. Direct Effect -.003 =-0.003 2. Indirect Effect via Comphile (-.006)*(.16)= -.00096 3. Indirect Effect via Partyon Partyon alone (-.0047)*(-.015) =.0000705 Partyon and Comphile (-.0047)(-.029)(.016) =.0000218

  41. Path Analysis (cont.) 4. Indirect via Smarts Smarts alone (-.044)(.035) = -.00154 Smarts & Partyon (-.044)(.053)(-.015) = .000035 Smarts & Comphile (-.044)(.032)(.16) = -.000023 Smarts, Partyon & Comphile (-.044)(.053)(-.029)(.16) = .000011 `````Total Indirect Effects = -.00259 Total Effects = Direct + Indirect Total Effects = -.003 - .0026 = -.0056

  42. Homework • III. Homework • A. Recap • Your homework assignment is to write a Path Diagram. • B. Issues in the Article • There is a dispute between American and European researchers on the effectiveness of AZT. Americans say that it works, and Europeans say that there's not enough evidence.

  43. Homework • 1. The U.S. View • The U.S. allowed AZT to be distributed to HIV-positive individuals on the basis of a study completed in 1989. • Usually the FDA requires that to release a drug the experimenters show: DRUG --------------> HEALTH • Instead of direct link, researchers showed an indirect link. AZT --------------> MARKERS ---------------> HEALTH • If both of these correlations are positive, then so should be the total effect from AZT to health.

  44. Homework • 2. The European View • The European researchers said that although it's true that AZT raised the level of CD-4 markers, these markers didn't indicate any long-term improvement in health. • So they say that the model looks like this: AZT --------------> MARKERS ---------------> HEALTH • If there's no link between CD-4 and health, then the overall link between AZT and health is also 0 on the basis of the information presented so far.

  45. Homework • 3. How To Resolve the Dispute • What kind of evidence would they need to resolve this dispute? • First, they could do studies to show that AZT has a direct effect on health. These studies take longer, but their conclusions are more reliable since they show a direct link. • Or, they could find another marker. That is, another intermediate substance that AZT affects and that affects health.

More Related