1 / 38

Regression/Correlation

Regression/Correlation. POLS 300 Butz. Bivariate Statistics. Bivariate Relationships Between Interval/Ratio Level Variables. Correlation Coefficient (r) Regression Analysis. Regression.

idola-carr
Télécharger la présentation

Regression/Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression/Correlation POLS 300 Butz

  2. Bivariate Statistics

  3. Bivariate Relationships Between Interval/Ratio Level Variables • Correlation Coefficient (r) • Regression Analysis

  4. Regression • Regression analysis is the standard procedure for exploring relationships and testing hypotheses with interval or ratio-level dependent and independent variables. • The Null and Research hypotheses are the same as we have discussed before.

  5. Regression • Bivariate Regression: one independent variable • Multiple Regression: two or independent variables

  6. Regression • In general, the goal of linear regression is to find the line that best predicts Y from X. • Linear regression does this by estimating a line that minimizes the sum of the squared errors from the line • Minimizing the vertical distances of the data points from the line.

  7. Regression • The purpose of regression analysis is to determine exactly what the line is (i.e. to estimate the equation for the line) • The regression line represents predicted values of Y based on the values of X

  8. Scatterplot • Scatterplot is a graphical display of the relationship between two quantitative variables. • Generally examine either ratio or interval level variables

  9. Scatterplots • Allow you to see the relationship more clearly between two variables. • X-axis (horizontal line) represents the independent variable (IV). • Y-axis (vertical line) is the dependent variable (DV). • Gives us an initial view of the direction and strength of the relationship. • Initial view of “line of best fit”

  10. Y X Scatterplots – to examine a relationship between X and Y

  11. Y (Xi, Yi) Yi Xi X Scatterplots

  12. Y X Positive Relationship

  13. Y X Negative Relationship

  14. Y X No Relationship(Independence)

  15. Regression Analysis • Regression concerned with dependence of one variable (the DV, measured at the interval/ratio level) on one or more other variables (IVs, measured at the interval, ratio, ordinal or nominal levels). • Bivariate vs. Multivariate regression analysis • Y used as dependent variable and X as independent variable.

  16. Equation for a Line (Linear Relationship) Yi = a + BXi a = Intercept, or Constant = The value of Y when X = 0 B = Slope coefficient = The change (+ or ‑) in Y given a one unit increase in X

  17. Estimating the Regression Coefficients • Using statistical calculations, for any relationship between X and Y, we can determine the “best-fitting” line for the relationship • This means finding specific values for a and B for the regression equation E(Yi) = a + BXi

  18. Slope • Yi = a + BXi • B = Slope coefficient • If B is positive than you have a positive relationship. If it is negative you have a negative relationship. • The larger the value of B the more steep the slope of the line…Greater (more dramatic) change in Y for a unit increase in X • General Interpretation: For one unit change in X, we expect a B change in Y.

  19. Slope • The formula assumes a linear relationship. • We know not all relationships are linear. • This is why you need to do a scatterplot to show if the relationship is linear or not. • Could have curvilinear relationship…Age and Voting???

  20. Linear Equation for a Regression Model (with error) Yi = a + BXi + ei Residual (ei )– for every observation, the difference between the observed value of Y and the regression line (Predicted Y) • X will not predict Y perfectly every time…will be some error and equation takes errors into account! • But Assume that errors are random and normally distributed and thus “cancel out” in Linear Regression!!!

  21. Regression • Most popular estimator among researchers doing empirical work. • Easy to use. • East to interpret. • The basic foundation for the more advanced estimators in empirical work.

  22. Estimating the Regression Coefficients • Using statistical calculations, for any relationship between X and Y, we can determine the best-fitting line for the relationship • This means finding specific values for a and B in the regression equation Yi = a + BXi + ei

  23. Estimating the Regression Coefficients • Regression analysis finds the line that minimizes the sum of squared residuals • Minimizes the sum of squared errors Yi = a + bXi + ei

  24. Calculating Predicted Values • We can calculate a predicted value for the dependent variable (Y) for any value of X by using the regression equation for the regression line: Yi = a + bXi

  25. Testing the “Threat” Hypothesis • Do states with a greater presence of a minority population (% African American) have less support for redistributive welfare policy, and lower monthly welfare benefit levels??? • Negative Relationship • Unit of Analysis – States • N = 50

  26. Calculating Predicted Values for Y from a Regression Equation: “Threat Hypothesis” • The estimated regression equation is: E(welfare benefit1995) = 422.7879 + [(-6.292) * %black(1995)] Number of obs = 50 F( 1, 64) = 76.651 Prob < = 0.001 R-squared = 0.3361 ------------------------------------------------------------------------------ welfare1995 | Coef. Std. Err. t P< [95% Conf. Interval] ---------+------------------------------------------------------------------- Black1995(b)| -6.29211 .771099 -8.1620.001 -8.1173 -4.0746 _cons(a)| 422.7879 12.63348 25.5510.001 317.90407 336.6615 ------------------------------------------------------------------------------

  27. Regression Example: “Threat Hypothesis” • To generate a predicted value for various % of AA in 1995, we could simply plug in the appropriate X values and solve for Y. 10% E(welfare benefit1995) = 422.7879 + [(-6.292) * 10] = $359.87 20% E(welfare benefit1995) = 422.7879 + [(-6.292) * 20] = $296.99 30% E(welfare benefit1995) = 422.7879 + [(-6.292) * 30] = $234.09

  28. Regression Analysis and Statistical Significance • Testing for statistical significance for the slope • The p-value - probability of observing a sample slope value at least as large as our Beta in our sample IF THE NULL HYPOTHESIS HOLDS TRUE • P-values closer to 0 suggest the null hypothesis is less likely to be true (.05 usually the threshold for statistical significance)

  29. The Fit of the Regression Line • The R-squared = the proportion of variation in the dependent variable (Y) explained by the independent variable (X). • In bivariate regression analysis it is simply the square of the correlation coefficient (r)

  30. Summary of Regression Statistics • Intercept (a) • Slope (B) • Predicted values of Y • Residuals • P-value for the slope • R-squared

  31. Covariance • The correlation coefficient is based on the covariance. • For a sample, the covariance is calculated as: _ _ • Covariancexy = (Xi - X)(Yi - Y) N - 1 • Interpretation: Covariance tells us how variation in one variable “goes with” variation in another variable (“covary”)

  32. Covariance • Two variables are statistically independent (perfectly unrelated) when their covariance = 0. When r = 0 • Positive relationships indicated by + value, negative relationships by a – value.

  33. Correlation • Correlation Coefficient (Pearson’s r) • A way of standardizing the covariance. • Intepretation: Measures the strength of a linear relationship. • Statistic goes from -1 to 1 • X and Y are perfectly unrelated (independent, uncorrelated) if rxy = 0 • Perfect positive relationship if r = 1 • Perfect negative relationship if r = -1

  34. Degrees of Strength • < = .25 – very weak relationship • .25 - .34 – weak relationship • .35 - .39 – moderate relationship • > = .40 – strong relationship

  35. Regression vs. Correlation • The correlation coefficient measures the strength and direction of a linear relationship between two variables measured at the Interval/Ratio level • In a scatterplot – the degree to which the points in the plot cluster around a “best-fitting” line

  36. Regression vs. Correlation • The purpose of regression analysis is to determine exactly what that line is (i.e. to estimate the equation for the line) • Correlation shows the strength and direction of covariance between an IV and DV.

More Related