160 likes | 289 Vues
POLS 570 week 9. Introduction to statistical correlation, crosstabs and causation. The test statistic and its distribution. Recall the thought experiment in which you estimated the average height of the people in ESU through random sampling
E N D
POLS 570week 9 Introduction to statistical correlation, crosstabs and causation
The test statistic and its distribution • Recall the thought experiment in which you estimated the average height of the people in ESU through random sampling • Your result was a frequency distribution, which had a mean and standard deviation • Test statistics, similarly, have probability distributions which they tell us the statistical likelihood of the observed relationship between the variables
Cross tab Chi-square statistic • In the case of cross-tabulation, the test statistic is called “Chi-squared;” it follows the Chi-squared distribution, which is defined as • In other words, the Chi-squared statistic indicates the sum of the squared deviations • from what would be expected assuming that no differences existed between men • and women (for the variable of interest) divided by the expected observations
Hypothesis testingusing cross-tabulation • Since the Chi-squared distribution is known, we can calculate the probability with which the statistic would have been generated if the null hypothesis were true • When the null hypothesis (no relationship between the variables) is true with very small probability (in this class the default is <.05), we say that there is a significant relationship • and thus we reject the null hypothesis
Interpreting the output • SPSS gives us several statistics related to the ordinary Chi-squared statistic, which is the one we are interested in • It tells us • the value of the statistic, • the number of “degrees of freedom” the “significance” of the statistic
experimenting with other variables we find… • a significant relationship between gender and • attitude toward prayer in school • presidential voting in 1992 • a significant relationship between race and • attitude toward prayer in school • presidential voting in 1992 • attitude toward homosexuality • Contingency coefficient, Somer’s d • PRE: proportional reduction in error, these are different indicators-
Correlation: a measure of association for interval- and ratio-level data • Correlation (like cross-tabulation) can be used as a descriptive or an inferential technique • Correlation answers the question, “is there a linear association between two variables”? • Are high values of one variable associated with high values of another? • Are low values of one variable associated with high values of another?
Correlation • The result of a correlation analysis is the correlation coefficient, called r (the “Pearson” correlation coefficient) • It varies from -1 to 1 • r = 1 indicates perfect positive correlation r = -1 indicates perfect negative correlation r = 0 indicates no correlation Use with quantitative variables and a linear relationship. As with cross-tabs, correlation does not show causation
To find the correlation between Respondent’s Highest Degree and Respondent’s Income go to the “Analyze” menu and choose “Correlate,” then “Bivarate” The correlation coefficient is the correlation between degree and rincome The significance level tells us: if there were zero correlation, with what probability would we observe such a large (positive or negative) correlation coefficient in our sample?
Scatterplots • Scatterplots show the relationship between two variables • For each observation (e.g., each year or each individual), the values of two variables are recorded as a point on a two-dimensional graph, but does not show level of relationship. • Example: the relationship between unemployment and gross domestic product
To reiterate and preview: Recall the normal distribution…. • The normal distribution is important because we can use the assumption of normality for the sampling mean of certain variables with unknown distributions • There are also important distributions that can be derived as functions of the normal distribution, such as • the t distribution • the Chi-squared distribution • the F distribution • These are important in statistical inference, more next week as we return to chapter 5 in Polack