150 likes | 639 Vues
Analysis of Variance (ANOVA). Agenda. Lab Stuff Questions about Chi-Square? Intro to Analysis of Variance (ANOVA). This Thursday: Lab 4. Final lab will be distributed on Thursday Very similar to lab 3, but with different data
E N D
Agenda • Lab Stuff • Questions about Chi-Square? • Intro to Analysis of Variance (ANOVA)
This Thursday: Lab 4 • Final lab will be distributed on Thursday • Very similar to lab 3, but with different data • You will be expected to find appropriate variables for three major tests (correlation, t-test, chi-square test of independence) • You will be expected to interpret the findings from each test (one short paragraph per test). • We will use the first 15 minutes of class to return lab 3 and discuss common issues and questions
Example Crosstab: Gender x Student Observed Expected
Analysis of Variance • In its simplest form, it is used to compare means for three or more categories. • Example: • Income (metric) and Marital Status (many categories) • Relies on the F-distribution • Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.
What is ANOVA? • If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests. • One problem is that the 3 tests would not be independent of each other (i.e., all of the information is known). • As number of comparisons grow, likelihood of some differences are expected– but do not necessarily indicate an overall difference. • A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)
The F-ratio • MS = mean square • bg = between groups • wg = within groups • The numerator and denominator have their own degrees of freedom • df = # of categories – 1 (k-1)
Interpreting the F-ratio • Generally, an f-ratio is a measure of how different the means are relative to the variability within each sample • Larger values greater likelihood that the difference between means are not just due to chance alone
Null Hypothesis in ANOVA • If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.
F-distribution • A right-skewed distribution • It is a ratio of two chi-square distributions
F-distribution • F-test for ANOVA is a one-tailed test.
Visual ANOVA and f-ratio http://tinyurl.com/271ANOVA
ANOVA and t-test • How do we know where the differences exist once we know that we have an overall difference between groups? • t-tests become important after an ANOVA so that we can find out which pairs are significantly different (post-hoc tests). • Certain ‘corrections’ can be applied to such post-hoc t-tests so that we account for multiple comparisons (e.g., Bonferroni correction, which divides p-value by the number of comparisons being made) • There are many means comparisons test available (Tukey, Sidak, Bonferroni, etc). All are basically modified means comparisons.
Logic of the ANOVA • Conceptual Intro to ANOVA • Class Example: • anova.do • GSS96_small.dta