410 likes | 613 Vues
ANALYSIS OF VARIANCE (ANOVA). ? =. ? =. STATITICAL DATA ANALYSIS. COMMON TYPES OF ANALYSIS? Examine Strength and Direction of Relationships Bivariate (e.g., Pearson Correlation—r) Between one variable and another: r xy or Y = a + b 1 x 1
E N D
ANALYSIS OF VARIANCE (ANOVA) ? = ? =
STATITICAL DATA ANALYSIS • COMMON TYPES OF ANALYSIS? • Examine Strength and Direction of Relationships • Bivariate (e.g., Pearson Correlation—r) • Between one variable and another: rxy or Y = a + b1 x1 • Multivariate (e.g., Multiple Regression Analysis) • Between one dep. var. and each of several indep. variables, while holding all other indep. variables constant: • Y = a + b1 x1 + b2 x2 + b3 x3 + . . . + bk xk • Compare Groups • Compare Proportions (e.g., Chi-Square Test—2) • H0: P1 = P2 = P3 = … = Pk • Compare Means (e.g., Analysis of Variance) • H0: µ1 = µ2 = µ3 = …= µk
To compare the mean values of a certain characteristic among two or more groups. To see whether two or more groups are equal (or different) on a given metric characteristic. To examine whether a metric dependentvariable is a function of a categorical independent variable. ONE-WAY ANOVA ANOVA was developed in 1919 by Sir Ronald Fisher, a British statistician and geneticist/evolutionary biologist When Do You Use ANOVA? Sir Ronald Fisher (1890-1962)
Remember: Level of measurement determines choice of statistical method. Statistical Techniques and Levels of Measurement: INDEPENDENT NOMINAL/CATEGORICAL METRIC (ORDERED METRIC or HIGHER) * Chi-Square * Discriminant Analysis * Fisher’s Exact Prob. * Logit Regression * T-Test * Correlation Analysis * Analysis of Variance * Regression Analysis (An Example ?) NOMINAL DEPENDENT METRIC
ONE-WAY ANOVA H0 in ANOVA? H0: There are no differences among the mean values of the groups being compared (i.e., the group means are all equal)– H0: µ1 = µ2 = µ3 = …= µk Ha (Conclusion if H0 rejected)? Not all group means are equal(i.e., at least one group mean is different from the rest).
Scenario 1.When comparing 2 groups, a one-step test : 2 Groups: A B Step 1: Check to see if the two groups are different or not, and if so, how. Scenario 2. When comparing >3 groups, if H0 is rejected, it isa two-step test:>3 Groups: A B C Step 1: Overall testthat examines if all groups are equal or not.And, if not all are equal (H0 rejected), then: Step 2: Pair-wise (post-hoc) comparison teststo see where(i.e., among which groups) the differences exit, and how. ONE-WAY ANOVA So, the number of steps involved in ANOVA depend on if we are comparing 2 groups or > 2 groups:
Typical solution presented in statistics classes require… • Constructing an ANOVA TABLE ANOVA TABLE Test Statistic Let’s see the intuitive logic…
Sample Data: A random sample of 9 banks, 10 retailers, and 10 utilities. Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing Utility $6.42 $3.52 $3.55 2.83 4.21 2.13 8.94 4.36 3.24 6.80 2.67 6.47 5.70 3.49 3.06 4.65 4.68 1.80 6.20 3.30 5.29 2.71 2.68 2.96 8.34 7.25 2.90 ----- 0.16 1.73 nB = 9 nR = 10 nU = 10 n = 29 H0: There were no differences in average EPS of Banks, Utilities, and Retailers. First logical thing you do? _ _ _ = xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21 ONE-WAY ANOVA EXAMPLE: Whether or not average earnings per share (EPS) for commercial banks, retailing operations, & utility companies (variable Industry) was the same last year.
Why is it called ANOVA? Differences in EPS (Dep. Var.) among all 29 firms hastwo components--differences among the groups and differences within the groups. That is, There are some differences in EPS amongthe three groups of firms (Banks vs. Retailers vs. Utilities), and There are also some differences/variations in EPS of the firms within each of these groups (among banks themselves, among retailers themselves, and among utilities themselves). ONE-WAY ANOVA • ANOVA will partition/analyze the varianceof the dependent variable (i.e., the differences in EPS) and traces it to its two components/sources--i.e., to differences between groups vs. differences within groups. • WHY?
ONE-WAY ANOVA • The underlying intuitive logicin ANOVA: • If the groups that are being compared, come from the same population (i.e., if groups are alike/equal): • They should exhibit similar differences (have equal variability) • Hence, the differencesamong these groups • should be no more than the differences withinthem (i.e., among members within same groups). • That is, groups that are alike/similar are expected to have about as much variability betweenthemasthey havewithinthem.
On the other hand… If the groups being compared are divergent/dissimilar/unequal ? They would exhibit more difference between them thanthey show within them. Among members within the same groups That is, they will havegreater similarity/commonality internallythan they have externally(with members of the other groups). ONE-WAY ANOVA
Compute the differences that exist among these groups, and Compareit with the differences that existwithinthese groups. And, that is exactly what ANOVA does…. QUESTION: How do we usually measuredifferences? ONE-WAY ANOVA • CRITERION USED BY ANOVA: Groups can be considered different if there exists…? • …if there exists larger differences among these groups than there are among members within them. • QUESTION: • Given the above, what would one have to do to conduct ANOVA? • That is, what do you have to do to judge whether or not two or more groups can be considered different/equal (with respect to a given characteristic)?
VARIANCE:A useful index of differences/variations/ dispersion among a set of values/scores. Estimate of average (i.e., per observation) difference from the mean Computation? ONE-WAY ANOVA QUESTION: How do we usually measure differences/variations? • Sum of squared deviations from the mean • S2 = • Sample Size – 1
So, steps in performing ANOVA: Compute the BETWEEN-GROUP VARIANCE for the characteristic under study (i.e., the dependent variable), Compute the WITHIN-GROUP VARIANCE for the same characteristic/variable, and then COMPAREthe two (i.e., check to see if Between Group var. > Within Group Var.) NOTE: In ANOVA the term “MEAN SQUARE,” rather thanvariance, is utilized. ONE-WAY ANOVA
Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing Utility 6.42 3.52 3.55 2.83 4.21 2.13 8.94 4.36 3.24 6.80 2.67 6.47 5.70 3.49 3.06 4.65 4.68 1.80 6.20 3.30 5.29 2.71 2.68 2.96 8.34 7.25 2.90 ----- 0.16 1.73 nB = 9 nR = 10 nU = 10 n = 29 ___= xB = 5.84 xR = 3.63 xU = 3.31 X = 4.21 Total WITHIN Group Variance (or Mean Square WITHIN)? ONE-WAY ANOVA
ONE-WAY ANOVA Mean Square WITHIN Groups (MSW): Let’s see what we just did: The generic mathematicalformula for MSW: Called “Degrees of Freedom”= (nB-1)+(nR-1)+(nU-1)
Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing Utility 6.42 3.52 3.55 2.83 4.21 2.13 8.94 4.36 3.24 6.80 2.67 6.47 5.70 3.49 3.06 4.65 4.68 1.80 6.20 3.30 5.29 2.71 2.68 2.96 8.34 7.25 2.90 ----- 0.16 1.73 nB = 9 nR = 10 nU = 10 n = 29 _ _ _ = xB = 5.84 xR = 3.63 xU = 3.31 x = 4.21 Let’s now compute the BETWEEN Group Variance (Mean Square BETWEEN--MSB)? ONE-WAY ANOVA
ONE-WAY ANOVA Mean Square BETWEEN Groups (MSB): Let’s see what we just did: Weighted by respective group sizes Mathematical formula for MSB: Called Degrees of Freedom
ONE-WAY ANOVA • Mean Square Between Groups = MSB =17.698 • MSBrepresents the portion of the total differences/variations in EPS (the dependent variable) that is attributable to (or explained by) differences BETWEEN groups (e.g., industries) • That is, the part of differences in companies’ EPS that result from whether they are banks, retailers, or utilities.
ONE-WAY ANOVA • Mean Square Within Groups (MSResidual/Error) =MSW =3.35 • MSWrepresents: • The differences in EPS (the dependent variable) that aredue to all other factors that are not examined and not controlled for in the study (e.g., diversification level, firm size, etc.) • Plus . . . • The natural variability of EPS (the dependent variable) among members within each of the comparison groups (Note that even banks with the same size and same level of diversification would have different EPS levels).
ONE-WAY ANOVA • Now, let’s compare MSB & MSW: • MSB = 17.6 and MSW = 3.35.QUESTION:Based on the logic of ANOVA, when would we consider two (or more) groups as different/unequal? • When MSB is significantly larger than MSW. • QUESTION: • What would be a reasonable index (a single number) that willshow how large MSB is compared to MSW? • (i.e., a single number that will show if MSB is larger than, equal to, or smaller than MSW)?
Compare BETWEEN and WITHIN GroupVariances/Mean Squares--Compute the F-Ratio: • Ratio of MSB and MSW (Call it F-Ratio): • What can we infer when F-ratiois close to1? • MSB and MSW are likely to be equal and, thus, there is a strong likelihood that NO difference exists among the comparison groups. • How about when F-ratio is significantly larger than 1? • The more F-ratio exceeds 1, the larger MSB is compared to MSW and, thus, the stronger would be the likelihood/evidence that group difference(s) exist. • Results of the above computations are usually summarized in an ANOVA TABLE such as the one that follows:
For our sample companies, EPS difference across the three industries (MSB) is more than 5 times the EPS difference among firms within the industries (MSW) QUESTION: What is our null Hypothesis? QUESTION: Is the above F-ratio of 5.28 large enough to warrant rejecting the null? ANSWER:It would be if the chance of being wrong (in rejecting the null) does not exceed 5%. So, look up the F-value in the table of F-distribution (under appropriate degrees of freedom) to find out what the -level will be if, given this F-value, we decide to reject the null. Degrees of Freedom: v1 = k – 1 = 2 v2 = n – k = 26 ONE-WAY ANOVA Interpretation and Conclusion: QUESTION: What does the F = 5.28 mean, intuitively?
11 F = 3.37 is significant at = 0.05 (If F=3.37 and we reject H0, 5% chance of being wrong)
Our F = 5.28 > 4.27 • So, what can we say about our -level? • F = 4.27 is significant at = 0.025. • That is, if F=4.27 and we reject H0, we would face 5% chance of being wrong. • But, our F = 5.28 > 4.27 • So, what can we say about our -level? Will it be larger or smaller than 0.025?
The odds of being wrong, if we decide to reject the null, would be less than 2.5% (i.e., < 0.025). Would rejecting the null be a safe bet? Conclusion? Reject the null and conclude that the average EPS is NOT EQUAL FOR ALL GROUPS (industries) being compared. Is the analysis complete? ONE-WAY ANOVA • Our F = 5.28 > 4.27
Is our analysis complete? It would be if we were comparing only two groups; simply examine which sample mean is larger than which and report!! HOWEVER, … If null is rejectedandmore than two groups are being compared: REMAINING QUESTION:Where exactly (i.e., between which groups) do the differences lie? And, which group(s) of firms exhibit relatively higher, lower, or equal EPS levels? ANSWER: Perform post hoc, multiple comparison tests. SPSS (and other software packages) offer a variety of options (e.g., LSD, Bonferroni, Tukey, etc.) to choose from. Let’s now review the steps involved… ONE-WAY ANOVA
Overall Ho: All Group Means Are Equal H1: Not All Groups Are Equal How many groups are being compared? ONE-WAY ANOVA No( > .05) Is overall F significant? (i.e., < 0.05) Yes( <.05) Don’t reject Ho; No group diff. found; stop Reject Ho; Not all group means are equal. (i.e., at least 2 groups are diff.) If only 2 If more than 2 Conduct post-hoc pairwise comparison tests to see where the differences lie. Examine the results. Examine the group means. Examine the group means. Report which group has higher/lower mean Report which groups have higher/lower means. Stop Stop
ANOVA in SPSS Let’s now use SPSS to perform the same analysis. NOTE: Students are supposed to have printed andbrought the “SPSS OUTPUT One-Way ANOVA” PDF file with them to class. ONE_WAY_EPS_SPSS_FILE
TWO-WAY ANOVA (with Interaction) In our EPS example, suppose you suspect that a company’s size category (small vs large)also may have a sig. effect on EPS. As such, since you did not attempt to control for company size when selecting your sample firms, small and large companies may not have been equally represented in the three industry groups (e.g., what if compared to the banks in the sample, all or a much greater % of retailers and utilities were small?). As such you are concerned that the potential confounding effect of company size may have distorted your earlier results. So, you now wish to examine possible EPS differences among the 3 industries while controlling for the possible confounding effect of company size (i.e., holding size constant/equal for the firms in our three industries). In other words, you wish to know if there are any differences among average EPS of banks, retailers, and utilities of equal size. .
TWO-WAY ANOVA (with Interaction) • So, Two-Way ANOVA will help us learn if banks in general, even after controlling for co. size, would, on average, have higher EPS than retailers and utilities. • But an additional advantage of Two-Way ANOVA is that it can also show us whether a particular group of banks (i.e., CERTAIN COMBINATIONS of industry and size category) are more/less conducive to EPS than others combinations of the two characteristics. • As just one example, it can show us if only the larger banks (and not all banks in general) have significantly higher EPS compared to firms in the other two industries (or compared to only the smaller firms in the other two industries).
TWO-WAY ANOVA (with Main & Interaction Effects): Analyze: General Linear Models Univariate: Y to “Dependent” box, Categorical X1 & X2 to the “Fixed Factors” box Model: Full, Continue Plots: X1 to “Horizontal”, X2 to “Separate Lines”, Add, Continue Post Hoc: Move factors (IVs) with >2 groups to “Post Hoc Tests” box, select “Tukey or Bonferoni”, Continue Options: Move Overall, X1, X2, and X1*X2 to “Display Means” Box, check “Descriptive Stats.”, Continue OK NOTE: Students are supposed to have printed and brought the “SPSS OUTPUT Two-Way ANOVA with Interaction” PDF file with them to class. ANOVA Using SPSS TWO_WAY_EPS_SPSS_FILE
Ho: There are no differences among the groups represented by either variable TWO-WAY ANOVA (Main & Interaction Effects Model) No Is overall F significant? (i.e., < 0.05) Yes Don’t reject Ho; No group diff. found; STOP Reject Ho; Some differences among the groups represented by at least one of the var. Determine if the interaction effect is significant? NOYES Examine plot of interaction effect for resultsa. Examine which main effect, if any, is significant (i.e., differences existacross categories of which independent variable). STOP b. Is the significant indep. var. dichotomous (i.e. represents only 2 groups)? Yes, only 2 groups No, more than 2 groups Examine the group means for that variable; report which group has higher/lower mean. Conduct post-hoc pairwise comparison tests for that var. to see where the differences lie. Examine the results. Examine the group means for that variable; report which groups have higher/lower means. STOP STOP
ANOVA CAUTION: Don’t get carried away with the number of factors (independent categorical variables); DON’T DO N-WAY ANOVA !!!
ANOTHER EXAMPLES: Using the gss.sav data file, we wish to find out if the age at which one gets married (agewed) is a function of one’s gender (sex) and highest educational degree (degree). That is, if average marriage age is different among the two genders and various educational groups. If so, in what way? NOTE: Here, we are considering/treating educational degree as a nominal/categorical variable, and NOT as an ordered metric variable. ANOVA Using SPSS
ASSIGNMENT 4 1. Suppose, as a social scientist, you are interested in studying gender differences in preference for different types of music. Specifically, you wish to know if there are differences between men and women relative to how much they like classical music (variables classical). The gss.savdata file (on your SPSS Data Disk) includes data regarding such issues. This data set represents 1500 randomly selected cases from the 1993 General Social Survey. Use the data from this SPSS file to address the above questions. NOTE: If you check the value labels for the variables classical, opera, and country in the gss.sav file, you will see that they were measured on 5-point scales (1=Like Very Much, 5=Dislike Very Much) and, thus, can be considered metric.
ASSIGNMENT 4 • As a staff researcher in the HR Department of a major company, you are interested in learning if there are differences among male and female employees and among employees who have different levels of education regarding the level of importance that they attach (a) to having a fulfilling job. Data regarding such issues have been obtained through the General Social Research Survey using a representative sample of approximately 1500 working men and women in the U.S. You have access to the resulting data (see gss.sav SPSS data file, variablessex, impjob,anddegree). Use this data set to address the above issues.
IMPORTANT NOTES FOR QUESTIONS 2, 3, AND 4: • Treat variable “degree” as a categorical/nominal variable. • When interpreting the results, please pay attention to the fact that if you check the value labels for the dependent variables, you will notice that it was measured on 5-point scales (1=One of Most Important, 5=Not at All Important). • If you find it necessary to conduct ad-hoc multiple comparison tests, use the Tukey option. • IMPORTANT: If alpha level for a given test is just slightly higher than 0.05 (e.g., 0.054) consider that difference statistically significant. REMINDERS: • For each analysis, include the Notes part of the SPSS output in the printout. Also edit the first page of every output to include your name. Make sure that you state your complete interpretations and explanationson the appropriate pages of the output. Be specific as to how you have used what parts of the output to reach your conclusions. Make sure that your explanations are complete. For example, it is not enough to say that there is a difference between groups A and B regarding characteristic C. You have to go on to indicate how the two groups are different on characteristic C (e.g., “on average, group A exhibits more/less of the characteristic C”).