Psych 5500/6500

Psych 5500/6500 Comparisons Among Treatment Means in Single-Factor Experiments Fall, 2008

Example Let’s go back to the example from the previous lecture. We have an experiment where the independent variable is ‘Type of Therapy’ (Control Group, Behavior Modification, Psychoanalysis , Client-Centered, Gestalt) and the dependent variable is level of depression after 2 months. H0: μCG= μBM= μPA= μCC= μG HA: at least one μ is different than the rest.

Data

Summary Table F(4,10)=6.46, p=.008 We can conclude that at least one μ is different than the rest. We will refer to this as the ‘overall’ or ‘omnibus’ F.

Comparisons Comparison procedures allow us to ask much more specific questions. There are many different comparison procedures, but they all involve comparing just two things with each other. This can be accomplished several ways, for a specific comparison you can: • Drop all but two groups. • Add groups together to get to two groups • Drop some groups and add others together (to end up with two groups). An important restriction is that no group can appear on both sides of a comparison.

Examples Dropping all but two groups. In this particular example we will compare each therapy group with the control group giving us 4 different comparisons (note other pair-wise comparisons could also be made, e.g. BM vs. Gestalt) • Control group vs. Behavior Mod H0: μCG = μBM • Control group vs. Psychoanalysis H0: μCG = μPA • Control group vs. Client-Centered H0: μCG = μCC • Control group vs. Gestalt H0: μCG = μG

Example Control Group vs Behavior Modification These are the means being compared, but as we shall see the analysis uses the within-group variance of all the groups.

Adding groups together • Control group vs. Therapy (i.e. the Control group vs. all of the therapy groups combined into one large ‘therapy’ group).

Example

Dropping some groups and adding other groups together • Behavior Modification versus ‘Talking Therapies’ (i.e. drop the control group and compare the Behavior Mod group to a group consisting of Psychoanalysis, Client-Centered, and Gestalt combined)

Computing the Comparison With a comparison we always end up with two groups, we could simply due a t test for those two groups. Comparison procedures, however, do something that has more power, they: • Recompute MSBetween using the two groups of the comparison. • Borrow the MSWithin from the overall F test. • F= MSBetween / MSWithin • If you want to do a directional (one-tailed) hypothesis, then adjust the p value accordingly.

Computing the Comparison This comparison procedure has more power than just doing a t test on the two groups of the comparison. • The df for MSWithin comes from all of the groups, even if the comparison involves less than all of the groups. • When you combine groups together that can cause more within group variance that would hurt the power of the t test.

I’ve changed the data here to make a point. If this was analyzed using the t test than the variance within the therapy group (a combination of the data from three therapies) would be large and would hurt the power. But with the comparison technique MSwithin only looks at the variance within each of the three therapy groups & within the control group.

Number of Possible Comparisons As the number of groups in the experiment grows, the number of possible comparisons gets very large. For example, in an experiment with 4 groups (‘A’, ‘B’, ‘C’, and ‘D’) some of the comparisons would be: A vs B, A vs C, A vs D, B vs C, B vs D, C vs D, A vs (B+C), A vs (B+D), A vs (C+D), B vs (A+D), B vs (C+A). B vs (C+D), C vs (A+B), C vs (A+D), C vs (A+D)...etc...(A+B) vs (C+D), (A+C) vs (B+D)...etc....A vs (B+C+D), B vs (A+C+D), C vs (A+B+D), D vs (A+B+C).

Error Rate A concern is that if H0 is true, and we make lots of comparisons, and each has a .05 chance of making a type 1 error (rejection H0 when H0 is true), then the chances of making at least one type 1 error becomes quite large. The (rather large and complicated) topic of ‘comparisons’ provides a variety of procedures for keeping the probability of making a type 1 under control when making many comparisons.

Definitions • Error rate per comparison: p(making a Type 1 error when performing any one specific comparison|H0 true for that comparison) • Error rate per comparison set: p(making at least one Type 1 error when you perform a set of comparisons|H0 true for all of them) • Error rate per experiment: If you make several sets of comparisons when analyzing the data from an experiment, then this would be: p(making at least one Type 1 error in the analysis of the data from the experiment|H0 true for all of them)

Example We are going to make two, independent, comparisons (and the null hypothesis happens to be true for both). Each has a .05 chance of making a Type 1 error (i.e. α=.05 per comparison). Thus the error rate per comparison = .05. Now lets calculate the error rate for that set of two comparisons...

Remember, you can only make a Type 1 error when H0 is true, we will assume H0 is true for this example: We will let ‘C’ stand for making a correct decision to not reject H0. p(C)=.95 We will let ‘E’ stand for incorrectly rejecting H0, thus making a Type 1 error. p(E)=.05

Possible Outcomes The probability of making at least one Type 1 error in two comparisons = .0475+.0475+.0025=.0975. Thus the error rate per comparison set =.0975

Formula for Error Rate Per Comparison Set If: αPC= error rate per comparison, and αPCS= error rate per comparison set, and k=the number of independent comparisons that will be made, Then: αPCS=1 – (1- αPC)K

Examples If error rate per comparison =.05, and you do three comparisons, then error rate for that comparison set αPCS=1 – (1- .05PC)3 =.143 With ten comparisons, the error rate per comparison set = .40 (a 40% chance of making at least one Type 1 error!)

Non-independent Comparisons There is no simple formula for computing error rate per comparison set when the comparisons are not independent. We will get to the difference between independent and dependent comparisons in a second. First I would like to review the various concerns that arise when making several comparisons.

Error Rate Concern #1 If H0 is true: • The more comparisons you make the more likely it is you’ll make a Type 1 error. This is the point we just covered.

Error Rate Concern #2 • One improbable mean (one that differs from the others quite a bit just due to chance) can lead to many wrong decisions to reject H0. Every comparison involving that mean could lead to the decision to reject H0.

Error Rate Concern #3 • If you have several groups in your experiment, a comparison of the lowest mean and highest mean is likely to lead to a rejection of H0. The overall F test is not fooled by this, it knows how many groups are in the experiment and looks at how much they all differ from each other. If you were, however, to drop all of the groups except the one with the highest mean and the one with the lowest mean, and do a t test on those two groups, you will probably be able to reject H0 even when the independent variable had no effect.

Controlling Error Rate There are many, many, procedures for controlling error rate when making multiple comparisons, they all take a different approach to the problem. We will look at a few procedures that cover the basic concepts. To differentiate among those procedures we need to determine if the comparisons are a priori or a posteriori, and if they are orthogonal (independent) or non-orthogonal (non-independent).

a priori and a posteriori a priori comparisons (also known as ‘planned comparisons’) are those you know you are going to want to do before you gather your data. They are based upon theory, specifically, upon which comparisons will shed light on the hypotheses you are testing. a posteriori comparisons (also known as ‘post hoc’ or ‘data snooping’ comparisons) are those you run to examine unexpected patterns in your data, or just to snoop out what else your data might tell you.

Orthogonality Conceptual: Comparisons are orthogonal when they lead to analyses that are not redundant. For example, if one comparison allows us to compare group 1 with group 2, that analysis will not in any way predict the results of a comparison that allows us to compare group 3 with group 4. Those two comparisons are ‘orthogonal’ (i.e. non-redundant).

Sets of Orthogonal Comparisons If ‘a’ is the number of groups in your experiment, it is possible to come up with a set of a-1 comparisons that are all orthogonal to each other. To determine whether or not comparisons are orthogonal to each other we need to express them as ‘contrast codes’

Contrast Codes Contrast codes are sets of constants that add up to zero and that indicate which groups are involved in a comparison (and how much to weight each one). Comparison 1: CG vs. BM Comparison 2: PA vs. (CC+G) Comparison 3: (CG+BM) vs. (CC+G)

Testing Orthogonality Two comparisons are orthogonal if the sum of their products (as demonstrated below) equals zero. The sum of the products = 0+0+0+0+0=0 so these two comparisons are orthogonal

Testing Orthogonality Now let’s see if comparisons 1 and 3 are orthogonal.. The sum of the products = 1/2+-1/2+0+0+0=0 so these two comparisons are orthogonal

Testing Orthogonality Now let’s see if comparisons 2 and 3 are orthogonal.. The sum of the products = 1/4+1/4+0+0+00 so these two comparisons are not orthogonal.

Comparisons 2 and 3 are not orthogonal, so this was not a set of orthogonal comparisons (they all need to be orthogonal to each other, tested by looking at each pair of contrasts). However, as there are five groups in the experiment we should be able to come up with a set of four comparisons that are all orthogonal to each other (pairwise). Actually we can come up with many different sets of four orthogonal comparisons, but we can’t come up with five comparisons that are orthogonal to each other if we have five groups in the experiment.

Examples of Orthogonal Sets Check them out, they are all orthogonal to each other (tested as pairs). Note the general pattern here; compare two groups, combine those two groups and compare them to a third group, combine those three groups and compare them to a fourth, and so on.

Another Example Check them out, this set of comparisons is also orthogonal (pairwise to each other). Note the pattern: compare 2 groups, compare 2 other groups, compare the first pair with the second, combine them all and compare them to yet another group. There are many possible sets of orthogonal comparisons, but unless the same comparison happens to appear in both sets (like comparison 4 here), the comparisons in this set will not be orthogonal to the comparisons in a different set.

Orthogonal Sets There are various patterns of orthogonality. After a while you get the hang of creating them. Remember, the maximum number of comparisons that can all be orthogonal to each other is a-1. Before going on to the next slide, go back and reread the material on error rate, error rate per comparison and error rate per comparison set, and the three factors the lead to high error rates.

Error Rate Concerns (revisited) Any method to control error rate has to address these three concerns: • The error rate per comparison set goes up as the number of comparisons goes up. • One weird mean might lead to a lot of mistaken rejections of H0. • In an experiment with several groups, a comparison of the largest and smallest means is likely to be significant.

Comparison Procedures We will be looking at three different comparison procedures, they each address the error rate concern in a different fashion: • A priori, orthogonal comparisons • Dunn’s method for a priori, non-orthogonal comparisons (also called the Bonferroni method) • Scheffe’s method for a posteriori comparison.

1) A priori, orthogonal comparisons To perform comparisons on a set of a priori, orthogonal comparisons: • You do not have to first reject H0 on the overall F test before doing these comparisons (as we will see you do need to reject H0 on the overall F test before doing a post hoc comparison) • Set the error rate per comparison at your normal significance level (i.e. .05) , which will make the error rate per comparison set greater than .05 • You can transform the p value of the comparison to test a directional hypothesis.

How Error Rate is Controlled • The number of comparisons is limited to a-1 by the requirement that they all be orthogonal to each other. • One weird mean is not likely to lead to lots of Type 1 errors, as the comparisons are orthogonal and thus aren’t redundant in the questions being asked. • As the comparisons are decided upon a priori you can’t wait until after you see the means and then select the biggest and smallest for your comparison.

General Strategy for Controlling Error Rate The other comparison procedures keep error rate per comparison set from getting too large by using a more stringent error rate per comparison. In other words, they keep you from falsely rejecting H0 over many comparisons by making it harder to reject H0 for each comparison.

Dunn’s Method for a priori, Non-Orthogonal Comparisons* • You do not have to first reject H0 on the overall F test. • Set your significance level for each comparison at α/(# of comparisons). See the next slide for more details. • You can transform the F of the comparison to a t test to test a directional hypothesis, or simply adjust the p value * Also known as Bonferroni t

Significance Level for Dunn’s Method You are controlling type 1 error by using a smaller significance level equal to α/(# of comparisons). The appropriate ‘# of comparisons’ is a matter of some controversy. The common options (in order of ascending conservatism) are: • The number of comparisons you will be making within the set of nonorthogonal a priori comparisons. • The total number of a priori comparisons you will be making (orthogonal as well as nonorthogonal). • The total number of statistical tests you will be performing on the data. For example, using criteria #1, if you are making 10 a priori, non-orthogonal comparisons then set your significance level for each comparison at .05/10=.005

How Error Rate is Controlled • You can make as many a priori comparisons as you would like, but the more you make the harder it is to reject H0 on each one. • One weird mean may appear in several comparisons in a redundant way, but the correction to the error rate per comparison should help control that to some degree. • As the comparisons are decided upon a priori you can’t wait until after you see the means and then select the biggest and smallest for your comparison.

A Posteriori Comparisons The general rule for a posteriori comparisons is that first you need to reject H0 on the overall F test before you can do an a posteriori comparison. In other words, first you have to show their is an effect somewhere in the data before you can snoop around looking for where it is. There are many, many a posteriori comparison procedures available, we will look at one that can do any comparison you want.

Scheffe’s Method for a posteriori Comparisons • You first have to reject H0 on the overall F test. • For each comparison, use (Fc from the overall F test)x(a-1) as your Fc value. In our example it would be Fc =(3.48)(5-1)=13.92

How Error Rate is Controlled • You can make every possible comparison, but you have set your critical value so high that the error rate per comparison set = your significance level. • One weird mean may appear in several comparisons in a redundant way, but the correction to the error rate per comparison controls that. • The overall F test must be significant before you do this procedure, the overall F test is not fooled by the likely big difference between the largest and smallest mean when you have many groups.

Advantages and Disadvantages Advantage of Scheffe’s method: you can do any comparisons you want, and as many as you want. Disadvantage: Scheffe’s assumes you are going to do every possible comparison, and so it makes the error rate per comparison very low. If you don’t make a lot of comparisons then you have conservatively biased your chances of rejecting H0.

Additional Comparison Procedures • Tukey’s HSD Test: this is used to make all possible pair-wise comparisons among the group means. The error rate per comparison set is your significance level (i.e. .05), while the error rate per comparison is less than your significance level. • Dunnett’s Test: this is used to compare each treatment group one at a time with the control group. Again, the error rate per comparison set is your significance level, while the error rate per comparison is less than your significance level.

Psych 5500/6500

Psych 5500/6500

Presentation Transcript

MHA 6500

CBio 4500/6500

Form 5500 Filing Requirements

CBio 4500/6500

MPC500/5500

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

EMR 6500: Survey Research

Psych 5500/6500

6500-CX Calibration

Psych 5500/6500

PD-6500 i

Psych 5500/6500

EMR 6500: Survey Research