1 / 26

Within-Subjects (Repeated Measures) Designs

PSY 4603 Research Methods. Within-Subjects (Repeated Measures) Designs. Introduction

Télécharger la présentation

Within-Subjects (Repeated Measures) Designs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PSY 4603 Research Methods Within-Subjects (Repeated Measures) Designs

  2. Introduction • Both terms - within subjects and repeated measures - stress the nature of this type of design, namely, that repeated measurements are taken and treatment effects are associated with differences observed within subjects. • Within-subjects designs are very popular for analyzing changes, especially changes across time. • Within-subjects designs are the obvious choice to study such behavioral changes as learning, transfer of training, forgetting, attitude change, and so on. • On the other hand, they are particularly efficient and sensitive, especially in comparison with an equivalent between-subjects design. • Here, I will examine the analysis of the single factor within-subjects design. • Then, we will consider the issues that arise when a within-subjects factor is combined with a between-subjects factor in what is called a mixed within-subjects design.

  3. Advantages of Within-Subjects Designs • The main advantage of within-subjects designs is the control of subject variability - that is, individual differences. • Take as an example a single-factor experiment. • When we form independent groups of subjects in a between-subjects design, we do not expect them to be equivalent in ability or on any other factor, for that matter. • We realize that these between-group differences, which are the natural outcome of randomly assigning subjects to conditions, will be superimposed over whatever treatment effects we may have been fortunate enough to produce by our experimental manipulations. • But suppose we select only one group of subjects and have them serve in all the treatment conditions. • On the face of it, such a procedure seems to guarantee that any differences we observe among the treatment conditions will reflect the effects of the treatment alone. • A moment's reflection, however, will indicate that this is an oversimplification. • For this ideal outcome to take place, a subject would have to remain constant during the course of the experiment so that we can attribute any changes in behavior to the effects of the independent variable. Such an expectation is unrealistic, of course. • Individuals will respond differently on each test for a host of reasons: changes in attention and motivation, learning about the task, and so on. • Not only is a subject not the "same" individual on successive tasks, but we also expect other uncontrolled sources of variability, such as variations in the physical environment or in the testing apparatus, to show themselves by producing differences between the treatment means. • In short, then, the variance attributed to factor A will still contain an error component even when the same subjects are tested in all the treatment conditions. • On the other hand, these sorts of changes will probably not be as great as the differences produced through the random assignment of subjects to the different experimental conditions. The error component associated with factor A, therefore, should be smaller in the case of repeated measures than that expected in an experiment with independent groups of subjects. • This reduction in error variance represents a direct increase in economy and statistical power.

  4. Inter-subject Variability vs. Intra-subject Variability Inter-subject Variability Randomly Distributed Completely Randomized Design Held Constant (for included Control Variable(s) ) Randomized Blocks Design Potentially Eliminated -Subjects serve as own control- Repeated Measures Design

  5. There are also other ways in which the within-subjects design is economical. • The running time per observation, for example, may be cut drastically by the omission of detailed instructions that overlap with the different treatment conditions. • In addition to an increase in efficiency, the repeated-measures design has become the most common experimental design with which to study such phenomena as learning, transfer, and practice effects of all sorts. • In these research areas, the interest is in the changes in performance that result from successive experience with a task. • In a training experiment, for example, the experience consists of repeated exposures of the same learning task - number of trials becomes the independent variable. • In a transfer experiment, the interest may be in the development of learning skills through experience with other tasks and materials. • In these studies, each subject receives each level of the independent variable, which in this case, consists of the various numbers of previous trials or tasks.

  6. Comparing a CRD & the RMD

  7. Disadvantages of Within-Subjects Designs • Several major disadvantages are associated with repeated-measures designs, and they are all interrelated. • One concerns the fact that subjects will change systematically during the course of multiple testing. • I will refer to any such overall change as a practice effect. • Subjects may show a general improvement during the course of testing, in which case, the practice effect is positive; alternatively, fatigue or boredom may build up on the successive tests to produce a negative practice effect. • In some research areas, we can effectively disregard these positive practice effects when we have reason to believe that performance has effectively reached an asymptote, so that additional practice on the task does not produce any further improvement. • For instance, if we are studying sensory functioning or performance on motor tasks that require a lot of learning, we ordinarily assume that a general practice effect is no longer present. • On the other hand, if fatigue or boredom is the major source of change in performance with multiple testing, it may be possible to eliminate these factors by introducing a rest of sufficient length between successive tasks or by using highly motivating instructions and incentives. • In most cases, however, researchers generally assume that practice effects will be present and that they cannot be eliminated completely. • Since practice effects are a real possibility, it follows that the current performance of a subject will reflect in part the effect of the particular treatment being administered-the direct effect of the treatment-as well as any practice effect that may also be present-indirect effects from prior experience in the experiment. • Only the treatment administered first is immune to the effects of practice. • A problem arises if we use the same order in administering the treatments to all subjects since we will be unable to disentangle the contribution of the treatment effect from the contribution of the practice effect. • In this case, then, practice effects and treatment effects are confounded.

  8. A common solution to this problem is to employ enough testing orders to ensure the equal occurrence of each experimental treatment at each stage of practice in the experiment. • This is usually accomplished through counterbalancing. • A second difficulty with within-subjects designs is the possibility of differential carryover effects, which counterbalancing will not control. • In contrast with general practice effects, which affect all treatment conditions equally, differential carryover effects are quite specific, the earlier administration of one treatment affecting a subject's performance on a later condition one way and on a different condition another way. • Consider, for example, an experiment with k= 3 conditions. • Will two of the conditions - a1 and a2 - equally affect the subsequent administration of the third, a3? • Suppose the three treatment conditions differ greatly in difficulty. • Will subjects receiving the most difficult condition first behave exactly the same way under the conditions of medium difficulty, say, as subjects receiving the easiest condition first? • Or will subjects receiving a control condition first display the same level of performance on one experimental treatment as do subjects receiving instead another experimental treatment first? • In either case, if the answer is no, we have an instance of differential carryover effects.

  9. Controlling Practice Effects • Counterbalancing • Counterbalancing refers to a technique of ordering sequences of conditions so that each treatment is administered first, second, third, and so on an equal number of times. • Consider the counterbalancing arrangement in the upper portion of the table, which is often called a Latin square. • There are k = 4 treatment conditions and each subject receives each condition once. • The order in which the conditions are presented is represented by the columns of the square (called testing positions), and the particular sequence of treatments is indicated by the entries in the rows of the square. • To balance practice effects, we need four sequences. • In the first sequence, for example, the subject (sl) receives the treatments in the order a1, a2, a3, a4 whereas in the other three sequences, the subjects (s2, s3, and s4) receive the different treatments in the orders specified in the table (Latin Square = Incomplete Counterbalancing). If you examine the entries in the first column of the table, you will note that each level of factor A is presented once as the first task that different subjects receive. Moving to the second column, you can see that each condition is again presented once as the second task different subjects receive. The same property holds true for the remaining columns in the table. The purpose of this arrangement is to spread any practice effects over the four treatment conditions equally. No matter what form the practice effects may take, their influence is the same for each of the treatments.

  10. Examining Practice Effects. Consider the data presented in the body of this counterbalancing matrix. • I have rearranged these scores according to the relevant treatment conditions in the lower matrix of the last table (shown here too). • The column marginal means for this data matrix reflect the treatment effects we found in the example. • Suppose, for a moment, that we had presented the four conditions in only one of these orders, the fourth sequence, say. • As you can see, we would have observed a different pattern of differences than the one we found when we averaged the scores over the four sequences. • When we consider the column averages, a4, for example, is the best of the four conditions, but when we consider the same condition in the fourth sequence, it is the worst. • You will see other discrepancies as well. In fact, if you examine the patterns of results obtained with each of the four sequences, you will discover that each pattern is different. These differences in outcome have occurred because each score reflects the combined effects of the treatment received and of practice. It is only when we bring together the data from the different counterbalancing sequences that the treatment effects are revealed, without contamination by the effects of practice.

  11. To show you how this occurs, I have plotted the scores in the figure as a function of condition and of testing position. • An inspection of the figure reveals a marked practice effect, but one that is the same for each treatment condition. • That is, there is no interaction between testing position and treatments; the same treatment effects are found at each of the testing positions. • The absence of interaction between position and treatments is critical in a within-subjects design. Plot of the Data from the Table

  12. Now consider the interaction depicted in the next figure. • You can see that differences among the treatment conditions tend to diminish during the course of testing. • When the four tasks are given as first tasks in the four different sequences, the differences among the treatments are at a maximum. • In contrast, when the four tasks are given as the last tasks in the different sequences, the differences among them very nearly disappear. An Example of a Treatment X Position Interaction

  13. Given this particular outcome, we still might be willing to generalize these findings in spite of the interaction, because the results are consistent. • If the interaction had exhibited reversals, where the ordering of the conditions changed from test to test, however, an overall F test of the main effect of treatments would be relatively meaningless. • It is always possible to analyze the data from the first testing session separately from those of the other sessions. • Performance at this point is completely uncontaminated by the effects of prior testing. • Of course, such an analysis involves a retreat to a single-factor design with independent groups of subjects, and we lose the advantages of the within-subjects design. • We will also be left with a relatively impoverished experiment -that is, small numbers of subjects in each treatment condition -and quite low power. • It might be possible to use portions of the data matrix in which the interaction is absent. • In this example, we might base an analysis on the first two testing sessions, changing the design from the original within-subjects design in which each subject receives all treatments to one in which subjects receive only two.

  14. Differential Carryover Effects The treatment x position interaction depicted in last figure is an example of differential carryover effects. Second Test Measure First Test Measure ½ of the Subjects ½ of the Subjects

  15. Statistical Model And Assumptions • Here, we will consider the statistical model and special assumptions that underlie the analysis of the single-factor within-subjects design. • As you will see, even small deviations from these assumptions complicate the interpretation of the overall F test. • Linear Model • The linear model underlying the analysis of variance is usually specified by expressing the basic score Yijas a sum of a number of quantities: • From this basic statement, expected values of the mean squares for the sources we normally extract in the analysis are written in terms of population variance components.

  16. The HomogeneityAssumptions • The statistical analysis of within-subjects designs operates under the same distribution assumptions required of completely randomized designs, namely, normality, homogeneity of within-treatment variances, and independence. • In addition, however, certain assumptions are made concerning the correlations between the multiple measures obtained from the same subjects. • The Assumptions. Suppose we arrange the data from the (A x S) design into a set of smaller AS matrices formed by isolating pairs of treatment conditions. • There would be three such matrices for our numerical example, one consisting of levels a1 and a2, another of levels a1 and a3, and a third of levels a2 and a3. • For each pair of treatments, suppose we subtract the two scores for each of the subjects and then calculate the variances based on these three sets of difference scores. • The assumption is that these three variances of differences are equal in the population. • This assumption is more formally stated in terms of population within-treatment variances and of correlations between pairs of treatments and is referred to as the sphericity assumption (also called the circularity assumption).

  17. Implications of Violating the Sphericity Assumption. Violating the assumption of homogeneity of within-group variances in the completely randomized designs does not affect our evaluation of F tests unless the ratio of the largest to the smallest variance is greater than 3. • In contrast, even minor violations of the sphericity assumption in repeatedmeasures designs can seriously affect our interpretation of F ratios. • More specifically, these violations produce sampling distributions of the F ratio that are not distributed as F when the null hypothesis is true, which means that the standard F tables cannot be directly used to judge the significance of an observed F. • Since it is known that when violations are present the actual sampling distribution shifts to the right of the central F distribution, the critical values of F obtained from any table (Appendix) of any Statistics book are too small. • That is, the actual critical values we should be using are larger than those listed in the F table. • Under these circumstances, the F test is said to be biased in a positive direction. • It could be the case, for example, that the tabled value of F at  = .05 actually represents a significance level that is greater than .05 - for example,  = . 10. • If we do not make an adjustment in our rejection procedure, we will in effect be operating at a more "lenient" significance level than we had set originally.

  18. Correcting the Positive Bias. Several ways of solving the problem of positive bias have been proposed in the literature. • One solution is to perform the usual analysis of variance but to evaluate the observed F ratios against a new critical value that for statistical convenience assumes the presence of maximal heterogeneity. • In practice, this is accomplished easily by evaluating the F in this design with dfnum. = 1 and • dfdenom. = n - 1, instead of dfnum. = a - 1 and dfdenom. = (a - 1)(n - 1). • Applied to our numerical example, in which a = 3 and n = 6, we would use F(l, 5)= 6.61 as our critical value, rather than F(2, 10) = 4.10. • Since the observed value of F was 4.72, we would not have declared the overall F significant if we had performed this corrective test. • This procedure is known as the Geisser-Greenhouse correction (Geisser & Greenhouse, 1958). • It is important to note that the mean squares obtained from the analysis are still calculated with the usual df 's and not on these corrected ones. • The corrected df 's are used only when we turn to the F table to find the critical value. • The main difficulty with the Geisser-Greenhouse correction is that it tends to overcorrects reducing the Type I error below the desired level. • That is, the significance level may actually be  = .02 rather than the value planned (  = .05). Only if the heterogeneity is at its theoretical maximum will the new statistical test reflect the correct significance level. • In other words, the F ratios are now biased in a negative direction.

  19. As an example of a simple repeated-measures design, we will consider a study of the effectiveness of relaxation techniques in controlling migraine headaches. We all know just how stressful the workplace can be and that headaches can drastically diminish productivity. The data described here are fictitious, but they are in general agreement with data collected by Blanchard, Theobald, Williamson, Silver, and Brown (1978), who ran a similar, although more complex, study. • In this experiment we have recruited nine employees who have reported suffering from a number of migraines and have asked them to record the frequency and duration of their migraine headaches. After 4 weeks of baseline recording during which no training was given, we had a 6-week period of relaxation training. (Each experimental subject participated in the program at a different time, so such things as changes in climate and holiday events should not systematically influence the data.) For our example we will analyze the data for the last 2 weeks of baseline and the last 3 weeks of training. The dependent variable is the duration (hours/week) of headaches in each of those 5 weeks. The data and the calculations are shown in the table.

  20. The Results …

  21. The Huynh-Feldt p value (0.000) does not differ from the p value for the F statistic (here at all). Compound symmetry appears to be satisfied and weight changes significantly over the five measures (trials). Specificity = The amount of a variable's variance not explained by the extracted factors. Also called specific variance or uniqueness.

  22. The polynomial tests (analysis of trend) indicate that most of the trials effect (change across time) can be accounted for by a linear trend across time. • In fact, the sum of squares for TIME is 2449.200, and the sum of squares for the linear trend is almost as large (2016.400). • Thus, the linear polynomial accounts for roughly 82.3% of the change across the repeated measures. 2016.40 / 2449.20 = .82322 or 82.3%

  23. Grade Inflation [Handout]

More Related