Chapter 9: Factorial Experiments
Section 9.1: One-Factor Experiments • In general, a factorial experiment involves several variables. • One variable is the response variable, which is sometimes called the outcome variable or the dependent variable. • The other variables are called factors. • The question addressed in a factorial experiment is whether varying the levels of the factors produces a difference in the mean of the response variable.
More Basics • If there is just a single factor, then we say that it is a one-factor experiment. • The different values of the factor are called the levelsof the factor and can also be called treatments. • The objects upon which measurements are made are called experimental units. • The units assigned to a given treatment are called replicates.
Fixed Effects or Random Effects • If particular treatments are chosen deliberately by the experimenter, rather than at random from a larger population of treatments, then we say that the experiment follows a fixed effects model. An example is when we are testing four tomato fertilizers, then we have four treatments that we have chosen to test. • In some experiments, treatments are chosen at random from a population of possible treatments. In this case, the experiment is said to follow a random effects model. An example is an experiment to determine whether or not different flavors of ice cream melt at different speeds. We test a random sample of three flavors for a large population of flavors offered to the customer by a single manufacturer. • The methods of analysis for these two models is essentially the same, although the conclusions to be drawn from them differ.
Completely Randomized Experiment • Definition: A factorial experiment in which experimental units are assigned to treatments at random, with all possible assignments being equally likely, is called a completely randomized experiment. • In many situations, the results of an experiment can be affected by the order in which the observations are taken. • The ideal procedure is to take the observations in random order. • In a completely randomized experiment, it is appropriate to think of each treatment as representing a population, and the responses observed for the units assigned to that treatment as a simple random sample from that population.
Treatment Means • The data from the experiment thus consists of several random samples, each from a different population. • The population means are called treatment means. • The questions of interest concern the treatment means – whether they are all equal, and if not, which ones are different, how big the differences are, and so on. • To make a formal determination as to whether the treatment means differ, a hypothesis test is needed.
One-Way Analysis of Variance • We have I samples, each from a different treatment. • The treatment means are denoted 1,…, I. • The sample sizes are denoted J1,…, JI. • The total number in all the samples combined is denoted by N, N = J1+…+ JI. • The hypothesis that we wish to test is H0: 1 =…= I versus H1: two or more of the i are different • If there were only two samples, we might use the two-sample t test to test for the null hypothesis. • Since there are more than two samples, we use a method known as one-way analysis of variance (ANOVA).
Notation Needed • Since there are several samples, we use a double subscript to denote the observations. • Specifically, we let Xij denote the jth observation in the ith sample. • The sample mean of the ith sample: • The sample grand mean:
Example 1 Question: For the data in Table 1, find I, J1,…,JI , N, X23 , , and TABLE 1 Flux Sample Values Mean SD A 250, 264, 256, 260, 239 253.8 9.7570 B 263, 254, 267, 265, 267 263.2 5.4037 C 257, 279, 269, 273, 277 271.0 8.7178 D 253, 258, 262, 264, 273 262.0 7.4498
Example 1 cont. Answer: There are four samples, so I = 4. Each sample contains five observations, so J1= J2= J3= J4= 5. The total number of observations is N = 20. The quantity X23 is the third observation in the second sample, which is 267. The quantity is the sample mean of the third sample. This value is presented in the table and is 271.0. We can use the equation on a previous slide:
Treatment Sum of Squares • The variation of the sample means around the sample grand mean is measured by a quantity called the treatment sum of squares (SSTr), which is given by • Note that each squared distance is multiplied by the sample size corresponding to its sample mean, so that the means for the larger samples count more. • SSTr provides an indication of how different the treatment means are from each other. • If SSTr is large, then the sample means are spread widely, and it is reasonable to conclude that the treatment means differ and to reject H0. • If SSTr is small, then the sample means are all close to the sample grand mean and therefore to each other, so it is plausible that the treatment means are equal.
Error Sum of Squares • In order to determine, whether SSTr is large enough to reject H0, we compare it to another sum of squares, called the error sum of squares (SSE). • SSE measures the variation in the individual sample points around their respective sample means. • This variation is measured by summing the squares of the distances from each point to its own sample mean. • SSE is given by
Comments • The term that is squared in the formula for SSE is called a residual. • Therefore, SSE is the sum of the squared residuals. • SSE depends only on the distances of the sample points from their own means and is not affected by the location of the treatment means relative to one another. • So, SSE measures only the underlying random variation in the process being studied. • An easier computational formula is:
Example 1 cont. For the data in Table 1, compute SSTr and SSE.
Assumptions for the One-Way ANOVA The standard one-way ANOVA hypothesis test are valid under the following conditions: • The treatment populations must be normal. • The treatment populations must all have the same variance, which we will denote by 2. To check: • Look at a normal probability plot for each sample and see if the assumption of normality is violated. • The spreads of the observations within the various samples can be checked visually by making a residual plot.
When Assumptions Are Met • We can compute the means of SSE and SSTr. • The mean of SSTr depends on whether H0 is true, because SSTr tends to be smaller when H0 is true and larger when H0 is false. • The mean of SSTr satisfies the condition SSTr = (I – 1)2, when H0 is true SSTr > (I – 1)2, when H0 is false. • The likely size of SSE, and its mean, does not depend on whether H0 is true. The mean of SSE is given by SSE = (N – I)2.
The F test for One-Way ANOVA To test H0: 1 =…= I versus H1: two or more of the i are different • Compute SSTr. • Compute SSE. • Compute MSTr = SSTr/(I – 1) and MSE = SSE/(N – I). • Compute the test statistic: F = MSTr / MSE. • Find the P-value by consulting the F table (Table A.7 in Appendix A) with I – 1 and N – I degrees of freedom. Note: The total sum of squares, SST = SSTr + SSE.
Example 1 cont. For the data in Table 1, compute MSTr, MSE, and F. Find the P-value for testing the null hypothesis that all the means are equal. What do you conclude?
Confidence Intervals for the Treatment Means A level 100(1 - )% confidence interval for i is given by
Example 1 cont. Find a 95% confidence interval for the mean hardness of welds produced with flux A.
Computer Output One-way ANOVA: A, B, C, D Source DF SS MS F P Factor 3 743.40 247.800 3.87 0.029 Error 16 1023.60 63.975 Total 19 1767.00 S = 7.998 R-Sq = 42.07% R-Sq(adj) = 31.21% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----+---------+---------+---------+----- A 5 253.80 9.76 (-------*------) B 5 263.20 5.40 (------*-------) C 5 271.00 8.72 (-------*-------) D 5 262.00 7.45 (-------*-------) ----+---------+---------+---------+----- 250 260 270 280 Pooled StDev = 8.00 • The fifth column is the one titled “F”. This gives the test statistic that we just discussed. • The column “P” presents the P-value for the F test. • Below the ANOVA table, the value “S” is the pooled estimate of the error standard deviation, . • Sample means and standard deviations are presented for each treatment group, as will as a graphic that illustrates a 95% CI for each treatment mean.
Checking Assumptions • To check normality, we construct a normal probability plot. If the sample sizes are large enough, then a separate probability plot can be constructed for each sample. • If this assumption is satisfied, then the plot of the residuals should plot approximately on a straight line. • The assumption of equal variances is difficult to check with only a few observations in each sample. If it seems reasonable, then it can be concluded that this assumption is satisfied. This can be checked with the residual plot and examining how spread out each sample is.
Balanced versus Unbalanced Designs • When equal numbers of units are assigned to each treatment, the design is said to be balanced. • With a balanced design, the effect of unequal variances is generally not great. • With an unbalanced design, the effect of unequal variances can be substantial. • The more unbalanced the design, the greater the effect of unequal variances.
Random Effects Model • If the treatments are chosen at random from a population of possibly treatments, then the experiments are said to follow a random effects model. When the treatments are deliberately chosen by the experimenter, the experiments are said to follow a fixed effects model. • In a random effects model, the interest is in the whole population of possible treatments and there is no particular interest in the ones that happen to be chosen for the experiment. • There is an important difference in interpretation between the results of a fixed effects model and those of a random effects model. • In a fixed effects model, the only conclusions that can be drawn are conclusions about the treatments actually used in the experiment.
Conclusions with Random Effects Model • In a random effects model, however, since the treatments are a simple random sample from a population of treatments, conclusions can be drawn concerning the whole population, including treatments not actually used in the experiment. • In the random effects model, the null hypothesis of interest is H0: the treatment means are equal for every level in the population. • Although the null hypothesis for the random effects model differs from that of the fixed effects model, the hypothesis test is exactly the same.
Section 9.2: Pairwise Comparisons in One-Factor Experiments • In a one-way ANOVA, an F test is used to test the null hypothesis that all the treatment means are equal. • If this hypothesis is rejected, we can conclude that the treatment means are not all the same. • But the test does not tell us which ones are different from the rest. • Sometimes an experimenter has in mind two specific treatments, i and j, and wants to study the difference i - j.
More on Pairwise Comparisons • In this case, a method known as Fisher’s least significant difference (LSD) method is appropriate and can be used to construct confidence intervals for i - j or to test the null hypothesis that i - j = 0. • At other times, the experimenter may want to determine all the pairs of means that can be concluded to differ from each other. • In this case a type of procedure called a multiple comparisons method must be used. We will discuss two such comparisons, the Bonferroni method and the Tukey-Kramer method.
Fisher’s Least Significant Difference Method for Confidence Intervals and Hypothesis Tests • The Fisher’s least significant difference confidence interval, at level 100(1-)%, for the difference i - j is • To test the null hypothesis H0: i - j = 0, the test statistic is • If H0 is true, this statistic has a Student’s t distribution with N – I degrees of freedom. Specifically, if then is rejected at level .
Example 1 cont. In the weld experiment, hardness measurements were made for five welds from each of four fluxes, A, B, C, and D. The sample mean hardness values are 253.8, 263.2, 271.0, and 262.0, respectively. Before the experiment was performed, the carbon contents of the fluxes were measured. Flux B had the lowest carbon content and flux C had the highest. The experimenter is therefore particularly interested in comparing the hardness obtained with these two fluxes. Find a 95% confidence interval for the difference in mean hardness between welds produced with flux B and those produced with flux C. Can we conclude that the two means differ?
Output • Consider the output on the following slide. • The output presents the 95% Fisher LSD CI’s for each difference between treatment means. • The values labeled “Center” are the differences between pairs of treatment means. • The quantities labeled “Lower” and “Upper” are the lower and upper bounds of the confidence interval.
Fisher LSD Confidence Intervals Fisher 95% Individual Confidence Intervals All Pairwise Comparisons Simultaneous Confidence Level = 81.11% A subtracted from: Lower Center Upper B -1.324 9.400 20.124 C 6.476 17.200 27.924 D -2.524 8.200 18.924 B subtracted from: Lower Center Upper C -2.924 7.800 18.524 D 11.924 -1.200 9.524 C subtracted from: Lower Center Upper D -19.724 -9.000 1.724
Simultaneous Tests • The simultaneous confidence level of 81.11% in the previous output indicates that although we are 95% confident that any given confidence interval contains it true difference in means, we are only 81.11% confident that all the confidence intervals contain their true differences. • When several confidence intervals or hypothesis tests are to be considered simultaneously, the confidence intervals must be wider, and the criterion for rejecting the null hypothesis more strict, than in situations where only a single interval or test is involved.
More on Multiple Intervals • In this situations, multiple comparison methods are used to produce simultaneous confidence intervals or simultaneous hypothesis tests. • If level 100(1-)% simultaneous confidence intervals are constructed for differences between every pair of means, then we are confident at the 100(1- )% level that every confidence interval contains the true difference. • If simultaneous hypothesis tests are conducted for all null hypotheses of the form H0: i - j = 0, then we may reject, at level , every null hypothesis whose P-value is less than .
The Bonferroni Method for Simultaneous Confidence Intervals • Assume that C differences of the form i - jare to be considered. The Bonferroni simultaneous confidence intervals, at level 100(1-)%, for the C differences i - j are • We are 100(1-)% confident that the Bonferroni confidence intervals contain the true value of the difference i - j for all C pairs under consideration.
Bonferroni Simultaneous Hypothesis Tests • To test the C null hypotheses of the form H0: i - j = 0, the test statistics are • To find the P-value for each test, consult the Student’s t table with N – I degrees of freedom, and multiply the P-value found there by C. • Specifically, if then H0is rejected at level .
Example 1 cont. For the weld data, use the Bonferroni method to determine which pair of fluxes, if any, can be concluded, at the 5% level, to differ in their effect on hardness.
Disadvantages • Although easy to use, the Bonferroni method has the disadvantage that as C becomes large, the confidence intervals become very wide, and the hypothesis tests have low power. • The reason for this is that the Bonferroni method is a general method, not specifically designed for analysis of variance or for normal populations. • In many cases C is fairly large, in particular it is often desired to compare all pairs of means.
Alternative Method • In these cases, a method called the Tukey-Kramer method is superior, because it is designed for multiple comparisons of means of normal populations. • The Tukey-Kramer method is based on a distribution called the Studentized range distribution, rather that on the Student’s t distribution. • The Studentized range distribution has two values for degrees of freedom, which for the Tukey-Kramer method are I and N – I. • The Tukey-Kramer method uses the 1 - quantile of the Studentized range distribution with I and N – I degrees of freedom, this quantity is denoted qI,N –I , .
Tukey-Kramer Method for Simultaneous Confidence Intervals • The Tukey-Kramer level 100(1-)% simultaneous confidence intervals for all differences i - j are • We are 100(1-)% confident that the Tukey-Kramer confidence intervals contain the true value of the difference i - j for every i and j.
Tukey-Kramer Method for Simultaneous Hypothesis Tests • To test all the null hypotheses of the form H0: i - j = 0 simultaneously, the test statistics are • The P-value for each test is found by consulting the Studentized range table (Table A.8) with I and N – I degrees of freedom. • For every pair of levels i and j for which the null hypothesis H0: i - j = 0 is rejected at level .
Output • The output on the following slide presents the 95% Tukey Simultaneous CI’s for each pair of treatment means. • The values labeled “Center” are the differences between pairs of treatment means. • The quantities labeled “Lower” and “Upper” are the lower and upper bounds of the confidence interval. • We are 95% confident that every one of these confidence intervals contains the true difference in treatment means. The “Individual confidence level” is 98.87%, this means that we are 98.87% confident that a specific confidence interval contains its true value.
Tukey-Kramer Confidence Intervals Tukey 95% Simulataneous Confidence Intervals All Pairwise Comparisons Individual Confidence Level = 98.87% 1 subtracted from: Lower Center Upper 2 109.4 385.3 661.1 3 21.4 312.3 603.1 4 -94.6 170.9 436.4 2 subtracted from: Lower Center Upper 3 -348.9 -73.0 202.9 4 -463.4 -214.3 34.7 3 subtracted from: Lower Center Upper 4 -406.8 -141.3 124.1
Example 1 cont. For the weld data, which pair of fluxes, if any, can be concluded, at the 5% level, to differ in their effect on hardness?
Section 9.3: Two-Factor Experiments • In one-factor experiments, the purpose is to determine whether varying the level of a single factor affects the response. • Many experiments involve varying several factors, each of which may affect the response. • In this section, we will discuss the case in which there are two factors. • The experiments are called two-factor experiments. • If one factor is fixed and one is random, then we say that the experiment follows a mixed model. • In the two factor case, the tests vary depending on whether the experiment follows a fixed effects model, a random effects model, or a mixed model. Here we discuss methods for experiments that follow a fixed effects model.
Example Table 2: Yields of a Chemical Process with Various Combinations of Reagent and Catalyst Reagent Catalyst 1 2 3 A 86.8, 82.4, 86.7, 83.5 93.4, 85.2, 94.8, 83.1 77.9, 89.6, 89.9, 83.7 B 71.9, 72.1, 80.0, 77.4 74.5, 87.1, 71.9, 84.1 87.5, 82.7, 78.3, 90.1 C 65.5, 72.4, 76.6, 66.7 66.7, 77.1, 76.7, 86.1 72.7, 77.8, 83.5, 78.8 D 63.9, 70.4, 77.2, 81.2 73.7, 81.6, 84.2, 84.9 79.8, 75.7, 80.5, 72.9 • Four runs of the process were made for each combination of three reagents and four catalysts. • There are two factors, the catalyst and reagent. • The catalyst is called the row factorsince its values varies from row to row in the table. • The reagent is called the column factor. • We will refer to each combination of factors as a treatment(some call these treatment combinations). • Recall that the units assigned to a given treatment are called replicates. • When the number of replicates is the same for each treatment , we will denote this number by K. • When observations are taken on every possible treatment, the design is called a complete design or a full factorial design.
Notes • Incomplete designs, in which there are no data for one or more treatments, can be difficult to interpret. • When possible, complete designs should be used. • When the number of replicates is the same for each treatment, the design is said to be balanced. • With two-factor experiments, unbalanced designs are much more difficult to analyze than balanced designs. • The factors may be fixed or random.
Set-Up • In a completely randomized design, each treatment represents a population, and the observation on that treatment are a simple random sample from that population. • We will denote the sample values for the treatment corresponding to the ith level of the row factor and the jth level of the column factor by Xij1,…, XijK. • We will denote the population mean outcome for this treatment by ij. • The values ij are often called thetreatment means. • In general, the purpose of a two-factor experiment is to determine whether the treatment means are affected by varying either the row factor, the column factor, or both. • The method of analysis appropriate for two-factor experiments is called the two-way analysis of variance.
Parameterization for Two-Way Analysis of Variance • For any level i of the row factor, the average of all the treatment means ij in the ithrow is denoted . We express in terms of the treatment means as follows: • Similarly, for level j of the column factor, the average of all the treatment means ij in the jthrow is denoted . We express in terms of the treatment means as follows: • We define the population grand mean, denoted by , which represents the average of all the treatment means ij. The population grand mean can be expressed in terms of the previous means:
More Notation • Using the quantities we just defined, we can decompose the treatment mean ij as follows: • This equation expresses the treatment mean ij as a sum of four terms. In practice, simpler notation is used for the three rightmost terms in the above equation.
Interpretations • The quantity is the population grand mean, which is the average of all the treatment means. • The quantity i is called the ith row effect. It is the difference between the average treatment mean for the ith level of the row factor and the population grand mean. The value of i indicates the degree to which the ith level of the row factor tends to produce outcomes that are larger or smaller than the population mean. • The quantity j is called the jth column effect. It is the difference between the average treatment mean for the jth level of the column factor and the population grand mean. The value of j indicates the degree to which the jth level of the column factor tends to produce outcomes that are larger or smaller than the population mean.