750 likes | 780 Vues
STT 511-STT411: DESIGN OF EXPERIMENTS AND ANALYSIS OF VARIANCE Dr. Cuixian Chen. Chapter 3: Experiments with a single factor: ANOVA. Portland Cement Formulation (page 26). Review: Two (independent) sample scenario.
 
                
                E N D
Design & Analysis of Experiments 8E 2012 Montgomery STT 511-STT411:DESIGN OF EXPERIMENTS AND ANALYSIS OF VARIANCEDr. Cuixian Chen Chapter 3: Experiments with a single factor: ANOVA
Portland Cement Formulation (page 26) Review: Two (independent) sample scenario An engineer is studying the formulation of a Portland cement mortar. He has added a polymer latex emulsion during mixing to determine if this impacts the curing time and tension bond strength of the mortar. The experimenter prepared 10 samples of the original formulation and 10 samples of the modified formulation. Q: How many factor(s)? How many levels? Factor: mortar formulation; Levels: two different formulations as two treatments or as two levels. Design & Analysis of Experiments 8E 2012 Montgomery
Review: In Chap 2, we compare two conditions or treatments... • It is a single-factor experiment with two levels of the factor, where the factor is mortar formulation and the two levels are the two different formulation methods. • Many experiments of this type involve more than two levels of the factor. Chapter 3 focuses on methods for the design and analysis of single factor experiments with an arbitrary number a levels of the factor (or a treatments). • Eg: In clinical trials, we compare mean responses among a # of parallel-dose groups, or among various strata based on patients’ background info, such as race, age group, or disease severity. • We assume that experiment has been completely randomized. Design & Analysis of Experiments 8E 2012 Montgomery
What If There Are More Than Two Factor Levels? • There are lots of practical situations where there are: either more than two levels of interest, or there are several factors of simultaneous interest. • The t-test does not directly apply. • The analysis of variance (ANOVA) is the appropriate analysis “engine” for these types of experiments • The ANOVA was developed by Fisher in the early 1920s, and initially applied to agricultural experiments • Used extensively today for industrial experiments Design & Analysis of Experiments 8E 2012 Montgomery
Example 1-- Review: Design Of Experiments (Chap2) Completely randomized designs (CRD) Repeat Repeat Random Allocation Compare Response Repeat Design & Analysis of Experiments 8E 2012 Montgomery
An Example (See pg. 66) • An engineer is interested in investigating the relationship between the RF power setting and the etch rate for this tool. The objective of an experiment like this is to model the relationship between etch rate and RF power, and to specify the power setting that will give a desired target etch rate. • The response variable is etch rate. • She is interested in a particular gas (C2F6) and gap (0.80 cm), and wants to test four levels of RF power: 160W, 180W, 200W, and 220W. She decided to test five wafers at each level of RF power. • The experimenter chooses 4 levels of RF power 160W, 180W, 200W, and 220W • The experiment is replicated 5 times – runs made in random order. Design & Analysis of Experiments 8E 2012 Montgomery
An Example (See pg. 66) Design & Analysis of Experiments 8E 2012 Montgomery
An Example (See pg. 66) • Does changing the power change the mean etch rate? • Is there an optimum level for power? • We would like to have an objective way to answer these questions. • The t-test really doesn’t apply here – more than two factor levels. Design & Analysis of Experiments 8E 2012 Montgomery
The Analysis of Variance (Sec. 3.2, pg. 68) • In general, there will be alevels of the factor, or atreatments, and n replicates of the experiment, run in randomorder…a completely randomized design(CRD) • N = # of total runs. • We consider the fixed effects case…the random effects case will be discussed later. • Objective is to test hypotheses about the equality of the atreatment means. Design & Analysis of Experiments 8E 2012 Montgomery
Notation for ANOVA Design & Analysis of Experiments 8E 2012 Montgomery Chapter 3
ANOVA Effects model • The name “analysis of variance” stems from a partitioning of the total variability in the response variable into components that are consistent with a model for the experiment. • The basic single-factor ANOVA effects model is • where NID means Normal Independent Distributed. Design & Analysis of Experiments 8E 2012 Montgomery
ANOVA Means Models for the Data There are several ways to write a model for the data: And you would expect: Design & Analysis of Experiments 8E 2012 Montgomery
The Analysis of Variance (ANOVA) Let’s add and subtract one term here… • Total variabilityis measured by the total sum of squares: • The basic ANOVA partitioning is: Does this expression depend on each j? Design & Analysis of Experiments 8E 2012 Montgomery Chapter 3
The Analysis of Variance (ANOVA) • Total variabilityis measured by the total sum of squares: • The basic ANOVA partitioning is: Does this expression depend on each j? Design & Analysis of Experiments 8E 2012 Montgomery Chapter 3
The fundamental ANOVA identity • The total variability in the data, as measured by the total corrected sum of squares, can be partitioned into a sum of squares of the differences between the treatment averages and the grand average, plus a sum of squares of the differences of observations within treatments from the treatment average. • Now, the difference between the observed treatment averages and the grand average is a measure of the differences between treatment means, whereas the differences of observations within a treatment from the treatment average can be due only to random error. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 3
The Analysis of Variance (ANOVA) • SSTreatments: Sum of squares due to treatments (i.e. between treatments) • SSE: Sum of squares due to errors (i.e. within treatments) • A large value of SSTreatments reflects large differences in treatment means • A small value of SSTreatments likely indicates no differences in treatment means • Formal statistical hypotheses are: OR Design & Analysis of Experiments 8E 2012 Montgomery
The Analysis of Variance (ANOVA) • While sums of squares cannot be directly compared to test the hypothesis of equal means, mean squares can be compared. • A mean square is a sum of squares divided by its degrees of freedom: • If the treatment means are equal, the treatment and error mean squares will be (theoretically) equal. • If treatment means differ, the treatment mean square will be larger than the error mean square. Design & Analysis of Experiments 8E 2012 Montgomery
The Analysis of Variance is Summarized in a Table • Computing…see text, pp 69 • The reference distribution for F0 is the F(a-1, a(n-1)) distribution • Reject the null hypothesis (equal treatment means) if Design & Analysis of Experiments 8E 2012 Montgomery
page 74: Calculation of ANOVA table Design & Analysis of Experiments 8E 2012 Montgomery
ANOVA Table • Use the ANOVA to test: H0:µ1=µ2=µ3=µ4 v.s. H1: some means are different. Q: Find SSTotal, SStreat, and SSerror; then find MStreat, MSerror, F0 and p-value. Design & Analysis of Experiments 8E 2012 Montgomery Chapter 3
ANOVA Table: Example 3-1 Q: SSTotal = 72,209.75, SStreat = 66,870.55, first find SSerror; then find MStreat, MSerror, F0 and p-value. Design & Analysis of Experiments 8E 2012 Montgomery
SSTotal = 72,209.75, SStreat = 66,870.55, SSerror = 5339.20, MStreat = 22,290.18, MSerror = 333.70, F0 = 66.80, p-value = 2.881934e-09. ANOVA Table: Example 3-1 Design & Analysis of Experiments 8E 2012 Montgomery
ANOVA Table: Example 3-1 R Example? Pvalue = 1-pf(66.80, 3, 16) = 2.881934e-09 Design & Analysis of Experiments 8E 2012 Montgomery
ANOVA Table: Example 3-1 Critical.value =qf(0.95, 3, 16) =3.238872 The Reference Distribution: P-value Design & Analysis of Experiments 8E 2012 Montgomery
Example 3.1: ANOVA in R ## One way Anova in R ## tab3.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/Etch-Rate.TXT",header=TRUE); y=tab3.1[,2]; ## assign y as response variable x=tab3.1[,1]; ## assign y as explainary variable anova(lm(y~factor(x))); ##or anova(lm(tab3.1[,2]~factor(tab3.1[,1]))); ##or summary(aov(tab3.1$EtchRate~factor(tab3.1$Power))) ## or summary(aov(tab3.1[,2]~factor(tab3.1[,1]))) ## the WRONG way: anova(lm(y~x)); ## Power is NOT categorical variable here. Design & Analysis of Experiments 8E 2012 Montgomery
Note 1: A factor is a vector object used to specify a discrete classification. If the treatment is made of numerical values, “factor” must be used on the treatment variable (tab3.1$Power) . Note 2: If the treatment is made of character values, “factor” may be ignored. Note 3: always check the “df” to verify the results! Note 4: some software has a total row. Design & Analysis of Experiments 8E 2012 Montgomery
Example 3.1: ANOVA in R with Data input ## Here is the way we code the data into R: ## x=c(rep(160, 5), rep(180, 5), rep(200, 5), rep(220, 5)); y=c(575, 542, 530, 539, 570, 565, 593, 590, 579, 610, 600, 651, 610, 637, 629, 725, 700, 715, 685, 710); anova(lm(y~factor(x))) Design & Analysis of Experiments 8E 2012 Montgomery
Extra eg: ANOVA in R with Data input data.tab2.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/Tension-Bond.TXT ", header=TRUE); x<-data.tab2.1[,1]; y<-data.tab2.1[,2]; With Table 2.1, how can we input the data to R for ANOVA? ## Here is the way we code the data into R: ## x=c(rep("Modified", 10), rep("Unmodified", 10)); y=c(16.85,16.40,17.21,16.35,16.52,17.04,16.96,17.15,16.59,16.57,16.62,16.75,17.37,17.12,16.98,16.87,17.34,17.02,17.08,17.27); anova(lm(y~factor(x))) Design & Analysis of Experiments 8E 2012 Montgomery
The connection of ANVOA with 2-sample (pooled) t-test, output analysis ## two sample pooled t-test is a special case of ANOVA? ## data.tab2.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/Tension-Bond.TXT ", header=TRUE); x<-data.tab2.1[,1]; y<-data.tab2.1[,2]; t.test(x,y,alternative ="two.sided",mu=0,var.equal = TRUE); ## see what ANOVA does... ## data2.1<-c(x,y); group<-c(rep("modfied",10),rep("unmodified",10)); ## group is a categorical variable here ## anova(lm(data2.1~group)); Design & Analysis of Experiments 8E 2012 Montgomery
The connection of ANVOA with 2-sample (pooled) t-test, output analysis Design & Analysis of Experiments 8E 2012 Montgomery
The connection of ANVOA with 2-sample (pooled) t-test, Another example… ## another example... ## bhh.tab3.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/BHH2-Data/tab0301.dat",header=TRUE); ## Yield column is our response variable ## x<-bhh.tab3.1$yield[1:10]; y<-bhh.tab3.1$yield[11:20]; t.test(x, y, alternative ="less", mu=0, var.equal = TRUE); ## see what ANOVA does... ## Yield=bhh.tab3.1$yield; Method=bhh.tab3.1$method anova(lm(Yield~Method)); ## OR ## anova(lm(bhh.tab3.1$yield~bhh.tab3.1$method)); Design & Analysis of Experiments 8E 2012 Montgomery
The connection of ANVOA with t-test, another output analysis Design & Analysis of Experiments 8E 2012 Montgomery
Review: ANOVA Effects model • The name “analysis of variance” stems from a partitioning of the total variability in the response variable into components that are consistent with a model for the experiment. • The basic single-factor ANOVA effects model is • where NID means Normal Independent Distributed. Design & Analysis of Experiments 8E 2012 Montgomery
Review: ANOVA Means Models for the Data There are several ways to write a model for the data: And you would expect: Design & Analysis of Experiments 8E 2012 Montgomery
Estimation of the Model Parameters Design & Analysis of Experiments 8E 2012 Montgomery
A confidence interval estimate of ith treatment mean df=N-a df=N-a Design & Analysis of Experiments 8E 2012 Montgomery
Example: Estimation of the Model Parameters and find Confidence Interval Q1: Find the estimates of the overall mean and each of the treatment effects. Q2: Find a 95 percent confidence interval on the mean of treatment 4 (220W of RF power). Q3: Find a 95 percent confidence interval on µ1- µ4. Design & Analysis of Experiments 8E 2012 Montgomery
Example: Estimation of the Model Parameters and find Confidence Interval Design & Analysis of Experiments 8E 2012 Montgomery
A little (very little) humor… Design & Analysis of Experiments 8E 2012 Montgomery
ANOVA calculations are usually done via computer • Text exhibits sample calculations from multiple very popular software packages, R, SAS Design-Expert, JMP and Minitab • See pages 102-105 • Text discusses some of the summary statistics provided by these packages Design & Analysis of Experiments 8E 2012 Montgomery
Model Adequacy Checking in the ANOVAText reference, Section 3.4, pg. 80 • Checking assumptions is important • Normality for residuals • Constant/equal variance • Independence • Have we fit the right model? • Later we will talk about what to do if some of these assumptions are violated Design & Analysis of Experiments 8E 2012 Montgomery
Model Adequacy Checking in the ANOVA • Examination of residuals (see text, Sec. 3-4, pg. 80) • Computer software generates the residuals • Residual plots are very useful • Normal probability plot of residuals. If the model is adequate, the residuals should be structureless; that is, they should contain no obvious patterns. If the underlying error distribution is normal, this plot will resemble a straight line. In visualizing the straight line, place more emphasis on the central values of the plot than on the extremes. Design & Analysis of Experiments 8E 2012 Montgomery
Plot of Residuals in Time Sequence/run order for correlation Plot of Residuals Versus Fitted Values for model checking …………………. Normal QQ-plot Plot1 is helpful in detecting strong correlation between residuals. A tendency to have runs of positive and negative residuals indicates positive correlation. If model is correct and the assumptions are satisfied, plot3 should be structureless. Design & Analysis of Experiments 8E 2012 Montgomery
Model Adequacy, Checking in the ANOVA ## residual analysis ## tab3.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/Etch-Rate.TXT",header=TRUE); y=tab3.1$EtchRate; x=tab3.1$Power; model3.1<- lm(y~factor(x)) res3.1<- resid(model3.1); fit3.1<-model3.1$fitted par(mfrow=c(1,3)) plot(c(1:20), res3.1) ; abline(h=0); qqnorm(res3.1); qqline(res3.1); plot(fit3.1,res3.1); dev.off(); ## OR you can do## par(mfrow=c(2,2)) plot(model3.1) ; dev.off(); ################## equal variance test ##################### boxplot(tab3.1$EtchRate~factor(tab3.1$Power)) bartlett.test(tab3.1$EtchRate~factor(tab3.1$Power)) ################### multiple comparison in R ################## tapply(tab3.1$EtchRate, factor(tab3.1$Power),mean) pairwise.t.test(tab3.1$EtchRate, factor(tab3.1$Power)) Design & Analysis of Experiments 8E 2012 Montgomery
Statistical Tests for Equality of Variance: Bartlett’s test http://en.wikipedia.org/wiki/Bartlett's_test ln(10)=2.302585 Design & Analysis of Experiments 8E 2012 Montgomery
Statistical Tests for Equality of Variance: Bartlett’s test • Use Bartlett’s Test to test for: • H0: σ1=σ2=σ3= σ4, v.s. • Ha: some σi are different. Design & Analysis of Experiments 8E 2012 Montgomery
Statistical Tests for Equality of Variance: Bartlett’s test • Pvalue=1-pchisq(0.43, 3)=0.9339778 Design & Analysis of Experiments 8E 2012 Montgomery
Bartlett’s test in R for equal variances (a>=2) ## equal variance test ## tab3.1<-read.table("http://people.uncw.edu/chenc/STT411/dataset%20backup/Etch-Rate.TXT",header=TRUE); y=tab3.1[,2]; x=tab3.1[,1]; boxplot(y ~ factor(x)) bartlett.test(y ~ factor(x)) ## OR you can use ## boxplot(tab3.1$EtchRate~factor(tab3.1$Power)) bartlett.test(tab3.1$EtchRate~factor(tab3.1$Power)) Design & Analysis of Experiments 8E 2012 Montgomery
Post-ANOVA Comparison of Means • The ANOVA tests the hypothesis of equal treatment means • Assume that residual analysis is satisfactory • In ANOVA, if H0 is rejected, we don’t know whichspecificmeans are different. • (Review) Checking ANOVA assumptions is important: • Normality • Constant variance • Independence • Have we fit the right model? OR Design & Analysis of Experiments 8E 2012 Montgomery
Post-ANOVA Comparison of Means • Determining which specific means differ, following an ANOVA is called the multiple pairwise comparisons b/w all a populations means. • Suppose we are interesting in comparing all pairs of a treatment means and that the null hypothesis that we wish to test H0: µi=µj v.s. Ha: µi ≠ µj for all i≠j. • There are lots of ways to do this…see text, Section 3.5, pg. 91. • We will use Tukey's ‘Honest Significant Difference’ method Design & Analysis of Experiments 8E 2012 Montgomery