1 / 11

STATISTICAL INFERENCE PART IX

STATISTICAL INFERENCE PART IX. HYPOTHESIS TESTING - APPLICATIONS – MORE THAN TWO POPULATION. INFERENCES ABOUT POPULATION MEANS. Example: H o :  1 =  2 =  3 where  1 = population mean for group 1  2 = population mean for group 2  3 = population mean for group 3

sinead
Télécharger la présentation

STATISTICAL INFERENCE PART IX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STATISTICAL INFERENCEPART IX HYPOTHESIS TESTING - APPLICATIONS – MORE THAN TWO POPULATION

  2. INFERENCES ABOUT POPULATION MEANS • Example: • Ho: 1 = 2 = 3 where • 1 = population mean for group 1 • 2 = population mean for group 2 • 3 = population mean for group 3 • H1: Not all are equal.

  3. Assumptions • Each of the populations are normally distributed (or large enough sample sizes to use CLT) with equal variances • Populations are independent • Cases within each sample are independent

  4. INFERENCES ABOUT POPULATION MEANS - ANOVA Difference in means large relative to overall variability Difference in means small relative to overall variability  F tends to be small  F tends to be large Larger F-values typically yield more significant results. How large is large enough? We will compare with the tabulated value.

  5. INFERENCES ABOUT POPULATION MEANS • If F test shows that there are significant differences between means, then, apply paired t-tests to see which one(s) are different. • Apply multiple testing correction to control for Type I error

  6. Example • Kenton Food Company wants to test 4 different package designs for a new product. Designs are introduced in 20 randomly selected markets. These markets are similar to each other in terms of location and sales records. Due to a fire incidence, one of these markets are removed from the study, leading to an unbalanced study design. Example is taken from: Neter, J., Kutner, M.H., Nachtsheim, C.J., & Wasserman, W., (1996) Applied Linear Statistical Models, 4th edition, Irwin.

  7. Example Is there a difference among designs in terms of their average sales?

  8. Example > va1=read.table("VAT1.txt",header=T) > head(va1,3) Case Design Market Sales 1 1 1 1 11 2 2 1 2 17 3 3 1 3 16 > aov1 = aov(Sales ~ Design,data=va1) > summary(aov1) Df Sum Sq Mean Sq F value Pr(>F) Design 1 483.08 483.08 31.186 3.289e-05 *** Residuals 17 263.34 15.49 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Degrees of freedoms are wrong! Since there are 4 different designs, d.f. should be 3.

  9. Example > class(va1[,2]) [1] "integer" > va1[,2]=as.factor(va1[,2]) > aov1 = aov(Sales ~ Design,data=va1) > summary(aov1) Df Sum Sq Mean Sq F value Pr(>F) Design 3 588.22 196.074 18.591 2.585e-05 *** Residuals 15158.20 10.547 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # or, alternatively: > aov1 = aov(Sales ~factor(Design),data=va1) 4 designs have different mean sales. But, which one(s) are different?

  10. Example > library(multcomp) > c1=glht(aov1, linfct = mcp(Design = "Tukey")) > summary(c1) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: aov(formula = Sales ~ Design, data = va1)  Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) 2 - 1 == 0 -1.200 2.054 -0.584 0.9352 3 - 1 == 0 4.900 2.179 2.249 0.1545 4 - 1 == 0 12.600 2.054 6.135 <0.001 *** 3 - 2 == 0 6.100 2.179 2.800 0.0584 . 4 - 2 == 0 13.800 2.054 6.719 <0.001 *** 4 - 3 == 0 7.700 2.179 3.534 0.0141 * Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method) 4th design has higher average sales than all other designs. 3rd design is slightly significantly better than 2nd design.

  11. Example # or, alternatively > TukeyHSD(aov1, "Tasarim", conf.level=0.9) • There are many functions in R available for multiple testing correction. For instance, you can look into “p.adjust” function in stats library for other types of corrections (e.g. Bonferroni). Supply raw p-values  obtain adjusted p-values. • Different ANOVA types (e.g. 2-factor, repeated,…) in R; reference: Ilk, O. (2011) R YaziliminaGiris, ODTU, Chp. 7

More Related