STAT 3130 Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence: Ashok.Krishnamurthy@ucdenver.edu
Outline • A brief overview of STAT 3120 • One-way Analysis of Variance (ANOVA) • ANOVA example • Implementing ANOVA in R statistical programming language
A quick review of STAT 3120 • Catalog description • A SAS/SPSS based course aimed at providing students with a foundation in statistical methods, including review of descriptive statistics, confidence intervals, hypothesis testing, t-tests, basic Regression and Chi-Square tests.
Statistical Inference • Statistical inference is the process of drawing conclusions from data that are subject to random variation. • The conclusion of a statistical inference is a statistical proposition. • Estimating the mean and variance of a distribution. • Confidence interval estimation the mean and variance of a distribution. • Hypothesis tests on the mean and variance of a distribution.
Theory of point estimation • There is at least one parameter whose value is to be approximated on the basis of a sample. • The approximation is done using an appropriate statistic. • This statistic is called a point estimator for .
Hypothesis Testing • In the estimation problem there is no preconceived notion concerning the actual value of the parameter . • In contrast, when testing a hypothesis on , there is a preconceived notion concerning its value. • There are two theories, • The hypothesis proposed by the experimenter, denoted H1 • The negation of H1, denoted H0
Tests concerning the difference between two normal population means Independent samples of sizes n1and n2. (or) (or)
How Many Dependent Variables? What Type Of Outcome? How Many Predictors? If Categorical Predictor, Same Participants or Different in each category? Does Data Meet Parametric Assumptions? What type Of predictors? If Categorical Predictor, How many Categories? ANALYSIS TOOL Yes Independent T-test Different No Mann-Whitney Test Two Yes Paired T-test Same No Wilcoxon Rank Sum Categorical Yes One Way ANOVA Different No Kruskall Wallis Test Three + One Yes Repeated Measures ANOVA Same No Friedman’s ANOVA Yes Pearson Correlation or Regression Continuous Yes No Spearman Correlation or Kendall’s Tau Continuous Yes Ind. Factorial ANOVA or Regression Different Categorical Same No Factorial Repeated Measures ANOVA Yes Factorial Mixed ANOVA Both One Continuous Yes Multiple Regression Two + Both Yes Multiple Regression/ANCOVA Categorical Different Pearson Chi-Square or Likelihood Ratio One Continuous Logistic Regression Categorical Categorical Different Loglinear Analysis Two + Continuous Logistic Regression/Discriminant Both Different One Categorical Yes MANOVA Two + Continuous Categorical Yes Factorial MANOVA Two + Both Yes MANCOVA
Comparing several means • It is often necessary to compare many populations for a quantitative variable. • That is, we may want to compare the mean outcome over several populations to determine whether they have the same mean outcome and if not, where differences exist. • The standard method of analysis for these types of problems is the one-way Analysis of Variance, often abbreviated ANOVA
Can we just use several pairwise t-tests? • You might be tempted to use t-tests to make such comparisons. Why would this be difficult? # groups# pair-wise test 3 3 4 6 5 10 6 15 7 21 and so on….
One-way ANOVA Contd. • The method of ANOVA allow for comparison of the mean over more than two independent groups. • In particular, it tests the following hypotheses for comparing over k groups:
Assumptions of ANOVA • Populations have normal distributions • Population standard deviations are equal • Observations are independent, both within and between samples
One-way ANOVA Contd. • A rejection of the null hypothesis tells us that there is at least one group with a differing mean (though there could be more than one group that is different). • If we do not reject the null hypothesis, then we can only conclude that there is no significant difference among the groups.
One-way ANOVA procedure • Total variation in a measured response is partitioned into components that can be attributed to recognizable sources of variation. • For example, suppose we wish to investigate the sulfur content of 5 coal reams in a certain geographical region. Then we would test,
ANOVA Table * See page 682 for a general format of a One-Way ANOVA Table
ANOVA Example A biologist is doing research on elk in their natural Colorado habitat. Three regions are under study, each region having about the same amount of forage and natural cover. To determine if there is a difference in elk life spans between the three regions, a sample of 6, 5, and 6 mature elks from each region are tranquilized and have a tooth removed. A laboratory examination of the teeth reveals the ages of the elk. Results for each sample are given in the below table.
ANOVA Example Contd. Are there differences in age (elk life spans) over the different regions? If so, where are such differences occurring?
R Programming Language • Free software for statistical computing and graphics: http://www.r-project.org/ • Developed at Bell Laboratories • Considered a baby version of S/S+ • S+ sells for about $2000/year subscription
R code to run an ANOVA (elk data) > elk <- read.csv("elk.csv", sep=",", header=T) > boxplot(elk$Age~ elk$Region, ylab = "Age", xlab = "Region", main = "Boxplot for Elk data") > Elk.ANOVA <- aov(elk$Age ~ elk$Region) > summary(Elk.ANOVA)
R output for ANOVA (elk data) Source Df Sum Sq Mean Sq F value Pr(>F)___ elk$Region 2 48.000 24.000 5.0909 0.0218 * Residuals 14 66.000 4.714 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Next class: Post-hoc tests • Bonferronicorrection • Tukey’s HSD test • Fisher’s LSD • Newman-Keul test • Scheffe method
Fixed versus random effects • When we consider the effect of a factor, it can be either fixed or random. If we are interested in the particular levels of a factor, then it is fixed, e.g., gender, socio-economic class, fertilizer, drug. If we are not interested in the particular levels, but rather have selected the levels to make inference about the factor, then the factor is random. • For example, what if there was an effect of hospital on a person’s recovery? A random sample of hospitals would allow us to study this relationship. Here we are interested in whether there is a relationship rather than describing an effect for each individual hospital. These types of models are called random effects models.