160 likes | 268 Vues
This resource covers the concepts of normality in populations and statistical procedures related to t-distributions, ANOVA, and the Rank-Sum test. It discusses normal quantile plots to assess population normality, the effects of sample size on statistical distributions, and how to determine appropriate methods for comparing means across populations. The document outlines the steps for conducting ANOVA, the Tukey procedure, and the Rank-Sum alternative, emphasizing the use of these methods under various population conditions. A real-world example illustrates the Rank-Sum procedure, providing insights into hypothesis testing.
E N D
Quiz 8 Today HW 7 Due 5:00 PM Bonus E Due 5:00 PM Projected Schedule: Today: Normal Quantile & Rank Sum R Oct. 26 - Intro to F, One Way ANOVA & Tukey Procedure T Oct. 31 - Blocked ANOVA & Two Way ANOVA R Nov. 2- Review day T Nov. 7 - EXAM II Exam - All material since EXAM I Pencil Scantron Reference Page Announcements
Two Oversights • We have shown that for large samples (n > 30) the sampling distribution of the sample mean is normal & substituting the sample SD for the pop SD leads to the t-distribution • The above result holds for samples of any size from a normal population as well • What about small samples from non-normal populations? • How do we tell if a population is normal?
Is it normal? • A normal quantile plot can help answer whether or not a population is normal. • (x,y) points are plotted where x is the “expected value” from a normal population and y is the observed value • If the population is normal, our sample should have a linear plot!
Is it normal? • Recall - our plot is based on a sample - so it won’t be exactly linear even if it is normal • However, obvious departure from normality is noticed in curvature of the plot • For example, this is right-skewed data (heavy right tail)
More Examples • This is a left-skewed data set - it has a long left tail • The curvature is obvious & the points are below the line on both ends
More Examples • This is symmetric with heavier tails than the normal • The points at the left are below the line and the points to the right are above it
One More Case • This data set came from a symmetric population with tails that are lighter than the normal curve • Points are the left are above the line and points to the right are below it
Tire Example Using the tire data from HW 7 Symmetric - no outliers Light tailed Our method is conservative We’re OK
When t based procedures are OK • The t-based procedures will work for large samples (n>30) from ANY population. • The t-based procedures will work for small samples from normal populations. • The t-based procedures are conservative (Type I probability is actually lower than a) for symmetric light-tailed populations. This is OK as well. • The t-based procedures fail for small samples taken from skewed or heavy tailed populations.
Rank-Sum Procedure • The rank-sum procedure was developed as an alternative to the two-sample t test & CI • The rank-sum procedure works for any type of population as long as the two population histograms have the same shape • Works for any sample size, but the two sample t should be used when both samples are large
A new method is proposed to develop instant film New method is supposedly faster than presently used method Wish to compare two methods for developing instant film (Polaroid) H0: gnew-gold >= 0, a = 0.05 (g = median) Take 8 pictures using old method Times (sec): 8.6, 5.1, 4.5, 5.4, 6.3, 6.6, 5.7, 8.5 8 pictures with new method: 5.5, 4.0, 3.8, 6.0, 5.8, 4.9, 7.0, 5.7 Rank Sum Example
Rank Sum Example • Think of the two samples as one sample and rank the values • Lowest observation gets rank 1 • Ties get the average of what would have been their ranks • Add up the ranks for each group • This is the rank sum for the group • A low rank-sum for the new process supports HA in this case • Is it low enough to reject the null hypothesis? • There are tables for comparison
StataQuest can compute a p-value Enter the data with one column having the group number and one column having the times Go to Statistics: Nonparametric Tests: Mann-Whitney Choose the group and data variable appropriately process | obs rank sum expected ---------+--------------------------------- 0 | 8 78.5 68 1 | 8 57.5 68 ---------+--------------------------------- combined | 16 136 136 unadjusted variance 90.67 adjustment for ties -0.13 ---------- adjusted variance 90.53 Ho: median time(process==0) = median time(process==1) z = 1.104 Prob > |z| = 0.2698 SQ gives two-tailed p-value, we want a one-tailed so we divide SQ’s p-value by two Our p-value = 0.1349 Rank Sum Example
Developed t-based procedures for any sample size from a normal pop or large samples from any type of pop Needed to determine if a population is normal based on sample This lead to the normal quantile plot Needed to develop procedures for small samples from non-normal populations This lead to the rank-sum procedure Review and Preview
Review and Preview • Next time, we discuss comparing means from more than two populations • This will involve finding sums of squares and the F-distribution
Normal Quantiles in SQ • After entering (or opening) the data, go to Graphs: One Variable: Normal Quantile Plot • Choose the variable of interest and click OK