Comparing Samples

Comparing Samples

Last Time • I talked about what could go wrong in an experiment where you compared a sample mean against a population with a known population mean and standard deviation. • You will build a sampling (null) distribution and set an alpha level using the population values. The population has a fixed amount of variability (SD) but the variability in the sample statistics is affected by the sample size. The smaller the sample size, the more variability in sample statistics.

Example with SE of Means • SAS EG example of simulating a population • Draw a single sample, get the mean • Get many means • Calculate the mean and SD from these means • Compare vs. theoretical distribution

Critical Cut Points • Given the hypothetical mean, standard deviation and sample size, you then determine what is such an usual sample that you would reject your null hypothesis (the null distribution).

Alpha and Beta Again • If your sample data came from a different population, you will guess that the data for this (sub) population is centered around your sample mean and the distribution will not completely overlap the null distribution. The part of the alternative distribution (area under the curve) which does not overlap the null distribution is the power.

Graphical Example • Here is an R example of cut points in the theoretic distribution and how the alternate distribution overlaps with the null distribution:

Comparing Means • In reality, you will almost never have a known population mean and standard deviation and compare your sample against that. You will likely have a hypothetical population mean and you will want to see if your sample was likely to have come from the set of sample means distributed around that hypothetical population mean. Conceptually it is the same task but the shape of the sampling distribution is different when you don’t know the population SD.

Gossett described the function that describes the distribution for when you are comparing means and estimating the population SD from the sample. • He figured it out while working at a brewery that would not let him publish under his own name so he published it under the name Students and called the distribution T. (Was he thinking tea?) • The T distribution describes the samples when you don’t know the population standard deviation. There is extra uncertainty and that is manifested as a wider (and fatter-tailed) looking distribution.

Student’s T T with 5 df

Asymptotic T • As your sample size gets bigger the T distribution looks more and more like a Z distribution. N of 30 is essentially indistinguishable from a Z.

Calculate It • To do the t-test is trivially easy. First load the data into an analysis package. Graph it and then do the one sample t-test. • See the example SAS Enterprise Guide project. • The formula for the statistic sure looks familiar…

Two Samples • If you have two samples, the formula gets a bit more complicated. Instead of using a single sample to get the guess for the population variability, you have two and if the samples are not of the same size, you want to put more trust (weight) in the larger sample.

Estimated Variance • Basically you take the weighted average, with a tweak to the denominator to consider you are estimating population parameters in the formula.

The T-Statistic

Paired samples? • What is your variance like if you sample the same person before and after a treatment relative to if you sampled two different people? • Smaller

ANOVA • To compare three or more groups you will want to use a method called ANOVA. Analysis of variance is baffling when you first see the algebra because you are looking for differences in group means by comparing variances.

How ANOVA Works • Begin by looking at the overall variability in your data vs. the overall mean. Then look at the variability in your data if you compare relative to the subgroups. If there is no meaningful effect of the treatments, the overall variability will look like the variability relative to the subgroups.

Reduced Variance • With the T or Z distributions you get excited if your sample mean is far from the proposed population mean. Here, you get excited if the ratio of the two variances is far from 1. You need a distribution that can describe the ratio of two variances. That distribution is the F. It has a parameter to describe the number of subjects in the two halves of the fraction.

Comparing Samples

Comparing Samples

Presentation Transcript

Samples

Comparing Two Proportions Using Dependent Samples p1 vs. p2

Comparing Means from Two Samples

Comparing Two Samples: Part I

Comparing Means: Independent-samples t- test

Comparing Means from Independent Samples

Common Non-Parametric Methods for Comparing Two Samples

Comparing Means from Paired Samples

Lesson Four: Student t Distribution and Comparing Samples

Samples

COMPARING MEANS: INDEPENDENT SAMPLES

Independent Samples: Comparing Proportions

Independent Samples: Comparing Proportions

Ch11: Comparing 2 Samples

Comparing Two Samples: Part II

COMPARING PROPORTIONS IN LARGE SAMPLES

Section 6 Comparing Two Samples

Independent Samples: Comparing Proportions

Independent Samples: Comparing Proportions

Independent Samples: Comparing Proportions

Independent Samples: Comparing Means