230 likes | 425 Vues
Introductory Statistics for Laboratorians dealing with High Throughput Data sets. Centers for Disease Control. Statistics and Order. Random vs. Accidental Snowflakes Quincunx’ http://www.stattucino.com/berrie/dsl/Galton.html http://www.jcu.edu/math/ISEP/Quincunx/Quincunx.html
E N D
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control
Statistics and Order • Random vs. Accidental • Snowflakes • Quincunx’ • http://www.stattucino.com/berrie/dsl/Galton.html • http://www.jcu.edu/math/ISEP/Quincunx/Quincunx.html • Statistical Determinism
Hypothesis Testing • Statistical methods are used to test scientific hypotheses. • You already understand this logic, you do this every day. • Statistical methods simply provide a way to put numbers to your logic. • That is, compute the chances that you are wrong or right.
Harry and Sue • This is the story of Harry Heartthrob and his Girlfriend Sue Sweetheart. Harry would like to know that everything is fine between him and Sue. • Begin by stating two hypotheses: • H0: Things are fine in Harry’s love life. • Ha: Harry has romantic problems. • We assume that H0: is true – that things are fine. • This leads us to expect to see certain things when we observe nature/reality.
Hypotheses • H0: is the “Null Hypotheis.” It states that nothing is going on. • Ha: is the “Alternative Hypothesis.” It states that something is going on. • The null and alternative hypotheses must be mutually exclusive and exhaustive. • That is: they can’t both be true and they can’t both be false.
Harry and Sue • Believing that H0: is true (as we assume) leads to certain expectations about what will be observed in reality. • Specifically Harry will expect there to be “no signs of men” in Sue’s apartment. • If Harry observes things in her apartment that differ from his expectations he will begin to doubt the truth of H0:
Harry’s Love Life H0: Assume All is OK Expect no Signs of Men in Her Apartment Ha: Harry has Romantic Problems
Proof • Can you ever prove that H0: is true? NO!!! • What is the strongest evidence that H0: is true? • No indication of men in the apartment! • Could this happen if H0: is false? YES!! • The strongest evidence for H0: is still weak • We “assume” H0: is true if the evidence it is false is weak.
Proof • Can Harry prove that H0: is false? YES!!! • There can be things that are so improbable if H0: is true that when you observe those things, (XXX) for example, you know H0: is false. • There can be strong evidence that H0: is false. • When we see that strong evidence we reject H0: • Then we say “we conclude Ha: is true” or “we have proved Ha: “
Statistically Speaking H0: Assume All is OK Reject H0: if the probability of the observed event is small enough if H0: is assumed to be true Expect no Signs of Men in Her Apartment 98% 20% 2% .05% Ha: Harry has Romantic Problems
Statistical Error Two Types of Error Type I Error: Reject H0: when it is true H0: Assume All is OK Type II Error: Continue to believe H0: when it is false Type II: too trusting Type I: too jealous Expect no Signs of Men in Her Apartment 98% 20% 2% .05% Ha: Harry has Romantic Problems
Quality ControlHave the samples been watered down? • There is a severe shortage of flu vaccine in the USA this season. However, Canada has a large surplus and they are willing to sell it to us. • We are a little paranoid and wonder if the reason they have extra is because they watered it down. • We make a surprise visit to their warehouse and request a sample of the vaccine for evaluation purposes before we commit to purchase the lot. • They allow us to select 70 vials at random from the whole lot for testing.
Quality Control Example • All flu vaccine is made to standard specifications. It is all supposed to be 16 m/dl with a standard deviation of 0.4. (That’s what it’s supposed to be if it is not watered down). • We measure the 70 vials from Canada and get a mean of 15.8 m/dl. • Is the Canadian surplus watered down?
Step 1: State H0 and Ha • H0: This sample of 70 vials (with a mean of 15.8) comes from a population with a mean of 16 and a standard deviation of 0.4. • (Everything is fine.) • Ha: This sample of 70 vials could not have been drawn from a population with a mean of 16 and a standard deviation of 0.4. • (There is a problem.)
Step 2: Select a Region of Rejection • If the probability of the null hypothesis being true is less than 5 chances in 100 (.05) we will reject it. • Alpha = .05
Step 3: Make Observations • Conduct the experiment – make surprise trip to the Canadian warehouse, select vials at random, test each vial. • Compute the mean for the 70 vials. – mean = 15.8.
Step 4: Test the Null Hypothesis • What does the Central Limit Theorem tells us about the distribution of means of samples of size N = 70 from a population with a mean of 16 and standard deviation of 0.4. • Central Limit Theorem says: • Mean of the means of all possible samples should be 16 • Standard Error (Standard Deviation) of the means is 0.4/sqrt(70) = .048
Step 4: Test the Null Hypothesis • Use http://davidmlane.com/hyperstat/z_table.htmlto compute the probability that a mean would be 15.8 or greater if the Sampling Distribution of the Mean has a mean of 16 and a standard deviation of .048. • 15.8 is 4.17 standard errors (standard deviations) below the mean of the population (16). • Z = -4.17
Step 4: Test the Null Hypothesis • The probability of the mean of a sample of size 70 being 15.8 or less is .000015 (15 chances in 1,000,000). • This is in the region of rejection • Reject H0 -- There is a problem. This stuff has been watered down.
Region of Rejection for a Sample of size N = 70 from a Population with mean 16 and standard deviation of 0.4 • The region of rejection is anything below 15.921. • 15.921 cuts of .05 of the distribution. • There are only 5 chances in 100 of a mean being less than 15.921 • Our mean was 15.8