180 likes | 192 Vues
Get prepared for your exam with this comprehensive review of key concepts, including hypothesis testing, probability, descriptive statistics, and more. Learn how to calculate summary descriptive statistics, perform hypothesis tests, and interpret the results. Use any resources allowed during the exam, such as books, course materials, and calculators.
E N D
Class 09 Exam1 Prep Things you should know
Exam Details • 75 minutes • Allowed • Any book, the course website and all files linked to it, your notes, excel, word, calculators. • Not Allowed • Communication with others (of any kind using any device) • Websites other than the course. • Short answers, lots of partial credit. Some easy. Some difficult.
Class 01 Jefferson Had Girls • We used BINOMDIST(false) to find the probabilities of 0,1,2,3,4 girls in four children IF the coin flip model is true. • We constructed a table of Observed vs Expected Counts for the 31,595 Danish Families. • You should be able to finish the hypothesis test. • You should be able to do the hypothesis test for a p other than 0.5.
Class 02 Prob, Pmfs, Binomial • The first probability problem • What if the situation were slightly different? • How probability works for two either/or events. • Athlete and Gender, Disease and Test. • Either build a 2x2 table of counts…or draw a prob tree…or use formulas. • What if one of the events had three outcomes? • Positive, Negative, Inconclusive • Strain A, Strain B, no disease • University Athlete, Intramural Athlete, Non-Athlete
Class 02 Prob, Pmfs, Binomial • BINOMDIST(X,n,p,false) • Applying the rules of probability to calculate the probability of X successes in n independent trials. • This is a PMF • Characteristics of Pmfs • A schedule assigning the unit of prob to the possible values of X. • Mean, Mode, Median, Std Deviation, Variance. • Binomial Mean is n*p • Binomial Standard Deviation is [n*p*(1-p)]^.5 • Most of our problems had p=.5. What if p was different?
Class 03 Lady Tasting Tea • Hypothesis Testing • Formulate H0 and Ha • Pick alpha (usually 0.05) • Identify and calculate the test statistics • So far it has either been number correct (proportion correct) or calculated chi-squared. • Calculate the p-value…the prob of observing a test statistic more extreme than the one observed if H0 is true. • So far we’ve used the binomial, the normal, the chi-squared to calculate p-values. • Reject H0 (in favor of Ha) if p-value is less than 0.05. Say the result is statistically significant. (Our result is rare if H0 is true).
Class 04 Wunderdog and Normal • As n gets big, the binomial looks like the normal. • NORMDIST(X,μ,σ,TRUE) • As an approximation to the binomial set μ=n*p and σ=[n*p*(1-p)]^.5 • Normal is a PDF • Normal is a family of distributions….but all have identical properties…see next slide.
EMBS Fig 6.4, p 249 You can use Normdist(X,0,1,true) Norminv(p,0,1) For more detail
Normal continued • NORMDIST(X,μ,σ,false) • The height of the normal density curve. We have never used it…and never will. • NORMINV(p,μ,σ) • finds the x value such that P(X<x) = p • Lots of variations of questions you can use the normal to answer. • What is p? • What X gives a specified p? • What is σ? • Lots of decisions to make (what target lorex should use)
Class 06 Descriptive Statistics • Each of the characteristics of a prob distribution (mean, median, mode, var, stdev, skew) has a corresponding summary descriptive statistic • Know how to calculate summary descriptive statistics and what they tell you. • They are only useful for NUMERICAL (not categorical) scaled variables. • EXCEPTION: If there are only two categories (Male/Female), (Right/Wrong), (Athlete/Not) then you can change to a 1/0 number and descriptive statistics make sense. • Descriptive statistics make some sense for birth months…especially in the soccer context where birth month number predicts tournament participation.
Class 07 Roulette and GOF • Observations of a categorical variable with multiple categories (n=904) are summarized using a table of counts. • Use Countif or pivottable to calculate the counts. • If we have an H0 that leads to expected counts, then we can perform a chi-squared GOF test. • H0: All categories are equally probable (roulette and soccer birth month) • Know how to perform and interpret a GOF test.
Class 07 Lorex GOF • Start with n=144 observations of a numerical variable. • Create (somewhat arbitrary) BINS, and a table of summary counts. • H0: the data came from N(10.2,.16) • Under H0, we can calculate the expected counts in each bin. • This is another use of the chi-squared GOF test. • Numerical data, bins and observed counts, H0 and expected counts, GOF test.
Class 08 2-tailed tests and sample proportions • P-value is always the probability of observing a test statistic AS EXTREME as ours under H0. • The interpretation of “extreme” depends on Ha. • Ha: p>.5, extreme means X ≥ 8. • Ha: p≠.5, extreme means X ≥ 8 or X ≤ 2 • The calculation of the p-value depends on Ha. • The p-value for a 2-tailed test will usually be twice that from a one-tailed test. • It is more difficult to find statistical significance when you are less sure what you are looking for.
There is often another way… • X is Binomial(n,p) • X is N(n*p,[n*p*(1-p)]^.5) • = X/n is N(p,[p*(1-p)/n]^.5) • Going from binomial to normal is an approximation good when n is big • The last two ARE EXACLTY THE SAME.
Suppose n=100, p=.5 [n*p*(1-p)]^.5 X is N(50,5) X is N(0.5,0.05) 35 40 45 50 55 60 65 0.35 0.40 0.45 0.50 0.55 0.60 0.65
When interpreting Statistics…. • It’s all about the n! • 60% correct is unimpressive if n=10 and very impressive if n=100. • Even with the chi-squared GOF test, n matters • Distance = (15-10)^2/10 = 2.5 • Distance = (150-100)^2/100 = 25 n is the total count (907 for roulette, 288 for soccer, 144 for Lorex)
Hypothesis Tests we have come to love……. [149*.5*.5]^.5 [.5*.5/149]^.5
Pfeifer’s Pfoibles • How many will she get correct? • Do I have the disease? • Give me probabilities as answers even though I did not explicitly ask for them. • H0 is a statement about the process, not the outcomes. • If p-value < 0.05 say …. • The result is statistically significant • We reject the null hypothesis. • If p-value > 0.05 say … • The result is NOT statistically significant. • We fail to (or cannot) reject H0.