Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time • Sampling • Experiments • Design • Controls • Randomization • Blind & Double Blind • Pepsi Challenge

Midterm I Coming up: Tuesday, Feb. 15 Material: HW Assignments 1 – 4 Extra Office Hours: Mon. Feb. 14, 8:30 – 12:00, 2:00 – 3:30 (Instead of Review Session) Bring Along: 1 8.5” x 11” sheet of paper with formulas

Sec. 3.4: Basics of “Inference” Idea: Build foundation for statistical inference, i.e. quantitative analysis (of uncertainty and variability) Fundamental Concepts: Population described by parameters e.g. mean , SD . Unknown, but can get information from…

Fundamental Concepts Last page: Population, here: Sample (usually random), described by corresponding “statistics” e.g. mean , SD . (Will become important to keep these apart)

Population vs. Sample E.g. 1: Political Polls • Population is “all voters” • Parameter of interest is: = % in population for A (bigger than 50% or not?) • Sample is “voters asked by pollsters” • Statistic is = % in sample for A (careful to keep these straight!)

Population vs. Sample E.g. 1: Political Polls • Notes • is an “estimate” of • Variability is critical • Will construct models of variability • Possible when sample is random • Recall random sampling also reduces bias

Population vs. Sample E.g. 2: Measurement Error (seemingly quite different…) • Population is “all possible measurem’ts” (a thought experiment only) • Parameters of interest are: = population mean = population SD

Population vs. Sample E.g. 2: Measurement Error • Sample is “measurem’ts actually made” • Statistics are: = mean of measurements = SD of measurements

Population vs. Sample E.g. 2: Measurement Error • Notes: • estimates • estimates • Again will model variability • “Randomness” is just a model for measurement error

Population vs. Sample HW: 3.59 3.61

Basic Mathematical Model Sampling Distribution Idea: Model for “possible values” of statistic E.g. 1: Distribution of in “repeated samplings (thought experiment only) E.g. 2: Distribution of in “repeated samplings (again thought experiment)

Basic Mathematical Model Sampling Distribution Tools Can study these with: • Histograms  “shape”: often Normal • Mean  Gives measure of “bias” • SD  Gives measure of “variation”

Bias and Variation Graphical Illustration Scanned from text: Fig. 3.9

Bias and Variation Class Example: Results from previous class on “Estimate % of males at UNC” https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg16.xls Recall several approaches to estimation (3 bad, on sensible)

E.g. % Males at UNC At top: • Counts • Corresponding proportions (on [0,1] scale) • Bin Grid (for histograms on [0,1] numbers) Next Part: • Summarize mean of each • Summarize SD (spread) of each Histograms (appear next)

E.g. % Males at UNC Recall 4 way to collect data: Q1: Sample from class Q2: Stand at door and tally • Q1 “less spread and to left”? Q3: Make up names in head • Q3 “more to right”? Q4: Random Sample • Supposed to be best, can we see it?

E.g. % Males at UNC Better comparison: Q4 vs. each other one Use “interleaved histograms” Q1 & Q4: • Q1 has smaller center: • i.e. “biased”, since Class Population • And less spread: • since “drawn from smaller pool”

E.g. % Males at UNC Q2 & Q4: • Centers have Q2 bigger: • Reflects bias in door choice • And Q2 is “more spread” : • Reflects “spread in doors chosen” + “sampling spread”

E.g. % Males at UNC Q3 & Q4: • Center for Q3 is bigger: • Reflects “more people think of males”? • And Q3 is “more spread” : • Reflects “more variation in human choice”

E.g. % Males at UNC A look under the hood: • Highlight an interleaved Chart • Click Chart Wizard • Note Bar (and interleaved subtype) • Different colors are in “series” • Computed earlier on left • Using Tools  Data Anal.  Histo’m

E.g. % Males at UNC Interesting question: What is “natural variation”? Will model this soon. This is “binomial” part of this example, which we will study later.

Bias and Variation HW: 3.62 (Hi bias – hi var, lo bias – lo var, lo bias – hi var, hi bias – lo var) 3.65

Chapter 4: Probability Goal: quantify (get numerical) uncertainty • Key to answering questions above (e.g. what is “natural variation” in a random sample?) (e.g. which effects are “significant”) Idea: Represent “how likely” something is by a number

Simple Probability E.g. (will use for a while, since simplicity gives easy insights) Roll a die (6 sided cube, faces 1,2,…,6) • 1 of 6 faces is a “4” • So say “chances of a 4” are: “1 out of 6” . • What does that number mean? • How do we find such for harder problems?

Simple Probability A way to make this precise: “Frequentist Approach” In many replications (repeat of die roll), expect about of total will be 4s Terminology (attach buzzwords to ideas): Think about “outcomes” from an “experiment” e.g. #s on die e.g. roll die, observe #

Simple Probability Quantify “how likely” by assigning “probabilities” I.e. a number between 0 and 1, to each outcome, reflecting “how likely”: Intuition: • 0 means “can’t happen” • ½ means “happens half the time” • 1 means “must happen”

Simple Probability HW: C10: Match one of the probabilities: 0, 0.01, 0.3, 0.6, 0.99, 1 with each statement about an event: • Impossible, can’t occur. • Certain, will happen on every trial. • Very unlikely, but will occur once in a long while. • Event will occur more often than not.

Simple Probability Main Rule: Sum of all probabilities (i.e. over all outcomes) is 1: E.g. for die rolling:

Simple Probability HW: 4.13a 4.15

Probability General Rules for assigning probabilities: • Frequentist View (what happens in many repititions?) • Equally Likely: for n outcomes P{one outcome} = 1/n (e.g. die rolling) iii. Based on Observed Frequencies e.g. life tables summarize when people die Gives “prob of dying” at a given age “life expectancy”

Probability General Rules for assigning probabilities: • Personal Choice: • Reflecting “your assessment: • E.g. Oddsmakers • Careful: requires some care HW: 4.16

Stat 31, Section 1, Last Time