INFERENTIAL STATISTICS

INFERENTIAL STATISTICS • Samples are only estimates of the population • Sample statistics will be slightly off from the true values of its population’s parameters • Sampling error: • The difference between a sample statistic and a population parameter • Probability theory • Permits us to estimate the accuracy or representativeness of the sample

The “Catch-22” of Inferential Statistics • When we collect a sample, we know nothing about the population’s distribution of scores • We can calculate the mean (x-bar) & standard deviation (s) of our sample, but  and  are unknown • The shape of the population distribution (normal or skewed?) is also unknown

Sampling Distribution(a.k.a. “Distribution of Sample Outcomes”) • “OUTCOMES” = proportions, means, obtained test statistics (z, t, F, chi-square) • Hypothetical, based on infinite random sampling, a mathematical description of all possible sampling event outcomes • And the probability of each one • Permits us to make the link between sample and population… • What is the likelihood that a sample findings accurately reflect the population? • Is what’s true for the sample also true for the population?

1. Estimation ESTIMATION

Introduction to Estimation • Estimation procedures • Purpose: • To estimate population parameters from sample statistics • Using the sampling distribution to infer from a sample to the population • Most commonly used for polling data • 2 components: • Point estimate (sample mean, sample proportion) • Confidence intervals

Sampling Distributions: Central tendency • Sample outcomes (means, proportions, etc.) will cluster around the population outcomes • Since samples are random, the sample outcomes should be distributed equally on either side of the population outcome • The mean of the sampling distribution for sample means (a bunch of x’s) is always equal to the population mean (μ) • The mean of the sampling distributions for proportions (infinite number of sample p’s), is equal to the population value Pμ

Sampling Distribution: Dispersion • Standard Error (SE) • The standard deviation of a sampling distribution • Measures the spread of sampling error that occurs when a population is sampled repeatedly • How far does the typical sample outcome fall from the mean of the sampling distribution? • Formulas: s / √N-1 (standard error for means) (standard error for proportions)

CONNECTION • Probability theory tells us that outcomes plotted from repeated random samples will produce a normal distribution • We use z scores • 95% of outcomes will fall within +/- 1.96 standard errors of the true population parameter • 99% of outcomes will fall within +/-2.58 standard errors of the true population parameter

Calculate the Standard Error Based on Your Sample • Always need 2 things • “Sample size (N) • Dispersion in sample • Sample measures of dispersion used as estimates of population • Formulas • Means  s / √N-1 • Proportions 

Hypothesis Testing (intro) Estimation HYPOTHESIS TESTING

Hypothesis Testing & Statistical Inference • We almost always test hypotheses using sample data • Draw conclusions about the population based on sample statistics • Therefore, always possible that any finding is due to sampling error • Are the findings regarding our hypothesis “real” or due to sampling error? • Is there a “statistically significant” finding? • Therefore, also referred to as “significance testing”

Testing a hypothesis 101 • State the research & null hypotheses • Set the criteria for a decision • Alpha, critical regions for particular test statistic • Compute a “test statistic” • A measure of how different finding is from what is expected under the null hypothesis • Make a decision • REJECT OR FAIL TO REJECT the null hypothesis • We cannot “prove” the null hypothesis (always some non-zero chance we are incorrect)

Sampling Distributions • Again, HYPOTHETICAL distribution based on an infinite number of sample outcomes • Based on what would happen if we got an infinite number of sample outcomes from a population where THE NULL WAS TRUE by assuming the null hypothesis is correct. • “OUTCOMES” are the test statistics (t, F, chi-square) • If the null was true, t should be close to zero (null says means are equal). • If the null was true, F should be less than (or close to) 1

Decision Making • Since we assume that in the population the null is true… • Large observed “test statistics” indicate that our findings are odd, or rare, or quite different from what we would expect if our initial assumption was correct • At some point, they will get large enough that we reject our initial assumption

Decision Making Part II • Critical values • The value of a test statistic where the associated probability is the same as alpha • Dictate “critical region” • If observed test statistic is in critical region, we reject the null –this is a “significant” finding/relationship • Sig values • The exact odds of obtaining any particular test statistic • If the null is true, there is an ___% chance of obtaining this finding • If it drops below alpha, we reject the null (significant finding)

Getting the right test statistic • If one of the variables is I-R… • You can calculate a mean • How many means are you comparing? • Only one sample mean (and a population mean) = one sample t-test • Dummy variable (sex) = two means = two sample (or independent sample) t-test • Nominal with more than two categories = F-test • If both variables are nominal/ordinal • Only test is chi-square

Exam • No multiple choice • A few “short answer” • Lots of interpretation/decision making • Calculate (given formulas) • One sample t, chi-square • 95% and 99% confidence intervals (means, proportion) • Interpret • t, F, chi-square

INFERENTIAL STATISTICS