Final review - statistics Spring 03

Final review - statistics Spring 03 Also, see final review - research design

Statistics Descriptive Statistics Statistics to summarize and describe the data we collected Inferential Statistics Statistics to make inferences from samples to the populations

Descriptive Statistics A summary of your data • Center / Central Tendencies • Indicates a central value for the variable • Measures of Dispersion (Variability / Spread) • Indicate how much each participants’ score vary from each other • Measures of Association Indicates how much variables go together • (Shown in Tables, Graphs, Distributions)

Measures of Center • Mode • A value with the highest frequency • The most common value • Median • The “middle” score • Mean • Average

WHY are LEVELS / SCALE of MEASUREMENT IMPORTANT? • Because you need to match the statistic you use to the kind of variable you have

Mode Mode Mode Median Median Measures of Central Tendency, Center Nominal Ordinal Interval/Ratio Mean

Calculate Math Summary Meaningful Zero Equal Interval Ratio Interval Info of difference among values Order Ordinal Difference Nominal Level of Measurement

Why “Equal Distance” Matters? • If the distance between values are equal (as in interval or ratio data), you are able to calculate (add, subtract, multiply, divide) values • You can get a mean only for interval/ratio variables • A wider variety of statistical tests are available for interval/ratio variables

4 5 6 7 8 9 10 What are the Mean, Median, and Mode for this distribution? What is this distribution shape called?

Types of Measures of Dispersion Variability / Spread • Frequencies / Percentages • Range • The distance between the highest score and the lowest score (highest – lowest) • Standard deviation / • Variance

Variance / Standard Deviation • Variance (S-squared): An approximate average of the squared deviations from the mean • Standard Deviation(S or SD): Square root of variance • The larger the variance/ SD is, the higher variability the data has or larger variation in scores, or distributions that vary widely from the mean.

Frequency, % Frequency, % Frequency, % Range, IQR Range, IQR Measures of Dispersion Nominal Ordinal Interval/Ratio StandardDeviatn, Variance

CORRELATION • Co-relation • 2 variables tend to “go together” • Indicates how strongly and in which direction two variables are correlated with each other • *** Correlation does NOT EQUAL cause

Positive correlation: As one • variable increases, so does the 2nd • Negative correlation: As one variable increases, the 2nd gets smaller SIGN • 0: No systematic relationship

Perfect Perfect None Stronger Stronger Weaker Correlation Co-efficient Negative Positive -1 0 +1

SIZE • Ranges from –1 to + 1 • 0 or close to 0 indicates NO relationship • +/- .2 - .4 weak • +/- .4 - .6 moderate • +/- .6 - .8 strong • +/- .8 - .9 very strong • +/- 1.00 perfect Negative relationships are NOT weaker!

Significance Test • Correlation co-efficient also comes with significance test (p-value) • p=.05: .05 probability of no correlation in the population = 5% risk of TYPE I Error = 95% confidence level • If p<.05, reject H0 and support Ha at 95% confidence level

Inferential Statistics • Infer characteristics of a population from the characteristics of the samples. • Hypothesis Testing • Statistical Significance • The Decision Matrix

Inferential Statistics Sample Statistics X SD n Population Parameters m s N

Inferential Statistics • assess -- are the sample statistics indicators of the population parameters? • Differences between 2 groups -- happened by chance? • What effect do random sampling errors have on our results?

Random sampling error Random sampling error: Difference between the sample characteristics and the population characteristics caused by chance • Sampling bias: Difference between the sample characteristics and the population characteristics caused by biased (non-random) sampling

Probability • Probability (p) ranges between 1 and 0 • p = 1 means that the event would occur in every trial • p = 0 means the event would never occur in any trial • The closer the probability is to 1, the more likely that the event will occur • The closer the probability is to 0, the less likely the event will occur

P > .05 means that … • Means of two groups fall in 95% central area of normal distribution with one population mean Mean 1 Mean 2 95%

P < .05 means that … • Means of two groups do NOT fall in 95% central area of normal distribution of one population mean, so it is more reasonable to assume that they belong to different populations 1 2

Null Hypothesis • Says IV has no influence on DV • There is no difference between the two variables. • There is no relationship between the two variables.

Null Hypothesis • States there is NO true difference between the groups • If sample statistics show any difference, it is due to random sampling error • Referred as H0 • (Research Hypothesis = Ha) • If you can reject H0, you can support Ha • If you fail to reject H0, you reject Ha

Be conservative. • What are chances I would get these results if null hypothesis is true? • Only if pattern is highly unlikely (p .05) do you reject null hypothesis and support your hypothesis • Since cannot be 100% sure your conclusion is correct, you take up to 5% risk. • Your p-value tells you the risk /the probability of making TYPE I Error

Correct Correct True state Wrong person to marry Right person to marry Type II error You think it’s the wrong person to marry Type I error You think - right person to marry

Correct Correct True state Fire No fire Type II error No Alarm Type I error Alarm

Correct Correct True State Fire Ho (no fire) Ha You decide... Accept Ho (no alarm) Type II error Type I error Reject Ho Ho = null hypothesis = there is NO fire Alarm Ha = alternative hyp. = there IS a FIRE

Easy ways to LOSE points • Use the word “prove” • Better to say support the hypothesis or consistent with the hypothesis • Tentative statements acknowledge possibility of making a Type 1 or Type 2 error • Use the word “random” incorrectly

Significance Test • Significance test examines the probability of TYPE I error (falsely rejecting H0) • Significance test examines how probable it is that the observed difference is caused by random sampling error • Reject the null hypothesis if probability is <.05 (probability of TYPE I error is smaller than .05)

Principle Logic P < .05 Reject Null Hypothesis (H0) SupportYour Hypothesis (Ha)

Logic of Hypothesis Testing Statistical tests used in hypothesis testing deal with the probability of a particular event occurring by chance. Is the result common or a rare occurrence if only chance is operating? A score (or result of a statistical test) is “Significant” if score is unlikely to occur on basis of chance alone.

Level of Significance The “Level of Significance” is a cutoff point for determining significantly rare or unusual scores. Scores outside the middle 95% of a distribution are considered “Rare” when we adopt the standard “5% Level of Significance” This level of significance can be written as: p = .05

Decision Rules • Reject Ho (accept Ha) when • the sample statistic is statistically significant at the • chosen p level, otherwise accept Ho (reject Ha). • Possible errors: • You reject the Null Hypothesis when in fact it is true, • a Type I Error, or Error of Rashness. • You accept the Null Hypothesis when in fact it is false, • a Type II Error, or Error of Caution.

True state Data results are by chance (Null is true) Data indicates something is happening (Null is false) There is nothing happening except chance variation (accept the null) Correct Type II error Data indicates something significant is happening (reject null) Type I error Correct Your decision: 

Parametric Tests To compare two groups on Mean Scores use t-test. For more than 2 groups use Analysis of Variance (ANOVA) Nonparametric Tests Can’t get a mean from nominal or ordinal data. Chi Square tests the difference in Frequency Distributions of two or more groups.

Parametric Tests • Used with data w/ mean score or standard deviation. • t-test, ANOVA and Pearson’s Correlation r. • Use a t-test to compare mean differences between two groups (e.g., male/female and married/single).

Parametric Tests • use ANalysis Of VAriance (ANOVA) to compare more than two groups (such as age and family income) to get probability scores for the overall group differences. • Use a Post Hoc Tests to identify which subgroups differ significantly from each other.

When comparing two groups on MEAN SCORES use the t-test.

T-test • If p<.05, we conclude that two groups are drawn from populations with different distribution (reject H0) at 95% confidence level

Our Research Hypothesis: hair length leads to different perceptions of a person. The Null Hypothesis: there will be no difference between the pictures. When comparing two groups on MEAN SCORES use the t-test.

“I think she is one of those people who quickly earns respect.” Short Hair: Mean = 2.2 SD = 1.9 n = 100 p = .03 Accept Ha Mean scores come from different distributions. Long Hair: Mean = 4.1 SD = 1.8 n = 100 Accept Ho Mean scores reflect just chance differences from a single distribution.

“In my opinion, she is a mature person.” Short Hair: Mean = 1.6 SD = 1.7 n = 100 p = .01 Accept Ha Mean scores come from different distributions. Long Hair: Mean = 3.6 SD = 1.2 n = 100 Accept Ho Mean scores reflect just chance differences from a single distribution.

“I think we are quite similar to one another.” Short Hair: Mean = 3.7 SD = 1.8 n = 100 Accept Ha Mean scores come from different distributions. p = .89 Long Hair: Mean = 3.9 SD = 1.5 n = 100 Accept Ho Mean scores are just chance differences from a single distribution.

A nonsignificant result may be caused by a • A. low sample size. • B. very cautious significance level. • C. weak manipulation of independent variables. • D. true null hypothesis.

Parametric Interval or ratio data Non-parametric Ordinal and nominal data When to use various statistics

Chi-Square X2 • Chi Square tests the difference in frequency distributions of two or more groups. • Test of Significance • of two nominal variables or • of a nominal variable & an ordinal variable • Used with a cross tabulation table

Final review - statistics Spring 03