Understanding t Distribution for Statistical Analysis

Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young

Objective Sec. 10.1 • After this section you will understand when it is appropriate to use the t distribution rather than the normal distribution for constructing confidence intervals or conducting hypothesis tests for population means, and know how to make proper use of the t distribution.

Sec. 10.1 t Distribution for Inferences about a Mean • When dealing with confidence intervals (ch.8) and hypothesis testing (ch.9), we worked with samples that were large enough to assume a normal distribution which allowed us to use the standard scores (z-scores) to find probabilities of certain values occurring • Recall that in order to find the z-score, the population standard deviation is needed • In real applications, the population standard deviation is typically not available, which means that in order to find the confidence interval or conduct the hypothesis test we would estimate it using the sample standard deviation • Many statisticians believe that this is not the best approach and they use what is known as a t distribution (or student t distribution) in place of the normal distribution • As long as the sample size is at least 30 or the population assumes a normal distribution, a t distribution can be used to find a confidence interval and/or conduct a hypothesis test • The t distribution is similar in shape and symmetry to the normal distribution • It accounts for greater variability that is expected with small samples • Note ~ when you know the population standard deviation and the sample size is greater than 30 or the population is normally distributed, the normal distribution is best to use

Sec. 10.1 t Distribution for Inferences about a Mean • The following diagram is a comparison between the standard normal distribution and two different t distributions of sample size n = 3 and n = 12 • As you can see, they are very similar in shape, and as the sample size increases, the t distribution becomes more and more normal

Sec. 10.1 Confidence Intervals Using the t Distribution • When determining a confidence interval using a t distribution, we use t values rather than z-scores to determine significance • A t value is a number that represents the number of standard deviations a value falls from the mean on a t distribution • Recall that to write a confidence interval, you must first calculate the margin of error • The formula for the margin of error using a t distribution is: • t = t value • Found by looking up the value that corresponds to the appropriate number of degrees of freedom (table 10.1 on P.412 ) • n = sample size • s = standard deviation of the sample

Degrees of freedom Use column 3 for a 90% confidence level for a two-tailed test (or confidence interval) Use column 2 for a 97.5% confidence level for a one-tailed test Use column 2 for a 95% confidence level for a two-tailed test (or confidence interval) Use column 3 for a 95% confidence level for a one-tailed test Sec. 10.1 Critical Values of t

Sec. 10.1 Confidence Intervals Using the t Distribution • Recall that the standard form for a confidence interval when dealing with means is: • Example 1 ~ Diastolic Blood Pressure • Here are five measures of diastolic blood pressure from randomly selected adult men: 78, 54, 81, 68, 66. These five values result in these sample statistics: n = 5, , and s = 10.7. Using this sample, construct the 95% confidence interval estimate of the mean diastolic blood pressure level for the population of all men. • Note ~ we are using the t distribution because the population standard deviation is not known and it is reasonable to assume that blood pressure levels are normally distributed • Before finding the margin of error, we must first find the t value from the table that corresponds to 4 degrees of freedom (since the sample size was 5; the degrees of freedom is 5 – 1, or 4) • For the 95% confidence level, 4 degrees of freedom corresponds to a t value of t = 2.776 • Note ~ for confidence intervals, we use the t values for the “area in two tails” because the margin of error can either be below the mean or above the mean

Sec. 10.1 Confidence Intervals Using the t Distribution • Example 1 Cont’d… • Here are five measures of diastolic blood pressure from randomly selected adult men: 78, 54, 81, 68, 66. These five values result in these sample statistics: n = 5, , and s = 10.7. Using this sample, construct the 95% confidence interval estimate of the mean diastolic blood pressure level for the population of all men. • Now that we know that t = 2.776, we can find the margin of error: • To construct the confidence interval, add and subtract the margin of error to the sample mean ( ) • Based on the five sample measurements, we can be 95% confident that the true mean of diastolic blood pressure for adult men is between 56.1 and 82.7

null hypothesis Sec. 10.1 Hypothesis Tests Using the t Distribution • When a t distribution is used to conduct a hypothesis test, the t value plays the role that the z-score played when we worked with the normal distribution • Recall, that we determined statistical significance by comparing the z-score to critical values or by using the z-score to determine the P-value • Use the following formula to calculate the t value: • This t value is then compared to the “Critical Values of t” chart to determine significance • Note ~ a P-value can be calculated, but it is usually done with the aid of statistical software in which case we will not be calculating the P-values using a t distribution in this course

Sec. 10.1 Hypothesis Tests Using the t Distribution • Once you calculate t, you can decide whether to reject or not reject the null hypothesis by using this following criteria: • Right-tailed test: reject the null if the t value that you found is ≥ the t value from the table (that corresponds to the appropriate degrees of freedom) • Use column 2 as a comparison if you want a 97.5% confidence level and column 3 if you want a 95% confidence level • Left-tailed test: reject the null if the t value that you found is ≤ the negative of the t value from the table (that corresponds to the appropriate degrees of freedom) • Use column 2 as a comparison if you want a 97.5% confidence level and column 3 if you want a 95% confidence level • Two-tailed test: reject the null if the absolute value of the t value that you found is ≥ to the t value from the table (that corresponds to the appropriate degrees of freedom) • Use column 2 as a comparison if you want a 95% confidence level and column 3 if you want a 90% confidence level

Sec. 10.1 Hypothesis Tests Using the t Distribution Example 2 ~ Right Tailed Hypothesis Test for a Mean Listed below are ten randomly selected IQ scores of statistics students: 111 115 118 100 106 108 110 105 113 109 Using methods from Chapter 4, you can confirm that these data have the following sample statistics: n = 10, , and s = 5.2. Using a 0.05 significance level, test the claim that statistics students have a mean IQ score greater than 100, which is the mean IQ score of the general population. • Step 1: • Step 2: • Sample size: n = 10 • Sample mean: • Standard deviation of the sample: s = 5.2

Sec. 10.1 Hypothesis Tests Using the t Distribution • Step 3: • Since this is a one-tailed test, the t value that we will be comparing will be found in the 3rd column of the table that corresponds to 9 degrees of freedom (10 – 1); it is 1.833 • Since this is a right-tailed test, it will be statistically significant if the t value that we found is greater than or equal to the t value of 1.833 (found in the table) • 5.777 is greater than 1.833, so this is statistically significant at the 0.05 level • Step 4: • Since this is statistically significant at the .05 level, we can conclude that we have enough evidence to reject the null hypothesis and support the claim that the mean IQ score of the general population is greater than 100

Sec. 10.1 Hypothesis Tests Using the t Distribution Example 3 ~ Two Tailed Hypothesis Test for a Mean Using the same data from example 2 and the same significance level of .05, test the claim that the mean IQ score is equal to 100 • Step 1: • Step 2: • Sample size: n = 10 • Sample mean: • Standard deviation of the sample: s = 5.2 • Step 3: • Since this is a two-tailed test, we are looking at column 2 for a .05 significance level • The degrees of freedom is 9, so the t value in the table is 2.262 • Because this is a two-tailed test, this will be statistically significant at the .05 level if the absolute value of our t value (5.777) is greater than or equal to 2.262

Sec. 10.1 Hypothesis Tests Using the t Distribution • Step 4: • Since the absolute value of the t value that we found (5.777) is greater than 2.262, we can say that this is statistically significant at the .05 level and therefore reject the null hypothesis that the mean score is equal to 100 • In other words, there is sufficient evidence that supports the alternative hypothesis that the mean IQ score is not equal to 100

Understanding t Distribution for Statistical Analysis