Review #2

Review #2 Chapter 9 Chapter 10 Chapter 11 and 12

Chapter 9Sampling Distributions • A statistic is a random variable describing a characteristic of a random samples. • Sample mean • Sample variance • We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics). • Statistics have distributions of their own.

Chapter 9 The Central Limit Theorem • The distribution of the sample mean is normal if the parent distribution is normal. • The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n ³ 30), even if the parent distribution is not normal. • The parameters of the sample distribution of the mean are: • Mean: • Standard deviation: (Assumption:The population is sufficiently large. No correction is needed in the calculation of the variance).

Chapter 9 The Central Limit Theorem • Problem 1 (Using Excel) Given a normal population whose mean is 50 and whose standard deviation is 5, • Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 • Answer: -.4 .8

Normal table Chapter 9The Central Limit Theorem • Problem 1 (Using the table) Given a normal population whose mean is 50 and whose standard deviation is 5, • Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 • Answer: -.4 .8

Normal table Chapter 9The Central Limit Theorem • Problem 1 • Question 2: Find the probability that a random sample of 16 has a mean between 49 and 52. • Answer

Normal table Chapter 9 The Central Limit Theorem • Problem 2: The amount of time per day spent by adults watching TV is normally distributed with m=6 and s=1.5 hours. • Question 1: What is the probability that a randomly selected adult watches TV for more than 7 hours a day? • Answer: • Question 2: What is the probability that 5 adults watch TV on the average 7 or more hours?Answer:

Normal table Chapter 9 The Central Limit Theorem • Problem 2: • Question 3: What is the probability that the total time of watching TV of the five adults will not exceed 28 hours? • Answer: • Question 4: What total TV watching time is exceeded by only 3% of the population for samples of 5 adults? Comments: 1.Excel returns X for agiven left hand tail probability 2. .670822 = 1.5/5.5

Normal table Chapter 9 The Central Limit Theorem • Problem 3: Assume that the monthly rents paid by students in a particular town is $350 with a standard deviation of $40. A random sample of 100 students who rented apartments was taken. Question1: What is the probability that the sample mean of the monthly rent exceeds $355?

40/10.5 = 12.64911 Normal table Chapter 9 The Central Limit Theorem • Problem 3 - continued Question2: What is the probability that the total revenue from renting 10 randomly selected apartments falls between 3300 and 3700 dollars?

Normal table Chapter 9 The Central Limit Theorem • Problem 3 - continued Question3: Let’s assume the population mean was unknown, but the standard deviation was known to be $40. A sample of 100 rentals was selected in order to estimate the mean monthly rent paid by the whole student population. What is the probability that the sample mean differ from the actual mean by more than $5? How about more than $10?

Chapter 9 The Central Limit Theorem • Problem 3 – continued

Chapter 9Sampling distribution of the sample proportion In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p = x/n is approximately normally distributed with the following parameters: ^ • (Assumption: The population is sufficiently large. No correction is needed in the calculation of the variance).

Sampling distribution of the sample proportion • Problem 4: • A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. • A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim.

Normal table Sampling distribution of the sample proportion Problem 4 - Continued: Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year? If indeed 10% of the sampled households reported a call for service within the first year, what does ittell you about the the manufacturer claim?

Sampling Distribution of the Difference Between two Means • If two independent variables are normally distributed with means and variances m1, s21, and m2, s22respectively, then x1 – x2 is also normally distributed with:

Sampling Distribution of the Difference Between two Means • When at least one of the populations is not normally distributed but the samples sizes are both at least 30, x1 – x2 is approximately normally distributed, with a mean and a variance as indicated above.

Sampling Distribution of the Difference Between two Means • Example: A national TV telethon committee is interested in determining whether donations made by males are on the average larger than those made by females by $4. Two samples of 25 males and 25 females were selected, and the donations made recorded. If the standard deviations of the male and female populations are $2.4 and $1.8 respectively, what is the probability that sample mean of the male donations exceeds the sample mean of the female donations by at least $5? Assume donations for the two populations are normally distributed.

Sampling Distribution of the Difference Between two Means • Solution For males For females

Chapter 10Introduction to Estimation • A population’s parameter can be estimated by a point estimator and by an interval estimator. • A confidence interval with 1-a confidence level is an interval estimator that covers the estimated parameters (1-a)% of the time. • Confidence intervals are constructed using sampling distributions.

a/2 a/2 1 - a -za/2 za/2 Confidence interval of the mean – Known Variance • We use the central limit theorem to build the following confidence interval

Confidence interval of the mean – Known Variance • Problem 5: How many classes university students miss each semester? A survey of 100 students was conducted. (See Data next) • Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student. Use 99% confidence level.

Data Confidence interval of the mean – Known Variance • Solution = 10.21 2.575 = 10.21 .57 1- a = .99 a = .01 a/2 = .005 Za/2 = Z.005= 2.575 LCL = 9.64, UCL = 10.78 You can used Data Analysis Plus > Z-Estimate: Mean

Data Confidence interval of the mean – Known Variance • Solution (using Data Analysis Plus): • Shade the data set (you may include the title label) • Select Data Analysis Plus, then “Z-Estimate: Mean” • Type in the sigma (2.2), check Labels (if appropriate), type in alpha (.01), click OK.

Selecting the sample size • The shorter the confidence interval, the more accurate the estimate. • We can, therefore, limit the width of the interval to 2W, and get • From here we have W is called “Margin of error”, or “Bound on the error estimate”

Selecting the sample size • Problem 6An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component. • Sigma is known to be 6 minutes. • The required estimate accuracy is within 20 seconds. • The confidence level is 90%; 95%. • Find the sample size.

Selecting the sample size • Solution s = 6 min; W = 20 sec = 1/3 min; • 1 - a =.90 Za/2 = Z.05 = 1.645 • 1-a = .95, Za/2 = Z.025 = 1.96

Chapter 11Hypotheses tests • In hypothesis tests we hypothesize on a value of a population parameter, and test to see if there is sufficient evidence to support our belief. • The structure of hypotheses test • Formulate two hypotheses. • H0: The one we try to reject in favor of … • H1: The alternative hypothesis, the one we try to prove. • Define a significance level a.

Hypotheses tests • The significance level is the probability of erroneously reject the null hypothesis. a= P(reject H0 when H0 is true) • Sample from the population and calculate a statistic that provides an indication whether or not the parameter value under H1 is more likely to be true. • We shall test the population mean assuming the standard deviation is known.

Hypotheses tests of the Mean – Known Variance • Problem 7: A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch.

Hypotheses tests of the Mean – Known Variance • Solution:The population studied is the ball-bearing diameters. • We hypothesize on the population mean. • A good point estimator for the population mean is the sample mean. • We use the distribution of the sample mean to build a sample statistic to test whether m = .50 inch.

Hypotheses tests of the Mean – Known Variance Solution – (A Two Tail rejection region) • Define the hypotheses: • H0: m = .50 • H1: m = .50 The probability of conducting atype one error

Hypotheses tests of the Mean – Known Variance Solution - A Two Tail rejection region Critical Z Z.025 = 1.96 (obtained from the Z-table) Build a rejection region: Zsample> Za/2, or Zsample<-Za/2 1.96 -1.96 Calculate the value of the sample Z statistic and compare it to the critical value Since 2 > 1.96, there is sufficient evidence to rejectH0 in favor of H1 at 5% significance level.

Hypotheses tests of the Mean – Known Variance Solution - A Two Tail rejection region • We can perform the test in terms of the mean value. • Let us find the critical mean values for rejection XL2=m0 + Z.025 =.50+1.96(.05)/(100)1/2=.5098 XL1=m0 - Z.025 =.50 -1.96(.05)/(100)1/2=.402 Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level.

Hypotheses tests of the Mean – Known Variance • Calculate the p value of this test • Solutionp-value = P(Z > Zsample) + P(Z < -Zsample) = P(Z > 2) + P(Z < -2) = 2P(Z > 2) = 2[1 - .9772} = .0456 • Since .0456 < .05, H0 is rejected.

Hypotheses tests of the Mean – Known Variance • Problem 8 • The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%. • It is believed that banks that exercise comprehensive planning do better. • A sample of 26 banks that exercise comprehensive training provide the following result: Mean return = 10.5% • Can we infer that the belief about bank performance is supported at 10% significance level by this sample result?

Data Hypotheses tests of the Mean – Known Variance • Solution: (A right Hand Tail Rejection region)The population tested is the “annual rate of return”. • H0: m = 10.2 • H1: m > 10.2 • Let us perform the test with the standardized rejection region approach: Zsample > Z.10 (Right hand tail rejection region)Z.10 = 1.28. Reject H0 if Zsample > 1.28

Hypotheses tests of the Mean – Known Variance • Conclusion • At 10% significance level there is sufficient evidence in the data to reject H0 in favor of H1, since the sample statistic falls inside the rejection region. • Interpretation: • If we are willing to accept 10% chance of making the wrong conclusion, we can conclude banks conducting comprehensive training perform better than banks who do not.

Data Hypotheses tests of the Mean – Known Variance • Let us perform the test with the p-value method: P(X > 10.5given that m = 10.2) = P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = .5 - .4719 = .0281 • Since .0281 < .10 we reject the null hypothesis at 10% significance level.

.10 .0281 Hypotheses tests of the Mean – Known Variance • Note the equivalence between the standardized method or the rejection region method and the p-value method. P(Z>Z.10) = .10Z10 = 1.28 The statement “p-value is smallerthan alpha, is equivalent to the statement “ the test statistic fallsin the rejection region” 1.28 1.91

Hypotheses tests of the Mean – Known Variance • Problem 9 • In the midst of labor-management negotiations, the president of a company argues that the company’s blue collar workers, who are paid an average of $30K a year, are well-paid because the mean annual pay for blue-collar workers in the country is less than $30K. • This figure is disputed by the union. To test the president’s belief an arbitrator draws a random sample of 350 blue-collar workers from across the country and their income recorded (see file Salaries). • If the arbitrator assumes that income is normally distributed with a standard deviation of $8,000, can it be inferred at 5% significance level that the company’s president is correct?

Data Hypotheses tests of the Mean – Known Variance • Solution (A left Hand Tail Rejection Region)The population tested is the ann. Salary • H0: m = 30KH1: m < 30K • Left hand Tail Rejection region: Z < -Z.05 or Z < -1.645ZSample =(29,119.5-30,000)/(8,000/350.5)= -2.059Since –2.059 < -1.645 there is sufficient evidence to infer that on the average blue collar workers’ income is lower than $30K at 5% significance level.

Hypotheses tests of the Mean – Known Variance • Calculate the p-value of this test: • Solutionp-value = P(Z < Zsample) = P(Z < -2.059)

Type II Error • Problem 7a Calculate b for the two-tail hypotheses test performed in problem 7, when the actual mean diameter is .515 inch. • Solution • The rejection region in terms of the critical values of the sample mean was found before: XL1 = .402; XL2 = .5098. b = P(Do not reject H0 when H1 is true) = P(.402 < < .5098 when m = .515) = P(.402-.515)/[.05/(100).5] < Z < (.5098-.515)/[.05/(100).5] P(-22.6 < Z < -1.04) = P(1.04 < Z < 22.6) = = 1 - .8508 = .1492 • This large probability may be reduced by taking larger samples H0: m = .500H1: m = .515 P(Z<22.6) – P(Z<1.04) ≈ 1-P(Z<1.04)

Ch 12: Inference when the Variance is Unknown • Generally, the variance may be unknown • In this case we change the test statistic from “Z” to “t”, when testing the population mean. • To test the population proportion we’ll use the normal distribution (under certain conditions).

Testing the mean – unknown variance • Replace the statistic Z with “t” The original distribution must be normal (or at least mound shaped).

Testing the mean – unknown variance • Problem 10 • A federal agency inspects packages to determine if the contents is at least as large as that advertised. • A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (data is provided later) • From the sample results… • Can we conclude that the average weight does not meet the weight stated? (use a = .05). • Estimate the mean weight of all containers with 99% confidence • What assumption must be met?

Testing the mean – unknown variance • Solution • We hypothesize on the mean weight. • H0: m = 8.04 • H1: m < 8.04 • (i) n=5. For small samples let us solve manuallyAssume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94 • The rejection region: t < -ta, n-1 = -t.05,5-1 = -2.132The tsample = ? • Mean = (8.07+…+7.94)/5 = 7.996Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054 -2.132

Testing the mean – unknown variance • The tsample is calculated as follows: • Since -1.32 > -2.132 the sample statistic does not fall in the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level. -.165 -2.132

1-a = .99 a = .01 a/2 = .005 t.005,50-1 = about 2.678 from the t - table Testing the mean – unknown variance • (ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain:Mean = 8.02; Std. Dev. = .04 • The confidence interval is calculated by = 8.02 2.678 = 8.02 .015 LCL = 8.005, UCL = 8.35

Review #2

Review #2

Presentation Transcript

Review, REVIEW!

Review Notes Lecture Review

REVIEW, REVIEW, REVIEW!!

ACT Review Paragraphs Review

Review

review

Geometry Review CRCT Review

Review Trust Review