Statistical Inference and Estimation in Sampling Theory

Review #2 Chapter 9 Chapter 10 Chapter 11 Chapter 12

Chapter 9 • A statistic is a random variable describing a characteristic of a random samples. • Sample mean • Sample variance • We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics). • Statistics have distributions of their own.

The Central Limit Theorem • The distribution of the sample mean is normal if the parent distribution is normal. • The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n ³ 30), even if the parent distribution is not normal. • The parameters of the sample distribution of the mean are: • Mean: • Standard deviation:

Problem 1 • Given a normal population whose mean is 50 and whose standard deviation is 5, • Find the probability that a random sample of 4 has a mean between 49 and 52 • Answer: -.4 .8

Problem 2 • Find the probability that a random sample of 16 has a mean between 49 and 52. • Answer

Problem 2 • The amount of time per day spent by adults watching TV is normally distributed with m=6 and s=1.5 hours. • What is the probability that a randomly selected adult watches TV for more than 7 hours a day? • Answer: • What is the probability that 5 adults watch TV on the average 7 or more hours? • Answer:

Problem 2 • Additional question • What is the probability that the total TV watching time of the five adults sampled will exceed 28 hours? • Answer:

Sampling distribution of the sample proportion • In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p = x/n is approximately normally distributed with the following parameters: ^

Problem 3 • A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. • A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim.

Problem 3 • Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year? If indeed 10% of the sampled households reported a call for service within the first year, what does ittell you about the the manufacturer claim?

Chapter 10 • A population’s parameter can be estimated by a point estimator and by an interval estimator. • A confidence interval with 1-a confidence level is an interval estimator that covers the estimated parameters (1-a)% of the time. • Confidence intervals are constructed using sampling distributions.

a/2 a/2 1 - a -za/2 za/2 Confidence interval of the mean • We use the central limit theorem to build the following confidence interval

Problem 4 • How many classes university students miss each semester? A survey of 100 students was conducted. (see Missed Classes) • Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student. • Use 99% confidence level.

Problem 4 • Solution = 10.21 2.575 = 10.21 .57 1- a = .99 a = .01 a/2 = .005 Za/2 = Z.005= 2.575 • LCL = 9.64, UCL = 10.78

Selecting the sample size • The shorter the confidence interval, the more accurate the estimate. • We can, therefore, limit the width of the interval to W, and get • From here we have

Problem 5 • An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component. • Sigma is known to be 6 minutes. • The required estimate accuracy is within 20 seconds. • The confidence level is 90%; 95%. • Find the sample size.

Problem 5 • Solution s = 6 min; W = 20 sec = 1/3 min; • 1 - a =.90 Za/2 = Z.05 = 1.645 • 1-a = .95, Za/2 = Z.025 = 1.96

Chapter 11 • Hypotheses tests • In hypothesis tests we hypothesize on a value of a population parameter, and test to see if there is sufficient evidence to support our belief. • The structure of hypotheses test • Formulate two hypotheses. • H0: The one we try to reject in favor of … • H1: The alternative hypothesis, the one we try to prove. • Define a significance level a.

Hypotheses tests • The significance level is the probability of erroneously reject the null hypothesis. a= P(reject H0 when H0 is true) • Sample from the population and calculate a statistic that provides an indication whether or not the parameter value defined under H1 is more probable. • We shall test the population mean assuming the standard deviation is known.

Problem 6 • A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch.

Problem 6 • The population studied is the ball-bearing diameters. • We hypothesize on the population mean. • A good point estimator for the population mean is the sample mean. • We use the distribution of the sample mean to build a sample statistic to test whether m = .50 inch.

Problem 6 • Solution • Define the hypotheses: • H0: m = .50 • H1: m = .50 Define a rejection region. Note that this is a two tail test because of the inequality. Probability of type one error

Problem 6 Critical Z Z.025 = 1.96 (obtained from the Z-table) Build a rejection region: Zsample> Za/2, or Zsample<-Za/2 1.96 -1.96 Calculate the value of the sample Z statistic and compare it to the critical value Since 2 > 1.96, there is sufficient evidence to rejectH0 in favor of H1 at 5% significance level.

Problem 6 • We can perform the test in terms of the mean value. • Let us find the critical mean values for rejection XL1=m0 + Z.025 =.50+1.96(.05/(100)1/2=.5098 XL2=m0 - Z.025 =.50 -1.96(.05/(100)1/2=.402 Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level.

Problem 7 • The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%. • It is believed that banks that exercise comprehensive planning do better. • A sample of 26 banks that conducted a comprehensive training provided the following result: Mean return = 10.5%. • Can we infer that the belief about bank performance is supported at 10% significance level by this sample result?

Problem 7 • The population tested is the “annual rate of return.” H0: m = 10.2 H1: m > 10.2 • Let us perform the test with the p-value method: • P(X > 10.5given that m = 10.2) = P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = 1 - .5719 = .0281 • Since .0281 < .10 we reject the null hypothesis at 10% significance level.

Problem 7 • Note the equivalence between the standardized method or the rejection region method and the p-value method. • P(Z>Z.10) = .10Z10 = 1.28 • Run the test with Data Analysis Plus.See data in Return .0281 1.28 1.91

Type II Error • Type II error occurs when H0 is erroneously not rejected. • The probability of a type II error is called b. b=P(Do not reject H0when H1is true) • To calculate b: • H1 specifies an actual parameter value (not a range of values). Example: H0: m = 100; H1: m = 110 • The critical value is expressed in original terms (not in standard terms).

Problem 7a • What is the probability you’ll believe the mean return in problem 7 is 10.2% while actually it’s 10.6%, if the sample provided a mean return of 10.5%?

Problem 7a • Solution • The two hypotheses are: H0: m = 10.2 H1: m = 10.6 • H0 is not rejected (we believe m = 10.2) if the sample mean is less than a critical value. • Therefore, the probability required is:b = P(X < Xcr | m = 10.6).

Problem 7a • The critical value is (recall, this problem was a case of a right hand tail test, with 10% significance level): b = P(X<10.4 when m = 10.6) = P(Z < (10.4-10.6)/[.8/(26)1/2]) = P(Z < -1.27) = .102

Chapter 12 • Generally, the standard deviation is unknown the same way the mean may be unknown. • When the standard deviation is unknown, we need to change the test statistic from “Z” to “t”. • We shall test three population parameters: • Mean • Variance • Proportion

Testing the mean (unknown variance) • Replace the statistic Z with “t” The original distribution must be normal (or at least mound shaped).

Problem 8 • A federal agency inspects packages to determine if the contents is at least as great as that advertised. • A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (See Content). • From the sample results… • Can we conclude that the average weight does not meet the weight stated? (use a = .05). • Estimate the mean weight of all containers with 99% confidence • What assumption must be met?

Problem 8 • Solution • We hypothesize on the mean weight. • H0: m = 8.04 • H1: m < 8.04 • (i) n=5. For small samples let us solve manuallyAssume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94 • The rejection region: t < -ta, n-1 = -t.05,5-1 = -2.132The tsample = ? • Mean = (8.07+…+7.94)/5 = 7.996Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054 -2.132

Problem 8 • The t sample is calculated as follows: • Since -1.32 > -2.132 the sample statistic does not fall into the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level. -2.132 Rejection Region -1.32

Problem 8 • (ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain:Mean = 8.02; Std. Dev. = .04 • The confidence interval is calculated by= 8.02 2.678 = 8.02 .015 or LCL = 8.005, UCL = 8.35 1-a = .99 a = .01 a/2 = .005 t.005,50-1 = about 2.678 from the t - table

Problem 8 • Comments • Check whether it appears that the distribution is normal

Using Excel • To obtain an exact value for ‘t’ use the TINV function: The exact value: Degrees of freedom =TINV(0.01,49) 2.6799535 .01 is the two tail probability

Problem 8 • In our example recall: • H0: m = 8.4 • H1: m < 8.4 • The p-value = .000187 < .05 • There is sufficient evidence to reject the H0 in favor of H1. Note: t = (8.018-8.04)/[.0403/(50)1.2]=-3.82. < -t.05,49 = -1.676

Inference about the population Variance • The following statistic is c2 (Chi squared) distributed with n-1 degrees of freedom: • We use this relationship to test and estimate the variance.

Inference about the population Variance • The Hypotheses tested are: • The rejection region is:

Problem 9 • A random sample of 100 observations was taken from a normal population. The sample variance was 29.76. • Can we infer at 2.5% significance level that the population variance exceeds 30? • Estimate the population variance with 90% confidence.

(n – 1)s2 s02 (100 – 1)29.762 302 Problem 9 • Solution: • H0:s2 = 30 • H1:s2 < 30 c2 = = = 97.42 c2a,n-1 = c2.025,100-1 = about 129.561 • Since 97.42 < 129.42 we conclude that there is sufficient evidence at 2.5% significance level that the variance is smaller than 30. Rejection region: c2 < c2a, n-1 For the confidence interval look at page 370.

Using Excel • We can get an exact value of the probability P(c2d.f.> c2) = ? for a given c2and known d.f. This makes it possible to determine the p-value. • Use the CHIDIST function: For example: = .526 That is: P(c299> 97.42) = .526 • In our example we had a left hand tail rejection region. The p-value is calculated based on the c2 value (97.42): P(c299 < 97.42) = 1 - .526 =CHIDIST(c2,d.f.) = CHIDIST(97.42,99)

Using Excel • We can get the exact c2 value for which P(c2d.f.> c2) = a, for any given probability a and known d.f. • Use the CHIINV functionFor example: =CHIINV(.025,99) = 128.4219 That is: P(c299 > ?) = .025. c2 = 128.4219 =CHIINV(a,d.f.)

Inference about a population proportion • The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5. • For the confidence interval of p we have: where p = x/n • For the hypotheses test, we run a Z test. ^

Problem 10 • A consumer protection group run a survey of 400 dentists to check a claim that 4 out of 5 dentists recommend ingredients included in a certain toothpaste. • The survey results are as follows: 71 – No; 329 – Yes • At 5% significance level, can the consumer group infer that the claim is true?

Problem 10 • Solution • The two hypotheses are: • H0: p = .8 • H1: p > .8 Z.05 = 1.645 • Since 1.18 < 1.645 the consumer group cannot confirm the claim at 5% significance level. The rejection region: Z > Za

Statistical Inference and Estimation in Sampling Theory

Statistical Inference and Estimation in Sampling Theory

Presentation Transcript

Review, REVIEW!

Review Notes Lecture Review

REVIEW, REVIEW, REVIEW!!

ACT Review Paragraphs Review

Review

review

Geometry Review CRCT Review

Review Trust Review

Sea Ice

Sea Ice