4-1 Statistical Inference

4-1 Statistical Inference • The field of statistical inference consists of those • methods used to make decisions or draw • conclusions about a population. • These methods utilize the information contained in • a sample from the population in drawing • conclusions. • Example: Estimate the average height of the class.

4-1 Statistical Inference

4-2 Point Estimation

4-2 Point Estimation • An estimator should be close in some sense to the true value of the unknown parameter. • Θ is an unbiased estimator of  if E(Θ) =  • If the estimator Θ is not unbiased, then the difference E(Θ)  is called the bias of the estimator Θ

4-2 Point Estimation Example 4-1 • Suppose that X is a random variable with mean  and variance 2. • Let X1, X2,.., Xn be a random sample of size n from the population represented by X. • Show that the sample mean X and sample variance S2 are unbiased estimators of  and variance 2, respectively.

4-2 Point Estimation Example 4-1

4-2 Point Estimation Example 4-1 • Sample variance S2 are unbiased estimators of variance 2. • Sample standard deviation S are not unbiased estimators of population standard deviation .

4-2 Point Estimation Different Unbiased Estimators • Sometimes there are several unbiased estimators of the sample population parameter. • Example: a random sample of size n =10. • Sample mean • Sample median • The first observation X1 • All are unbiased estimator of population X. • Cannot rely on the property of unbiasedness alone to select the estimator.

4-2 Point Estimation Different Unbiased Estimators • Suppose that Θ1 and Θ2 are unbiased estimators of  • The variances of these two distribution of each estimator may be different. • Θ1 has a smaller variance than Θ2 does. • Θ1 is more likely to produce an estimate close to the true value .

4-2 Point Estimation How to Choose an Unbiased Estimator • A logical principle of estimation is to choose the estimator that has minimum variance.

4-2 Point Estimation How to Choose a Good Estimator • In practice, one must occasionally use a biased estimator. • For example, S for . • What is the criterion? • Mean square error

4-2 Point Estimation Mean Square Error • MSE(Θ) = E[ΘE(Θ)2] + [E(Θ)]2 = V(Θ)+(bias)2 • Unbiased estimator  bias = 0 • A good estimator is the one that minimizes V(Θ)+(bias)2

4-2 Point Estimation Mean Square Error • Given two estimator Θ1 and Θ2, • The relative efficiency of Θ1 and Θ2 is defined as r = MSE(Θ1)/MSE(Θ2) • r < 1  Θ1 is better than Θ2. • Example: • Θ1= X: the sample mean of sample size n. • Θ2= Xi: the i-th observation.

4-2 Point Estimation Mean Square Error • r = MSE(Θ1)/MSE(Θ2) = (2/n)/2 = 1/n • Sample mean is a better estimator than a single observation. • The square root of the variance of an estimator, V(Θ), is called the standard error of the estimator.

4-2 Point Estimation Methods of obtaining an estimator: 1. Method of Moments Estimator (MME) Example: If the random sample is really from the population with pdf , the sample mean should resemble the population mean and the MME of can be obtained by solving . Therefore, the MME of is .

MME – cont’d Example 2: The MME of and can be obtained by solving and . Therefore, the MMEs of and are and

The MMEs of the population parameters can be obtained by: • Express the k population moments as functions of parameters • Replace the notation for population parameters by those of the MMEs • Equate the sample moments to the population moments • Solve the k equations to obtain the MMEs of the parameters

4-2 Point Estimation Methods of obtaining an estimator: 2. Maximum Likelihood Estimator (MLE) The likelihood function of , given the data is the joint distribution of The idea of ML estimation is to find an estimator of which maximizes the likelihood of observing

4-2 Point Estimation 2. Maximum Likelihood Estimator (MLE) Example: Maximizing is equivalent to maximizing We obtain the MLE by solving the first derivative equation: , hence the MLE of is

It can be shown that the MLEs of the mean and variance of the normal distribution are:

4-3 Hypothesis Testing • The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief about a parameter. • Examples • Is there statistical evidence in a random sample of • potential customers, that support the hypothesis that • more than p% of the potential customers will purchase • a new products? • Is a new drug effective in curing a certain disease? A • sample of patient is randomly selected. Half of them • are given the drug where half are given a placebo. The • improvement in the patients conditions is then measured • and compared.

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses We like to think of statistical hypothesis testing as the data analysis stage of a comparative experiment, in which the engineer is interested, for example, in comparing the mean of a population to a specified value (e.g. mean pull strength).

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses • The critical concepts of hypothesis testing. • There are two hypotheses (about a population parameter) • The null hypothesis [ for example m = 5] • The alternative hypothesis [m > 5] • Assume the null hypothesis is true. • Build a statistic related to the parameter hypothesized. • Pose the question: How probable is it to obtain a statistic • value at least as extreme as the one observed from the sample

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses • The critical concepts of hypothesis testing-Continued • Make one of the following two decisions (based on the test): • Reject the null hypothesis in favor of the alternative • hypothesis. • Do not reject the null hypothesis in favor of the alternative • hypothesis. • Two types of errors are possible when making the decision • whether to reject H0 • Type I error - reject H0 when it is true. • Type II error - do not reject H0 when it is false.

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses • For example, suppose that we are interested in the burning rate of a solid propellant used to power aircrew escape systems. • Now burning rate is a random variable that can be • described by a probability distribution. • Suppose that our interest focuses on the mean burning • rate (a parameter of this distribution). • Specifically, we are interested in deciding whether or • not the mean burning rate is 50 centimeters per second.

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses Two-sided Alternative Hypothesis One-sided Alternative Hypotheses

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses • Test of a Hypothesis • A procedure leading to a decision about a particular • hypothesis • Hypothesis-testing procedures rely on using the information • in a random sample from the population of interest. • If this information is consistent with the hypothesis, then we will conclude that the hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude that the hypothesis is false.

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses Based on the hypotheses and the information in the sample, the sample space is divided into two parts: Reject Region and Accept Region.

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses Rejection region (RR) is the subset of sample space leads to the rejection of the null hypothesis. RR in Fig4-3. The complement of the rejection region is the acceptance region. AR .

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses • The Probabilities of committing errors are calculated to determine the performance of a test. Sometimes the type I error probability is called the significance level, or the -error, or the size of the test.

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses If the mean is 50, the probability of obtaining a sample mean less than 48.5 or greater than 51.5 is 0.0576.

Since the alternative hypothesis is composite, there are many distributions in the corresponding subset of parameter space. Therefore, the probability of committing type II error depends on the distribution chosen to calculate .

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses If the mean is actually 52, the probability of falsely accept is 0.2643

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses Similarly, the probability of falsely accept when is 0.8923, which is much higher than the previous case. It is harder to detect the difference if two distribution are close.

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses The probabilities of committing errors depends also on the sample size n, the amount of information. decreases as n increases.

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses • can be decreased by making the AR larger, at the price that will increase. • 2. The probability of type II error is a function of population mean • 3. The probabilities of committing both types of error can be reduced at the same time only by increasing the sample size.

4-3 Hypothesis Testing 4-3.2 Testing Statistical Hypotheses • The power is computed as 1 - b, and power can be interpreted as the probability of correctly rejecting a false null hypothesis. We often compare statistical tests by comparing their power properties. • For example, consider the propellant burning rate problem when • we are testing H0 : m = 50 centimeters per second against H1 : m not equal 50 centimeters per second . Suppose that the true value of the mean is m = 52. When n = 10, we found that b = 0.2643, so the power of this test is 1 - b = 1 - 0.2643 = 0.7357 when m = 52.

Example #4-19.(p157) = P(Z > 1.58) = 1  P(Z  1.58) = 1  0.94295 = 0.057 = P(Z 2.37) = 0.00889. c) 1 -  = 1 – 0.00889 = 0.99111 (The power of the test when mean=200)

Example #4-20 (p157). • Reject the null hypothesis and conclude that the mean foam height is greater than 175 mm. b) The probability that a value of at least 190 mm would be observed (if the true mean height is 175 mm) is only 0.0082. Thus, the sample value of = 190 mm would be an unusual result.

4-3 Hypothesis Testing 4-3.3 P-Values in Hypothesis Testing

Example #4-21 (p157). Using n = 16:

Example #4-22(a) (p157). n = 16: a) 0.0571 =

4-3 Hypothesis Testing 4-3.3 One-Sided and Two-Sided Hypotheses Two-Sided Test: One-Sided Tests:

4-3 Hypothesis Testing 4-3.5 General Procedure for Hypothesis Testing

4-4 Inference on the Mean of a Population, Variance Known Assumptions

4-4 Inference on the Mean of a Population, Variance Known 4-4.1 Hypothesis Testing on the Mean We wish to test: The test statistic is:

4-1 Statistical Inference