Analyze Phase Inferential Statistics

Analyze PhaseInferential Statistics

Inferential Statistics Welcome to Analyze “X” Sifting Inferential Statistics Inferential Statistics Nature of Sampling Intro to Hypothesis Testing Central Limit Theorem Hypothesis Testing ND P1 Hypothesis Testing ND P2 Hypothesis Testing NND P1 Hypothesis Testing NND P2 Wrap Up & Action Items

in·fer·ence (n.) The act or process of deriving logical conclusions from premises known or assumed to be true. The act of reasoning from factual knowledge or evidence. 1 1. Dictionary.com Inferential Statistics – To draw inferences about the process or population being studied by modeling patterns of data in a way that accounts for randomness and uncertainty in the observations. 2 2. Wikipedia.com Nature of Inference Putting the pieces of the puzzle together….

5 Step Approach to Inferential Statistics 1. What do you want to know? 2. What tool will give you that information? 3. What kind of data does that tool require? 4. How will you collect the data? 5. How confident are you with your data summaries? So many questions….?

1. Error in sampling Error due to differences among samples drawn at random from the population (luck of the draw). This is the only source of error that statistics can accommodate. 2. Bias in sampling Error due to lack of independence among random samples or due to systematic sampling procedures (height of horse jockeys only). 3. Error in measurement Error in the measurement of the samples (MSA/GR&R). 4. Lack of measurement validity Error in the measurement does not actually measure what it is intended to measure (placing a probe in the wrong slot measuring temperature with a thermometer that is just next to a furnace). Types of Error

X X X X X Population, Sample, Observation • Population • EVERY data point that has ever been or ever will be generated from a given characteristic. • Sample • A portion (or subset) of the population, either at one time or over time. • Observation • An individual measurement. X

Significance Significance is all about differences… • Practical difference and significance is: • The amount of difference, change or improvement that will be of practical, economic or technical value to you. • The amount of improvement required to pay for the cost of making the improvement. • Statistical difference and significance is: • The magnitude of difference or change required to distinguish between a true difference, change or improvement and one that could have occurred by chance. Twins: Sure there are differences… but do they matter?

The Mission Variation Reduction Mean Shift Both

Imagine you have some population. The individual values of this population form some distribution. Take a sample of some of the individual values to calculate the sample Mean. Keep taking samples and calculating sample Means. Plot a new distribution of these sample Means. The Central Limit Theorem says as the sample size becomes large this new distribution (the sample Mean distribution) will form a Normal Distribution no matter what the shape of the population distribution of individuals. A Distribution of Sample Means

Sampling Distributions—The Foundation of Statistics Population • Samples from the population, each with five observations: • In this example we have taken three samples out of the population each with five observations in it. We computed a Mean for each sample. Note the Means are not the same! • Why not? • What would happen if we kept taking more samples? 3 5 2 12 10 1 6 12 5 6 12 14 3 6 11 9 10 10 12 Sample 1 Sample 2 Sample 3 1 92 1283 956 71411 81010 7.4 9.2 6.4

Constructing Sampling Distributions Open Minitab Worksheet “Die Example”. Roll ‘em!

Sampling Distributions Calc> Random Data> Sample from Columns…

Calculate the Mean and Standard Deviation for each column and compare the sample statistics to the population. Descriptive Statistics: Population, Sample1, Sample2, Sample3, Sample4, Sample5 Sampling Error Stat > Basic Statistics > Display Descriptive Statistics… Range in Mean 1.2 (4.600 – 3.400) Range in StDev 0.591 (2.074 – 1.483)

Sampling Error Create 5 more columns of data sampling 10 observations from the population. Calc> Random Data> Sample from Columns…

Sampling Error - Reduced Calculate the Mean and Standard Deviation for each column and compare the sample statistics to the population. Stat > Basic Statistics > Display Descriptive Statistics… Range in StDev 0.668 (2.066 – 1.398) Range in Mean 0.9 (4.100 – 3.200) With 10 observations the differences between samples are now much smaller.

Sampling Error - Reduced Calc> Random Data> Sample from Columns… Stat> Basic Statistics> Display Descriptive Statistics… Variable N Mean StDev Sample 11 30 3.733 1.818 Sample 12 30 3.800 1.562 Sample 13 30 3.400 1.868 Sample 14 30 3.667 1.768 Sample 15 30 3.167 1.487 Range in StDev 0.381 Range in Mean 0. 63

Sampling Distributions In theory if we kept taking samples of size n = 5 and n = 10 and calculated the sample Means we could see how the sample Means are distributed. Simulate this in MINITABTM by creating ten columns of 1000 rolls of a die. Calc> Random Data> Integer… Feeling lucky…?

Sampling Distributions • For each row calculate the Mean of five columns. Calc> Row Statistics… Repeat this command to calculate the Mean of C1-C10 and store result in Mean10.

Sampling Distributions • Create a Histogram of C1, Mean5 and Mean10. Graph> Histogram> Simple….. Multiple Graph…On separate graphs…Same X, including same bins Select “Same X, including same bins” to facilitate comparison

Individuals Different Distributions Sample Means • What is different about the three distributions? • What happens as the number of die throws increase?

Observations • As the sample size (number of die rolls) increases from 1 to 5 to 10, there are three points to note: • The Center remains the same. • The variation decreases. • The shape of the distribution changes - it tends to become Normal. The Standard Deviation of the sample Mean distribution, also known as the Standard Error. The Mean of the sample Mean distribution: Better news: I can reduce my uncertainty about the population Mean by increasing my sample size n. Good news: the Mean of the sample Mean distribution is the Mean of the population.

Central Limit Theorem If all possible random samples, each of size n, are taken from any population with a Mean μ and Standard Deviation σ the distribution of sample Means will: • have a Mean • have a Std Dev • and be Normally Distributed when the parent population is Normally Distributed or will be approximately Normal for samples of size 30 or more when the parent population is not Normally Distributed. • This improves with samples of larger size. Bigger is Better!

So how does this theorem help me understand the risk I am taking when I use sample data instead of population data? So What? Recall that 95% of Normally Distributed data is within ± 2 Standard Deviations from the Mean. Therefore the probability is 95% my sample Mean is within 2 standard errors of the true population Mean.

A Practical Example • Let’s say your project is to reduce the setup time for a large casting: • Based on a sample of 20 setups you learn your baseline average is 45 minutes with a Standard Deviation of 10 minutes. • Because this is just a sample the 45 minute average is an estimate of the true average. • Using the Central Limit Theorem there is 95% probability the true average is somewhere between 40.5 and 49.5 minutes. • Therefore do not get too excited if you made a process change resulting in a reduction of only 2 minutes.

Sample Size and the Mean Theoretical distribution of sample Means for n = 2 Distribution of individuals in the population Theoretical distribution of sample Means for n = 10

Standard Error of the Mean • The Standard Deviation for the distribution of Means is called the standard error of the Mean and is defined as:

StandardError 5 0 1 0 2 0 3 0 Sample Size Standard Error The rate of change in the Standard Error approaches zero at about 30 samples. This is why 30 samples is often recommended when generating summary statistics such as the Mean and Standard Deviation. This is also the point at which the t and Z distributions become nearly equivalent.

At this point you should be able to: Explain the term “Inferential Statistics” Explain the Central Limit Theorem Describe what impact sample size has on your estimates of population parameters Explain Standard Error Summary

Analyze Phase Inferential Statistics

Analyze Phase Inferential Statistics

Presentation Transcript

Inferential Statistics

Inferential Statistics

Inferential statistics

Inferential statistics:

Inferential Statistics

Inferential statistics

Inferential Statistics

INFERENTIAL STATISTICS

Inferential Statistics

Inferential Statistics

Inferential statistics

Inferential statistics

Inferential Statistics

Inferential statistics

Inferential Statistics

Inferential Statistics:

Inferential statistics

Inferential Statistics

INFERENTIAL STATISTICS

Inferential statistics

Inferential Statistics