Understanding Probability and Randomness - A Comprehensive Overview

Probability and Sampling Distributions

Randomness There are 117,000,000 households in America (amazing what I learn when I read!). If a sample of 57,000 households is undertaken and household income is obtained from each, then we could calculate the sample mean income level. If I flip a coin and let it fall to the ground I can observe what is on the side of the coin that is up. After the 1 flip the proportion of heads will be either 1 (1/1 if it is heads) or 0. When we think about the sample of households or the flip of a coin, both are random in the sense that in the 1 sample or 1 flip we do not know what the outcome will be. But, we know the pattern of sample means or sample proportions if we repeat the process over and over. The authors say chance behavior is unpredictable in the short run but has a regular and predictable pattern in the long run. The sampling of a population or flipping a coin and observing the proportion of heads is a random process of phenomenon.

Randomness In the coin flip example 1 flip is the short run, many flips is the long run. We expect the proportion of heads after many flips to be 0.5 In the sample of households, 1 sample includes 57,000 households. Another sample of 57,000 could be obtained and the sample mean income could be calculated. The long run would be considered as having many samples of size 57,000 and the mean of the sample means is expected to be the population mean.

Randomness and Probability A phenomenon is random if individual outcomes are uncertain but there is nonetheless a regular distribution in a large number of repetitions. The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions. So, probability here is defined as a proportion. In a coin flipping context the proportion of heads in n flips is Number of heads divided by the n flips. Example. Say a coin is flipped 4040 times and comes up heads 2948. The proportion of heads is 2048/4040 = .5069.

Sample Space and Probability The sample space of a random phenomenon is the set of all possible outcomes. Let S be used to represent the sample space. An event is an outcome or a set of outcomes of a random phenomenon. So, an event is a subset of S. A probability model is a mathematical description of a random phenomenon. The model has 2 parts: a sample space and a way of assigning probabilities to the events.

Example I am going to describe a game based on rolling two dice. The game is this: if the sum of the 2 dice is 8 or more you win $1 and if it is 7 or less you win $2 The sample space for what you win is $1 or $2. How do we get the probabilities of each? Well, let’s consider rolling two dice. Below I show all possible outcome as 2 numbers like first die number, second die number. So, we could have 6, 6 5, 6 4, 6 3, 6 2, 6 1, 6 (note a sum of 11 can occur 6, 5 5, 5 4, 5 3, 5 2, 5 1, 5 2 ways) 6, 4 5, 4 4, 4 3, 4 2, 4 1, 4 6, 3 5, 3 4, 3 3, 3 2, 3 1, 3 6, 2 5, 2 4, 2 3, 2 2, 2 1, 2 6, 1 5, 1 4, 1 3, 1 2, 1 1, 1 Note that of these 36 different outcomes you see the sum is 8 or more 15 times. So if you rolled the dice many, many, many times you would expect the sum to be 8 or more in the proportion close to 15/36. So, winning $1 would have a probability 15/36 and winning $2 would have probability 12/36.

Probability Rules Let’s note that on any trial of a random phenomenon some type of outcome must occur. Let’s talk in general about event A and the sample space is S. 1) 0 ≤ P(A) ≤ 1. Since probability is a proportion the lowest it could be is 0 and the highest if could be is 1. Note if P(A) = 1 A is a certain event and if P(A) = 0 the event never occurs. 2) P(S) = 1. It is certain a member of the sample space will occur. 3) Let AC mean not event A, then P(AC) = 1 – P(A). This is often called the complement rule. This is like splitting the sample space into 2 parts and since the sum of probabilities must be 1, P(AC) + P(A) = 1 and thus the rule.

Probability Rules 4) Two events A and B are disjoint if they have no outcomes in common and so can never occur simultaneously. If A and B are disjoint, then P(A or B) = P(A) + P(B). This is a special case of the additional rule, the additional rule for disjoint events. Are the following disjoint events? When you roll two dice the sum is a 3 or an odd number? They are not disjoint! 3 is an odd number! When you roll two dice the sum is a 7 or 11? They are disjoint!

An event Each possible outcome of a random phenomenon is referred to as a simple event. So, in my tossing a coin a simple event is what happens with the coin – heads or tails is face up. Events in general may be more complex than simple events. An event is a collect or set of one or more simple events in a sample space. The probability of an event is the sum of the probabilities of the simple events that constitute the event.

Random Variables

-2 -1 0 1 2 Here I have reproduced a number line. I did this because we will use the number line. Remember a variable is a concept that can have a different value from subject to subject. An example might be daily Big Mac units sold during the lunch hour at the Wayne McDonalds. Each day (daily lunch hour units sold is the subject) could have a different value. Another example might be the temperature in Wayne during the lunch hour. On the number line we have the variable of interest. The amount of the number line we use depends on the thing we study.

Consider an experiment as a process that generates well defined outcomes. From our McDonalds example, each day is a new experiment where the daily units sold are well defined. But at the beginning of the day we do not know what the sales will be. Now I want to tell you about something called a random variable. A random variable is a numerical description of the outcome of an experiment. Each experimental outcome gets assigned a numerical value. In fact, most of the time the experimental outcome is a number so we just use that number as the number we assign. You could also say that a random variable is a function or rule that assigns a number to each outcome of an experiment. Random variables can be discrete or continuous. I was at a conference once and a guy told a joke about the difference between these two types of random variables (oh yes, the conference was really scintillating! – next screen for the joke)

The guy said he would give 50 bucks to the person or people (there were about 300 people in the room) who could guess the number between 1 and 10 that he had written on a card before the evening. He paused a moment and gave each of us a chance to write a number down. He then said his number was something like 6.732158497341 Get it? Arrrrrrrrrrrrrrrr Now here is the point. We all assumed he meant discrete values between 1 and 10 (discrete really means more than just integer values, but this is all we need at this time.) He meant continuous. Think back to the number line. Pick two points on the line - any two points you want. If all the number line between those two points can also be considered values for the random variable, then the variable is continuous. But if only some values between the points can be consider as possible values, then the variable is discrete. See next screen for examples.

Have you every gone into a gas station with a food shop? Sure you have! Now, have you every gone over to the cold pop section and then you looked and saw a pop bottle have less than a full bottle of fluid? I did once and a still bought it. It was the old glass coke bottle and it had no pop it, but was sealed as tight as a drum. Novelty, you know. I can’t find that bottle now. The ounces of fluid in a bottle is a continuous variable because, at least in theory with a really good measuring device, we could get really precise measures. All the number line from 0 to 20.2 ounces could happen. An example of a discrete variable could be the number of people who shop at the gas station that day. We would just have the values 1, 2, 3, but the fractions in between each number would not be part of the variable.

Probability Distribution A probability distribution for a random variable is very similar to a frequency distribution that we saw before. Essentially we have probabilities associated with each value of the random variable. When the variable is discrete the probability distribution is called a probability function and often denoted P(X). Let’s do an example. The random variable, X, is the number of cars you have called your own. The possible values (in this example) are 1, 2, 3, or 4. On the next screen we show these values and the associated probabilities found by observing what happened over 20 people observed. Note in general we talk about Xi as the ith possible value. Here the values go from 1 to 4.

Remember P(X) represents probabilities. Note each value of Xi has P(Xi)  0, and the sum of the probabilities equals 1. The second part is written ΣP(Xi) = 1. Our example has these properties. The graph assists in seeing which outcome has the highest probability. With the probability values we can answer questions about likelihood of events. An example would be what is the probability a randomly chosen person had 1 or 4 cars? Answer = .15 + .2 = .35 (this is an or statement, a union with no overlap – 1 or 4 cars are disjoint!). Xi P(Xi) 1 3/20 = .15 2 5/20 = .25 3 8/20 = .4 4 4/20 = .2 Probability .4 .3 .2 .1 1 2 3 4 Number of cars

Expected Value or Mean The expected value of a discrete random variable is a measure of central location and is called mu, μ. The expected value has the formula E(X) = μ = ΣXiP(Xi). XiP(Xi) is the product of each value of the variable and its probability, and this is added across the values of the variable. From our car example we have μ = 1(.15) + 2(.25) + 3(.4) + 4(.2) = 2.65. The values could be 1, 2, 3, or 4 and we see the average amount is 2.65. So note the expected value does not have to be one of the discrete values in the problem.

Variance The variance and associated standard deviation are used to measure the variability of the random variable. The formula for the variance is Var(x) = σ2 = Σ (Xi – E(X))2P(Xi). For our car example we have (1 - 2.65)2(.15) + (2 - 2.65)2(.25) + (3 - 2.65)2(.4) + (4 - 2.65)2(.2) = .41 + .11 + .05 + .36 = .93, and the standard deviation is the square root of .93, or .96

Expected value and variance Expected value The expected value is a number we look to as an indicator of the center of the data. I have the arrows point in both directions to remind you variance and standard deviation are measures of how spread out, or variable, the data are.

An example Distribution C Distribution D X P(X) X P(X) 0 .2 0 .1 1 .2 1 .2 2 .2 2 .4 3 .2 3 .2 4 .2 4 .1 Compute the expected value for each distribution Compute the standard deviation for each distribution c) Compare and contrast the results of distributions C and D.

a. For distribution C the expected value is 0(.2) + 1(.2) + 2(.2) + 3(.2) + 4(.2) = 0 + .2 + .4 + .6 + .8 = 2 For distribution D the expected value is 0(.1) + 1(.2) + 2(.4) + 3(.2) + 4(.1) = 0 + .2 + .8 + .6 + .4 = 2 b. For distribution C the standard deviation is found as the square root of the variance. The variance is [(0 – 2)^2].2 + [(1 – 2)^2].2 + [(2– 2)^2].2 + [(3 – 2)^2].2 + [(4 – 2)^2].2 = 4(.2) + 1(.2) + 0(.2) + 1(.2) + 4(.2) = .8 + .2 + 0 + .2 + .8 = 2 So, the standard deviation is square root of 2 (=1.414).

Problem continued For distribution D the standard deviation is found as the square root of the variance. The variance is [(0 – 2)^2].1 + [(1 – 2)^2].2 + [(2– 2)^2].4+ [(3 – 2)^2].2 + [(4 – 2)^2].1 = 4(.1) + 1(.2) + 0(.4) + 1(.2) + 4(.1) = .4 + .2 + 0 + .2 + .4 = 1.2 So, the standard deviation is square root of 1.2 (=1.095). c. The distributions have the same expected value. Distribution D has a smaller standard deviation and is thus less spread out. Note each distribution has the same possible values(the 0, 1, 2, 3, and 4)and values away from 2 occur less frequently for distribution D and thus it has smaller standard deviation.

The Sample Mean

68 - 95 - 99.7 rule Recall we learned a variable could have a normal distribution. This was useful because then we could say approximately 68% of the people in the data set have a value on the variable within 1 standard deviation of the mean. Approximately 95% have a value within 2 standard deviations of the mean, and approximately 99.7% have a value within 3 standard deviations of the mean. Let’s look at this idea again in the context of an example. Say we asked a whole bunch of people how many ounces of Mt. Dew they consume each year. Say the responses follow a normal distribution with mean = 5480 and standard deviation = 480.

The rule again 4040 4520 5000 5480 5960 6440 6920 -----68% --- Ounces of Mt. Dew -------------- 95%------------- per year -----------------------------99.7%-----------------------

The rule So, by the rule we know that about 68% of the people in the data set have between 5000 and 5960 ounces of Mt. Dew (ozs of MD)per year. Remember a Z score = (Value – mean)/standard deviation. For example, a person who has 6920 ozs of MD each year has a Z value = (6920 – 5480)/480 = 1440/480 = 3. Similar Z’s are shown for this example on the next slide. Check each calculation! (Don’t say, “yea right man.” Do it )

Z values put below actual values 4040 4520 5000 5480 5960 6440 6920 -3 -2 -1 0 1 2 3 Ounces of Mt. Dew per year

Questions What % of people in the data set had between 5480 and 6920 ozs of MD per year? The Z’s for these two values are 0 and 3, respectively. We saw before how to get the answer. Note Z is carried out to two decimal places. YOU SHOULD ALWAYS CARRY Z OUT TO TWO DECIMAL PLACES! The value we want is .4987

Questions What does .4987 mean? If you were to meet someone at random who was a part of this data set, you would say the probability is .4987 they consume between 5480 and 6920 ozs of MD per year. In other words, 49.87% of the people in the study had between 5480 and 6920 ozs of MD per year. What % of people had between 4040 and 5480 ozs of MD per year? The Z for 4040 is –3. The Z table is symmetric. So the amount between –3 and 0 is the same as between 0 and 3. So 49.87% of the people where between 4040 and 5480.

Questions What % of the people in the study has between 4040 and 6920 ozs of MD per year? The Z’s are –3 and 3. This means you would have people within 3 standard deviations of the mean. You take the amount between –3 and 0 and the amount between 0 and 3. That is 49.87 + 49.87 = 99.74 or close to 99.7%. What % of people had between 4520 and 6440 ozs of MD per year? The Z’s are –2 and 2. From 0 to 2 we have .4772 so the total is .4772 + .4772 = .9544 or 95.44% So within 2 standard deviations is a little more than 95%.

Special Z = 1.96 If you have a Z of 1.96 the value in the table is .9750. So if you are within 1.96 standard deviations of the mean you will have 95% of the people in the data set. Before we said within 2 standard deviations would give you 95% of the people. But to be more precise we only have to be within 1.96 standard deviations to have 95% of the people.

Review Remember back to the normal distribution. We saw we could make probability statements about a range of values for a variable that has a normal distribution as long as we could calculate the Z score or we had access to Excel. Remember to calculate the Z score we need to know the the mean and the standard deviation of the distribution for the variable. We will now find another use for this rule.

Sample Means - overview • We usually take one sample from a population. From the sample we might calculate a sample mean and use the mean to help us understand the population the sample was taken from. • In principle we could take many more samples and calculate means for each sample, but we don’t because it is too time consuming and often expensive.

Sample Means - overview • If we did take more samples we would see all the sample means are not the same and thus the sample statistic has a distribution. • The main point is whether we take more than one sample or not, the statistics we calculate have a distribution. This section helps us understand the sampling distribution of the sample mean.

Sample Means - overview • We will see the sampling distribution of the sample mean • 1) is normal • 2) has the same mean as the mean of the population from which the sample is drawn • 3) has a smaller standard deviation than the standard deviation from which the sample was drawn.

Sample Means - The central limit theorem • We really don’t have to worry about the central limit theorem other than the fact that experts(people who live more than 50 miles away) tell us that the sampling distribution of the sample mean is normal.

Sample Means - Mean of sampling distribution population variable mean of population Some samples will have more values from the low side of the mean than other samples. But, overall the mean of the sample means will be right at the population mean.

Sample Means - Standard deviation of sampling distribution • The standard deviation of the sampling distribution is smaller than the standard deviation of the population from which the sample was drawn. • The standard deviation of the sampling distribution equals the standard deviation of the population divided by the square root of the sample size. I need you to accept this fact.

Let’s say we are told in a population the mean on a variable is 200 and the standard deviation is 50. You would think we could stop and not do any work because we know about the population. But we continue on because we want to illustrate some details. We are told the sample size is 100. From the central limit theorem we know the sampling distribution of the sample mean will be 1. A normal distribution 2. With mean equal to the mean in the population, or 200, 3. With standard deviation = 50/sqrt100 = 50/10 = 5 (more on next screen)

In this example what is the probability the sample mean will be within + or – 5 of the population mean? To answer this we must use the sampling distribution of the sample mean. Another way to think of this is what is the probability the sample mean will be between 195 and 205? 195 has the Z = (195 – 200)/5 = -1.00. 205 has Z = 1.00. So the answer is .6828 What is the probability the sample mean will be between 190 and 210. The Z’s are –2 and 2. The answer is .9544

Understanding Probability and Randomness - A Comprehensive Overview

Understanding Probability and Randomness - A Comprehensive Overview

Presentation Transcript

Probability and distributions

Sampling and Sampling Distributions

Probability and Probability Distributions

Probability and Sampling Distributions

Chapter 4 Probability and Sampling Distributions

Probability and Distributions

Chapter Six Normal Curves and Sampling Probability Distributions

SAMPLING AND SAMPLING DISTRIBUTIONS

Chapter 4 Probability and Sampling Distributions

Chapter Six Normal Curves and Sampling Probability Distributions

Probability: Part 2 Sampling Distributions

Sampling and Sampling Distributions

Chapter Six Normal Curves and Sampling Probability Distributions

Chapter Six Normal Curves and Sampling Probability Distributions

Sampling and Sampling Distributions

Chapter Six Normal Curves and Sampling Probability Distributions

Sampling and Sampling Distributions

Chapter Six Normal Curves and Sampling Probability Distributions

Chapter Six Normal Curves and Sampling Probability Distributions

Sampling and Sampling Distributions

Sea Ice

Sea Ice