Statistics Chapter 4: Discrete Random Variables
Where We’ve Been • Using probability to make inferences about populations • Measuring the reliability of the inferences McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
Where We’re Going • Develop the notion of a random variable • Numerical data and discrete random variables • Discrete random variables and their probabilities McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
Statistics inAction Probability in a reverse cocaine sting: Was cocaine really sold? McClave, Statistics, 11th ed. Chapter 1: Statistics, Data and Statistical Thinking 4
4.1: Two Types of Random Variables • Definition 4.1 A random variable is a variable hat assumes numerical values associated with the random outcome of an experiment, where one (and only one) numerical value is assigned to each sample point. McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.1VALUES OF A DISCRETE RANDOM VARIABLE—Wine Ratings Problem A panel of 10 experts for the Wine Spectator (a national publication) is asked to taste a new white wine and assign it a rating of 0, 1, 2, or 3.A score is then obtained by adding together the ratings of the 10 experts. How many values can this random variable assume?
Solution A sample point is a sequence of 10 numbers associated with the rating of each expert. For example, one sample point is The random variable assigns a score to each one of these sample points by adding the 10 numbers together. Thus, the smallest score is 0 (if all 10 ratings are 0), and the largest score is 30 (if all 10 ratings are 3). Since every integer between 0 and 30 is a possible score, the random variable denoted by the symbol x can assume 31 values. Note that the value of the random variable for the sample point shown here is X=8.
EXAMPLE 4.2VALUES OF A DISCRETE RANDOM VARIABLE—EPA Application Problem Suppose the Environmental Protection Agency (EPA) takes readings once a month on the amount of pesticide in the discharge water of a chemical company. If the amount of pesticide exceeds the maximum level set by the EPA, the company is forced to take corrective action and may be subject to penalty. Consider the random variable number x of months before the company’s discharge exceeds the EPA’s maximum level. What values can x assume?
Solution The company’s discharge of pesticide may exceed the maximum allowable level on the first month of testing, the second month of testing, etc. It is possible that the company’s dischargewill never exceed the maximum level. Thus, the set of possible values for the number of months until the level is first exceeded is the set of all positive integers 1,2,3,4,…… If we can list the values of a random variable x, even though the list is never ending, we call the list countable and the corresponding random variable discrete. Thus, the number of months until the company’s discharge first exceeds the limit is a discrete random variable.
EXAMPLE 4.3VALUES OF A CONTINUOUS RANDOM VARIABLE—Another EPAApplication Problem Refer to Example 4.2. A second random variable of interest is the amount x of pesticide (in milligrams per liter) found in the monthly sample of discharge waters from the same chemical company. What values can this random variable assume?
Solution Unlike the number of months before the company’s discharge exceeds the EPA’s maximum level, the set of all possible values for the amount of discharge cannot be listed (i.e., is not countable). The possible values for the amount x of pesticide would correspond to the points on the interval between 0 and the largest possible value the amount of the discharge could attain, the maximum number of milligrams that could occupy 1 liter of volume. (Practically, the interval would be much smaller, say, between 0 and 500 milligrams per liter.)
4.1: Two Types of Random Variables • Definition 4.2 A discreterandom variable can assume a countable number of values. • Number of steps to the top of the Eiffel Tower* • Definition 4.3 A continuousrandom variable can assume any value along a given interval of a number line. • The time a tourist stays at the top once s/he gets there *Believe it or not, the answer ranges from 1,652 to 1,789. See Great Buildings McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
Discreterandom variables Number of sales Number of calls Shares of stock People in line Mistakes per page Continuousrandom variables Length Depth Volume Time Weight 4.1: Two Types of Random Variables McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.4 FINDING A PROBABILITY DISTRIBUTION—Coin-Tossing Experiment Problem Recall the experiment of tossing two coins (p. 181), and let x be the number of heads observed. Find the probability associated with each value of the random variable x, assuming that the two coins are fair.
Solution The sample space and sample points for this experiment are reproduced in Figure 4.2. Note that the random variable x can assume values 0, 1, 2. Recall (from Chapter 3) that the probability associated with each of the four sample points is Then, identifying the probabilities of the sample points associated with each of these values of x, we have Thus, we now know the values the random variable can assume (0, 1, 2) and how the probability is distributed over those values This dual specification completely describes the random variable and is referred to as the probability distribution, denoted by the symbol The probability distribution for the coin-toss example is shown in tabular form in Table 4.1 and in graphic form in Figure 4.3.
Solution (續) Since the probability distribution for a discrete random variable is concentrated at specific points (values of x), the graph in Figure 4.3a represents the probabilities as the heights of vertical lines over the corresponding values of x. Although the representation of the probability distribution as a histogram, as in Figure 4.3b, is less precise (since the probability is spread over a unit interval), the histogram representation will prove useful when we approximate probabilities of certain discrete random variables in Section 4.4. FIGURE 4.3 Probability distribution for coin-toss experiment: graphical form
Definition 4.4 The probability distribution of adiscrete random variable is a graph, table or formula that specifies the probability associated with each possible outcome the random variable can assume. p(x)≥ 0 for all values of x p(x) = 1 4.2: Probability Distributions for Discrete Random Variables McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.5PROBABILITY DISTRIBUTION USING A FORMULA—Parasitic Fish Problem The distribution of parasites (tapeworms) found in Mediterranean brill fish was studied in the Journal of Fish Biology (Aug. 1990).The researchers showed that the distribution of x, the number of brill that must be sampled until a parasitic infection is found in the digestive tract, can be modeled with the formula etc. Find the probability that exactly three brill fish must be sampled before a tapeworm is found in the digestive tract.
Solution We want to find the probability that X=3. Using the formula, we have Thus, there is about a 10% chance that exactly three fish must be sampled before a tapeworm is found in the digestive tract.
4.2: Probability Distributions for Discrete Random Variables • Say a random variable x follows this pattern: p(x) = (.3)(.7)x-1 for x > 0. • This table gives the probabilities (rounded to two digits) for x between 1 and 10. McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
4.3: Expected Values of Discrete Random Variables • Definition 4.5 The mean, or expected value,of a discrete random variable is McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.6FINDING AN EXPECTED VALUE—An Insurance Application Problem Suppose you work for an insurance company and you sell a $10,000 one-year term insurance policy at an annual premium of $290.Actuarial tables show that the probability of death during the next year for a person of your customer’s age, sex, health, etc., is .001.What is the expected gain (amount of money made by the company) for a policy of this type?
Solution The experiment is to observe whether the customer survives the upcoming year. The probabilities associated with the two sample points, Live and Die, are .999 and .001, respectively. The random variable you are interested in is the gain x, which can assume the values shown in the following table: If the customer lives, the company gains the $290 premium as profit. If the customer dies, the gain is negative because the company must pay $10,000, for a net “gain” of The expected gain is therefore In other words, if the company were to sell a very large number of $10,000 one-year policies to customers possessing the characteristics described, it would (on the average) net $280 per sale in the next year.
4.3: Expected Values of Discrete Random Variables • Definition 4.6 The variance of a discrete random variablex is • Definition 4.7 The standard deviation of a discrete random variable x is McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
4.3: Expected Values of Discrete Random Variables McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.7FINDING AND —Skin Cancer Treatment Problem Medical research has shown that a certain type of chemotherapy is successful 70% of the time when used to treat skin cancer. Suppose five skin cancer patients are treated with this type of chemotherapy, and let x equal the number of successful cures out of the five. The probability distribution for the number x of successful cures out of five is given in the following table: a. Find Interpret the result. b. Find Interpret the result. c. Graph P(X). Locate and the interval on the graph. Use either Chebyshev’s rule or the empirical rule to approximate the probability that x falls into this interval. Compare your result with the actual probability. d. Would you expect to observe fewer than two successful cures out of five?
Solution a. Applying the formula for we obtain On average, the number of successful cures out of five skin cancer patients treated with chemotherapy will equal 3.5. Remember that this expected value has meaning only when the experiment—treating five skin cancer patients with chemotherapy—is repeated a large number of times. b. Now we calculate the variance of x: Thus, the standard deviation is This value measures the spread of the probability distribution of x, the number of successful cures out of five. A more useful interpretation is obtained by answering parts c and d.
Solution (續) c.The graph of P(X) is shown in Figure 4.6, with the mean and the interval also indicated. Note particup larly that locates the center of the probability distribution. Since this distribution is a theoretical relative frequency distribution that is moderately mound shaped (see Figure 4.6), we expect (from Chebyshev’s rule) at least 75% and, more likely (from the empirical rule), approximately 95%, of observed x values to fall between 1.46 and 5.54. You can see from the figure that the actual probability that x falls in the interval includes the sum of P(X) for the values X=2,X=3,X=4 and X=5. This probability is Therefore, 96.9% of the probability distribution lies within two standard deviations of the mean. This percentage is consistent with both Chebyshev’s rule and the Empirical. rule.
Solution (續) d. Fewer than two successful cures out of five implies that X=0 or X=1 Both of these values of x lie outside the interval and the empirical rule tells us that such a result is unlikely (approximate probability of .05). The exact probability, is P(0)+P(1)=.002+.029=.031. Consequently, in a single experiment in which five skin cancer patients are treated with chemotherapy, we would not expect to observe fewer than two successful cures.
4.3: Expected Values of Discrete Random Variables • In a roulette wheel in a U.S. casino, a $1 bet on “even” wins $1 if the ball falls on an even number (same for “odd,” or “red,” or “black”). • The odds of winning this bet are 47.37% On average, bettors lose about a nickel for each dollar they put down on a bet like this. (These are the best bets for patrons.) McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
4.4: The Binomial Distribution • A Binomial Random Variable • n identical trials • Two outcomes: Success or Failure • P(S) = p; P(F) = q = 1 – p • Trials are independent • x is the number of Successes in n trials McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
A Binomial Random Variable n identical trials Two outcomes: Success or Failure P(S) = p; P(F) = q = 1 – p Trials are independent x is the number of S’s in n trials Flip a coin 3 times Outcomes are Heads or Tails P(H) = .5; P(F) = 1-.5 = .5 A head on flip i doesn’t change P(H) of flip i + 1 4.4: The Binomial Distribution McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.8 ASSESSING WHETHER X IS A BINOMIAL Problem For the following examples, decide whether x is a binomial random variable: a. A university scholarship committee must select two students to receive a scholarship for the next academic year. The committee receives 10 applications for the scholarships— 6 from male students and 4 from female students. Suppose the applicants are all equally qualified, so that the selections are randomly made. Let x be the number of female students who receive a scholarship. b. Before marketing a new product on a large scale, many companies conduct a consumer-preference survey to determine whether the product is likely to be successful. Suppose a company develops a new diet soda and then conducts a taste-preference survey in which 100 randomly chosen consumers state their preferences from among the new soda and the two leading sellers. Let x be the number of the 100 who choose the new brand over the two others.
EXAMPLE 4.8ASSESSING WHETHER X IS A BINOMIAL Problem (續) c. Some surveys are conducted by using a method of sampling other than simple random sampling (defined in Chapter 3). For example, suppose a television cable company plans to conduct a survey to determine the fraction of households in a certain city that would use the cable television service. The sampling method is to choose a city block at random and then survey every household on that block. This sampling technique is called cluster sampling. Suppose 10 blocks are so sampled, producing a total of 124 household responses. Let x be the number of the 124 households that would use the television cable service.
Solution a. In checking the binomial characteristics, a problem arises with independence (characteristic 4 in the preceding box). On the one hand, given that the first student selected is female, the probability that the second chosen is female is 3/9. On the other hand, given that the first selection is a male student, the probability that the second is female is 4/9. Thus, the conditional probability of a Success (choosing a female student to receive a scholarship) on the second trial (selection) depends on the outcome of the first trial, and the trials are therefore dependent. Since the trials are not independent, this variable is not a binomial random variable. (This variable is actually a hypergeometric random variable, the topic of Optional Section 4.6.)
Solution (續) b. Surveys that produce dichotomous responses and use random-sampling techniques are classic examples of binomial experiments. In this example, each randomly selected consumer either states a preference for the new diet soda or does not. The sample of 100 consumers is a very small proportion of the totality of potential consumers, so the response of one would be, for all practical purposes, independent of another.* Thus, x is a binomial random variable. c. This example is a survey with dichotomous responses (Yes or No to the cable service), but the sampling method is not simple random sampling. Again, the binomial characteristic of independent trials would probably not be satisfied. The responses of households within a particular block would be dependent, since households within a block tend to be similar with respect to income, level of education, and general interests. Thus, the binomial model would not be satisfactory for x if the cluster sampling technique were employed.
EXAMPLE 4.9DERIVING THE BINOMIAL PROBABILITY DISTRIBUTION—Passing a Physical Fitness Exam Problem The Heart Association claims that only 10% of U.S. adults over 30 years of age meet the President’s Physical Fitness Commission’s minimum requirements. Suppose four adults are randomly selected and each is given the fitness test. a. Use the steps given in Chapter 3 (box on p. 120) to find the probability that none of the four adults passes the test. b. Find the probability that three of the four adults pass the test. c. Let x represent the number of the four adults who pass the fitness test. Explain why x is a binomial random variable. d. Use the answers to parts a and b to derive a formula for P(X) the probability distribution of the binomial random variable x.
Solution a. 1. The first step is to define the experiment. Here we are interested in observing the fitness test results of each of the four adults: pass (S) or fail (F). 2. Next, we list the sample points associated with the experiment. Each sample point consists of the test results of the four adults. For example, SSSS represents the sample point denoting that all four adults pass, while FSSS represents the sample point denoting that adult 1 fails, while adults 2, 3, and 4 pass the test. The 16 sample points are listed in Table 4.2.
Solution (續) 3. We now assign probabilities to the sample points. Note that each sample point can be viewed as the intersection of four adults’ test results, and assuming that the results are independent, the probability of each sample point can be obtained by the multiplicative rule as follows: All other sample-point probabilities are calculated by similar reasoning. For example, You can check that this reasoning results in sample-point probabilities that add to 1 over the 16 points in the sample space.
Solution (續) 4. Finally, we add the appropriate sample-point probabilities to obtain the desired event probability. The event of interest is that all four adults fail the fitness test. In Table 4.2, we find only one sample point, FFFF, contained in this event. All other sample points imply that at least one adult passes. Thus, b. The event that three of the four adults pass the fitness test consists of the four sample points in the second column of Table 4.2: FSSS, SFSS, SSFS, and SSSF. To obtain the event probability, we add the sample-point probabilities: Note that each of the four sample-point probabilities is the same, because each sample point consists of three S’s and one F; the order does not affect the probability because the adults’ test results are (assumed) independent.
Solution (續) c. We can characterize this experiment as consisting of four identical trials: the four test results. There are two possible outcomes to each trial, S or F, and the probability of passing, P=.1, is the same for each trial. Finally, we are assuming that each adult’s test result is independent of all others, so that the four trials are independent. Then it follows that x, the number of the four adults who pass the fitness test, is a binomial random variable. d. The event probabilities in parts a and b provide insight into the formula for the probability distribution P(X). First, consider the event that three adults pass (part b). We found that In general, we can use combinatorial mathematics to count the number of sample points. For example,
Solution (續) The component counts the number of sample points with x successes, and the component is the probability associated with each sample point having x successes.
Solution (續) For the general binomial experiment, with n trials and probability p of Success on each trial, the probability of x successes is
4.4: The Binomial Distribution McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
4.4: The Binomial Distribution • The Binomial Probability Distribution • p = P(S) on a single trial • q = 1 – p • n = number of trials • x = number of successes McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
4.4: The Binomial Distribution • The Binomial Probability Distribution The probability of getting the required number of successes The probability of getting the required number of failures The number of ways of getting the desired results McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.10APPLYING THE BINOMIAL DISTRIBUTION—Physical Fitness Problem Problem Refer to Example 4.9. Use the formula for a binomial random variable to find the probability distribution of x, where x is the number of adults who pass the fitness test. Graph the distribution.
Solution For this application, we have n=4 trials. Since a success S is defined as an adult who passes the test, p=P(S)=.1 and q=1-p=.9. Substituting and n=4, p=.1 and q=.9 into the formula for p(x) we obtain
4.4: The Binomial Distribution • Say 40% of the class is female. • What is the probability that 6 of the first 10 students walking in will be female? McClave, Statistics, 11th ed. Chapter 4: Discrete Random Variables
EXAMPLE 4.11FINDING AND —Physical Fitness Problem Problem Refer to Examples 4.9 and 4.10. Calculate and the mean and standard deviation, respectively, of the number of the four adults who pass the test. Interpret the results.