Statistics

Statistics Chapter 13: Categorical Data Analysis

Where We’ve Been • Presented methods for making inferences about the population proportion associated with a two-level qualitative variable (i.e., a binomial variable) • Presented methods for making inferences about the difference between two binomial proportions McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

Where We’re Going • Discuss qualitative (categorical) data with more than two outcomes • Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable – called a one-way analysis • Present a chi-square hypothesis test relating two qualitative variables – called a two-way analysis McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.1: Categorical Data and the Multinomial Experiment • Properties of the Multinomial Experiment • The experiment consists of n identical trials. • There are k possible outcomes (called classes, categories or cells) to each trial. • The probabilities of the k outcomes, denoted by p1, p2, …, pk, where p1+ p2+ … + pk = 1, remain the same from trial to trial. • The trials are independent. • The random variables of interest are the cell counts n1, n2, …, nk of the number of observations that fall into each of the k categories. Example 13.1 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table • Suppose three candidates are running for office, and 150 voters are asked their preferences. • Candidate 1 is the choice of 61 voters. • Candidate 2 is the choice of 53 voters. • Candidate 3 is the choice of 36 voters. • Do these data suggest the population may prefer one candidate over the others? McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table Candidate 1 is the choice of 61 voters. Candidate 2 is the choice of 53 voters. Candidate 3 is the choice of 36 voters. n =150 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table Reject the null hypothesis McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table Test of a Hypothesis about Multinomial Probabilities: One-Way Table H0: p1= p1,0, p2= p2,0, … , pk= pk,0 where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities Ha: At least one of the multinomial probabilities does not equal its hypothesized value where Ei = np1,0, is the expected cell count given the null hypothesis. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table Conditions Required for a Valid 2 Test: One-Way Table • A multinomial experiment has been conducted. • The sample size n will be large enough so that, for every cell, the expected cell count E(ni) will be equal to 5 or more. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table Example 13.2: Distribution of Opinions About Marijuana Possession Before Television Series has Aired Table 13.2: Distribution of Opinions About Marijuana Possession After Television Series has Aired McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired Reject the null hypothesis McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.2: Testing Categorical Probabilities: One-Way Table • Inferences can be made on any single proportion as well: • 95% confidence interval on the proportion of citizens in the viewing area with no opinion is McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.3: Testing Categorical Probabilities: Two-Way Table • Chi-square analysis can also be used to investigate studies based on qualitative factors. • Does having one characteristic make it more/less likely to exhibit another characteristic? McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.3: Testing Categorical Probabilities: Two-Way Table The columns are divided according to the subcategories for one qualitative variable and the rows for the other qualitative variable. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.3: Testing Categorical Probabilities: Two-Way Table McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.3: Testing Categorical Probabilities: Two-Way Table • The results of a survey regarding marital status and religious affiliation are reported below (Example 13.3 in the text). Religious Affiliation Marital Status H0: Marital status and religious affiliation are independent Ha: Marital status and religious affiliation are dependent McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.3: Testing Categorical Probabilities: Two-Way Table • The expected frequencies (see Figure 13.4) are included below: Religious Affiliation Marital Status The chi-square value computed with SAS is 7.1355, with p-value = .1289. Even at the  = .10 level, we cannot reject the null hypothesis. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.3: Testing Categorical Probabilities: Two-Way Table McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.4: A Word of Caution About Chi-Square Tests McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

13.4: A Word of Caution About Chi-Square Tests Be sure McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis

Chapter 13 Categorical Data Analysis

EXAMPLE 13.1IDENTIFYING A MULTINOMIAL EXPERIMENT Problem Consider the problem of determining the highest level of education attained by each of a sample of n = 40 National Hockey League (NHL) players. Suppose we categorize level of education into one of five categories—some high school, high school diploma, some college, college undergraduate degree, and graduate degree—and count the number of the 40 players that fall into each category. Is this a multinomial experiment, to a reasonable degree of approximation?

Solution Checking the five properties of a multinomial experiment shown in the box, we have the following: 1. The experiment consists of n = 40 identical trials, each of which is undertaken to determine the education level of an NHL player. 2. There are k = 5 possible outcomes to each trial, corresponding to the five education-level responses. 3. The probabilities of the k = 5 outcomes p1, p2, p3, p4, and p5, where pi represents the true probability that an NHL player attains level-of-education category i, remain the same from trial to trial (to a reasonable degree of approximation). 4. The trials are independent; that is, the education level attained by one NHL player does not affect the level attained by any other player. 5. We are interested in the count of the number of hockey players who fall into each of the five education-level categories. These five cell counts are denoted n1, n2, n3, n4, and n5. Thus, the properties of a multinomial experiment are satisfied.

EXAMPLE 13.2A ONE-WAY TEST—Effectiveness of a TV Program on Marijuana Problem Suppose an educational television station has broadcast a series of programs on the physiological and psychological effects of smoking marijuana. Now that the series is finished, the station wants to see whether the citizens within the viewing area have changed their minds about how the possession of marijuana should be considered legally. Before the series was shown, it was determined that 7% of the citizens favored legalization, 18% favored decriminalization, 65% favored the existing law (an offender could be fined or imprisoned), and 10% had no opinion. A summary of the opinions (after the series was shown) of a random sample of 500 people in the viewing area is given in Table 13.2. Test at the level to see whether these data indicate that the distribution of opinions differs significantly from the proportions that existed before the educational series was aired.

Solution Define the proportions after the airing to be Then the null hypothesis representing no change in the distribution of percentages is and the alternative is Thus, we have

Solution (續) Where Since all these values are larger than 5, the approximation is appropriate. Also, if the citizens in the sample were randomly selected, then the properties of the multinomial probability distribution are satisfied. Rejection region: For and df = k – 1 = 3, reject H0 if where (from Table VII in Appendix A) We now calculate the test statistic: Since this value exceeds the table value of (11.3449), the data provide sufficient evidence that the opinions on the legalization of marijuana have changed since the series was aired.

Solution (續) The test can also be conducted with the use of an available statistical software package. Figure 13.2 is an SPSS printout of the analysis of the data in Table 13.2.The test statistic and p-value of the test are highlighted on the printout. Since exceeds p = .004, there is sufficient evidence to reject H0.

EXAMPLE 13.3CONDUCTING A TWO-WAY ANALYSIS: Marital Status and Religion Problem A social scientist wants to determine whether the marital status (divorced or not divorced) of U.S. men is independent of their religious affiliation (or lack thereof). A sample of 500 U.S. men is surveyed, and the results are tabulated as shown in Table 13.8. a. Test to see whether there is sufficient evidence to indicate that the marital status of men who have been or are currently married is dependent on religious affiliation. Take b. Graph the data and describe the patterns revealed. Is the result of the test supported by the graph?

EXAMPLE 13.3CONDUCTING A TWO-WAY ANALYSIS: Marital Status and Religion Problem (續)

Solution a. The first step is to calculate estimated expected cell frequencies under the assumption that the classifications are independent. Rather than compute these values by hand, we resort to a computer. The SAS printout of the analysis of Table 13.8 is displayed in Figure 13.4, each cell of which contains the observed (top) and expected (bottom) frequency in that cell. Note that E11 the estimated expected count for the Divorced, A cell, is 48.952. Similarly, the estimated expected count for the Divorced, B cell, is E12 = 18.56 Since all the estimated expected cell frequencies are greater than 5, the approximation for the test statistic is appropriate. Assuming that the men chosen were randomly selected from all married or previously married American men, the characteristics of the multinomial probability distribution are satisfied.

Solution (續)

Solution (續) The null and alternative hypotheses we want to test are The test statistic, is highlighted at the bottom of the printout, as is the observed significance level (p-value) of the test. Since is less than p = .129, we fail to reject H0; that is, we cannot conclude that the marital status of U.S. men depends on their religious affiliation. (Note that we could not reject H0 even with )

Solution (續) b. The marital status frequencies can be expressed as percentages of the number of men in each religious affiliation category. The expected percentage of divorced men under the assumption of independence is (116/500)100% = 23%. An SAS graph of the percentages is shown in Figure 13.5. Note that the percentages of divorced men (see the bars in the “DIVORCED” block of the SAS graph) deviate only slightly from that expected under the assumption of independence, supporting the result of the test in part a. That is, neither the descriptive bar graph nor the statistical test provides evidence that the male divorce rate depends on (varies with) religious affiliation.

Solution (續)

Statistics

Statistics

Presentation Transcript

Statistics

Statistics

Statistics 1: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics - Descriptive statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics on Statistics.

Social Statistics: Inferential Statistics

Statistics 1: Elementary Statistics

Mathematics & Statistics Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics South Africa Official statistics; Statistics Act

Statistics

Statistics

Statistics

Presentation Transcript

Statistics

Statistics

Statistics 1: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics - Descriptive statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics on Statistics.

Social Statistics: Inferential Statistics

Statistics 1: Elementary Statistics

Mathematics &amp; Statistics Statistics

Statistics 300: Elementary Statistics

Statistics 300: Elementary Statistics

Statistics South Africa Official statistics; Statistics Act

Statistics

Mathematics & Statistics Statistics