Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly

Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly

Chapter 12 Learning Objectives (LOs) LO 12.1:Conduct a goodness-of-fit test for a multinomial experiment. LO 12.2:Determine whether two classifications of a population are independent. LO 12.3:Conduct a goodness-of-fit test for normality. LO 12.4:Perform the Jarque-Bera test for normality.

Is Brand Loyalty Related to Buyer’s Age? • The retail analyst for a marketing firm wants to know if different customer groups prefer one brand over another. She looks at data from 600 sales. • In particular, she feels that the brand Under Armour might appeal more to younger customers. • The more established brands (Nike and Adidas) might be capturing the older-customer market.

Is Brand Loyalty Related to Buyer’s Age? • Determine whether the two classifications (age and brand name) are dependent at the 5% significance level • Discuss how the findings from the test for independence can be used. 12-4

12.1 Goodness-of-Fit Test for a Multinomial Experiment LO 12.1 Conduct a goodness-of-fit test for a multinomial experiment. • This test determines whether two or more population proportions equal each other or any predetermined set of values. • For example, are four candidates in an election equally favored by voters? • Or, do people rate food quality in a restaurant comparably to last year?

A Multinomial Experiment LO 12.1 A multinomial experiment consists of a series of n independent trials such that: • On each there are k possible outcomes. • The probability pi of falling into category i is the same on each trial. • The k probabilities sum to 1: p1 + p2 + … + pk= 1

The Hypothesis Test LO 12.1 • The null hypothesis: the population proportions are equal to one another or they are each equal to a specific value. • Equal Population proportions: H0: p1 = p2 = p3 = p4 = 0.25 HA: Not all population proportions are equal to 0.25. • Unequal Population Proportions: H0: p1 =0.4, p2 = 0.3, p3 = 0.2, p4 = 0.1 HA: At least one pi differs from its hypothesized value.

Restaurant Food Quality LO 12.1 Last year the management at a restaurant surveyed its patrons to rate the quality of its food. The results were as follows: Based on this and other survey results, management made changes to the menu. 12-8

This Year’s Results LO 12.1 This year, the management surveyed 250 patrons, asking the same questions about food quality. Here are the results: We want to know if the results agree with those from last year, or if there has been a significant change. 12-9

Methodology LO 12.1 • Compute an expected frequency for each category and compare it to what we actually observe. • Compute the difference between what was observed and expected for each category. • If the results this year are consistent with last year, these differences will be relatively small.

The ei (Expected Frequencies) LO 12.1 • We first compute the expected counts based on the survey of 250 restaurant patrons. • If the survey is consistent with last year’s results, we expect e1 = p1(250) = .15(250) = 37.5 responses to be in the “Excellent” category. • There actually were o1 = 46, a bit more than expected.

Computing the Deviations LO 12.1 • In the first category e1 = 37.5 and o1 = 46, so we get (o1 – e1) = ___. • In the third category, which are “Fair” responses, e3 = p3(250) = .45(250) = 112.5. • There are 105 of these responses in the survey, so we compute (o3 – e3) = 105 – 112.5 = ___.

Standardizing the Deviations LO 12.1

The Chi-Square Test LO 12.1 df = k-1, where k is the number of categories oi = observed frequency for category i ei = expected frequency for category i

The Critical Value (at  = .05) LO 12.1

The Restaurant Example LO 12.1

The Restaurant Example LO 12.1 • Since the computed test statistic of 6.520 is less than the critical value of 7.815, we do not reject H0. • The changes did not produce a statistically significant response at the 5% level.

A Required Condition LO 12.1 • The test requires that the expected frequency ( ei ) in each cell is at least 5. • That was not a problem in the restaurant example. • One way to correct this potential problem is to combine categories to get ei ≥ 5.

Example 12.1 LO 12.1 • There are five companies that manufacture a particular product. Their market shares for 2010 are: • Current-year shares are not yet known, so a market analyst surveys 200 recent customers to gain an “advanced look.”

Example 12.1 (continued) LO 12.1 • The survey showed the following results: • A minor complication is that for two small companies, a 2% market share yields expected frequencies of 4 (200×0.02). • We will combine companies 4 and 5 in performing the analysis.

Example 12.1 (continued) LO 12.1

Example 12.1 Computations LO 12.1 • Because the computed test statistic exceeds 7.815, we reject H0. • We conclude that there have been shifts in the market.

12.2 Chi-Square Test for Independence LO 12.2 Determine whether two classifications of a population are independent. • The goodness-of-fit test examines a single qualitative variable. A test of independence – also called a chi-square test of a contingency table – analyzes the relationship between two qualitative variables. • The competing hypotheses can be expressed as: H0: The two classifications are independent HA: The two classifications are dependent

Contingency Tables LO 12.2 • A contingency table shows the frequencies for two qualitative variables (i.e., brand of product and type of customer). • Each variable has two or more categories. • The test for independence is based on the expected and observed frequencies for each cell in the table.

Example LO 12.2 Does the brand of compression garment purchased depend on the customer’s age?

Notation LO 12.2 • We use the notation oij to denote the observed frequency in row i of column j. • Similarly, eij is the expected frequency in row i of column j. • Under the independence assumption, the expected frequency per cell is: eij = (Row i total)(Column j total)/Sample Size

The Chi-Square Statistic LO 12.2 We apply the chi-square test statistic in a similar manner as in the goodness-of-fit test. The formula is as follows: where df = (rows - 1)(columns -1).

Computing Expected Frequencies LO 12.2 • For row 1 and column 1, the expected frequency, e11, is (396)(228)/600 = 150.48. • For row 1 and column 2, the expected frequency, e12, is (396)(204)/600 = _____. • For e13, we calculate (396)(___)/600 = _____.

Expected Frequencies and Deviations LO 12.2 The deviations ( oij – eij ) are:

Squared Deviations LO 12.2 • We square each deviation and divide by the respective expected frequency. These values are shown in the following table. • The standardized, squared deviations sum to 22.53, the value of the test statistic.

Summarizing the Example LO 12.2 Competing Hypotheses: H0: Age and brand name are independent. HA: Age and brand name are dependent. The test statistic is calculated using: where df = (r – 1)(c – 1) = (2 - 1)(3 - 1) = 2. The critical value is 5.991 at the 5% significance level.

Summarizing the Example LO 12.2 • We reject H0 because the value of the test statistic is larger than the critical value: 22.53 > 5.991. Therefore, age and brand name are not independent of one another. • Alternatively, by selecting Formulas > Insert Function > CHISQ.DIST.RT and inputting X=22.53 and Deg-freedom=2, Excel will compute the p-value for our test, which is very close to 0.

12.3 Chi-Square Test for Normality LO 12.3 Conduct a goodness-of-fit test for normality. • The goodness-of-fit test can also be used to determine if a population has a particular probability distribution. The expected frequencies are determined from this assumed distribution. • These expected frequencies are then compared to the observed frequencies to compute the familiar chi-square test statistic.

Testing for Normality LO 12.3 • The hypotheses for a test for normality: H0: The data follow a normal distribution with parameters µ and σ HA: The data do not follow this distribution • The values of µ and σ are typically the point estimates calculated from the sample data.

Example: A Sample of 50 Incomes LO 12.3 • Table 12.9 in the text shows 50 household incomes. The sample mean income is 63.80 (in $1000s) with standard deviation 45.78. • We next form k = 5 categories (up to 20, 20 to 40, etc.), and count how many households we observe with incomes in each category. • For the expected frequency, we calculate the probability of an income falling in each category, assuming income follows our hypothesized distribution.

Computing the Expected Counts LO 12.3 • There are 6 households in the first class of less than $20,000. • If µ = 63.8 and σ = 45.78, we compute: • In this interval we expect 0.1658×50 = 8.43 households.

Calculations for the Test LO 12.3 • df = (k – 1 – 2) because there are two parameters in the normal distribution. • With k = 5, df = 2 and the critical value is 5.991.

Concluding the Test LO 12.3 • Since the value of the test statistic, 8.12, exceeds our critical value of 5.991, we reject the null hypothesis. • We conclude that this data does not come from a normal distribution with mean 63.8 and standard deviation 45.78. • A criticism of this method is that we first have to convert raw data into a set of arbitrary classes. • The result might be different if we had grouped the data differently.

12.4 The Jarque-Bera Test for Normality LO 12.4 Perform the Jarque-Bera test for normality. • An alternative to the goodness-of-fit test for normality is one developed by Jarque and Bera. • A normal distribution is not skewed and its peak is in a specific ratio to its spread. • The Jarque-Bera test uses these facts to derive a test statistic.

Skewness and Kurtosis LO 12.4 • Skewness is a measure of a distribution’s lack of symmetry; we have S = 0 for any normal distribution. • Kurtosis is a measure of peakedness; the value is K = 0 for a normal distribution. • We can obtain the values of S and K from Excel and use them to compute the appropriate test statistic.

Hypotheses and Test Statistic LO 12.4

Example 12.3 LO 12.4

Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly