1 / 111

Applied Quantitative Methods MBA course Montenegro

Applied Quantitative Methods MBA course Montenegro. Peter Balogh PhD baloghp @ agr.unideb.hu. Non-parametric tests. 13. Non-parametric tests.

sakura
Télécharger la présentation

Applied Quantitative Methods MBA course Montenegro

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AppliedQuantitative MethodsMBA course Montenegro Peter Balogh PhD baloghp@agr.unideb.hu

  2. Non-parametric tests

  3. 13. Non-parametric tests • When looking at hypothesis testing in the previous chapter we were concerned with specific statistics (parameters), which represent statements about the population, and these are then tested by using further statistics derived from the sample. • Such parametric tests are extremely important in the development of statistical theory and for testing of many sample results, but they do not cover all types of data, particularly when parameters cannot be calculated in a meaningful way. • We therefore need to develop other tests which will be able to deal with such situations; a small range of these non-parametric tests are covered in this chapter.

  4. 13. Non-parametric tests • Parametric tests require the following conditions to be satisfied: • A null hypothesis can be stated in terms of parameters. • A level of measurement has been achieved that gives validity to differences. • The test statistic follows a known distribution. • It is not always possible to define a meaningful parameter to represent the aspect of the population in which we are interested. • For instance, what is an average eye-colour? • Equally it is not always possible to give meaning to differences in values, for instance if brands of soft drink are ranked in terms of value for money or taste.

  5. 13. Non-parametric tests • Where these conditions cannot be met, non-parametric tests may be appropriate, but note that for some circumstances, there may be no suitabletest. • As with all tests of hypothesis, it must be remembered that even when a test result is significant in statistical terms, there may be situations where it has no importance in practice. • A non-parametric test is still a hypothesis test, but rather than considering just a single parameter of the sample data, it looks at the overall distribution and compares this to some known or expected value, usually based upon the null hypothesis.

  6. 13. Non-parametric tests

  7. 13.1 Chi-squaredtests • The format of this test is similar to the parametric tests already considered (see Chapter 12). • As before, we will define hypotheses, calculate a test statistic, and compare this to a value from tables in order to decide whether or not to rejectthe null hypothesis. • As the name may suggest, the statistic calculated involves squaring values, and thus the result can only be positive. • This is by far the most widely used non-parametric hypothesis test and is almost invariably used in the early stages of the analysis of questionnaire results. • As statistical programs become easier and easier to use, more people are able to conduct this test, which, in the past, took a long time to construct and calculate.

  8. 13.1 Chi-squaredtests The shape of the chi-squared distribution is determined by the number of degrees of freedom (cf. the t-distribution used in the previous chapter). In general, for relatively low degrees of freedom, the distribution is skewed to the left, as shown in Figure 13.2. As the number of degrees of freedom approaches infinity, then shape of the distribution approaches a Normal distribution.

  9. 13.1 Chi-squaredtests

  10. 13.1 Chi-squaredtests We shall look at two particular applications of the chi-squared χ2) test. The first considers survey data, usually from questionnaires, and tries to find if there is an association between the answers given to a pair of questions. Secondly, we will use a chi-squared test to check whether a particular set of data follows a known statistical distribution.

  11. 13.1.1 Tests of association Case study In the Arbour Housing Survey we might be interested in how the responses to question 4 on type of property and question 10 on use of the local post office relate. Looking directly at each question would give us the following:

  12. 13.1.1 Tests of association

  13. 13.1.1 Tests of association • In addition to looking at one variable at a time, we can construct tables to show how the answers to one question relate to the answers to another; these are commonly referred to as cross-tabulations. • The single tabulations tell us that 150 respondents (or 50%) live in a house and that 40 respondents (or 13.3%) use their local post office 'once a month'. • They do not tell us how often people who live in a house use the local post office and whether their pattern of usage is different from those that live in a flat. • To begin to answer these questions we need to complete the table (see Table 13.1).

  14. 13.1.1 Tests of association

  15. 13.1.1 Tests of association • It would be an extremely boring and time-consuming job to manually fill a table like Table 13.1, even with a small sample such as this. • In fact, in some cases we might want to cross-tabulate three or more questions. • However, most statistical packages will produce this type of table very quickly. • For relatively small sets of data you could use Excel, but for larger scale surveys it would be advantageous to use a more specialist package such as SPSS (the Statistical Package for the Social Sciences). • With this type of program it can take rather longer to prepare and enter the data, but the range of analysis and the flexibility offered make this well worthwhile.

  16. 13.1.1 Tests of association The cross-tabulation of questions 4 and 10 will produce the type of table shown as Table 13.2.

  17. 13.1.1 Tests of association We are now in a better position to relate the two answers, but, because different numbers of people live in each of the types of accommodation, it is notimmediately obvious if different behaviours are associated with their type of residence. Achi-squared test will allow us to find if there is a statistical association between the two sets of answers; and this, together with other information, mayallow the development of a proposition that there is a causal link between thetwo.

  18. 13.1.1 Tests of association • To carry out the test we will follow the seven steps used in Chapter 12. • Step 1 State the hypotheses. • H0: There is no association between the two sets of answers. • H1: There is an association between the two sets of answers. • Step 2 State the significance level. • Aswith a parametric test, the significance level can be set at various values, but for most business data it is usually 5%. • Step 3 State the critical value. • The chi-squared distribution varies in shape with the number of degrees of freedom (in a similar way tothe t-distribution), and thus we need to find this value before we can look up the appropriate critical value.

  19. 13.1.1 Tests of associationDegreesoffreedom Consider Table 13.1. There are four rows and four columns, giving a total of 16 cells. Each of the row and column totals is fixed (i.e. these are the actual num­bers given by the frequency count for each question), and thus the individual cell values must add up to the appropriate totals. In the first row, we have freedom to put any numbers into three of the cells, but the fourth is then fixed because all four must add to the (fixed) total (i.e. three degrees of freedom).

  20. 13.1.1 Tests of associationDegreesoffreedom • The same will apply to the second row (i.e. three more degrees of freedom). • And again tothe third row (three more degrees of freedom). • Now all of the values on the fourth row are fixed because of the totals (zero degrees of freedom). • Totaling these, we have 3 + 3 + 3 + 0 = 9 degrees of freedom for this table. • This is illustrated in Table 13.3. • As you can see, you can choose any three cells on the first row, not necessarily the first three.

  21. 13.1.1 Tests of associationDegreesoffreedom

  22. 13.1.1 Tests of associationDegreesoffreedom There is a short cut! If you take the number of rows minus one and multiply by the number of columns minus one you get the number of degrees of freedom ν= (r- 1) x (c- 1) Using the tables in Appendix E, we can now state the critical value as 16.9.

  23. 13.1.1 Tests of associationDegreesoffreedom • Step 4 Calculate the test statistic. • The chi-squared statistic is given by the following formula: • where O is the observed cell frequencies (the actual answers) and Eis the expected cell frequencies (if the null hypothesis is true). • Finding the expected cell frequencies takes us back to some simple probability rules, since the null hypothesis makes the assumption that the two sets of answers are independent of eachother.Ifthis is true, then the cell frequencies will depend only on the totals of each row and column.

  24. 13.1.1 Tests of associationCalculatingtheexpectedvalues Consider the first cell of the table (i.e. the first row and the first column). The number of people living in houses is 150 out of a total of 300, and thus the probability of someone living in a house is: The probability of 'once a month' is:

  25. 13.1.1 Tests of associationCalculatingtheexpectedvalues Thus the probability of living in a house and 'once a month' is: 0.5 x 0.13333 = 0.066665 Since there are 300 people in the sample, one would expect there to be 0.066665 x 300 = 19.9095 people who fit the category of the first cell. (Note that the observed value was 30.)

  26. 13.1.1 Tests of associationCalculatingtheexpectedvalues Again there is a short cut! Look at the way in which we found the expected value.

  27. 13.1.1 Tests of associationCalculatingtheexpectedvalues We need to complete this process for the other cells in the table, but remember that, because of the degrees of freedom, you only need to calculate nine of them, the rest being found by subtraction. Statistical packages will, of course, find these expected cell frequencies very quickly. The expected cell frequencies are shown in Table 13.4.

  28. 13.1.1 Tests of associationCalculatingtheexpectedvalues

  29. 13.1.1 Tests of associationCalculatingtheexpectedvalues If we were to continue with the chi-squared test by hand calculation we would need to produce the type of table shown as Table 13.5. Step 5 Compare the calculated value and the critical value. The calculated χ2value of 140.875 > 16.9.

  30. 13.1.1 Tests of associationCalculatingtheexpectedvalues • Step 6Come to a conclusion. • We already know that chi-squared cannot be below zero. • If all of the expected cell frequencies were exactly equal to the observed cell frequencies, then the value of chi-squared would be zero. • Any differences between the observed and expected cell frequencies may be due to either sampling error or to an association between the answers; the larger the differences, the more likely it is that there is an association. • Thus, if the calculated value is below the critical value, we will be unable to reject the null hypothesis, but if it is above the critical value, we reject the null hypothesis. • In this example, the calculated value is above the critical value, and thus we reject the null hypothesis.

  31. 13.1.1 Tests of associationCalculatingtheexpectedvalues • Step 7 Put the conclusion into English. • There appears to be an association between the type of property people are living in and the frequency of using the local post office. • We need now to examine whether such an association is meaningful within the problem context andthe extent to which the association can be explained by other factors. • The chi-squared test is only telling you that the association (a word we use in this context in preference to relationship) is likely to exist but not what it is.

  32. 13.1.1 Tests of associationCalculatingtheexpectedvalues

  33. 13.1.1 Tests of associationAn adjustment • In fact, although the basic methodology of the test is correct, there is a problem. • One of the basic conditions for the chi-squared test is that all of the expected frequencies must be above five. • This is not true for our example! • In order to make this condition true, we need to combine adjacent categories until their expected frequencies are equal to five or more. • To do this, we will combine the two categories 'Bedsit' and 'Other' to represent all non-house or flat dwellers; it will also be necessary to combine 'Twice a week' with 'More often' to represent anything above once a week.

  34. 13.1.1 Tests of associationAn adjustment • The new three-by-three cross-tabulation is shown as Table 13.6. • Thenumber of degrees of freedom now becomes (3 - 1) x (3 - 1) = 4, and the critical value (from tables) is 9.49. • Re-computing the value of chi-squared from Step 4, we have a value of approximately 107.6, which is still substantially above the critical value of chi-squared. • However, in other examples, the amalgamation of categories may affect the decision. • In practice, one of the problems of meeting this condition is deciding which categories to combine, and deciding what, if anything, the new category represents. • (One of the most difficult examples is the need to combine ethnic groups in a sample.)

  35. 13.1.1 Tests of associationAn adjustment This has been a particularly long example since we have been explaining each step as we have gone along. Performing the tests is much quicker in practice, even if a computer package is not used.

  36. 13.1.1 Tests of associationAn adjustment Table 13.6

  37. 13.1.1 Tests of associationAn adjustment Example Purchases of different strengths of lager are thought to be associated with the gender of the drinker and a brewery has commissioned a survey to find if this is true. Summary results are shown below.

  38. 13.1.1 Tests of associationAn adjustment 1. H0: No association between gender and strength bought. H1: An association between the two. 2. Significance level is 5%. 3. Degrees of freedom = (2 -1) x (3 -1) = 2. Critical value = 5.99. 4. Find totals:

  39. 13.1.1 Tests of associationAn adjustment

  40. 13.1.1 Tests of associationAn adjustment Expected frequency for Male and High Strength is

  41. 13.1.1 Tests of associationAn adjustment

  42. 13.1.1 Tests of associationAn adjustment Chi-squared = 3.956 5. 3.956 < 5.99. 6. Therefore we cannot reject the null hypothesis. 7. There appears to be no association between the gender of the drinker and the strength of lager purchased at the 5% level of significance.

  43. 13.1.2 Tests of goodness-of-fit If the data has been collected and seems to follow some pattern, it would beuseful to identify that pattern and to determine whether it follows some (already)known statistical distribution. If this is the case, then many more conclusions can be drawn about the data. (We have seen a selection of statistical distributions in Chapters 9 and 10.)

  44. 13.1.2 Tests of goodness-of-fit • The chi-squared test provides a suitable method for deciding if the data follows a particular distribution, since we have the observed values and the expected values can be calculated from tables (or by simple arithmetic). • For example, do the sales of whisky follow a Poisson distribution? • If the answer is 'yes', then sales forecasting might become a much easier process. • Again, we will work our way through examples to clarify the various steps taken in conducting goodness-of-fit tests. • The statistic used will remain as:

  45. 13.1.2 Tests of goodness-of-fit • where 0 is the observed frequencies and e is the expected (theoretical) frequencies.

  46. 13.1.2 Tests of goodness-of-fitTest for a uniform distribution You may recall the uniform distribution from Chapter 9; it implies that each item or value occurs the same number of times. Such a test would be useful where we want to find if several individuals are all working at the same rate, or if salesof various 'reps' are the same. Suppose we are examining the number of tasks completed in a set time by five machine operators and have available the data shown in Table 13.7.

  47. 13.1.2 Tests of goodness-of-fitTest for a uniform distribution

  48. 13.1.2 Tests of goodness-of-fitTest for a uniform distribution • We can again follow the seven steps: 1. State the hypotheses: • H0: All operators complete the same number of tasks. • H1: All operators do not complete the same number of tasks. • (Note that the null hypothesis is just another way of saying that the datafollows a uniform distribution.) 2. The significance level will be taken as 5%. 3. The degrees of freedom will be the number of cells minus the number of parameters required to calculate the expected frequencies minus one. • Here ν = 5 — 0—1=4. • Therefore (from tables) the critical value is 9.49.

  49. 13.1.2 Tests of goodness-of-fitTest for a uniform distribution 4. Since the null hypothesis proposes a uniform distribution, we would expect all of the operators to complete the same number of tasks in the allotted time. This number is:

  50. 13.1.2 Tests of goodness-of-fitTest for a uniform distribution

More Related