1 / 117

Nonparametric Statistical Methods

Nonparametric Statistical Methods. Definition. When the data is generated from process (model) that is known except for finite number of unknown parameters the model is called a parametric model . Otherwise, the model is called a non-parametric model.

shammond
Télécharger la présentation

Nonparametric Statistical Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nonparametric Statistical Methods

  2. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters the model is called a parametric model. Otherwise, the model is called a non-parametric model Statistical techniques that assume a non-parametric model are called non-parametric.

  3. For example If you assume that your data has come from a normal distribution with mean mand standard deviation s(both unknown) then the data is generated from process (model) that is known except for two of parameters.(mand s) The model is called a parametric model. Models that do not assume normality (or some other distribution with a finite no. of paramters) are non-parametric

  4. We will consider two nonparametric tests • The sign test • Wilcoxon’s signed rank test These are tests for the central location of a population. They are alternatives to the z-test and the t-test for the mean of a normal population

  5. Nonparametric Statistical Methods

  6. Single samplenonparametric tests for central location • The sign test • Wilcoxon’s signed rank test These are tests for the central location of a population. They are alternatives to the z-test and the t-test for the mean of a normal population

  7. Both the z-test and the t-test assumes the data is coming from a normal population If the data is not coming from a normal population, properties of the z-test and the t-test that require this assumption will no longer be true. The probability of a type I error may be different than the desired value (0.05 or 0.01)

  8. Single sample non parametric tests If the data is not coming from a normal population we should then use one of the two nonparametric tests • The sign test • Wilcoxon’s signed tank test These tests do not assume the data is coming from a normal population

  9. The sign test A nonparametric test for the central location of a distribution

  10. We want to test: H0: median = m0 against HA: median ≠m0 (or against a one-sided alternative)

  11. The Sign test: • The test statistic: S = the number of observations that exceed m0 Comment: If H0: median =m0 is true we would expect 50% of the observations to be above m0, and 50% of the observations to be below m0,

  12. If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size. 50% 50% median = m0

  13. If H 0 is not true then S will still have a binomial distribution. However p will not be equal to 0.50. m0 > median p< 0.50 p median m0

  14. m0 < median p> 0.50 p median m0 p= the probability that an observation is greater than m0.

  15. Summarizing: If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size. n = 10

  16. The critical and acceptance region: n = 10 Choose the critical region so that a is close to 0.05 or 0.01. e. g. If critical region is {0,1,9,10} then a= .0010 + .0098 + .0098 +.0010 = .0216

  17. e. g. If critical region is {0,1,2,8,9,10} then a= .0010 + .0098 +.0439+.0439+ .0098 +.0010 = .1094 n = 10

  18. Example Suppose that we are interested in determining if a new drug is effective in reducing cholesterol. Hence we administer the drug to n = 10 patients with high cholesterol and measure the reduction.

  19. The data

  20. Suppose we want to test H0: the drug is not effective median reduction ≤ 0 against HA: the drug is effective median reduction > 0 The Sign test S = the no. of positive obs

  21. The Sign test The test statistic S = the no. of positive obs = 8 We will use the p-value approach p-value = P[S ≥ 8] = 0.0439 + 0.0098 + 0.0010 = 0.0547 Since p-value > 0.05 we cannot reject H0

  22. Summarizing: To carry out Sign Test We • Compute S = The # of observations greater than m0 • Let sobserved = the observed value of S. • Compute the p-value = P[S ≥sobserved] (2 P[S ≥sobserved] for a two-tailed test). Use the table for the binomial dist’n (p = ½ , n = sample size) • Conclude HA(Reject H0) if p-value is less than 0.05 (or 0.01).

  23. Sign Test for Large Samples

  24. If n is large we can use the Normal approximation to the Binomial. Namely S has a Binomial distribution with p = ½ and n = sample size. Hence for large n, S has approximately a Normal distribution with mean and standard deviation

  25. Hence for large n,use as the test statistic (in place of S) Choose the critical region for z from the Standard Normal distribution. i.e. Reject H0 if z < -za/2 or z > za/2 two tailed ( a one tailed test can also be set up.

  26. Nonparametric Confidence Intervals

  27. Assume that the data, x1, x2, x3, … xn is a sample from an unknown distribution. Now arrange the data x1, x2, x3, … xn in increasing order x(1) < x(2) < x(3) < … < x(n) Hence x(1) = the smallest observation x(2) = the 2nd smallest observation x(n) = the largest observation

  28. Consider the kth smallest observation and the kth largest observation in the data x1, x2, x3, … xn x(k) and x(n – k + 1) P[x(k) < median < x(n – k + 1) ] Hence = P[at least k observations lie below the median and at least k observations lie above the median ] If at least k observations lie below the median than x(k) < median If at least k observations lie above the median than median < x(n – k + 1)

  29. Thus P[x(k) < median < x(n – k + 1) ] = P[at least k observations lie below the median and at least k observations lie above the median ] = P[The number of observations below the median is at least k and at most n-k] = P[k ≤S≤ n-k] where S = the number of observations below the median S has a binomial distribution with n = the sample size and p =1/2.

  30. Hence P[x(k) < median < x(n – k + 1) ] = P[k ≤S≤n-k] = p(k) + p(k + 1) + … + p(n-k) = P where p(i)’sare binomial probabilities with n = the sample size and p =1/2. This means that x(k) to x(n – k + 1) is a P100% confidence interval for the median

  31. Summarizing x(k) to x(n – k + 1) is a P100% confidence interval for the median where P = p(k) + p(k + 1) + … + p(n-k) and p(i)’sare binomial probabilities with n = the sample size and p =1/2.

  32. n = 10 and k =2 Example: Binomial probabilities P = p(2) + p(3) + p(4) + p(5) + p(6) + p(7) + p(8)= .9784 Hence x(2) to x(9) is a 97.84% confidence interval for the median

  33. Example Suppose that we are interested in determining if a new drug is effective in reducing cholesterol. Hence we administer the drug to n = 10 patients with high cholesterol and measure the reduction.

  34. The data

  35. The data arranged in order x(2) = -3 to x(9) =15 is a 97.84% confidence interval for the median

  36. Example In the previous example to repeat the study with n = 20 patients with high cholesterol.

  37. The data

  38. The binomial distribution with n = 20, p = 0.5 Note: p(6) + p(7) + p(8) + p(9) + p(10) + p(11) + p(12) + p(13) + p(14) = 0.037 + 0.0739 + 0.1201 + 0.1602 + 0.1762 + 0.1602 + 0.1201 + 0.0739 + 0.037 = 0.9586 Hence x(6) to x(15) is a 95.86% confidence interval for the median reduction in cholesterol

  39. The data arranged in order x(6) = -1 to x(15) = 9 is a 95.86% confidence interval for the median

  40. For large values of n one can use the normal approximation to the Binomial to find the value of k so that x(k) to x(n – k + 1) is a 95% confidence interval for the median. i.e. we want to find k so that

  41. The Wilcoxon signed rank test The Wilcoxon signed rank test is an alternative to the Sign test, a test for the central location of a single population Next we will consider:

  42. The sign test A nonparametric test for the central location of a distribution

  43. We want to test: H0: median = m0 against HA: median ≠m0 (or against a one-sided alternative)

  44. The Sign test: • The test statistic: S = the number of observations that exceed m0 • Comment: If H0: median =m0 is true then • The distribution of S is binomial • n = sample size, • p = 0.50

  45. To carry out the The Sign test: • Compute the test statistic: S = the number of observations that exceed m0 = sobserved Compute the p-value of test statistic, sobserved : p-value = P [S ≥ sobserved ] ( = 2 P [S ≥ sobserved ] for 2-tailed test) where S is binomial, n = sample size, p = 0.50 Reject H0 if p-value low (< 0.05)

  46. Non-parametric confidence intervals for the median of a population x(k) to x(n – k + 1) is a (1 – a)100% = P100% confidence interval for the median where x(k) = kthsmallest xiand x(n – k + 1) = kthlargest xi P= p(k) + p(k + 1) + … + p(n-k) and p(i)’sare binomial probabilities with n = the sample size and p =1/2.

  47. The Wilcoxon Signed Rank Test An Alternative to the sign test

  48. Situation • A sample of size n , (x1 , x2 , … , xn) from an unknown distribution and we want to test H0 :the centre of the distribution, m= m0 , against HA:m≠m0 ,

  49. For the sign test we would count S, the number of positive values of (x1 – m0 , x2 – m0 , … , xn – m0). • We would reject H0 if S was not close to n/2

More Related