1 / 136

Biostatistics

Biostatistics. Unit 6 – Confidence Intervals. Statistical inference.

albertreid
Télécharger la présentation

Biostatistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics Unit 6 – Confidence Intervals

  2. Statistical inference Statistical inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample drawn from that population.  Estimation involves the use of the data in the sample to calculate the corresponding parameter in the population from which the sample was drawn.

  3. Point estimate A point estimate is a single numerical value used to estimate the corresponding population parameter.

  4. Interval estimate An interval estimate consists of two numerical values that, with a specified degree of confidence, we feel includes the parameter being estimated.

  5. Estimator An estimator is a rule or formula that tells how to compute the estimate. Estimators are unbiased if they predict well the value in the population.

  6. Table of unbiased estimators

  7. Sampled and target populations The sampled population is the population from which we actually draw the sample.  The target population is the population about which we wish to make an inference. (continued)

  8. Sampled and target populations These two populations may or may not be the same.  When they are the same, it is possible to use statistical inference procedures to make conclusions about the target population.  If the sample and target populations are different, conclusions can be made about the target population only on the basis of nonstatistical considerations.

  9. Random and nonrandom samples The strict validity of statistical procedures depends on the assumption of random samples.

  10. Confidence intervals to be studied A) Confidence Interval for a Population meanB) Confidence Interval for the Difference of Two Population MeansC) Confidence Interval for a Population ProportionD) Confidence Interval for the Difference of Two Population ProportionsE) Confidence Interval for the Variance of a Normally Distributed PopulationF) Confidence Interval for the Ratio of Variances of Two Normally Distributed Populations

  11. A) Confidence interval for a population meanEstimating the mean Estimating the mean of a normally distributed population entails drawing a sample of size n and computing  which is used as a point estimate of m.It is more meaningful to estimate m by an interval that communicates information regarding the probable magnitude of m.

  12. Sampling distributions and estimation Interval estimates are based on sampling distributions.  When the sample mean is being used as an estimator of a population mean, and the population is normally distributed, the sample mean will be normally distributed with mean, , equal to the population mean, m, and variance of

  13. The 95% confidence interval Approximately 95% of the values of making up the distribution will lie within 2 standard deviations of the mean.  The interval is noted by the two points,  and , so that 95% of the values are in the interval, .

  14. The 95% confidence interval Since m and  are unknown, the location of the distribution is uncertain.  We can use  as a point estimate of m.  In constructing intervals of , about 95% of these intervals would contain m.

  15. Example Suppose a researcher, interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of x = 22. Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45.  We wish to estimate m.

  16. Solution An approximate confidence interval for is given by:

  17. Components of an interval estimate The interval estimate of m is centered on the point estimate of m.  Approximately 95% of the values of the standard normal curve lie within 2 standard deviations of the mean. The z score in this case is called the reliability coefficient.  The real value to use is 1.96.

  18. General expression for an interval estimate

  19. Table of confidence coefficients

  20. Interpretation of confidence intervals The interval estimate for m is expressed as:   If a = .05, we can say that, in repeated sampling, 95% of the intervals constructed this way will include m.  This is based on the probability of occurrence of different values of .(continued)

  21. Interpretation of confidence intervals The area of the curve of  that is outside the area of the interval is called a, and the area inside the interval is called 1- a.

  22. Probabilistic interpretation of the interval In repeated sampling from a normally distributed population with a known standard deviation, 100(1- a) percent of all intervals in the form  will, in the long run, include the population mean, m.   (continued)

  23. Probabilistic interpretation of the interval The quantity 1-a is called the confidence coefficient or confidence level and the interval,  , is called the confidence interval for m.

  24. Practical interpretation When sampling is from a normally distributed population with known standard deviation, we are 100(1- a) percent confident that the single computed interval, contains the population mean, m.

  25. Precision Precision indicates how much the values deviate from their mean.  Precision is found by multiplying the reliability factor by the standard error of the mean.  This is also called the margin of error.

  26. Exercise 6.2.2 We wish to estimate the mean serum indirect bilirubin level of 4-day-old infants.  The mean for a sample of 16 infants was found to be 5.98 mg/dl.  Assuming bilirubin levels in 4-day-old infants are approximately normally distributed with a standard deviation of 3.5 mg/dl find:    A) The 90% confidence interval for m    B) The 95% confidence interval for m    C) The 99% confidence interval for m

  27. Solution (1) Given         = 5.98s = 3.5       n = 16

  28. (2) Sketch

  29. Solution (3) Calculations A)  90% interval (z = 1.645)5.98 ± 1.645 (.875)            5.98-1.439375, 5.98+1.439375                       (4.5408, 7.4129)

  30. Solution B)  95% interval (z = 1.96)                     5.98 ± 1.96 (.875)          (4.265, 7.695)

  31. Solution C)  99% interval (z = 2.575)                     5.98 ± 2.575 (.875)                     (3.7261, 8.2339)

  32. Solution (4) Results A higher percent confidence level gives a wider band.  There is less chance of making an error but there is more uncertainty. Calculator answers are more accurate because the calculator uses exact values and derives its answers from calculus.

  33. The t distribution In most real life situations the variance of the population is unknown.  We know that the z score,  is normally distributed if the population is normally distributed and is approximately normally distributed when the population is large.  But, it cannot be used because s is unknown.

  34. Estimation of the standard deviation The sample standard deviation, can be used to replace s.  If n 30, then s is a good approximation of s.  An alternate procedure is used when the samples are small.  It is known as Student's t distribution.

  35. Student's t distribution Student's t distribution is used as an alternative for z with small samples.  It uses the following formula:

  36. Student's t distribution Student's t distribution was developed in 1908 by W. S. Gosset (1876-1937) who worked for the Guinness Brewery.

  37. Properties of the t distribution 1.  Mean = 02.  It is symmetrical about the mean.3.  Variance is greater than 1 but approaches 1 as the sample gets large.  For df > 2, the variance = df/(df-2) or (continued)

  38. Properties of the t distribution 4.  The range is to . 5.  t is really a family of distributions because the divisors are different.6.  Compared with the normal distribution, t is less peaked and has higher tails.7.  t distribution approaches the normal distribution as n-1 approaches infinity.

  39. Confidence interval for a mean using t General relationship The reliability coefficient is obtained from the t distribution.

  40. Confidence interval When sampling is from a normal distribution whose standard deviation, s, is unknown, the 100(1- a) percent confidence interval for the population mean, m, is given by:

  41. Deciding between z and t When constructing a confidence interval for a population mean, we must decide whether to use z or t.  Which one to use depends on the size of the sample, whether it is normally distributed or not, and whether or not the variance is known.  There are various flowcharts and decision keys that can be used to help decide.  Mine appears below.

  42. Key for deciding between z and t in confidence interval construction 1.    Population normally distributed................2        Not as above—normally distributed.........5 2.    Sample size is large (30 or higher)............3        Sample size is small (less than 30)............4 3.    Population variance is known.............use z        Population variance not known.... use t (or z) 4.    Population variance is known.............use z        Population variance is not known.......use t5.    Sample size is large..................................6        Sample size is small..................................7 6.    Population variance is known.............use z        Population variance not known        (central limit theorem applies)............use z 7.    Must use a non-parametric method

  43. Example In a study of preeclampsia, Kaminski and Rechberger found the mean systolic blood pressure of 10 healthy, nonpregnant women to be 119 with a standard deviation of 2.1. (continued)

  44. Example (Preeclampsia:  Development of hypertension, albuminuria, or edema between the 20th week of pregnancy and the first week postpartum. Eclampsia:  Coma and/or convulsive seizures in the same time period, without other etiology.)

  45. Example a.  What is the estimated standard error of the mean?b.  Construct the 99% confidence interval for the mean of the population from which the 10 subjects may be presumed to be a random sample.c.  What is the precision of the estimate?d.  What assumptions are necessary for the validity of the confidence interval you constructed?

  46. Solution (1) Given        n = 10         = 119        s = 2.1

  47. (2) Sketch of t distribution

  48. Reading the t table

More Related