1 / 92

Probability and Statistics in the Life Sciences Winter 2011 AMS 110.01 Lecture Note 2

brant
Télécharger la présentation

Probability and Statistics in the Life Sciences Winter 2011 AMS 110.01 Lecture Note 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Probability and Statistics in the Life Sciences (Winter 2011) AMS 110.01 Lecture Note 2 Donghyung Lee

    2. Chapter 5 Sampling Distributions Read Section 5.1, 5.2, 5.3, 5.4, 5.5, 5.6

    3. Section 5.3 Quantitative Observations Statistic : A numerical value calculated using information gathered from a sample. Ex) Average of weights in our class A statistic is any quantity whose value can be calculated from sample data. Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable and will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.

    4. Section 5.3 Quantitative Observations Example 1) The sample mean, regarded as a statistic (before a sample has been selected or an experiment carried out), is denoted by ; the calculated value of this statistic is . 2) S represents the sample standard deviation thought of as a statistic, and its computed value is s.

    5. Section 5.3 Quantitative Observations Sampling distribution : a sampling distribution is the probability distribution of a given statistic based on a random sample of certain size n. It may be considered as the distribution of the statistic for all possible samples of a given size. The sampling distribution depends on the underlying distribution of the population, the statistic being considered, and the sample size used. (from wikipedia.com)

    6. Section 5.3 Quantitative Observations Sampling distribution of the sample mean 1) a sampling distribution of the sample mean is the probability distribution of the sample mean based on a random sample of certain size n. 2) the probability distribution that describes sampling variability in the sample mean.

    7. Section 5.3 Quantitative Observations Sampling distribution of the sample mean 1. MEAN: The mean of the sampling distribution of the sample mean is equal to the population mean. 2. STANDARD DEVIATION: The standard deviation of the sampling distribution of the sample mean is equal to the population standard deviation divided by the square root of the sample size.

    8. Section 5.3 Quantitative Observations Sampling distribution of the sample mean 3. Shape (a) If the population distribution of X is normal, then the sampling distribution of is normal, regardless of the sample size n. (b) Central Limit Theorem : If n is large, then the sampling distribution of is approximately normal, even if the population distribution of X is not normal.

    9. Section 5.3 Quantitative Observations Proposition Let be a random sample from a normal distribution with mean and standard deviation . Then for any n, is normally distributed with mean and standard deviation . The Central Limit Theorem (CLT) Let be a random sample from a distribution with mean and standard deviation . Then if n is sufficiently large, has approximately a normal distribution with mean and standard deviation . The larger the value of n, the better the approximation.

    10. Section 5.3 Quantitative Observations Example 1 A large population of seeds of the princess bean Phaseotus vulgaris is to be sampled, The weights of the seeds in the population follow a normal distribution with mean , and standard deviation . Suppose now that a random sample of four seeds is to be weighed, and let represent the mean weight of the four seeds. 1. Find the distribution of . 2. Find .

    11. Section 5.3 Quantitative Observations Example 1 1. Find the distribution of . According to the Proposition, the sampling distribution of will be a normal distribution with mean and standard deviation as follows:

    12. Section 5.3 Quantitative Observations Example 1 2. Find .

    13. Section 5.3 Quantitative Observations Example 2 (Exercise 5.22 page 165) Professor Smith conducted a class exercise in which students ran a computer program to generate random samples from a population that had a mean of 50 and a standard deviation of 9 mm. Each of Smiths students took a random sample of size n and calculated the sample mean. Smith found that about 68% of the students had sample means between 48.5 and 51.5 mm. What was n? (Assume that n is large enough that Central Limit Theorem is applicable.)

    14. Section 5.3 Quantitative Observations Example 2 (Exercise 5.22 page 165)

    15. Section 5.3 Quantitative Observations Example 3 (Page 165, Exercises 5.18) The heights of a certain population of corn plants follow a normal distribution with mean 145cm and standard deviation 22 cm. 1) What percentage of the plants are between 135 and 155 cm tall? 2) Suppose we were to choose at random from the population a large number of samples of 16 plants each. In what percentage of the samples would the sample mean height be between 135 and 155cm? 3) If represents the mean height of a random sample of 16 plants from the population, what is 4) If represents the mean height of a random sample of 36 plants from the population, what is

    16. Section 5.3 Quantitative Observations Example 3 (Page 165, Exercises 5.18)

    17. Section 5.3 Quantitative Observations Example 4 Professor Mendell conducted a class exercise in which students ran a computer program to generate random samples from a population that had a mean of 60 and a standard deviation of 12 mm. Each of Mendells students took a random sample of size n and calculated the sample mean. Mendell found that about 98% of the students had sample means less than or equal to 62.5 mm. What was n? (Assume that n is large enough that Central Limit Theorem is applicable.)

    18. Section 5.3 Quantitative Observations Example 4

    19. Section 5.3 Quantitative Observations Example 5 The time taken by a randomly selected applicant for a mortgage to fill out a certain form has a normal distribution with mean value 10 min and standard deviation 2 min. If five individuals fill out a form on one day and six on another, what is the probability that the sample average amount of time taken on each day is at most 11 min?

    20. Section 5.3 Quantitative Observations Example 5

    21. Section 5.2 Dichotomous Observations Population proportion and sample proportion

    22. Section 5.2 Dichotomous Observations Example (Superior Vision p152)

    23. Section 5.2 Dichotomous Observations Example (Superior Vision p152)

    24. Section 5.2 Dichotomous Observations Example (Superior Vision p152)

    25. Section 5.2 Dichotomous Observations Example (Superior Vision p152)

    26. Section 5.2 Dichotomous Observations Dependence on Sample Size

    27. Section 4.5 The Continuity Correction The Continuity Correction

    28. Section 4.5 The Continuity Correction The Continuity Correction

    29. Section 4.5 The Continuity Correction The Continuity Correction

    30. Section 4.5 The Continuity Correction The Continuity Correction

    31. Section 4.5 The Continuity Correction The Continuity Correction

    32. Section 5.5 The Normal Approximation to the Binomial Distribution Normal approximation to binomial distribution (Section 5.5) (a) If n is large, then the binomial distribution can be approximated by a normal distribution with n: the sample size ( # of independent trials) p: the population proportion (the probability of success in each independent trial) (b) If n is large, then the sampling distribution of can be approximated by a normal distribution with

    33. Section 5.5 The Normal Approximation to the Binomial Distribution Example (Superior Vision p152) In a previous example when n=20 and p=.3, we found that Here we can apply the normal approximation to this probability.

    34. Section 5.5 The Normal Approximation to the Binomial Distribution Example (Superior Vision p152) In a previous example when n=20 and p=.3, we found that Here we can apply the normal approximation to this probability.

    35. Chapter 6 Confidence Intervals Read Section 6.1, 6.2, 6.3, 6.4, 6.5, 6.6

    36. Section 6.1 Statistical Estimation Statistical Estimation 1. determining an estimate of some feature of the population 2. assessing the precision of the estimate

    37. Section 6.1 Statistical Estimation Example (Soybean Growth page 179) As part of a study on plant growth, a plant physiologist grew 13 individually potted soybean seedlings of the type called Wells II. She raised the plants in a greenhouse under identical environmental conditions (light, temperature, soil, and so on). She measured the total stem length (cm) for each plant after 16 days of growth. The data are given as follows: Stem Length (cm) : 20.2, 22.9, 23.3, 20.0, 19.4, 22.0, 22.1, 22.0, 21.9, 21.5, 19.7, 21.5, 20.9

    38. Section 6.1 Statistical Estimation Example (Soybean Growth page 179) For the data, the mean is . and standard deviation is . Assume that the 13 observations is a random sample from a population. The population could be described by its mean, , and its standard deviation, . = the (population) mean stem length of Wells II soybean plants grown under the specified conditions. = the (population) SD of stem lengths of Wells II soybean plants grown under the specified conditions.

    39. Section 6.1 Statistical Estimation Example (Soybean Growth page 179) is an estimate of . is an estimate of . In general, is an estimate of . is an estimate of . CAN WE BELIEVE IN THESE ESTIMATES? We should assess the reliability or precision of these estimate. To quantify the confidence of these estimate we use confidence intervals.

    40. Section 6.2 Standard Error of the Mean The standard error of the mean : an estimate of the standard deviation of the sampling distribution of . The SE can be interpreted in terms of the expected sampling error. The SE is a measure of the reliability or precision of as an estimate of . What is the standard error of the mean?

    41. Section 6.2 Standard Error of the Mean SE (standard error) Versus SD (standard deviation) - SE : describes the uncertainty (due to sampling error) in the mean of the data. - SD : describes the dispersion of the data Ex) Lamb Birthweights (page 181) A geneticist weighed 28 female lambs at birth. The lambs were all born in April, were all the same breed (Rambouillet), and were all single births (no twins). The diet and other environment condition were the same for all parents. The birthweights are as follows: DATA: 4.3, 5.2, 6.2, 6.7, 5.3, 4.9, 4.7, 5.5, 5.3, 4.0, 4.9, 5.2, 4.9, 5.3 5.4, 5.5, 3.6, 5.8, 5.6, 5.0, 5.2, 5.8, 6.1, 4.9, 4.5, 4.8, 5.4, 4.7

    42. Section 6.2 Standard Error of the Mean SE (standard error) Versus SD (standard deviation) Ex) Lamb Birthweights (page 181) DATA: 4.3, 5.2, 6.2, 6.7, 5.3, 4.9, 4.7, 5.5, 5.3, 4.0, 4.9, 5.2, 4.9, 5.3 5.4, 5.5, 3.6, 5.8, 5.6, 5.0, 5.2, 5.8, 6.1, 4.9, 4.5, 4.8, 5.4, 4.7 The mean is 5.17kg, the standard deviation SD is .65kg, and the standard error SE is .12kg. SD : describes the variability from one lamb to the next. SE : describes the variability associated with the sample mean (5.17kg), viewed as an estimate of the population mean birthweight. Question) What if the sample size n increases? (the behavior of the sample mean and sample SD and SE)

    43. Section 6.2 Standard Error of the Mean Example (Exercise 6.2 page184) An agronomist measured the heights of n corn plants. The mean height was 220 cm and the standard deviation was 15 cm. Calculate the standard error or the mean if (a) n=25 (b) n=100

    44. Section 6.3 Confidence Interval For Derivation of Confidence Interval for Thus, the interval will contain for 95% of all samples.

    45. Section 6.3 Confidence Interval For Definition (when s is known) If after observing a sample data we compute the observed sample mean and then substitute in place of , the resulting fixed interval is called a 95% confidence interval for . This CI can be expressed either as is a 95% CI for . or as with 95% confidence

    46. Section 6.3 Confidence Interval For Example (when s is known) Industrial engineers who specialize in ergonomics are concerned with designing workspace and devices operated by workers so as to achieve high productivity and comfort. The article Studies on Ergonomically Designed Alphanumeric Keyboards (Human Factors, 1985: 175-187) reports on a study of preferred height for an experimental keyboard with large forearm-wrist support. A sample n=31 trained typists was selected, and the preferred keyboard height was determined for each typist. The resulting sample average preferred height was . Assuming that preferred height is normally distributed with (a value suggested by data in the article), obtain the 95% CI for true average preferred height .

    47. Section 6.3 Confidence Interval For Example (when s is known) We can be highly confident that 79.3 < <80.7. This interval is relatively narrow, indicating that has been rather precisely estimated.

    48. Section 6.3 Confidence Interval For Interpreting a Confidence Interval A correct interpretation of 95% confidence relies on the long-run frequency interpretation of probability: To say that an event A has probability .95 is to say that if the experiment on which A is defined is performed over and over again, in the long run A will occur 95% of the time. Suppose we obtain a sample from a population and compute a 95% interval. Then suppose we consider repeating this for a second sample, a third sample, a fourth sample, and so on. Let A be the event that Since P(A)=.95, in the long run 95% of our computed CIs will contain .

    49. Section 6.3 Confidence Interval For Definition 1 (when s is known) A 100(1-a)% confidence interval for the mean of a normal population when the value of s is known is given by or, equivalently, by

    50. Section 6.3 Confidence Interval For Example (when s is known) The production process for engine control housing units of a particular type has recently been modified. Prior to this modification, historical data had suggested that the distribution of hole diameters for bushing on the housings was normal with a standard deviation of .100mm. It is believed that the modification has not affected the shape of the distribution or the standard deviation, but that the value of the mean diameter may have changed. A sample of 40 housing units is selected and hole diameter is determined for each one, resulting in a sample mean diameter of 5.426 mm. Find a confidence interval for true average hole diameter using a confidence level of 90%.

    51. Section 6.3 Confidence Interval For Example With a reasonably high degree of confidence, we can say that 5.400< <5.452.

    52. Section 6.3 Confidence Interval For Definition 2 (when s is unknown and n=30) A 100(1-a)% confidence interval for the mean of a population when the value of s is unknown and sample size n=30 is given by or, equivalently, by

    53. Section 6.3 Confidence Interval For Example (when s is unknown and n=30) As part of a study of the treatment of anemia in cattle, researchers measured the concentration of selenium in the blood of 56 cows who had been given a dietary supplement of selenium (2 mg/day). The cows were all the same breed (Santa Gertrudis) and had borne their first calf during the year. The sample mean selenium concentration was 6.21 ug/dLi and the sample standard deviation was 1.84 ug/dLi. Construct a 95% confidence interval for the population mean.

    54. Section 6.3 Confidence Interval For Example (when s is unknown and n=30)

    55. Section 6.3 Confidence Interval For t distribution (1) When is known and the variable is normally distributed or when is unknown and n=30, the standard normal distribution is used to find confidence intervals for the mean. However, in many situations, the population standard deviation is not known and the sample is size is less than 30. In such situations, the standard deviation from the sample can be used in place of the population standard deviation for confidence intervals. But a somewhat different distribution, called the t distribution, must be used when the sample size is less than 30 and the variable is normally or approximately normally distributed.

    56. Section 6.3 Confidence Interval For t distribution (2) - Similar to the standard normal distribution in these ways. 1. It is bell-shaped. 2. It is symmetric about the mean, 3. The mean, median, and mode are equal to 0 and are located at the center of the distribution. 4. The curve never touches the x-axis. - Differs from the standard normal distribution in the following ways. 1. The variance is greater than 1. 2. The t distribution is actually a family of curves based on the concept of degree of freedom (df), which is related to sample size. (df = n-1) 3. As the sample size increases, the t distribution approaches the standard normal distribution.

    57. Section 6.3 Confidence Interval For t distribution (3)

    58. Section 6.3 Confidence Interval For t distribution (4): t distribution table (page 677 Table4)

    59. Section 6.3 Confidence Interval For t distribution (5) When sample size =10 Find

    60. Section 6.3 Confidence Interval For Definition 3 (when s is unknown and n < 30) A 100(1-a)% confidence interval for the mean of a normal population when the value of s is unknown and sample size n < 30 is given by or, equivalently, by The degrees of freedom are n-1.

    61. Section 6.3 Confidence Interval For Example (when s is unknown and n < 30) As part of a study of the development of the thymus gland, researchers weighed the glands of five chick embryos after 14 days of incubation. The thymus weight(mg) were as follows: 29.6 21.5 28.0 34.6 44.9 For these data, the mean is 31.72 and the standard deviation is 8.729 (We assume that the population distribution follows a normal.) Construct a 90% confidence interval for the population mean.

    62. Section 6.3 Confidence Interval For Example (when s is unknown and n < 30)

    63. Section 6.3 Confidence Interval For When to use the z or t distribution

    64. Section 6.3 Confidence Interval For Example (example 6.9 page 192) Lone bone mineral density often leads to hip fractures in the elderly. In an experiment to assess the effectiveness of hormone replacement therapy, researchers gave conjugated equine estrogen (CEE) to a sample of 94 women between the ages of 45 and 64. After taking the medication for 36 months, the bone mineral density was measured for each of the 94 women. The average density was .878 g/cm2, with a standard deviation of .126 g/cm2. Find a 95% confidence interval.

    65. Section 6.3 Confidence Interval For Example (example 6.9 page 192) : This is a solution from our text book. (The degree of freedom is 93 but we use 80 degrees of freedom since Table 4 doesnt list 93 degrees of freedom)

    66. Section 6.3 Confidence Interval For Example (example 6.9 page 192) : This is my solution. Thus, we are 95% confident that the average hip done mineral density of all women age 45 to 64 who take CEE for 36 months is between .852 g/cm2 and .904 g/cm2

    67. Section 6.3 Confidence Interval For Example A massive multistate outbreak of food-borne illness was attributed to Salmonella enteritidis. Epidemiologists determined that the source of the illness was ice cream. They sampled nine production runs from the company that had produced the ice cream to determine the level of Salmonella enteritidis in the ice cream. These levels (MPN/g) are as follows: .593 .142 .329 .691 .231 .793 .519 .392 .418 Find a 99% confidence interval for the average level of Salmonella enteritidis in the ice cream.

    68. Section 6.3 Confidence Interval For Example

    69. Section 6.4 Choosing the sample size for estimating Sample Size (1) How can we determine the number of observations to include in the sample? Data collection costs money. If the sample is too large, time and talent are wasted. Conversely, it is wasteful if the sample is too small. Hence, the number of observations to be included in the sample will be a compromise between the desired accuracy of the sample statistic as an estimate of the population parameter and the required time and cost to achieve this degree of accuracy. There are two considerations in determining the appropriate sample size for estimating using a confidence interval. First, the tolerable error establishes the desired width of the interval. The second consideration is the level of confidence. (From An Introduction to Statistical Methods and Data Analysis 5th editon)

    70. Section 6.4 Choosing the sample size for estimating Sample Size (2) Suppose we want to estimate using a 100(1-a)% confidence interval having tolerable error W. The margin of error E is defined to be half width of a 100(1-a)% CI. Note that determining a sample size to estimate requires knowledge of the population standard deviation .

    71. Section 6.4 Choosing the sample size for estimating Sample Size (Example 1) AMS 110 students wanted to estimate the mean height of Stony Brook students with a 95% confidence interval having a tolerable error of 4. The population standard deviation is known as 7.5 cm. How many students must be included in the sample to achieve their specifications?

    72. Section 6.4 Choosing the sample size for estimating Sample Size (Example 1) Thus, 55 students should be included in the sample to achieve their specifications.

    73. Section 6.4 Choosing the sample size for estimating Sample Size (Example 2) An insurance company is concerned about the number of worker compensation claims based on back injuries by baggers in grocery stores. They want to evaluate the fitness of baggers at the many grocery stores they insure. The workers selected for the study will be evaluated to determine the amount of weight that they can lift without undue back stress. From studies by other insurance companies, pounds. How many baggers must be included in the study to be 99% confident that the average weight lifted is estimated to within 8 pounds?

    74. Section 6.4 Choosing the sample size for estimating Sample Size (Example 2)

    75. Section 6.6 Confidence Interval for a Population Proportion Confidence interval for a population proportion Up to this point in Chapter 6, we have described confidence intervals when the observed variable is quantitative. Now we will turn our attention to situations in which the variable is categorical and the parameter of interest is a population proportion. Suppose a geneticist observes n guinea pigs whose coat color can be either black or white; let us fix attention on the category black. Let p denote the population proportion of the category, and let denote the corresponding sample proportion. , where y is the number of black out of n A natural estimate of the population proportion, p, is the sample proportion, .

    76. Section 6.6 Confidence Interval for a Population Proportion Normal approximation to binomial distribution (Section 5.5) (a) If n is large, then the binomial distribution can be approximated by a normal distribution with n: the sample size ( # of independent trials) p: the population proportion (the probability of success in each independent trial) (b) If n is large, then the sampling distribution of can be approximated by a normal distribution with

    77. Section 6.6 Confidence Interval for a Population Proportion Normal approximation to binomial distribution (Section 5.5) (a) If n is large, then the binomial distribution can be approximated by a normal distribution with n: the sample size ( # of independent trials) p: the population proportion (the probability of success in each independent trial) (b) If n is large, then the sampling distribution of can be approximated by a normal distribution with

    78. Section 6.6 Confidence Interval for a Population Proportion Wald confidence interval (1) If n is large,

    79. Section 6.6 Confidence Interval for a Population Proportion Wald confidence interval (2)

    80. Section 6.6 Confidence Interval for a Population Proportion Wald confidence interval (3) Most books present the Wald confidence interval, since it is much more simple in form. However, the Wald confidence interval has poor coverage properties: A nominal 95% Wald confidence interval might actually cover p only 80% of the time, rather than 95% of the time. In addition, the Wald confidence interval doesnt work properly when the sample proportion is exactly one or exactly zero or the sample size is small or the probability p is extreme.

    81. Section 6.6 Confidence Interval for a Population Proportion Wald confidence interval (4) (the sample proportion)

    82. Section 6.6 Confidence Interval for a Population Proportion Wilson confidence interval : This interval has good properties even for a small number of trials (small sample size) and/or an extreme probability. (the modified sample proportion)

    83. Section 6.6 Confidence Interval for a Population Proportion Example 1 (ex 6.16 page208) BRCA1 is a gene that has been linked to breast cancer. Researchers used DNA analysis to search for BRCA1 mutations in 169 women with family histories of breast cancer. Of the 169 women tested, 27(16%) had BRAC1 mutations. Let p denote the probability that a woman with a family history of breast cancer will have a BRAC1 mutation. 1) Find 95% and 99% Wald confidence interval for p. 2) Find 95% and 99% Wilson confidence interval for p.

    84. Section 6.6 Confidence Interval for a Population Proportion Example 1 (ex 6.16 page208) 1) 95% and 99% Wald confidence interval

    85. Section 6.6 Confidence Interval for a Population Proportion Example 1 (ex 6.16 page208) 2) 95% and 99% Wilson confidence interval

    86. Section 6.6 Confidence Interval for a Population Proportion Example 1 (ex 6.16 page208) 2) 95% and 99% Wilson confidence interval

    87. Section 6.6 Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) Extracorporeal membrane oxygenation (ECMO) is a potentially life saving procedure that is used to treat newborn babies who suffer from severe respiratory failure. An experiment was conducted in which 11 babies were treated with ECMO; none of the 11 babies died. Let p denote the probability of death for a baby treated with ECMO. 1) Find 95% and 99% Wald confidence interval for p. 2) Find 95% and 99% Wilson confidence interval for p.

    88. Section 6.6 Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 1) 95% and 99% Wald confidence interval

    89. Section 6.6 Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 2) 95% and 99% Wilson confidence interval

    90. Section 6.6 Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 2) 95% and 99% Wilson confidence interval We know that p cannot be negative, so we state the confidence interval as (0,.299). Thus, we are 95% confident that the probability of death in a newborn with severe respiratory failure who is treated with ECMO is between 0 and .299.

    91. Section 6.6 Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 2) 95% and 99% Wilson confidence interval

    92. Section 6.6 Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 2) 95% and 99% Wilson confidence interval We know that p cannot be negative, so we state the confidence interval as (0,.429). Thus, we are 99% confident that the probability of death in a newborn with severe respiratory failure who is treated with ECMO is between 0 and .429.

More Related