610 likes | 829 Vues
Chapter 5. Summarizing Data Numerically. Homework 7. Read pages 299-313 LDI: 5.1-5.7 Exercises: (page 311) 5.1-5.9. Chapter 4 Quiz. Take practice quizzes online (Book site) Quiz will be given via MyCR Review homework. Wendall Zurkowitz, slave to the waffle light.
E N D
Chapter 5 Summarizing Data Numerically
Homework 7 • Read pages 299-313 • LDI: 5.1-5.7 • Exercises: (page 311) 5.1-5.9
Chapter 4 Quiz • Take practice quizzes online (Book site) • Quiz will be given via MyCR • Review homework
Will every waffle take the same amount of time to cook?Two things Wendall would like to know: What is the average amount of time to cook and how much variability is there in the cooking time. We cover the average in this section, variability in the next.
Measurement of Center • If we take a sample of n values and calculate what we have come to know as the average we have calculated the arithmetic mean of the data. • This measure of center is a statistic since it comes from a sample.
The Sample Mean • The sample mean is a statistic. The purpose for its existence is to estimate the parameter, the population mean. • The sample mean is denoted by:
The Population Mean • The population mean is a parameter. The population mean is denoted by:
Example • Let’s find the sample mean of the AGE data. We’l use our calculator: 1VarStat AGE
Is the mean always the center? • Suppose that a sample of 100 is obtained from a population • Can the mean be larger than the maximum value or smaller than the minimum value? • Can the mean be the same as the max or min value? • Can the mean be the exact middle point of the distribution? • Can the mean not be equal to any of the data collected?
Let's Do It! 5.1, Page 303, • A Mean Is Not Always Representative • Kim's test scores are 7, 98, 25, 19, and 26. • Calculate Kim's mean test score. Explain why the mean does not do a very good job at summarizing Kim's test scores.
Let’s Do It! 5.2 Combining Means We have seven students. The mean score for three of these students is 54 and the mean score for the four other students is 76. • What is the mean score for all seven students?
The Median! • The median of a set n observations, ordered from smallest to largest, is a value such that at least half of the observations are less than or equal to that value and at least half of the observations are greater than or equal to that value.
Find the Median of the AGE data • The Hard way • The Easy way
Let’s Do It! 5.3: Median Number of Children per Household Find the median number of children in a household from this sample of 10 households, that is, find the median of Observation Number: 1 2 3 4 5 6 7 8 9 10 Number of Children: 2, 3, 0, 1, 4, 0, 3, 0, 1, 2 (a) Order the observations from smallest to largest: (b) Calculate (n+1)/2 = _________________ (c) Median = ______________ • What happens to the median if the fifth observation in the first list was incorrectly recorded as 40 instead of 4? (e) What happens to the median if the third observation in the first list was incorrectly recorded as -20 instead of 0? • Note: The median is resistant—that is, it does not change, or changes very little, in response to extreme observations.
The Mode • To find the middle or measure of center of categorical (qualitative) data we are forced to use the Mode. It can also be used with numerical (quantitative) data, but it is not a good measure of center. • The mode of a set of data is the most frequently occurring value, the value with the highest frequency.
Example • Find the mode for the following data:(a) 1, 2, 3, 2, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6, 7(b) 1, 4, 3, 4, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6
The mode can be computed for qualitative data. • The modal race category is “white.”
Consider the following data: 2, 2, 2, 20, 34, 45, 210What are the mode, median, mean?
Think About It • Suppose that you compute the mean, median, and mode for a list of numbers. Which must always appear as one of the numbers in the list? • If the distribution is symmetric, which measure of center do you calculate, the mean or the median? Why?
Let's Do It! 5.4, Page 308, The Usefulness of Randomization • Consider a study to compare two antibiotics for treating strep throat in children, Amoxicillin and Cefadroxil. At one center for this study, 23 children (who met the study entrance criteria and for whom consent was given) were randomly assigned to one of the two treatment groups. One concern is that age of the child might influence the effectiveness of the antibiotics. The ages of the children in each treatment group are given below. • Calculate the mean, median, and mode for each of the two treatment groups. How do the two groups compare with respect to age? • Amoxicillin Group (n=11) AGE: 14 17 11 10 11 14 9 12 8 10 9 • Cefadroxil Group (n=12)AGE:9 14 8 10 13 7 9 11 16 10 12 9 • How do the two groups compare with respect to age?
Let’s Do It! 5.5: Page 309, Attend Graduate School? When do undergraduates make the decision to continue their education and attend graduate school? An undergraduate attending a four-year college with a semester system (versus a quarter system) would have a total of eight semesters of classes (excluding any summer sessions). A sample of 18 senior undergraduates who would be graduating and attending graduate school were asked the following question: "In which semester {1, 2, 3, 4, 5, 6, 7, or 8} did you decide you would continue your education and attend graduate school?" The responses are given below: (a) Construct a frequency plot of these data. (b) Obtain the following sample statistics for these data. Minimum = ___________ Maximum = ______________ Median = _____________ Mean = _____________ (c) How do the two measures of center, the median and the mean, compare? Select one: i. Median > Mean ii. Median < Mean iii. Median = Mean
Homework 8 • Read pages 314-340 • LDI: 5.8–5.12, 5.14 • Exercises: (page 333) 5.10, 5.11, 5.13, 5.14, 5.16, 5.18, 5.19
Measures of Variation • Now that we can measure the center of a distribution, we need to know something about the spread or variability of the data. • There are (as with the average) several popular ways of doing this measurement.
Why Measure Variation? • Consider the following plots • They both have mean of 60, but are they the same distribution?
The Range • Our first crude estimate of the variation of a data set is the rangewhich is simply: max – min. • Again, this measure is very limited in it’s ability to describe the spread in a data set.
Example • Consider these distributions: • They have the same range of 30 – 20 or 10, yet they have very different variation.
Quartiles • Recall that the median is the middle number of a distribution. This means that 50% of the data will fall below this value. We can chop the data into four equal pieces by finding the median of the lower 50% and the upper 50%. These values are called the Quartiles.
Find the Quartiles for AGE • Q1 is the first quartile, 25% of the data fall below this value and 75% above it. It is the median of the data that fall below the median • MED is the second quartile, 50% of the data fall below this value and 50% above it. • Q3 is the third quartile, 75% of the data fall below this value and 25% fall above it. It is the median of the data that fall above the median
5-Number Summary and Boxplots • The 5-number summary is simply:MinQ1MedQ3Max • A Boxplot is a plot of these points. Draw a boxplot of the AGE data (page 283)
InterQuartile Range • The InterQuartile Range or IQR is simply the difference between Q3 and Q1: IQR = Q3–Q1 Find the IQR for the AGE data.
1.5xIQR Rule • Any value of the data that falls 1.5xIQR above Q3 or 1.5xIQR below Q1 is a considered an outlier. • Do modified boxplot of AGE data by hand • Do boxplots on TI-83
Let’s Do It • LDI 4.23 Use data to make side-by-side comparative boxplots.
Let's Do It! 5.11, Page 321, Comparing Ages – Antibiotic Study • Amoxicillin Group (n=11) AGE: 14 17 11 10 11 14 9 12 8 10 9 • Cefadroxil Group (n=12)AGE:9 14 8 10 13 7 9 11 16 10 12 9 • Make side-by-side basic boxplots for the age data above. • Use the 1.5xIQR to determine if there are any outliers for either group.
Think About It • If the boxplot is symmetric, can we conclude that the distribution is symmetric?
Homework 9 • LDI: 5.17, 5.18 • Exercises: (page 342) 5.22, 5.25 • Exercises: (page 345) 5.30, 5.37, 5.47, 5.50
Standard Deviation • We want a way to measure spread based upon the mean. To do this we will find the average distance from the mean of our data. Well, actually we find the sum of the squared deviations and then divide by n – 1 and then take the square root.
Sample Standard Deviation Formula The TI-83 calculates sample standard deviation of data.
Population Standard Deviation The TI-83 calculates the population standard deviation of data.
Find the Stan. Dev. • Let’s do this small data set by hand:1, 4, 2, 3, 9, 7, 2, 4, 5, 1, 8, 8, 7 • Let’s verify our result on the TI-83