CHAPTER 2 Modeling Distributions of Data

CHAPTER 2Modeling Distributions of Data 2.1 (part 3)Describing Location in a Distribution

Transforming Data Transforming converts the original observations from the original units of measurements to another scale. Transformations can affect the shape, center, and spread of a distribution. Effect of Adding (or Subtracting) a Constant • Adding / Subtracting the same number “a” to / from each individual observation: • Is the same as if you add/subtract “a” to measures of center and location (mean, median, quartiles, percentiles) • Does not change the shape of the distribution or measures of spread (range, IQR, standard deviation).

Example #1: Teachers’ Salaries • A school system employs teachers at salaries between $30,000 and $60,000. The teacher’s union and the school board are negotiating the form of next year’s increase in the salary schedule. Suppose that every teacher is given a flat $1000 raise. • How much will the MEAN salary increase? • How much will the MEDIAN salary increase? • Will a flat $1000 raise increase the spread as measured by the distance between the quartiles? • Will a flat $1000 raise increase the spread as measured by the standard deviation of the salaries?

Example #2: Estimating Room Width Soon after the metric system was introduced in Australia, a group of students was asked to guess the width of their classroom to the nearest meter. The dotplot labeled “guess m” shows their responses. Later, the students found out that the actual width of the room was 13 meters. We can now examine the distribution of students’ guessing errors by defining a new variable as follows: error = guess − 13 That is, we’ll subtract 13 from each observation in the data set. Try to predict what the shape, center, and spread of this new distribution will be.

From the figure below, it seems clear that subtracting 13 from each observation • did NOT affect the shape or spread of the distribution. • But this transformation appears to have decreased the center of the distribution • by 13 meters. • The summary statistics in the table below confirm our beliefs.

Transforming Data Transforming converts the original observations from the original units of measurements to another scale. Transformations can affect the shape, center, and spread of a distribution. Effect of Multiplying (or Dividing) by a Constant • Multiplying (or dividing) each observation by the same number “b”: • Multiplies/divides measures of center and location (mean, median, quartiles, percentiles)by “b” • Multiplies/divides measures of spread (range, IQR, standard deviation) by |b|, • does not change the shape of the distribution

Example #3: Teachers’ Salaries (revisited) Suppose that instead of teachers getting a flat $1000 raise, they will receive a 5% raise, and the amount of the raise will vary from $1500 to $3000, depending on present salary. • Will a 5% raise across the board increase the spread of the distribution as measured by the distance between the quartiles? • Do you think it will increase the standard deviation?

Example #4: Estimating Room Width (revisited) • Because our group of Australian students is having some difficulty with the metric system, it may not be helpful to tell them that their guesses tended to be about 2 to 3 meters too high. • Let’s convert the error data to feet before we report back to them. There are roughly 3.28 feet in a meter. • So for the student whose error was -5 meters, that translates to: • To change the units of measurement from meters to feet, we multiply each of the error values by 3.28. • What effect will this have on the shape, center, and spread of the distribution?

The dotplots below show the students’ guessing errors in meters and in feet, along with summary statistics from computer software. • The shape of the two distributions is the same….they are both skewed to the right and they are both bimodal. • The centers and spreads of the two distributions are quite different. Can you see that the measures of center were multiplied by 3.28? • If we multiply the all the individual observations by 3.28, then the mean, median, standard deviation, IQR, and the range should also be multiplied by a factor of 3.28. (** The variance would be multiplied by )

Check your understanding: Temperature • What happens if we transform a data set by both adding/subtracting a constant AND multiplying/dividing by a constant? • Convert temperature data from Celsius to Farenheit, we have to use the formula • Ex: If the mean temp is 8.43 degrees in Celsius, find the mean temperature in Fahrenheit. • Ex: If the std dev is 2.27 degrees Celsius, calculate the std dev of the temperature in Fahrenheit. • If the 93rd percentile is 12 degrees Celsius, calculate the 93rd percentile in degrees Fahrenheit.

Connecting Transformations & Z-scores • To standardize an observation, you subtract the mean of the distribution, and then divide by the standard deviation. • What if we standardized EVERY observation in a distribution? • Let’s go back to the class of Statistics tests that had a mean of 80 and a std dev of 6.07. To convert the entire class’ test results to z-scores, we would subtract 80 from each test score, and then divide by 6.07. • What effect would these transformations have on the shape, center, and spread of the distribution?

Check your understanding: Taxicabs In 2010, taxicabs in NYC charged an initial fee of $2.50 plus $2 per mile. Writing an equation, fare = 2.50 + 2(miles) At the end of the month, a businessman collects all his taxicab receipts and calculates the mean fare he paid was $15.45, and the standard deviation was $10.20. Problem: What are the mean and standard deviation of the lengths of his cab rides in MILES?

Describing Location in a Distribution • FIND and INTERPRET the percentile of an individual value within a distribution of data. • ESTIMATE percentiles and individual values using a cumulative relative frequency graph. • FIND and INTERPRET the standardized score (z-score) of an individual value within a distribution of data. • DESCRIBE the effect of adding, subtracting, multiplying by, or dividing by a constant on the shape, center, and spread of a distribution of data.

CHAPTER 2 Modeling Distributions of Data