BCOR 1020 Business Statistics

BCOR 1020Business Statistics Lecture 4 – January 29, 2008

Overview • Chapter 4 – Descriptive Statistics… • Numerical Description • Central Tendency • Dispersion

Chapter 4 – Numerical Description Sample (Size = n): Statistics are computed and estimate parameters e.g., = sample mean, S = sample std. dev. Population (Size = N): Characterized by Parameters e.g., m = pop. Mean, s = pop. Std. dev. • Recall: • Statistics are descriptive measures derived from a sample (n items). • Parameters are descriptive measures derived from a population (N items).

Chapter 4 – Numerical Description There are three key characteristics of numerical data:

Defect rate = total no. defects x 100 no. inspected Chapter 4 – Numerical Description Example: Vehicle Quality • Consider the data set of vehicle defect rates from J. D. Power and Associates. • Numerical statistics can be used to summarize this random sample of brands. • Must allow for sampling error since the analysis is based on sampling.

Chapter 4 – Numerical Description • Number of defects per 100 vehicles, 2004 models.

Chapter 4 – Numerical Description • Sorted data provides insight into central tendency and dispersion.

Chapter 4 – Numerical Description Visual Displays: • The dot plot offers a visual impression of the data. • Histograms with 5 bins (suggested by Sturges’ Rule) and 10 bins are shown below. • Both are symmetric with no extreme values and show a modal class toward the low end.

Chapter 4 – Numerical Description • We can compute descriptive statistics using Excel and discuss measures of central tendency and dispersion… • Figures 4.4 and 4.5 in your text details the Excel menus for computing descriptive statistics. • Figure 4.7 in your text details the MegaStat menus for computing descriptive statistics.

Chapter 4 – Numerical Description MegaStat output…

Chapter 4 – Central Tendency • The central tendency is the middle or typical values of a distribution. • Central tendency can be assessed using a dot plot, histogram or more precisely with numerical statistics. • The Text presents six measures of central tendency… • Mean – Median • Mode – Midrange • Geometric Mean (G) – Trimmed Mean • The mean and median are the most frequently used, but we will discuss the merits of all six.

Chapter 4 – Central Tendency Mean – • A familiar measure of central tendency. • In Excel, use function =AVERAGE(Data) where Data is an array of data values. • For the sample of n = 37 car brands:

Chapter 4 – Central Tendency Characteristics of the Mean: • Arithmetic mean is the most familiar average. • Affected by every sample item. • The balancing point or fulcrum for the data. • Regardless of the shape of the distribution, distances from the mean to the data points always sum to zero.

Chapter 4 – Central Tendency Median(M) – the 50th percentile or midpoint of the sorted sample data. • Use Excel’s function =MEDIAN(Data) where Data is an array of data values. • M separates the upper and lower half of the sorted observations. • If n is even, the median is the average of the middle two observations in the data array. • If n is odd, the median is the middle observation in the data array.

For odd n, Median = For even n, Median = Chapter 4 – Central Tendency Median: • To compute the median by hand, sort the n observations in the data: where

For even n, Median = Chapter 4 – Central Tendency Example: • Consider the following n = 6 data values:11 12 15 17 21 32 • What is the median? n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4 M = (x3+x4)/2 = (15+17)/2 = 16

Clickers Consider the following n = 7 data values:12 23 23 25 27 34 41 What is the median? A = 24 B = 25 C = 26 D = 27

Chapter 4 – Central Tendency Median • For the 37 vehicle quality ratings (odd n) the position of the median is (n+1)/2 = (37+1)/2 = 19. • So, the median is x19 = 121. • When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.

Chapter 4 – Central Tendency Characteristics of the Median • The median is insensitive to extreme data values. • For example, consider the following quiz scores for 3 students: • What does the median for each student tell you? Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285 Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380 Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350

Chapter 4 – Central Tendency Mode – The most frequently occurring data value. • Similar to mean and median if data values occur often near the center of sorted data. • May have multiple modes or no mode. • Easy to define, not easy to calculate in large samples. • Use Excel’s function =MODE(Array) • will return #N/A if there is no mode. • will return first mode found if multimodal. • May be far from the middle of the distribution and not at all typical. • Generally isn’t useful for continuous data since data values rarely repeat. • Best for attribute data or a discrete variable with a small range (e.g., Likert scale).

Chapter 4 – Central Tendency Mode: • A bimodal distribution refers to the shape of the histogram rather than the mode of the raw data. • Occurs when dissimilar populations are combined in one sample. For example,

Chapter 4 – Central Tendency Skewness: • Compare mean and median or look at histogram to determine degree of skewness. • Mean, Median & Skewness: • If median > mean, skewed left. • If median = mean, symmetric. • If median < mean, skewed right. • Mean, Mode & Skewness: • If mode > mean, skewed left. • If mode = mean, symmetric. • If mode < mean, skewed right.

Midrange = Chapter 4 – Central Tendency Midrange – the point halfway between the lowest and highest values of X. • Easy to use but sensitive to extreme data values.

Clickers Consider the J. D. Power quality data (n=37): What is the midrange? A = 121 B = 122 C = 130 D = 173

Chapter 4 – Central Tendency Trimmed Mean: • To calculate the trimmed mean, first remove the highest and lowest k percent of the observations. • To determine how many observations to trim, multiply k x n: • Remove (k x n) highest and lowest observations. • Mitigates the effects of extreme values. • May exclude relevant data values.

Chapter 4 – Dispersion • Variation is the “spread” of data points about the center of the distribution in a sample. The text considers the following measures of dispersion: • Range • Variance (S2) • Standard Deviation (S) • Coefficient of Variation (CV) • Mean Absolute Deviation (MAD) • The variance and standard deviation are the most frequently used, but we will briefly discuss the merits of all five.

Chapter 4 – Dispersion Range – The difference between the largest and smallest observation. • Easy to calculate, but sensitive to extreme data values. Range = xmax – xmin

Chapter 4 – Dispersion Variance: • The population variance (s2) is defined as the sum of squared deviations around the mean m divided by the population size. • For the sample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance s2.

Population standard deviation Sample standard deviation Chapter 4 – Dispersion Standard Deviation – The square root of the variance. • Explains how individual values in a data set vary from the mean. • Units of measure are the same as X. • For the 37 vehicle quality ratings …

Chapter 4 – Dispersion

Chapter 4 – Dispersion Calculating Standard Deviation: • Excel’s built in functions are… • The standard deviation is nonnegative because deviations around the mean are squared. • When every observation is exactly equal to the mean, the standard deviation is zero. • Standard deviations can be large or small, depending on the units of measure. • Compare standard deviations only for data sets measured in the same units and only if the means do not differ substantially.

Chapter 4 – Dispersion Coefficient of Variation – A unit-free measure of dispersion. • Expressed as a percent of the mean. • Useful for comparing variables measured in different units or with different means. • Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.

Clickers Recall from the J. D. Power quality data (n=37): What is the Coefficient of Variation? A = 5.48% B = 18.26% C = 22.89% D = 125.38%

Chapter 4 – Dispersion Mean Absolute Deviation (MAD) – reveals the average distance from an individual data point to the mean (center of the distribution). • Uses absolute values of the deviations around the mean. • Excel’s function is =AVEDEV(Array).

Machine B Machine A Chapter 4 – Dispersion Central Tendency vs. Dispersion: Manufacturing • Consider the histograms of hole diameters drilled in a steel plate during manufacturing. Acceptable variation but mean is less than 5 mm. Desired mean (5mm) but too much variation. • The desired distribution is outlined in red.

BCOR 1020 Business Statistics