Business and Finance College Principles of Statistics Lecture 5 Eng. Heba Hamad week 3- 2008

Business and Finance College Principles of StatisticsLecture 5Eng. Heba Hamadweek 3- 2008

Slides Prepared by JOHN S. LOUCKS St. Edward’s University

Chapter 3 Descriptive Statistics: Numerical MeasuresPart B • Measures of Distribution Shape, Relative Location, and Detecting Outliers • Exploratory Data Analysis • Measures of Association Between Two Variables • The Weighted Mean and Working with Grouped Data

Measures of Distribution Shape,Relative Location, and Detecting Outliers • Distribution Shape • z-Scores • Empirical Rule • Detecting Outliers

Distribution Shape: Skewness • An important measure of the shape of a distribution is called skewness. • The formula for computing skewness for a data set is somewhat complex. • Skewness can be easily computed using statistical software.

.35 .30 .25 .20 .15 .10 .05 0 Distribution Shape: Skewness • Symmetric (not skewed) • Skewness is zero. • Mean and median are equal. Skewness = 0 Relative Frequency

.35 .30 .25 .20 .15 .10 .05 0 Distribution Shape: Skewness • Moderately Skewed Left • Skewness is negative. • Mean will usually be less than the median. Skewness = - .31 Relative Frequency

.35 .30 .25 .20 .15 .10 .05 0 Distribution Shape: Skewness • Moderately Skewed Right • Skewness is positive. • Mean will usually be more than the median. Skewness = .31 Relative Frequency

.35 .30 .25 .20 .15 .10 .05 0 Distribution Shape: Skewness • Highly Skewed Right • Skewness is positive (often above 1.0). • Mean will usually be more than the median. Skewness = 1.25 Relative Frequency

Distribution Shape: Skewness • Example: Apartment Rents Seventy efficiency apartments were randomly sampled in a small college town. The monthly rent prices for these apartments are listed in ascending order on the next slide.

Distribution Shape: Skewness

.35 .30 .25 .20 .15 .10 .05 0 Distribution Shape: Skewness Skewness = .92 Relative Frequency

z-Scores The z-score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean.

z-Scores • An observation’s z-score is a measure of the relative location of the observation in a data set. • A data value less than the sample mean will have a • z-score less than zero. • A data value greater than the sample mean will have • a z-score greater than zero. • A data value equal to the sample mean will have a • z-score of zero.

z-Scores • z-Score of Smallest Value (425) Standardized Values for Apartment Rents

of the values of a normal random variable are within of its mean. 68.26% +/- 1 standard deviation of the values of a normal random variable are within of its mean. 95.44% +/- 2 standard deviations of the values of a normal random variable are within of its mean. 99.72% +/- 3 standard deviations Empirical Rule For data having a bell-shaped distribution:

99.72% 95.44% 68.26% Empirical Rule x m m + 3s m – 3s m – 1s m + 1s m – 2s m + 2s

Detecting Outliers • An outlier is an unusually small or unusually large • value in a data set. • A data value with a z-score less than -3 or greater • than +3 might be considered an outlier. • It might be: • an incorrectly recorded data value • a data value that was incorrectly included in the • data set • a correctly recorded data value that belongs in • the data set

Detecting Outliers • The most extreme z-scores are -1.20 and 2.27 • Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set. Standardized Values for Apartment Rents

Exploratory Data Analysis • Five-Number Summary • Box Plot

Five-Number Summary 1 Smallest Value 2 First Quartile 3 Median 4 Third Quartile 5 Largest Value

Five-Number Summary Lowest Value = 425 First Quartile = 445 Median = 475 Largest Value = 615 Third Quartile = 525

Five-Number Summary 1 Smallest Value = 425 2 First Quartile = 445 3 Median = 475 4 Third Quartile = 525 5 Largest Value = 615

625 450 375 400 500 525 550 575 600 425 475 Box Plot • A box is drawn with its ends located at the first and third quartiles. • A vertical line is drawn in the box at the location of • the median (second quartile). Q1 = 445 Q3 = 525 Q2 = 475

Box Plot • The lower limit is located 1.5(IQR) below Q1. Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325 • The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645 • Data outside these limits are considered outliers. • The locations of each outlier is shown with the symbol * . • There are no outliers (values less than 325 or • greater than 645) in the apartment rent data.

625 450 375 400 500 525 550 575 600 425 475 Box Plot • Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. Smallest value inside limits = 425 Largest value inside limits = 615

Weighted Mean • When the mean is computed by giving each data • value a weight that reflects its importance, it is • referred to as a weighted mean. • In the computation of a grade point average (GPA), • the weights are the number of credit hours earned for • each grade. • When data values vary in importance, the analyst • must choose the weight that best reflects the • importance of each value.

Weighted Mean where: xi= value of observation i wi = weight for observation i

GPA Example Grade#Credits (Weight)Product A 4 4 16 B 3 3 9 B 3 2 6 C 2 12 10 33 (sum of above) A = 4, B = 3, C = 2 GPA = 33/10 = 3.3

Grouped Data • The weighted mean computation can be used to • obtain approximations of the mean, variance, and • standard deviation for the grouped data. • To compute the weighted mean, we treat the • midpoint of each class as though it were the mean • of all items in the class. • We compute a weighted mean of the class midpoints • using the class frequencies as weights. • Similarly, in computing the variance and standard • deviation, the class frequencies are used as weights.

Mean for Grouped Data • Sample Data • Population Data where: fi = frequency of class i Mi = midpoint of class i

Sample Mean for Grouped Data Given below is the previous sample of monthly rents for 70 efficiency apartments, presented here as grouped data in the form of a frequency distribution.

Sample Mean for Grouped Data This approximation differs by $2.41 from the actual sample mean of $490.80.

Variance for Grouped Data • For sample data • For population data

continued Sample Variance for Grouped Data

End of Chapter 3, Part B

Business and Finance College Principles of Statistics Lecture 5 Eng. Heba Hamad week 3- 2008