950 likes | 2.58k Vues
The Variance and Standard Deviation. The most important measure of variability is based on deviations of individual observations about the central value. For this purpose the mean usually serves as the center. MEASURES OF VARIABILITY. Variance Population variance Sample variance
E N D
The Variance and Standard Deviation The most important measure of variability is based on deviations of individual observations about the central value. For this purpose the mean usually serves as the center.
MEASURES OF VARIABILITY • Variance • Population variance • Sample variance • Standard Deviation • Population standard deviation • Sample standard deviation • Coefficient of Variation (CV) • Sample CV • Population CV
MEASURES OF VARIABILITYPOPULATION VARIANCE • The population variance is the mean squared deviation from the population mean: • Where 2stands for the population variance • is the population mean • N is the total number of values in the population • is the value of the i-th observation. • represents a summation
An example related to deviation about the central value • There are five SAT scores as below: 584, 613, 622, 693, 755. • The mean is (584+613+622+693+755)/5 = 653.4 • The deviation for each score can be computed by subtracting mean from each score: 755-653.4 = 101.6
An example related to deviation about the central value (cont..) 693-653.4 = 39.6 622-653.4 = -31.4 613.653.4 = -40.4 584-653.4 = -69.4 These deviations may be summarized by the collective measure that considers each deviation.
An example related to deviation about the central value (cont..) With the previous data, this procedure results in
Population Variance • In practice population variance cannot be computed directly because the entire population is not ordinarily observed. • An analogous measure of variability may be determined with sample data. • This referred to as sample variance
MEASURES OF VARIABILITYSAMPLE VARIANCE • The sample variance is defined as follows: • Where s2stands for the sample variance • is the sample mean • n is the total number of values in the sample • is the value of the i-th observation. • represents a summation
MEASURES OF VARIABILITYSAMPLE VARIANCE • Notice that the sample variance is defined as the sum of the squared deviations divided by n-1. • Sample variance is computed to estimate the population variance. • An unbiased estimate of the population variance may be obtained by defining the sample variance as the sum of the squared deviations divided by n-1 rather than by n. • Defining sample variance as the mean squared deviation from the sample mean tends to underestimate the population variance.
MEASURES OF VARIABILITYSAMPLE VARIANCE • A shortcut formula for the sample variance: • Where s2is the sample variance • n is the total number of values in the sample • is the value of the i-th observation. • represents a summation
MEASURES OF VARIABILITY POPULATION/SAMPLE STANDARD DEVIATION • The standard deviation is the positive square root of the variance: Population standard deviation: Sample standard deviation: • Compute the standard deviations of advertising and sales.
MEASURES OF VARIABILITY POPULATION/SAMPLE STANDARD DEVIATION • Compute the sample standard deviation of advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0 • Compute the sample standard deviation of sales data: 264, 116, 165, 101 and 209
MEASURES OF VARIABILITY POPULATION/SAMPLE CV • The coefficient of variation is the standard deviation divided by the means Population coefficient of variation: Sample coefficient of variation:
MEASURES OF VARIABILITY POPULATION/SAMPLE CV • Compute the sample coefficient of variation of advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0 • Compute the sample coefficient of variation of sales data: 264, 116, 165, 101 and 209
MEASURES OF ASSOCIATION • Scatter diagram plot provides a graphical description of positive/negative, linear/non-linear relationship • Some numerical description of the positive/negative, linear/non-linear relationship are obtained by: • Covariance • Population covariance • Sample covariance • Coefficient of correlation • Population coefficient of correlation • Sample coefficient of correlation
Sales Advertising Month (000 units) (000 $) 1 264 2.5 2 116 1.3 3 165 1.4 4 101 1.0 5 209 2.0 MEASURES OF ASSOCIATION: EXAMPLE • A sample of monthly advertising and sales data are collected and shown below: • How is the relationship between sales and advertising? Is the relationship linear/non-linear, positive/negative, etc.
POPULATION COVARIANCE • The population covariance is mean of products of deviations from the population mean: • Where COV(X,Y) is the population covariance • x,y are the population means of X and Y respectively • N is the total number of values in the population • are the values of the i-th observations of X and Y respectively. • represents a summation
SAMPLE COVARIANCE • The sample covariance is mean of products of deviations from the sample mean: • Where cov(X,Y) is the sample covariance • are the sample means of X and Y respectively • n is the total number of values in the population • are the values of the i-th observations of X and Y respectively. • represents a summation
POPULATION/SAMPLE COVARIANCE • If two variables increase/decrease together, covariance is a large positive number and the relationship is called positive. • If the relationship is such that when one variable increases, the other decreases and vice versa, then covariance is a large negative number and the relationship is called negative. • If two variables are unrelated, the covariance may be a small number. • How large is large? How small is small?
POPULATION/SAMPLE COVARIANCE • How large is large? How small is small? A drawback of covariance is that it is usually difficult to provide any guideline how large covariance shows a strong relationship and how small covariance shows no relationship. • Coefficient of correlation can overcome this drawback to a certain extent.
POPULATION COEFFICIENT OF CORRELATION • The population coefficient of correlation is the population covariance divided by the population standard deviations of X and Y: • Where is the population coefficient of correlation • COV(X,Y) is the population covariance • x,y are the population means of X and Y respectively
SAMPLE COEFFICIENT OF CORRELATION • The sample coefficient of correlation is the sample covariance divided by the sample standard deviations of X and Y: • Where r is the sample coefficient of correlation • cov(X,Y) is the sample covariance • sx,sy are the sample means of X and Y respectively
RELATIVE STANDINGBOX PLOTS • When the data set contains a small number of values, a box plot is used to graphically represent the data set. These plots involve five values: • the minimum value (S) • the lower quartile (Q1) • the median (Q2) • the upper quartile (Q3) • and the maximum value (L)
RELATIVE STANDING: BOX PLOTSEXAMPLE • Example: Construct a box plot with the following data which shows the assets of the 15 largest North American banks, rounded off to the nearest hundred million dollars: 111, 135, 217, 108, 51 , 98, 65, 85, 75, 75, 93, 64, 57, 56, 98
RELATIVE STANDING: BOX PLOTSINTERPRETATION • If the median is near the center of the box, the distribution is approximately symmetric. • If the median falls to the left of the center of the box, the distribution is positively skewed. • If the median falls to the right of the center of the box, the distribution is negatively skewed. • If the lines are about the same length, the distribution is approximately symmetric. • If the line segment to the right of the box is larger than the one to the left, the distribution is positively skewed. • If the line segment to the left of the box is larger than the one to the right, the distribution is positively skewed.
EXAMPLE • Salary and expenses for cultural activities, and sports related activities are collected from 100 households. Data of only 5 households shown below: How are the relationships (linear/non-linear, positive/negative) between (i) salary and culture, (ii) salary and sports, and (iii) sports and culture?