1 / 43

Economics 173 Business Statistics

Economics 173 Business Statistics. Lectures 1 & 2 Summer, 2001 Professor J. Petry. Introduction. Purpose of Statistics is to pull out information from data “without data, ours is just another opinion” “without statistics, we are just another person on data overload”

fosterc
Télécharger la présentation

Economics 173 Business Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Economics 173Business Statistics Lectures 1 & 2 Summer, 2001 Professor J. Petry

  2. Introduction • Purpose of Statistics is to pull out information from data • “without data, ours is just another opinion” • “without statistics, we are just another person on data overload” • Because of its broad usage across disciplines, Statistics is probably the most useful course irrespective of major. • More data, properly analyzed allows for better decisions in personal as well as professional lives • Applicable in nearly all areas of business as well as social sciences • Greatly enhances credibility

  3. Statistics as “Tool Chest” • Different types of data, allow different types of analysis • Quantitative data • values are real numbers, arithmetic calculations are valid • Qualitative data • categorical data, values are arbitrary names of possible categories, calculations involve how many observations in each category • Ranked data • categorical data, values must represent the ranked order of responses, calculations are based on an ordering process. • Time series data • data collected across different points of time • Cross-sectional data • data collected at a certain point in time

  4. Statistics as “Tool Chest” • Different objectives call for alternative tool usage • Describe a single population • Compare two populations • Compare two or more populations • Analyze relationship between two variables • Analyze relationship among two or more variables • By conclusion of Econ 172 & 173, you will have about 35 separate tools to select from depending upon your data type and objective

  5. Describe a single population Compare two or more populations Compare two populations Problem Objective? Analyze relationships among two or more variables. Analyze relationships between two variables

  6. Data type? Quantitative Qualitative Type of descriptive measurements? Number of categories? Two Two or more Variability Central location Describe a single population t- test & estimator of m c2- test & estimator of s2 Z- test & estimator of p c2 goodness of fit test

  7. Data type? Quantitative Qualitative Number of categories Ranked Type of descriptive measurements? Experimental design? Two Two or more Central location Variability Experimental design? Matched pairs Independent samples Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Continue Compare two populations Z - test & estimator of p1 - p2 F- test & estimator of s12/s22 c2-test of a contingency table Wilcoxon rank sum test Sign test

  8. Experimental Design Continue Continue Independent samples Matched pairs Population distribution Distribution of differences Normal Nonnormal Normal Nonnormal Population variances Equal Unequal Wilcoxon rank sum test t- test & estimator of mD Wilcoxon signed rank sum test t- test & estimator of m1-m2 (equal variances) T-test & estimator of m1-m2 (unequal variances)

  9. Experimental design? Data type? Quantitative Qualitative Independent samples Blocks Ranked Population distribution Population distribution Experimental design? Normal Nonnormal Normal Nonnormal Blocks Independent samples Compare two or more populations c2 - test of a contingency table Kruskal-Wallis test Friedman test ANOVA (independent samples) ANOVA (randomized blocks) Kruskal-Wallis test Friedman test

  10. Data type? Quantitative Qualitative Population distribution Ranked Error is normal, or x and y are bivariate normal x and y are not bivariate normal Analyze relationship between two or more variables Data type? Quantitative Qualitative Not covered Multiple regression Ranked Not covered Analyze relationship between two variables c2 - test of a contingency table Spearman rank correlation Simple linear regression and correlation Spearman rank correlation

  11. Numerical Descriptive Measures • Measures of central location • arithmetic mean, median, mode, (geometric mean) • Measures of variability • range, variance, standard deviation, coefficient of variation • Measures of association • covariance, coefficient of correlation

  12. Sum of the measurements Number of measurements Sum of the measurements Number of measurements Mean = Mean = Arithmetic mean Measures of Central Location • This is the most popular and useful measure of central location Sample mean Population mean Sample size Population size

  13. Example The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by 6 7 3 9 4 4.5 • Example Calculate the mean of 212, -46, 52, -14, 66 54

  14. Example 4.4 Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29. Find the median salary. Suppose one employee’s salary of $31,000 was added to the group recorded before. Find the median salary. Odd number of observations 26,26,28,29,30,32,60 The median • The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude. Even number of observations First, sort the salaries. Then, locate the values in the middle First, sort the salaries. Then, locate the value in the middle There are two middle values! 29.5, 26,26,28,29, 30,32,60,31 26,26,28,29, 30,32,60,31 26,26,28,29, 30,32,60,31 26,26,28,29,30,32,60,31

  15. The mode • The mode of a set of measurements is the value that occurs most frequently. • Set of data may have one mode (or modal class), or two or more modes. The modal class

  16. Example The manager of a men’s store observes the waist size (in inches) of trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40. • What is the modal value? 34 This information seems valuable (for example, for the design of a new display in the store), much more than “ the median is 33.2 in.”.

  17. Relationship among Mean, Median, and Mode • If a distribution is symmetrical, the mean, median and mode coincide • If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mode Mean Median

  18. ` • If a distribution is symmetrical, the mean, median and mode coincide • If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A negatively skewed distribution (“skewed to the left”) A positively skewed distribution (“skewed to the right”) Mode Mean Mean Mode Median Median

  19. Example A professor of statistics wants to report the results of a midterm exam, taken by 100 students. He calculates the mean, median, and mode using excel. Describe the information excel provides. The mean provides information about the over-all performance level of the class. It can serve as a tool for making comparisons with other classes and/or other exams. The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated.Then, the mode becomes a logical measure to compute. Excel results

  20. Measures of variability(Looking beyond the average) • Measures of central location fail to tell the whole story about the distribution. • A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? or How spread out are the measurements about the average value?

  21. Observe two hypothetical data sets Low variability data set The average value provides a good representation of the values in the data set. High variability data set This is the previous data set. It is now changing to... The same average value does not provide as good presentation of the values in the data set as before.

  22. ? ? ? The range • The range of a set of measurements is the difference between the largest and smallest measurements. • Its major advantage is the ease with which it can be computed. • Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. But, how do all the measurements spread out? The range cannot assist in answering this question Range Largest measurement Smallest measurement

  23. The variance • This measure of dispersion reflects the values of all the measurements. • The variance of a population of N measurements x1, x2,…,xN having a mean m is defined as • The variance of a sample of n measurementsx1, x2, …,xn having a mean is defined as

  24. Sum = 0 Sum = 0 Consider two small populations: Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16 9-10= -1 11-10= +1 8-10= -2 12-10= +2 Thus, a measure of dispersion is needed that agrees with this observation. Let us start by calculating the sum of deviations The sum of deviations is zero in both cases, therefore, another measure is needed. A 8 9 10 11 12 …but measurements in B are much more dispersed then those in A. The mean of both populations is 10... 4-10 = - 6 16-10 = +6 B 7-10 = -3 13-10 = +3 4 7 10 13 16

  25. Sum = 0 Sum = 0 9-10= -1 The sum of squared deviations is used in calculating the variance. 11-10= +1 8-10= -2 12-10= +2 The sum of deviations is zero in both cases, therefore, another measure is needed. A 8 9 10 11 12 4-10 = - 6 16-10 = +6 B 7-10 = -3 13-10 = +3 4 7 10 13 16

  26. Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of dispersion instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!!

  27. sA2 = SumA/N = 10/5 = 2 sB2 = SumB/N = 8/2 = 4 Which data set has a larger dispersion? Let us calculate the sum of squared deviations for both data sets However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked Data set B is more dispersed around the mean A B 1 2 3 1 3 5 SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10 5 times 5 times ! SumB = (1-3)2 + (5-3)2 = 8

  28. Example Find the mean and the variance of the following sample of measurements (in years). 3.4, 2.5, 4.1, 1.2, 2.8, 3.7 • Solution A shortcut formula =1/5[3.42+2.52+…+3.72]-[(17.7)2/6] = 1.075 (years)

  29. The standard deviation of a set of measurements is the square root of the variance of the measurements. • Example Rates of return over the past 10 years for two mutual funds are shown below. Which one have a higher level of risk? Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05 Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4

  30. Solution • Let’s use the Excel printout that is run from the “Descriptive statistics” sub-menu Fund A should be considered riskier because its standard deviation is larger

  31. The coefficient of variation • The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. • This coefficient provides a proportionate measure of variation. A standard deviation of 10 may be perceived as large when the mean value is 100, but only moderately large when the mean value is 500

  32. Interpreting Standard Deviation • The standard deviation can be used to • compare the variability of several distributions • make a statement about the general shape of a distribution. • The empirical rule: If a sample of measurements has a mound-shaped distribution, the interval

  33. Example The duration of 30 long-distance telephone calls are shown next. Check the empirical rule for the this set of measurements. • Solution • First check if the histogram has an approximate • mound-shape

  34. Calculate the mean and the standard deviation: • Mean = 10.26; Standard deviation = 4.29. • Calculate the intervals: Interval Empirical Rule Actual percentage 5.97, 14.55 68% 70% 1.68, 18.84 95% 96.7% -2.61, 23.13 100% 100%

  35. Measures of Association • Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram. • Covariance - is there any pattern to the way two variables move together? • Correlation coefficient - how strong is the linear relationship between two variables

  36. The covariance mx (my) is the population mean of the variable X (Y) N is the population size. n is the sample size.

  37. If the two variables move the same direction, (both increase or both decrease), the covariance is a large positive number. • If the two variables move in two opposite directions, (one increases when the other one decreases), the covariance is a large negative number. • If the two variables are unrelated, the covariance will be close to zero.

  38. The coefficient of correlation • This coefficient answers the question: How strong is the association between X and Y.

  39. Strong positive linear relationship +1 0 -1 COV(X,Y)>0 or r or r = No linear relationship COV(X,Y)=0 Strong negative linear relationship COV(X,Y)<0

  40. If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). • If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). • No straight line relationship is indicated by a coefficient close to zero.

  41. Example Compute the covariance and the coefficient of correlation to measure how advertising expenditure and sales level are related to one another.

  42. x y xy x2 y2 • Use the procedure below to obtain the required summations Similarly, sy = 8.839

  43. Excel printout • Interpretation • The covariance (10.2679) indicates that advertisement expenditure and sales levelare positively related • The coefficient of correlation (.797) indicates that there is a strong positive linear relationship between advertisement expenditure and sales level. Covariance matrix Correlation matrix

More Related