Descriptive Statistics

Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Slide No. Subject Slide No. Subject • 1. Index 2 • 2. Index 3 • 3. Statistics (Definitions) 4 • 4. Descriptive Statistics 5 • Inferential Statistics 11 • Examples of 4 and 5 14 • 7. Data, Level of measurements 15 • 8. Variable 8 • 9. Discrete variable 10 • 10. Continues variable 9 • 11. FrequencyDistribution 6 • 12. Constructing Freq. Distn. 22, 23 • 13. Exampleof 12 24, 25 • 14. Displayingthe Data 7 • 15.Bar Chart, Pie Chart 16 • 16. Stem Leaf Plot 32-34 • 17. Graph 17 • 18. Histogram 26, 27 • 19. FrequencyPolygon 28, 29 • 20. Cumulative Freq. Polygon 30, 31 21. Summary Measures 18 22. Goals 19 23.Arithmetic Mean 37, 40 24. Characteristic of Mean 20 25. Examplesof 23 38-39 26. Weighted Mean 41 27. Example weighted Mean 42 28. Geometric Mean 43 29. Example: Geometric Mean 44 30. Median 45 31. Example of Median 46 32. Properties of Median 47 33. Mode 48 34.Examples of Mode 49-50 35. Positions of mean, median and mode. 51 36. Dispersion 52 37. Range and Mean Deviation 53 39. Example of Mean Deviation 54-55 40. Variance 56 Index

Slide No. Subject 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 75. 79. 80. Index Slide No. Subject 41. Examples of variance 57-59 42. Moments 60 43. Examples of Moments 61-62 44. Skewness 63 45. Types of Skewness 64 46. Coefficient of Skewness 65 47. Example of skewness 66-67 48. Empirical Rule 68-69 49. Exercise 70 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60.

Numerical Facts (Common Usage) Field or Discipline of Study Definition 1. No. of children born in a hospital in some specified time. 2. No. of students enrolled in GCU in 2007. 3. No of road accidents on motor way. 4. Amount spent on Research Development in GCU during 2006-2007. 5. No. of shut down of Computer Network on a particular day. The Science of Collection, Presentation, Analyzing and Interpretation of Data to make Decisions and Forecasts. Probability provides the transition between Descriptive and Inferential Statistics Descriptive Statistics STATISTICS Examples of Descriptive And Inferential Statistics Inferential Statistics 1

Consists of methods for Organizing, Displaying, and DescribingData by using Tables, Graphs, and Summary Measures. Data Types of Data A data set is a collection of observations on one or morevariables. Descriptive Statistics 1

Tables Frequency Distribution Frequency Table Organizing the Data 1 Construction of Frequency Distribution A grouping of quantitative data into mutually exclusive classes showing the number of observations in each class. A grouping of qualitative data into mutually exclusive classes showing the number of observations in each class. Selling price of 80 vehicles Vehicle Selling Number of Price Vehicles 15000 to 24000 48 24000 to 33000 30 33000 to 42000 2 Preference of four type of beverage by 100 customers. Beverage Number Cola-Plus 40 Coca-Cola 25 Pepsi 20 7-UP 15

Stem and Leaf Plot Graph Diagrams/Charts • Bar Chart • Pie Chart • Histogram • Frequency Polygon Displaying the Data 1

Qualitative or Categorical variable Quantitative Variable Variable A characteristic under study that assumes different values for different elements. (e.g Height of persons, no. of students in GCU ) A variable that can be measured numerically is called quantitative variable. A variable that can not assume a numerical value but can be classified into two or more non numeric categories is called qualitative or categorical variable. Continuous variable Discrete variable • Educational achievements • Marital status • Brand of PC 1 Go to Descriptive Statistics

Continuous variable A variable whose observations can assume any value within a specific range. • Amount of income tax paid. • Weight of a student. • Yearly rainfall in Murree. • Time elapsed in successive network breakdown. 1 Back

Discrete variable Variable that can assume only certain values, and there are gaps between the values. • Children in a family • Strokes on a golf hole • TV set owned • Cars arriving at GCU in an hour • Students in each section of statistics course 1 Back

InferentialStatistics Consists of methods, that use sample results to help make decisions or predictions about population. 1

Testing of Hypothesis Estimation Interval Estimation Point Estimation Sample • A portion of population selected for study. • 2. A sub set of Data selected from a population. Selecting a Sample Go to Inferential Statistics 1

Examples Finite Population Infinite Population Population 1. Consists of all-individual items or objects-whose characteristics are being studied. 2. Collection of Data that describe some phenomenon of interest. • Length of fish in particular lake. • No. of students of Statistics course in BCS. • No. of traffic violations on some specific holiday. • Depth of a lake from any conceived position. • Length of life of certain brand of light bulb. • Stars on sky. 1 Go to Inferential Statistics

Examples Descriptive Inferential Descriptive and Inferential Statistics • At least 5% of all fires reported • last year in Lahore were • deliberately set. • Next to colonial homes, more • residents in specified locality • prefer a contemporary design. • As a result of recent poll, most • Pakistanis are in favor of • independent and powerful parliament. • As a result of recent cutbacks by the • oil-producing nations, we can expect • the price of gasoline to double in the • next year. 1

Level of measurement Nominal Ordinal Interval Ratio 1 Types of Data • Data can be classified according to level of measurement. • The level of measurement dictates the calculations that can be done to summarize and present the data. • It also determines the statistical tests that should be performed. Data are ranked no meaningful difference between values Data may only be classified Meaningful difference between values. Meaningful 0 point and ratio between values. • Jersey numbers • of football • player. • Make of car. • Your rank in class. • Team standings. • Temperature • Dress size • No. of patients seen • No of sales call made • Distance students travel to class

Bar Chart Pie Chart A graph in which the classes are reported on the horizontal axis and the class frequencies on vertical axis. The class frequencies are proportional to the heights of the bars. A chart that shows the proportion or percent that each class represents of the total number of frequencies. f Angle White 130 36 29 Black 104 Lime 325 90 Orange 455 126 Red 286 79 1300 360 n = Diagrams/Charts Angle = (f/n)360 1 Back

Histogram Frequency Polygon Cumulative Frequency Polygon Graphs Go to Descriptive Statistics 1

Measures of Location Measures of Dispersion Describing the Data Summary Measures Goals Moments • Arithmetic Mean • WeightedArithmetic Mean • Geometric Mean • Median • Mode • Range, Mean Deviation • Variance, Standard Deviation • Moments about Origin • Moments about mean Skewness 1

Summary Measures Goals • Calculate the arithmetic mean, weighted mean, median, mode, and geometric mean. • Explain the characteristics, uses, advantages, and disadvantages of each measure of location. • Identify the position of the mean, median, and mode for both symmetric and skewed distributions. • Compute and interpret the range, • mean deviation, variance, and • standard deviation. • Understand the characteristics, uses, • advantages, and disadvantages • of each measure of dispersion. • Understand Chebyshev’s theorem and • the Empirical Rule as they relate to a set • of observations. 1

The arithmetic meanis the most widely used measure of location. It requires the interval scale. Its major characteristics are: All values are used. It is unique. The sum of the deviations from the mean is 0. It is calculated by summing the values and dividing by the number of values. Characteristics of the Mean • Every set of interval-level and ratio-level data has a mean. • All the values are included in computing the mean. • A set of data has a unique mean. • The mean is affected by unusually large or small data values. • The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero. 1

Selecting a Sample 1 Use of Tables of Random Numbers • Random numbers are the randomly produced digits from 0 to 9. • Table of random numbers contain rows and columns of these randomly produced digits. • In using Table, choose: • the starting point at random • read off the digits in groups containing either one, two, three, or more of the digits in any predetermined direction (rows or columns). Example • Choose a sample of size 7 from a group of 80 objects. • Label the objects 01, 02, 03, …, 80 in any order. • Arbitrarily enter the Table on any line and read out the pair of digits in any two consecutive columns. • Ignore numbers which recur and those greater than 80. Go to Sample

Step 1 Step 2 Construction of Frequency Distribution • How many no. of groups (classes)? • Just enough classes to reveal the shape of the distribution. • Let k be the desired no. of classes. • k should be such that 2k > n. • If n = 80 and we choose k = 6, then 26 = 64 which is < 80, so k = 6 is not desirable. If we take k = 7, then 27 = 128, which is > 80, so no. of classes should be 7. • Determine the class interval (width). • the class interval should be the same for all classes. • The formula to determine class width: where i is the class width, H is the highest observed value, L is the lowest observed value, and k is the number of classes. Next 1

Step 3 Step 4 Construction of Frequency Distribution(continued) • Set the individual class limits. • Class limits should be very clear. • Class limits should not be overlapping. • Some time class width is rounded which may increase the range H-L. • Make the lower limit of the first class a multiple of class width. • Make tally of observations falling in each class. Step 5 • Count the number of items in each class (class frequency) Back Example 1

23197 23372 20454 23591 24220 30655 22442 17891 18021 28683 30872 19587 21639 24296 15935 21558 20047 24285 24324 24609 26651 29076 20642 19889 19873 25251 28034 23169 28337 17399 20895 25277 20004 17357 20155 19688 28670 20818 19766 21981 20203 23765 25783 26661 24533 27453 32492 17968 25799 18263 23657 35851 20633 24052 15794 20642 20356 21442 21722 19331 32277 15546 29237 18890 20962 22845 26285 27896 35925 27443 17266 23613 21740 22374 24571 25449 22817 26613 19251 20445 Construction of Frequency Distribution( Example ) Raw Data ( Ungrouped Data ) Continued Back 1

Selling Price Frequency 15000 up to 18000 8 18000 up to 21000 23 21000 up to 24000 17 24000 up to 27000 18 27000 up to 30000 8 30000 up to 33000 4 33000 up to 36000 2 Total = 80 Construction of Frequency Distribution( Example Continued ) • Following Step 1, with n = 80 k should be 7. • Following Step 2 the class width should be 2911. • The width size is usually rounded up to a number multiple of 10 or 100. • The width size is taken as i = 3000. • Following Step 3, with i = 3000 and k = 7, the range is 7×3000=21000. • Where as the actual range is H – L = 35925 - 15546 = 20379. • The lower limit of the first class should be a multiple of class width. • Thus the lower limit of starting class is taken as 15000. • Following Step 4 and Step 5 Back 1

Example 1 k = 6 Group H cf f Histogram (Example 1) 1.6 - 2.2 2.1 2 2 35 2.2 – 2.8 2.7 6 4 30 25 2.8 - 3.4 3.3 19 13 20 15 10 3.4 – 4.0 3.9 32 13 5 0 4.0 - 4.6 4.5 38 6 1.60 2.20 2.80 3.40 4.00 4.60 5.20 Groups 4.6 - 5.2 5.1 40 2 Histogram A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other. Next 1

Example 1 k = 7 Group H cf f 1.5 - 2.0 2 2 2 Histogram (Example 1) 2.0 - 2.5 2.5 4 2 40 2.5- 3.0 3 9 5 30 3.0 - 3.5 3.5 24 15 20 Percent 3.5- 4.0 4 32 8 10 0 4.0 - 4.5 4.5 38 6 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 4.5 - 5.0 5 40 2 Groups Histogram 1 Back

Example 1 k = 6 Group Mid pt cf f Frequency Polygon (Example 1) 35.0 3.10 3.70 1.6 - 2.2 1.9 2 2 30.0 25.0 2.2 - 2.8 2.5 6 4 20.0 Percent 2.8 - 3.4 3.1 19 13 15.0 4.30 3.4 – 4.0 3.7 32 13 10.0 2.50 5.0 1.90 4.90 4.0 - 4.6 4.3 38 6 0.0 1 2 2 3 3 4 5 4.6 - 5.2 4.9 40 2 Raw Data FrequencyPolygon A graph in which the points formed by the intersections of the class midpoints and the class frequencies are connected by line segments. Mid point = ( Li +Hi )/2 1 Back

Group Example 1 k = 7 Mid pt cf f Frequency Polygon (Example 1) 1.5 – 2.0 1.75 2 2 40.0 3.25 35.0 2.0 - 2.5 2.25 3 1 30.0 25.0 3.75 2.5 – 3.0 2.75 7 4 Percent 20.0 3.0 - 3.5 3.25 22 15 15.0 4.25 10.0 2.75 4.75 3.5 – 4.0 3.75 32 10 5.0 1.75 2.25 0.0 4.0 - 4.5 4.25 37 5 1 2 3 4 Data Example1 4.5 – 5.0 4.75 40 3 Frequency PolygonContinued Back 1

Example 1 k = 6 Group Mid pt cf f 1.6 - 2.2 1.9 2 2 2.2 - 2.8 2.5 6 4 2.8 - 3.4 3.1 19 13 3.4 – 4.0 3.7 32 13 4.0 - 4.6 4.3 38 6 4.6 - 5.2 4.9 40 2 Cumulative FrequencyPolygon A graph in which the points formed by the intersections of the class midpoints and the class cumulative frequencies are connected by line segments. A cumulative frequency polygon portrays the number or percent of observations below given value. 1 Next

Example 1 K = 7 Cumulative Frequency PolygonContinued Group Mid pt cf f 1.5 – 2.0 1.75 2 2 2.0 - 2.5 2.25 3 1 2.5 – 3.0 2.75 7 4 3.0 - 3.5 3.25 22 15 3.5 – 4.0 3.75 32 10 4.0 - 4.5 4.25 37 5 4.5 – 5.0 4.75 40 3 Back 1

A Stem and Leaf Plot is a type of graph that is similar to a histogram but shows more information. Summarizes the shape of a set of data. provides extra detail regarding individual values. The data is arranged by placed value. Stem and Leaf Plots are great organizers for large amounts of information. The digits in the largest place are referred to as the stem. The digits in the smallest place are referred to as the leaf The leaves are always displayed to the left of the stem. Series of scores on sports teams, series of temperatures or rainfall over a period of time, series of classroom test scores are examples of when Stem and Leaf Plots could be used. Stem and Leaf Plot What is A Stem and Leaf Plot Diagram? What Are They Used For? Constructing Stem and Leaf Plot 1

Stem (Tens) and Leaf (Ones) Begin with the lowest temperature. The lowest temperature of the month was 50. Enter the 5 in the tens column and a 0 in the ones. The next lowest is 57. Enter a 7 in the ones Next is 59, enter a 9 in the ones. find all of the temperatures that were in the 60's, 70's and 80's. Enter the rest of the temperatures sequentially until your Stem and Leaf Plot contains all of the data. Temperature Stem (Tens) Leaf (Ones) 5 0 7 9 6 1 1 2 2 4 5 5 5 7 8 9 7 0 0 1 3 6 7 7 9 9 8 0 0 0 2 2 3 7 ConstructingStem and Leaf Plot Make Stem and Leaf Plot with the following temperatures for June. 77 80 82 68 65 59 61 57 50 62 61 70 69 64 67 70 62 65 65 73 76 87 80 82 83 79 79 71 80 77 1 Next

Make a Stem and Leaf Plot for the following data. 2.4 0.7 3.9 2.8 1.3 1.6 2.9 2.6 3.7 2.1 3.2 3.5 1.8 3.1 0.3 4.6 0.9 3.4 2.3 2.5 0.4 2.1 2.3 1.5 4.3 1.8 2.4 1.3 2.6 1.8 2.7 0.4 2.8 3.5 1.4 1.7 3.9 1.1 5.9 2.0 5.3 6.3 0.2 2.0 1.9 1.2 2.5 2.1 1.2 1.7 Stem and LeafExample 1 Next Back

Following are the car battery life Data. Make a Stem and Leaf Plot. Stem and Leaf PlotExample 1 Next Back

Stem and Leaf PlotExample Go to Stem and Leaf Plot 1 Back

Arithmetic Mean Ungrouped Data Grouped Data Population Sample Population Sample Measures of Location 1 Point of Equilibrium N observations X1, X2,…, XN in the population. n observations X1, X2 ,…, Xn in the sample Let Xi and fi be the mid point and frequency respectively of the ith group in the population The mean is defined as Let Xi and fi be the mid point and frequency respectively of the ith group in the sample The mean is defined as Next

Example of Sample Mean Following is a random sample of 12 Clients showing the number of minutes used by clients in a particular cell phone last month. What is the mean number of Minutes Used? Example of Population Mean Thereare automobile manufacturing Companies in the U.S.A. Listed below is the no. of patents granted by the US Government to each company. Is this information a sample or population? Numerical Examples Of Arithmetic MeanUngrouped Data 1 Next Back

Selling Price Frequency Midpoint ($ thousands) f X fX 15 - 18 8 16.5 132.0 18 - 21 23 19.5 448.5 21 - 24 17 22.5 382.5 24 - 27 18 25.5 459.0 27 - 30 28.5 228.0 8 30 - 33 4 31.5 126.0 Total 80 1845.0 33 - 36 2 34.5 69.0 Numerical Examples Of ArithmeticMeanGrouped Data Following is the frequency distribution of Selling Prices of Vehicles at Whitner Autoplex Last month. Find arithmetic mean. So the mean vehicle selling price is $23100. Go to Summary measures Back 1

An object is balanced at when Back Point ofEquilibrium 1

A special case of arithmetic mean. Case when values of variable are associated with certain quality, e.g price of medium, large, and big The weight meanof a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula: Soft Drink Price Weights Medium $0.90 3 Large $1.25 4 Big $1.50 3 Summary Measures Weighted Mean EXAMPLE Weighted Mean 1

EXAMPLE Weighted Mean The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are 26 hourly employees, 14 of which are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean hourly rate paid the 26 employees? Go to Summary measures Back 1

The geometric mean of a set of n positive numbers is defined as the nth root of the product of n values. The formula for the geometric mean is written: The geometric mean used as the average percent increase over time n is calculated as: Useful in finding the average change of percentages, ratios, indexes, or growth rates over time. It has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the GDP, which compound or build on each other. The geometric mean will always be less than or equal to the arithmetic mean. Summary Measures Geometric Mean Example 1

The return on investment by certain Company for four successive years was 30%, 20%, -40%, and 200%. Find the geometric mean rate of return on investment. Solution: The 1.3 represents the 30 percent return on investment, i.e original Investment of 1.0 plus the return of 0.3. So Which shows that the average return is 29.4 percent. If you earned $30000 in 1997 and $50000 in 2007, what is your annual rate of increase over the period? The annual rate of increase is 5.24 percent. Example of Geometric Mean Back Summary Measures 1

If number of observations n is odd, the median is( n+1)/2th observation. If n is even the median is the average of n/2th and (n/2+1)th observations Example: Determine the median for each set of data. Arrange the set of data n=7 median is 4th observation that is 33. 2) n=6, median is average of 3rd and 4th observation, that is (27+28)/2 = 27.5. Median for Grouped Data The median is obtained by using the formula: Where m is the group of n/2th obs. Lm, Im, fm, and cfm-1 are the lowest value, class width, frequency, and cumulative frequency respectively of the mth group. Median Median is the midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest • 41 15 39 54 31 15 33 • 15 16 27 28 41 42 • 15 15 31 33 39 41 54 • 15 16 27 28 41 42 Example 1

Find the Median for the following data. n/2 = 20, so median group is 3.40-4.00 Lm = 3.40, Im = 0.6, fm = 13, cfm-1 = 19 Example (Median) Back Go to Summary Measures 1

Properties of the Median • There is a unique median for each data set. • It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. • It can be computed for ratio-level, interval-level, and ordinal-level data. • It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class. Go to Summary Measures 1

Mode The modeis the value of the observation that appears most frequently. Mode 1 Next

Mode(Example) Back Next 1

Calculating Mode for Grouped Data. Calculate the mode of the following Distribution. Solution: Modal Group is 2.8 - 3.4 fm = 14, fm-1 = 4, fm+1 = 12 and Im= 0.6 ModeGrouped Data Back Go to Summary Measures 1

Descriptive Statistics