DESCRIPTIVE STATISTICS

DESCRIPTIVE STATISTICS

STATISTICAL MEASUREMENT OF DATA • Location (central tendency) • Dispersion (spread) • Skewness (symmetry) • Kurtosis (peakedness)

MEASURES OF LOCATION • Arithmetic mean • Geometric mean • Harmonic mean • Median • Percentiles • Mode

ARITHMETIC MEAN (DISCRETE DATA) where n is the number of observations.

ARITHMETIC MEAN (GROUPED DATA) where xi is the MCV and fi is the frequency of of the ith class whereas n is the number of classes.

GEOMETRIC MEAN The geometric mean is used where relative changes (especially percentages) are being considered.

HARMONIC MEAN The harmonic mean is used when the data consist of rates such as prices ($/kg), speeds (km/h) or production (output/man-hour).

GEOMETRIC MEAN where n is the number of observations.

MEDIAN The median is the middle observation of a set of arranged data (ascending or descending order), i. e, it divides the set of data into two equal parts in terms of the number of observations.

MEDIAN The rank of the median is given by where n is the total number of observations.

MEDIAN (DISCRETE DATA) When n is odd, the median is the middle observation. When n is even, the median is the average or midpoint of the two middle observations.

MEDIAN (DISCRETE DATA) Example 1: 27 13 62 5 44 29 16 Rearranged: 5 13 16 27 29 44 62 Rank of median = (7 + 1) / 2 = 4 Median = 27

MEDIAN (DISCRETE DATA) Example 2: 5 13 16 27 29 44 Rank of median = (6 + 1) / 2 = 3.5 Median = (16 + 27) / 2 = 21.5

MEDIAN (GROUPED DATA) In this case, the value of the median can only be estimated since the identity of each observation is unknown in the whole frequency distribution.

MEDIAN (GROUPED DATA) We proceed as follows: • Determine the rank of the median • Locate the cell in which the median is found • Use linear interpolation or simple proportion to evaluate the median

MEDIAN (GROUPED DATA) The method of linear interpolation assumes that the observations within each cell are evenly spread or uniformly distributed.

MEDIAN (GROUPED DATA) where is the rank of the median in its cell. This is obtained by taking the overall rank of the median and subtracting the cumulative frequency of the previous cell.

MEDIAN (E. g. GROUPED DATA)

MEDIAN (E. g. GROUPED DATA) n = 50 Rank of median = (50 + 1) / 2 = 25.5 Location of median: cell ‘15 – 19’

MEDIAN (E. g. GROUPED DATA) Ranks 25 25.5 41 Values Q2 19.5 14.5

PERCENTILES Percentiles are statistics which divide a distribution into 100 equal parts in terms of the number of observations. The most well-known ones are quartiles and deciles.

QUARTILES The rank of the first or lower quartile (Q1) is given by The rank of the third or upper quartile (Q3) is given by where n is the total number of observations.

QUARTILES (DISCRETE DATA) Example 1: 27 13 62 5 44 29 16 Rearranged: 5 13 16 27 29 44 62 Rank of Q1 = (7 + 1) / 4 = 2 Q1 = 13 Rank of Q3 = 3(7 + 1) / 4 = 6 Q3= 44

QUARTILES (DISCRETE DATA) Example 2: 5 13 16 27 29 44 Rank of Q1 = (6 + 1) / 4 = 1.75 Q1 = 5 + 0.75(13 – 5) = 11 Rank of Q3 = 3(6 + 1) / 4 = 5.25 Q3 = 29 + 0.25(44 – 29) = 32.75

PERCENTILES where is the rank of the kth percentile in its cell. This is obtained by taking the overall rank of the percentile and subtracting the cumulative frequency of the previous cell.

PERCENTILES Percentiles can be estimated from a cumulative frequency ogive by interpolation.

MODE (DISCRETE DATA) The mode is the observation occurring the most or which has the highest frequency. It can be easily located by visual inspection. NOTE If there are more than one observation with the same highest frequency we say that there are several modes but we can also say that there is no mode.

MODE (GROUPED DATA) In this case, we talk about a modal class, which is the class with the highest frequency. A rough approximation for a single value of the mode is the MCV of the modal class. The mode can be found quite accurately by using a formula or from a histogram.

MODE (GROUPED DATA) A useful formula for finding the mode is Mode = mean – 3(mean – median)

MODE (GROUPED DATA) where f1 is the difference in frequencies between the modal class and the class preceding it and f2is the difference in frequencies between the modal class and the class immediately after it.

MODE (GROUPED DATA) We can also use a histogram to find the mode. We simply represent the modal class and the classes preceding it and immediately after it.

MODE (GROUPED DATA)

MEASURES OF DISPERSION • Range • Quartile deviation • Standard deviation • Coefficient of variation

RANGE (DISCRETE DATA) The range is the numerical difference between the maximum and the minimum observations

RANGE (GROUPED DATA) The range is the numerical difference between the upper cell limit of the last cell and lower cell limit of the first cell.

QUARTILE DEVIATION The quartile deviation or semi inter-quartile range is defined as This quantity eliminates outliersand extreme values.

STANDARD DEVIATION AND VARIANCE The standard deviation is the positive square root of the variance. All formulae are given in terms of the variance which is equal to

STANDARD DEVIATION(discrete data) The standard deviation is the best measure of spread since it can be used for further statistical processing.

STANDARD DEVIATION(grouped data) with the usual definitions ofxi andfi.

COEFICIENT OF VARIATION The purpose of the coefficient of variation is to compare dispersions in various distributions.

SKEWNESS Skewness is a measure of symmetry. It indicates whether there is a concentration of low or high observations. A distribution having a lot of low observations is positively skewed whereas one which has more high observations displays negative skewness.

SKEWNESS A distribution which is symmetrical has no or zero skewness (e. g the Normal distribution)

MEASURE OF SKEWNESS Coefficient of skewness:

Q2 Mean Mode POSITIVE SKEWNESS

Mean Q2 Mode NEGATIVE SKEWNESS

Mean Mode Median ZERO SKEWNESS (SYMMETRY)

KURTOSIS Kurtosisindicates the degree of ‘peakedness’ of a unimodal frequency distribution. Kurtosis usually indicates to which extent a curve (distribution) departs from the bell-shaped or normal curve.

Platykurtic KURTOSIS

Mesokurtic KURTOSIS

Leptokurtic KURTOSIS

DESCRIPTIVE STATISTICS

DESCRIPTIVE STATISTICS

Presentation Transcript

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Statistics - Descriptive statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive statistics

Descriptive Statistics

Descriptive Statistics

Descriptive statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics

Descriptive Statistics