STATISTICS PNP COMPTROLLERSHIP COURSE
Statistics • The term has two meanings. • Statistics (singular) is the science of collecting, organizing, analyzing, and interpreting information. • Statistics (plural) are numbers calculated from a set or collection of information.
General Categories • Descriptive Statistics comprises those methods used to organize and describe information that has been collected. • Inferential Statistics involves the theory of probability and comprises those methods and techniques for making generalizations, predictions, or estimates about the population by using limited information.
Organizing Data • Data are the building blocks of statistics. • They are generally categorized as quantitative or qualitative. • They are also classified according to the type of measurement scale used such as: • Nominal scale • Ordinal scale • Interval scale • Ratio scale
Nominal Scale • Nominal scale exists for both the quantitative and qualitative data. • Nominal scale for quantitative data assigns numbers to categories to distinguish one from another such as basketball jerseys, postal zip codes, and telephone numbers. • Nominal scale for qualitative data is an unordered grouping of data into discrete categories where each datum can go into only one group such as sex, blood type, or religion..
Ordinal Scale • Data measured on a nominal scale that is ordered in some fashion are referred to as ordinal data. • Letter grades as A, B, C, D, and F • Ranks as Inspector, Sr Inspector, Chief Inspector • Residence number • Performance Rating as Poor, Fair, Good • Grades in school as 1, 2, 3 and so on.
Interval Scale • Data measured on an ordinal scale for which distances between values are calculated are called interval data. • The distance between two values is relevant. • Interval data are necessarily quantitative. • An interval scale does not have necessarily a zero point, a point which indicates the absence of what we are measuring.
Example • IQ test scores. We can say an IQ score of 180 is higher than an IQ score of 90. We can also say that it is 90 points higher. But we cannot say that a person with an IQ score of 180 is twice as smart as a person with an IQ score of 90. Likewise, a given difference between two IQ scores does not have always the same meaning. Say 100-90, and 150-140, may have different interpretations even if the difference is the same as 10.
Another Example • Celsius Temperature. A temperature of 80degrees C is 40 degrees warmer than a temperature of 40degrees C. But it is not correct to say that 80degrees C is twice as warm as 40degrees C. Note that 0degree C does not represent the absence of heat or zero heat. The absence of heat is represented by 0degree Kelvin equivalent to -273degrees C.
Ratio Scale • Data measured on an interval scale with a zero point meaning “none” are called ratio data. Because the zero point of the Celsius scale does not represent the absence of heat, the Celsius scale is not a ratio scale. The Kelvin scale is a ratio scale. Examples of other ratio scales are those commonly used to measure units such as feet, meters, pounds, and pesos. The results of counting objects are also ratio data.
Organizing Data Using Tables • The objective of organizing data is to arrange a set of data into useful form in order to reveal essential features and simplify certain analyses. • Data that are not organized in some fashion are called raw data. • One method of arranging data is to construct an ordered array; that is arranging data from low to high (or high to low). • If the number of data is large, the data may be difficult to manage, thus tables are often used as a general approach to organizing raw data.
Ungrouped Frequency Tables • The frequency of a measurement or category is the total number of times the measurement or category occurs in a collection of data. The symbol f is used to denote the frequency of a measurement. • For example: A sample data representing the number of free throws missed by a basketball team during the last 7 games: 7 2 8 4 2 7 2
Tally Marks • For a very large number of data, an intermediate step is to count observations through the use of tally marks to aid in determining the frequency f for each observation. • Corresponding to each observation we place a tally mark in a tally column. • After all tallies are placed, they are counted for each measurement x to determine the frequency.
Grouped Frequency Tables • A grouped frequency table shows frequencies according to groups or classes of measurements. • For example, a memorial hospital wants to study whether its emergency room staffing is adequate. To start the study, the manager tracks down the number of people visiting the emergency room each day for a 12-day period with result as:
Steps • The manager constructs six groupings or classes, the first class representing 1-10 patients; the second class, 11-20 patients; 3rd class, 21-30 patients; 4th class, 31-40; 5th class, 41-50; and, the 6th class, 51-60. • For the first class, the lower class limit is 1 while the upper class limit is 10. The rest of the classes will have a similar pattern of lower and upper limits. • Tally the number of patients that fall within each class. • Construct the grouped frequency table.
Basic Guidelines • Each class should have the same width. • No two classes should overlap. • Each piece of data should belong to a class.
Class Boundaries and Class Widths • Class boundaries determine class widths. • Class boundaries for grouped frequency table are determined by considering the unit or precision of measurement. • The lower class boundary of a class interval is located one-half unit below the lower class limit. The upper class boundary is one-half unit above the upper class limit. • The class width w for any class interval is found by subtracting the lower class boundary from the upper class boundary, thus: w = l2 – l1 where: l1 is the lower class boundary; and, l2 is the upper class boundary for each class interval.
Basic Rules in Constructinga Grouped Frequency Table • How many classes should be used? • What should be the width of each class? • At what value should the first class start? • How is the class mark or midpoint computed?
Basic Rules in Constructing a Grouped Frequency Table • For number of classes, Sturges’ rule: c = 3.3(log n) + 1 • For width, the rule is w = R/c where R is the range computed by subtracting the smallest measurement L from the largest measurement U; thus, R = U – L. • The lower limit of the first class should be near and at most as large as the smallest measurement L. • Class mark Xor the midpoint iscomputed by adding the lower class limit a and upper class limit b and dividing the sum by 2; thus, (a + b) / 2
Relative Frequency Table • It is useful sometimes to express each value or class in a frequency table as a fraction of the total observations. • The relative frequency of a class is found by dividing the frequency f bythe total number of observations n. • The table that describes the relative frequencies is then called relative frequency table.
Cumulative Frequency Table • There are many occasions when we are interested in the number of observations less than or equal to some value. Example: A teacher may want to know the number of students who got a score of less than or equal to 70% on an examination. The cumulative frequency will answer that. • The cumulative frequency for any measurement or class is the total of the frequency for that measurement or class and the frequencies of all measurements or classes of smaller value.
Cumulative Relative Frequency Table • Cumulative frequency tables can be constructed also for tables containing relative frequencies or percentages. • The procedures are identical to those used for cumulative frequency tables except that relative frequencies or percentages are used. • Cumulative relative frequencies have many uses. One is in scoring standardized tests through the percentiles method. A percentile score tells what part of the tested population scored lower. For example, if 50 is said to be the 90th percentile in an examination, it means that 90% of the scores were lower than 50.
Example • A final examination result has the following data. • In constructing the frequency table, assume c = 5.
Graphical Representation of Data A Bar Graph
Graphical Representation of Data A Pie Graph
Graphical Representation of Data A Line Graph
Measures of Central Tendencies • The first characteristic of a set of data that we want to measure is the center or central tendency. The purpose is to summarize a collection of data to obtain a general overview that will serve as a representative for the rest of the data. • Common Measures of Central Tendencies: • Mean • Median • Mode • Midrange
Mean • The mean or arithmetic average is found by adding the numbers and then dividing the sum by the number of observation n: χ = Σx / n • A population mean is denoted by: μ= Σx / N • The mean for grouped data: χ = Σ(f X) / Σf
Disadvantage of the Mean • The mean as a measure of center has a disadvantage. It is affected by the extreme measurements on one end of a distribution. It depends on the value of every measurement and extreme values can lead to the mean misrepresenting the data. • In this case, the median may provide a better measure than the mean inasmuch as it is not affected by the extreme values.
Median • In general, the median is found by first ranking the data. • If there is an odd number of observations, then the median is the number in the middle of the distribution. • If the number of observations is even, then the median is computed by adding the two numbers found in the middle positions and divide the sum by 2.
Mode • The mode, if it exists, is the most frequent measurement or observation. • The mode has the advantage of being easily found especially in small samples and is usually not influenced by extreme measurements on one end of an ordered set of data. • Example: In an array of data arrange as follows: 1, 2, 3, 3, 3, 4, 5, and 6, the mode is 3.
Relationships Between Mean, Median and Mode Mean < Median Mean > Median Median > Mode Median < Mode Mode Mode Mean Mean Median Median Rightward skewness Leftward skewness Median = Mode Mean Mean = Median Median Mode Symmetry
Midrange • The midrange of a set of data is the average of the largest and smallest measurements, thus: Midrange = (U + L) / 2 • For a data organized in a grouped frequency table, the midrange is approximately the average of the lower class boundary of the first class and the upper class boundary of the last class, thus: Midrange = (l1fc + l2lc) / 2
Measures of Dispersion or Variability • Quite often, measures of central tendency alone do not adequately describe a characteristic being observed. • Hence, variability is an important concept in statistics. As a result, there are many measures of variability for a collection of quantitative data such as: • Range • Variance • Standard deviation • Standard Score
Range • As previously defined, range is the difference between the largest and the smallest measurements; thus: R = U – L where: R is the range L is the smallest measurement U is the largest measurement
Deviation Score • Deviation score is the quantity defined by this relationship: • Deviation score represents the directed distance a measurement has from the mean of a set of data. • A positive deviation score means the measurement is above the mean; a negative means the mean is above the measurement; a zero deviation means the measurement is equal to the mean. x - x
Sum of Squares • By adding the deviation scores the resulting value will be zero, a useless result for analyzing a set of data. To avoid this situation the sum of squares is used. • Sum of squares SS is computed by first squaring the deviation scores, then adding them up; thus: SS = Σ( x – x )2
Variance • The variance of a population of measurements is defined as the average of the squared deviation scores denoted by δ2; thus: δ2 = SS/N • The variance of a sample, denoted by s2, is defined by the following formula: s2 =SS/(n – 1) • The variance for data in frequency tables is computed by deriving first the sum of squares using the following formula then proceeding: SS = ( Σ(f x2 ) – (Σf x )2/Σf ) Where x is the midpoint class mark x of the class
Variance • Thus, the variance for a grouped data of a population is: δ2 = SS/ Σf The variance for a grouped data of a sample , denoted by s2, is defined by the following formula: s2 =SS/ (Σf -1)
Standard Deviation • The standard deviation is defined as the positive square root of the variance. • The standard deviation of a population is denoted by δ. The standard deviation of the sample is denoted by s; thus: δ = √δ2 s = √s2 • If the standard deviation of the population is given the standard deviation of a sample is derived from the following formula: s = δ √n
Standard Score • A measure that takes into account the dispersion of the scores is called standard scores. • Standard score allows also analysts to make comparisons from different distributions, thus giving him the ability to decide on ranking. • A standard score denoted as z is defined as: Standard score = Deviation Score/Standard Deviation z = (x – μ) / δ