Dr. Ka-fu Wong

Dr. Ka-fu Wong ECON1003 Analysis of Economic Data

Chapter Two Describing Data: Frequency Distributions and Graphic Presentation GOALS • Organize data into a frequency distribution. • Portray a frequency distribution in a histogram, frequency polygon, and cumulative frequency polygon. • Develop a stem-and-leaf display. • Present data using such graphic techniques as line charts, bar charts, and pie charts.

Frequency Distribution A Frequency distribution is a grouping of data into mutually exclusive categories showing the number of observations in each class.

Construction of a Frequency Distribution • Question to be addressed. • Collect data (raw data). • Organize data. • Frequency distribution • Present data. • Draw conclusion.

Terms Related to Frequency Distribution • In constructing a frequency distribution, data are divided into exhaustive and mutually exclusive classes. • Mid-point: A point that divides a class into two equal parts. This is the average of the upper and lower class limits. • Class frequency: The number of observations in each class. • Class interval: The class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class.

Four steps to construct frequency distribution • Decide on the number of classes. • Determine the class interval or width. • Set the individual class limits. • Tally the data into the classes and count the number of items in each. For illustration, it is convenient to carry an example with us – example 1.

EXAMPLE 1 • Dr. Tillman is Dean of the School of Business Socastee University. He wishes to prepare a report showing the number of hours per week students spend studying. He selects a random sample of 30 students and determines the number of hours each student studied last week. 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6. Organize the data into a frequency distribution.

Step 1: Decide on the number of classes. • The goal is to use just enough groupings or classes to reveal the shape of the distribution. • “Just enough” Recipe –“2 to the k rule” • Select the smallest number (k) for the number of classes such that 2k is greater than the number of observations (n).

Sample size (n) = 80 21=2; 22=4; 23=8; 24=16; 25=32; 26=64; 27=128; … The rule suggest 7 classes. Sample size (n) = 1000 21=2; 22=4; 23=8; 24=16; 25=32; 26=64; 27=128; 28=256; 29=512; 210=1024 … The rule suggest 10 classes. 2 to the k rule • Select the smallest number (k) for the number of classes such that 2k is greater than the number of observations (n).

Sample size (n)=10000 21=2; 22=4; 23=8; 24=16; 25=32; 26=64; 27=128; …,213=8192; 214 = 16384 The rule suggest 14. Sample size (n)=100000 22=4; 23=8; 24=16; 25=32; 26=64; 27=128; … 2 to the k rule • Select the smallest number (k) for the number of classes such that 2k is greater than the number of observations (n).

2 to the k rule • Select the smallest number (k) for the number of classes such that 2k is greater than the number of observations (n). • We want to find smallest k such that 2k > n. • Smallest k such that k log 2 > log n • Smallest k such that k > (log n) / (log 2) • Example: If n=10000, (log n) / (log 2) = 13.28. Hence the recipe suggest 14 classes. Note: Same result for base 10 log and natural log.

Step 1: Decide on the number of classes. • 2 to the k rule • Select the smallest number (k) for the number of classes such that 2k is greater than the number of observations (n). • Example 1 (continued): • Sample size (n) = 30 • 21=2; 22=4; 23=8; 24=16; 25=32; 26=64; 27=128; … • The rule suggest 5 classes. • Alternative, by computing (Log 30/log 2) = 4.91, we get the same suggestion of 5 classes.

Step 2: Determine the class interval or width. • Generally the class interval or width should be the same for all classes. • The classes all taken together must cover at least the distance from the lowest value in the raw data to the highest value. • The classes must be mutually exclusive and exhaustive. Class interval ≥ (Highest value – lowest value) / number of classes. Usually we will chose some convenient number as class interval that satisfy the inequality.

Step 2: Determine the class interval or width. Class interval ≥ (Highest value – lowest value) / number of classes. • Example 1 (continued): • Highest value = 33.8 hours • Lowest value = 10.3 hours • k=5. • Hence, class interval ≥ (33.8-10.3)/5 = 4.7 • We choose class interval to be 5, some convenient number.

Step 3: Set the individual class limits • The class limits must be set so that the classes are mutually exclusive and exhaustive. • Round up so some convenient numbers.

Step 3: Set the individual class limits • Example 1 (continue): • Highest value = 33.8 hours. • Lowest value = 10.3 hours. • Range = highest – lowest = 23.5. • K=5; Interval = 5. • With k=5 and interval = 5, the classes will cover a range of 25. • Let’s split the surplus in the lower and upper tail equally. (25-23.5)/2 = 0.75. Hence, the lower limit of the first class should be around (10.3 – 0.75)=9.55 and upper limit of the last class should be (33.8 + 0.75)=34.55. • 9.55 and 34.55 look odd. Some convenient and close numbers would be 10 and 35.

Step 3: Set the individual class limits • Example 1 (continue): • We have determined • K=5; Interval = 5. • The lower limit of the first class = 10 and • The upper limit of the last class = 35. “10 up to 15” means the interval from 10 to 15 that includes 10 but not 15.

Step 4: Tally the data into the classes and count the number of items in each 15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6. Hours studying 7 10 up to 15 12 15 up to 20 20 up to 25 7 25 up to 30 3 30 up to 35 1

EXAMPLE 1 (continued)

EXAMPLE 1 (continued) • A relative frequency distribution shows the percent of observations in each class.

Stem-and-leaf Displays • Stem-and-leaf display: A statistical technique for displaying a set of data. Each numerical value is divided into two parts: the leading digits become the stem and the trailing digits the leaf. • Note: • An advantage of the stem-and-leaf display over a frequency distribution is we do not lose the identity of each observation. • An disadvantage is that it is not good for large data sets.

EXAMPLE 2 • Colin achieved the following scores on his twelve accounting quizzes this semester: 86, 79, 92, 84, 69, 88, 91, 83, 96, 78, 82, 85. • Construct a stem-and-leaf chart.

EXAMPLE 2 (continued) 86, 79, 92, 84, 69, 88, 91, 83, 96, 78, 82, 85. Stem Leaf 6 9 8 9 7 8 6 4 8 3 2 5 9 2 6 1

EXAMPLE 2 (continued) 86, 79, 92, 84, 69, 88, 91, 83, 96, 78, 82, 85. Stem Leaf 6 9 9 8 7 4 2 3 8 5 6 8 9 1 6 2

Graphic Presentation of a Frequency Distribution • The three commonly used graphic forms are • histograms, • frequency polygons, and • a cumulative frequency distribution. • AHistogramis a graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. • The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other.

Graphic Presentation of a Frequency Distribution • A frequency polygon consists of line segments connecting the points formed by the class midpoint and the class frequency. • A cumulative frequency distribution is used to determine how many or what proportion of the data values are below or above a certain value.

Histogram for Hours Spent Studying Example 1 (continued): AHistogramis a graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis.

Frequency Polygon for Hours Spent Studying Example 1 (continued): A frequency polygon consists of line segments connecting the points formed by the class midpoint and the class frequency.

Cumulative Frequency Distribution For Hours Studying Example 1 (continued): A cumulative frequency distribution is used to determine how many or what proportion of the data values are below or above a certain value.

Bar Chart • A bar chart can be used to depict any of the levels of measurement (nominal, ordinal, interval, or ratio).

Example 3 Construct a bar chart for the number of unemployed per 100,000 population for selected cities during 2001

Bar Chart for the Unemployment Data Example 3 (continued):

Pie Chart • A pie chart is useful for displaying a relative frequency distribution. A circle is divided proportionally to the relative frequency and portions of the circle are allocated for the different groups.

EXAMPLE 4 A sample of 200 runners were asked to indicate their favorite type of running shoe. Draw a pie chart based on the following information.

EXAMPLE 4 (continued) Compute the percentage and degree each type occupy out of 360o. Degree occupied in a circle = percentage x 360

Pie Chart for Running Shoes Example 4 (continued):

Chapter Two Describing Data: Frequency Distributions and Graphic Presentation - END -

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Presentation Transcript

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Ka-fu Wong University of Hong Kong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong

Dr. Ka-fu Wong