Chapter 1 Picturing Distributions with Graphs
Statistics Statistics is a science that involves the extraction of information from numerical data obtained during an experiment or from a sample. It involves: • the design of the experiment or sampling procedure, • the collection and analysis of the data, and • making inferences (statements, assumptions, guesses) about the population based upon information in a sample.
Individuals and Variables • Individuals • the objects described by a set of data • may be people, animals, or things • Variable • any characteristic of an individual • can take different values for different individuals
Variables • Categorical • Places an individual into one of several groups or categories • Quantitative (Numerical) • Takes numerical values for which arithmetic operations such as adding and averaging make sense
Example 1.1 • Individuals? • Variables? • Categorical? • Quantitative?
Distribution • Tells what values a variable takes and how often it takes these values • Can be a table, graph, or function
Displaying Distributions • Categorical variables • Pie charts • Bar graphs • Quantitative variables • Histograms • Stemplots (stem-and-leaf plots) • Time Plots
Example: Class Make-up on First Day Data Table
Example: Class Make-up on First Day Pie Chart
Example: Class Make-up on First Day Bar Graph
Example: U.S. Solid Waste (2000) Data Table
Example: U.S. Solid Waste (2000) Pie Chart
Example: U.S. Solid Waste (2000) Bar Graph
Example: U.S. Solid Waste (2000) Bar Graph
Histograms • For quantitative variables that take many values • Divide the possible values into class intervals (we will only consider equal widths) • Count how many observations fall in each interval (may change to percents) • Draw picture representing distribution
Case Study Weight Data Introductory Statistics classSpring, 1997 Virginia Commonwealth University
Weight Data: Frequency Table sqrt(53) = 7.2, or 8 intervals; range (260100=160) / 8 = 20 = class width
Number of students 100 120 140 160 180 200 220 240 260 280 Weight Data: Histogram Weight * Left endpoint is included in the group, right endpoint is not.
Histograms: Class Intervals • How many intervals? • One rule is to calculate the square root of the sample size, and round up. • Size of intervals? • Divide range of data (maxmin) by number of intervals desired, and round to convenient number • Pick intervals so each observation can only fall in exactly one interval (no overlap)
Examining the Distribution of Quantitative Data • Overall pattern of graph • Shape of the data • Center of the data • Spread of the data (Variation) • Deviations from overall pattern • Outliers
Shape of the Data • Symmetric • bell shaped • other symmetric shapes • Asymmetric • right skewed • left skewed • Unimodal, bimodal
SymmetricBell-Shaped SymmetricMound-Shaped SymmetricUniform
AsymmetricSkewed to the Left AsymmetricSkewed to the Right
Outliers • Extreme values that fall outside the overall pattern • May occur naturally • May occur due to error in recording • May occur due to error in measuring • Observational unit may be fundamentally different
Stemplots(Stem-and-Leaf Plots) • For quantitative variables • Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number) • Write the stems in a vertical column; draw a vertical line to the right of the stems • Write each leaf in the row to the right of its stem; order leaves if desired
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 192 5 152 2 135 2 Weight Data:Stemplot(Stem & Leaf Plot) Key 20|3 means203 pounds Stems = 10’sLeaves = 1’s
10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Weight Data:Stemplot(Stem & Leaf Plot) Key 20|3 means203 pounds Stems = 10’sLeaves = 1’s
Extended Stem-and-Leaf Plots If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splitting the original stems.
151516161717 Extended Stem-and-Leaf Plots Example: if all of the data values were between 150 and 179, then we may choose to use the following stems: Leaves 0-4 would go on each upper stem (first “15”), and leaves 5-9 would go on each lower stem (second “15”).
Time Plots • A time plot shows behavior over time. • Time is always on the horizontal axis, and the variable being measured is on the vertical axis. • Look for an overall pattern (trend), and deviations from this trend. Connecting the data points by lines may emphasize this trend. • Look for patterns that repeat at known regular intervals (seasonal variations).