1 / 28

2: Frequency distributions

2: Frequency distributions. Stemplot, frequency tables, histograms. Stem-and-leaf plots (stemplots). Analyses start by exploring data with pictures My favorite technique is the stemplot : a histogram-like display of data points. You can observe a lot by looking – Yogi Berra.

kay
Télécharger la présentation

2: Frequency distributions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2: Frequency distributions Stemplot, frequency tables, histograms Frequency Distributions

  2. Stem-and-leaf plots (stemplots) • Analyses start by exploring data with pictures • My favorite technique is the stemplot: a histogram-like display of data points You can observe a lot by looking – Yogi Berra Frequency Distributions

  3. Illustrative example: sample.sav • A SRS of AGE (in years) • Data as an ordered array (n = 10): 05 11 21 24 27 28 30 42 50 52 • Divide each data point into • Stem values  first one or two digits • Leaf values  next digit • In this example • Stem values  tens place • Leaf values  ones place • e.g., 21 has a stem value of 2 and leaf value of 1 Frequency Distributions

  4. Stemplot (cont.) • Draw stem-like axis from lowest to highest stem 0| 1| 2| 3| 4| 5| ×10  axis multiplier (important!) • Place leaves next to stem • 21 plotted (animation) 1 Frequency Distributions

  5. Continue plotting … • Rearrange leaves in rank order: 0|5 1|1 2|1478 3|0 4|2 5|02 ×10 • For discussion, let’s rotate the plot 8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5 (x10) ------------Rotated stemplot Frequency Distributions

  6. Interpreting frequency distributions • Central Location • Gravitational center  mean • Middle value  median • Spread • Range and inter-quartile range • Standard deviation and variance (next week) • Shape • Symmetry • Modality • Kurtosis Frequency Distributions

  7. Mean = arithmetic average “Eye-ball method”  visualize where plot would balance Arithmetic method = total divided by n 8 7 4 25 1 1 0 2 0------------0 1 2 3 4 5 ------------ ^ Grav.Center Eye-ball method  balances around 25 to 30 Actual arithmetic average = 29.0 Frequency Distributions

  8. Middle point  median • Count from top to depth of (n + 1) ÷ 2 • For illustrative data: • n = 10 • Depth of median = (10+1) ÷ 2 = 5.5 Frequency Distributions

  9. Spread  variability • Easiest way to describe spread is by stating its range, e.g., “from 5 to 52” (not the best way) • A better way is to divide the data into low groups and high groups • Quartile 1 = median of low group • Quartile 3 = median of high group Frequency Distributions

  10. Shape  visual pattern • Skyline silhouette of plot • Symmetry • Mounds • Outliers (if any) • When n is small, it’s too difficult to describe shape accurately X X X XX X X X X X------------0 1 2 3 4 5 ------------ Frequency Distributions

  11. What to look for in shape • Idealized shape = density curve • Look for: • General pattern • Symmetry • Outliers Frequency Distributions

  12. Symmetrical shapes Frequency Distributions

  13. Asymmetrical shapes Frequency Distributions

  14. Modality (no. of peaks) Frequency Distributions

  15. Kurtosis (steepness of peak)  fat tails Mesokurtic (medium) Platykurtic (flat)  skinny tails Leptokurtic (steep) Kurtosis can NOT be easily judged by eye Frequency Distributions

  16. Second example (n = 8) • Data: 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94, 4.42 • Truncate extra digit (e.g., 1.47  1.4) • Stem = ones-place • Leaves = tenths-place • Do not plot decimal |1|4|2|03|3|4779|4|4(×1) • Center: between 3.4 & 3.7 (underlined) • Spread: 1.4 to 4.4 • Shape: mound, no outliers Frequency Distributions

  17. Third example (pollution.sav) Regular stem: |1|4789|2|223466789|3|000123445678(×1) • Regular stemplot (top)  too squished • Split-stem (bottom) • First 1 on stem  leaves 0 to 4 • Second 1 on stem  leaves 5 to 9 Split-stem: |1|4|1|789|2|2234|2|66789|3|00012344|3|5678(×1) Note negative skew Frequency Distributions

  18. How many stem-values? • Start with between 4 and 12 stem- values • Then, trial and error to draw out shape for the most informative plot (use judgment) Frequency Distributions

  19. Body weight (n = 53) Data range from 100 to 260 lbs.  100 lb. multiplier seems too broad (only two stem values) 100 lb. multiplier w/ split stem-values still too broad (only 4 stem values) Try 10 pound stem multiplier Frequency Distributions

  20. Body weight (n = 53) 10|0166 11|009 12|0034578 13|00359 14|08 15|00257 16|555 17|000255 18|000055567 19|245 20|3 21|025 22|0 23| 24| 25| 26|0 (×10) 10|0 means “100” Shape: Positive skew, high outlier (260) Location: median = 165 (underlined) Spread: from 100 to 260 Frequency Distributions

  21. Quintuple split:Body weight data (n = 53) 1*|0000111 1t|222222233333 1f|4455555 1s|666777777 1.|888888888999 2*|0111 2t|2 2f| 2s|6 (×100) • Codes: • * for leaves 0 and 1 t for leaves two and threef for leaves four and fives for leaves six and seven. for leaves eight and nine • Example: • 2t| 2 means a value of 222 (×100) Frequency Distributions

  22. Frequency counts (SPSS plot) Age of participants SPSS provides frequency counts w/ stemplot: Frequency Stem & Leaf 2.00 3 . 0 9.00 4 . 0000 28.00 5 . 00000000000000 37.00 6 . 000000000000000000 54.00 7 . 000000000000000000000000000 85.00 8 . 000000000000000000000000000000000000000000 94.00 9 . 00000000000000000000000000000000000000000000000 81.00 10 . 0000000000000000000000000000000000000000 90.00 11 . 000000000000000000000000000000000000000000000 57.00 12 . 0000000000000000000000000000 43.00 13 . 000000000000000000000 25.00 14 . 000000000000 19.00 15 . 000000000 13.00 16 . 000000 8.00 17 . 0000 9.00 Extremes (>=18) Stem width: 1 Each leaf: 2 case(s) 3 . 0 means 3.0 years Because of large n, each leaf represents 2 observations Frequency Distributions

  23. Frequency tables AGE   |  Freq  Rel.Freq  Cum.Freq. ------+----------------------- 3    |     2    0.3%     0.3% 4    |     9    1.4%     1.7% 5    |    28    4.3%     6.0% 6    |    37    5.7%    11.6% 7    |    54    8.3%    19.9% 8    |    85   13.0%    32.9% 9    |    94   14.4%    47.2%10    |    81   12.4%    59.6%11    |    90   13.8%    73.4%12    |    57    8.7%    82.1%13    |    43    6.6%    88.7%14    |    25    3.8%    92.5%15    |    19    2.9%    95.4%16    |    13    2.0%    97.4%17    |     8    1.2%    98.6%18    |     6    0.9%    99.5%19    |     3    0.5%   100.0%------+-----------------------Total |   654  100.0% • Frequency = count • Relative frequency = proportion or % • Cumulative frequency  % less than or equal to current value Frequency Distributions

  24. Class intervals • When data sparse  group data into class intervals • Classes can be uniform or non-uniform Frequency Distributions

  25. Uniform class intervals • Create 4 to 12 class intervals • Set end-point convention - include left boundary and exclude right boundary • e.g., first class interval includes 0 and excludes 10 (0 to 9.99 years of age) • Talley frequencies • Calculate relative frequency • Calculate cumulative frequency (demo) Frequency Distributions

  26. Here’s age data in sample.sav… Frequency Distributions

  27. Histogram – for quantitative data Bars are contiguous Frequency Distributions

  28. Bar chart – for categorical data Bars are discrete Frequency Distributions

More Related