1 / 25

Displaying and Describing Categorical Data

Displaying and Describing Categorical Data. Chapter 3. Contingency Table (a.k.a. Two-way Table). A table that shows the frequency distribution across two variables. The Super Bowl Indicator. Can the winner of the Super Bowl predict the stock market?

lena
Télécharger la présentation

Displaying and Describing Categorical Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Displaying and Describing Categorical Data Chapter 3

  2. Contingency Table(a.k.a. Two-way Table) • A table that shows the frequency distribution across two variables

  3. The Super Bowl Indicator • Can the winner of the Super Bowl predict the stock market? • If the winner of the Super Bowl is from the original National Football League, there will be a bull market (Dow Jones Index increases). • If the winner of the Super Bowl is from the original American Football League, there will be a bear market (Dow Jones Index decreases). • Right 80% of the time

  4. The Super Bowl Indicator

  5. Independent • Two variables are independent when the distribution of one variable is the “same” for all categories of the other variable • You must Think carefully about which variable you are treating as the Who and which you are treating as What

  6. Independent? Who: Market What: Super Bowl Winner Who: Super Bowl Winner What: Market Type

  7. Not Independent • Since the distribution of winning league for a bull market is different than the marginal distribution of winning league, winning league and market type are not independent. • There appears to be an association between the original league of the Super Bowl winner and market type • Since the distribution of market type when the winner of the Super Bowl is originally from the AFL is different than the marginal distribution of market type, market type and winning league are not independent. • There appears to be an association between market type and the original league of the Super Bowl winner

  8. Independent?

  9. Not Independent • Since the distribution of winning league for a bull market is different than the marginal distribution of winning league, winning league and market type are not independent. • There appears to be an association between the original league of the Super Bowl winner and market type • Since the distribution of market type when the winner of the Super Bowl is originally from the AFL is different than the marginal distribution of market type, market type and winning league are not independent. • There appears to be an association between market type and the original league of the Super Bowl winner

  10. Displaying and Summarizing Quantitative Data Chapter 4

  11. Three Rules of Data Analysis • Make a picture • Make a picture • Make a picture

  12. Barry Bonds’ HRs Who: MLB Seasons from 1986 to 2007 What: Barry Bonds’ HRs (HRs) When: From 1986 to 2007 Where: Cities with MLB teams Why: Mr. Gray likes baseball and needed an example How: Data was gathered from baseball-reference.com

  13. Quantitative Data • A quantitative variable is a measured variable (with units) that answers questions about the quantity of what is being measured. (e.g. income ($), height (inches), weight (pounds)) • The data are values of a quantitative variable whose units are known Quantitative Data Condition

  14. Histogram

  15. Histogram • When to use: Number of variables: 1 Data type: quantitative data Purpose: displaying data distribution Gaps in the graph are gaps in the data

  16. What to look for When you describe a distribution alwaysdescribe the • Shape • Center • Spread

  17. Shape • Does the histogram have a single, central peak or several separated peakss? • Is the histogram symmetric? • Do any unusual features appear?

  18. 1. Peaks • The peaks in a histogram are called modes. • Uniform -- no peaks • Unimodal– one peak • Bimodal – two peaks • Multimodal – three or more peaks

  19. 2. Symmetry Symmetric Skewed

  20. 3. Unusual • Outlier – an unusually small or large data value • Gap – space between data values

  21. Center“One number to rule them all” • When the distribution is skewed or has outliers, use the median Median -- the middle number when the set is ordered • If there is an even number of data values, the median is the average of the two middle values Has the same units as the data! • When the distribution is unimodal and symmetric, use the mean

  22. Quartiles • 50% of the data lies below the median, 50% of the data lies above the median • Quartile 1(Q1) – the number with 25% of the data below and 75% of the data above • “the median of the lower half of the data” • Quartile 3 (Q3) -- the number with 75% of the data below and 25% of the data above • “the median of the upper half of the “data”

  23. Spread • When the distribution is skewed or has outliers, use the IQR Interquartile Range (IQR) • The difference between quartile 3 and quartile 1 • IQR = Q3 – Q1 Has the same units as the data! • When the distribution is unimodal and symmetric, use the standard deviation

  24. Five Number Summary • Min • Q1 • Median • Q3 • Max • 5 • 25 • 34 • 45 • 73

  25. In Context • IQR -- Barry Bonds hit between 25 and 45 HRs in 50% of MLB seasons from 1986 to 2007 • Median • Barry Bonds hit less than 34 home runs in 50% of MLB seasons from 1986 to 2007 • Barry Bonds hit more than 34 home runs in 50% of MLB seasons from 1986 to 2007

More Related