1 / 55

Describing Data: Two Variables

STAT 101 Dr. Kari Lock Morgan 9/11/12. Describing Data: Two Variables. SECTIONS 2.1, 2.4, 2.5 Two categorical (2.1) Quantitative and categorical (2.4) Two quantitative (2.5). The Big Picture. Sample. Population. Sampling. Statistical Inference. Descriptive Statistics.

eman
Télécharger la présentation

Describing Data: Two Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables • SECTIONS 2.1, 2.4, 2.5 • Two categorical (2.1) • Quantitative and categorical (2.4) • Two quantitative (2.5)

  2. The Big Picture Sample Population Sampling Statistical Inference Descriptive Statistics

  3. Two Categorical Variables • Look at the relationship between two categorical variables • Relationship status • Gender

  4. Two-Way Table • It doesn’t matter which variable is displayed in the rows and which in the columns Data from our class survey R: table(relationship, gender)

  5. Two-Way Table What proportion of students in this sample are in a relationship? • 42/169  25% • 32/107  30% • 10/62  16% • 32/42  76%

  6. Two-Way Table What proportion of females in this sample are in a relationship? • 42/169  25% • 32/107  30% • 10/62  16% • 32/42  76%

  7. Two-Way Table What proportion of males in this sample are in a relationship? • 42/169  25% • 32/107  30% • 10/62  16% • 32/42  76%

  8. Male and Female Proportions • 30% of females in the sample say they are in a relationship • 16% of males in the sample say they are in a relationship • Why the difference???

  9. Difference in Proportions • A difference in proportions is a difference in proportions for one categorical variable calculated for different levels of the other categorical variable • Example: proportion of females in a relationship – proportion of males in a relationship

  10. Two-Way Table What proportion of people in a relationship in this sample are female? • 42/169  25% • 32/107  30% • 10/62  16% • 32/42  76%

  11. Two-Way Table CAUTION: The proportion of females in a relationship is NOT THE SAME AS the proportion of people in a relationship who are female! • 30% ≠ 76%!

  12. Two-Way Table What proportion of students in this sample are female and in a relationship? • 42/169  25% • 32/169  19% • 32/107  30% • 10/62  16% • 32/42  76%

  13. Side-by-Side Bar Chart • The height of each bar is the number of the corresponding cell in the two-way table R: barplot(relationship~gender, beside=TRUE)

  14. Segmented Bar Chart • A segmented bar chart is like a side-by-side bar chart, but the bars are stacked instead of side-by-side R: barplot(relationship~gender)

  15. Summary: Two Categorical Variables • Summary Statistics • Two-way table • Difference in proportions • Visualization • Side-by-side bar chart • Segmented bar chart

  16. Kidney Stones R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham (1986). "Comparison of treatment of renal calculi by open surgery, percutaneousnephrolithotomy, and extracorporeal shockwave lithotripsy". Br Med J (Clin Res Ed)292 (6524): 879–882 • Which treatment is better at removing kidney stones? • Treatment A • Treatment B

  17. Kidney Stones • Which treatment is better at removing small kidney stones? • Treatment A • Treatment B

  18. Kidney Stones • Which treatment is better at removing large kidney stones? • Treatment A • Treatment B

  19. Kidney Stones • Treatment A is more effective for all kidney stones, but the data shows Treatment B to be effective overall! • How is this possible!?!?

  20. Kidney Stones – Simpson’s Paradox

  21. Kidney Stones • Treatment A is used more often on large stones, which are harder to treat. • This is an example of Simpson’s Paradox: an observed relationship between two variables can change (or even reverses!) when a third variable is considered

  22. Kidney Stones

  23. Slope = # successful / # unsuccessful = odds

  24. Slope = # successful / # unsuccessful = odds

  25. Quantitative and Categorical Relationships • Interested in a quantitative variable broken down by categorical groups

  26. Tea and the Immune System • Participants were randomized to drink five or six cups of either tea or coffee every day for two weeks (both drinks have caffeine but only tea has L-theanine) • After two weeks, blood samples were exposed to an antigen, and production of interferon gamma (immune system response) was measured • Explanatory variable: tea or coffee • Response variable: measure of interferon gamma Mednick, Cai, Kanady, and Drummond (2008). “Comparing the benefits of caffeine, naps and placebo on verbal, motor and perceptual memory,” Behavioral Brain Research, 193, 79-86.

  27. Tea and the Immune System If the tea drinkers have significantly higher levels of interferon gamma, can we conclude that drinking tea rather than coffee caused an increase in this aspect of the immune response? • Yes • No Randomized experiment – possible to make conclusions about causality

  28. Side-by-Side Boxplots R: boxplot(InterferonGamma~Drink)

  29. Quantitative Statistics by a Categorical Variable • Any of the statistics we use for a quantitative variable can be looked at separately for each level of a categorical variable > mean(InterferonGamma~Drink) Coffee Tea 17.70000 34.81818

  30. Difference in Means • Often, when comparing a quantitative variable across two categories, and compute the difference in means > mean(InterferonGamma~Drink) Coffee Tea 17.70000 34.81818 R: compareMean(InterferonGamma~Drink)

  31. Summary: One Quantitative and One Categorical • Summary Statistics • Any summary statistics for quantitative variables, broken down by groups • Difference in means • Visualization • Side-by-side boxplots

  32. Two Quantitative Variables • Summary Statistics: correlation • Visualization: scatterplot

  33. Scatterplot A scatterplot is the graph of the relationship between two quantitative variables. R: plot(study_hours, gpa)

  34. Direction of Association • A positive associationmeans that values of one variable tend to be higher when values of the other variable are higher • A negative associationmeans that values of one variable tend to be lower when values of the other variable are higher • Two variables are not associated if knowing the value of one variable does not give you any information about the value of the other variable

  35. Cars Data Handout • Quantitative Variables: • Weight (pounds) • City MPG • Fuel capacity (gallons) • Page number (in Consumer Reports) • Time to go ¼ mile (in seconds) • Acceleration time from 0 to 60 mph • Relationships • Weight vs. CityMPG • Weight vs. FuelCapacity • PageNum vs. Fuel Capacity • Weight vs. QtrMile • Acc060 vs. QtrMile • CityMPG vs. QtrMile

  36. Car Associations

  37. Correlation The correlation is a measure of the strength and direction of linear association between two quantitative variables • Sample correlation: r • Population correlation:  (“rho”) R: cor(x,y)

  38. Car Correlations (-.91) (.89) (-.45) (.51) (.99) (-.08) What are the properties of correlation?

  39. Correlation • -1 ≤ r ≤ 1 • The sign indicates the direction of association • positive association: r > 0 • negative association: r < 0 • no linear association: r 0 • The closer r is to ±1, the stronger the linear association • r has no units and does not depend on the units of measurement • The correlation between X and Y is the same as the correlation between Y and X

  40. Correlation Guessing Game http://istics.net/gett/gcstart.php?group_id=duke Highest scorer in the class gets one extra point on the first exam!

  41. Correlation NFL Teams r = 0.43

  42. Correlation r = 0.08 Same plot, but with Dolphins and Raiders (outliers) removed

  43. X Y Human Cannonball Plot Y vs. X • What is the correlation between X and Y? • r > 0 • r < 0 • r = 0 • Are X and Y associated? • Yes • No

  44. Correlation Cautions • Correlation can be heavily affected by outliers. Always plot your data! • r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data! • Correlation does not imply causation!

  45. Summary: Two Quantitative Variables • Summary Statistics: correlation • Visualization: scatterplot

More Related