1 / 66

Chapter 3 Association: Contingency, Correlation, and Regression

Chapter 3 Association: Contingency, Correlation, and Regression. Section 3.1 How Can We Explore the Association between Two Categorical Variables?. Learning Objectives. Identify variable type: Response or Explanatory Define Association Contingency tables

terry
Télécharger la présentation

Chapter 3 Association: Contingency, Correlation, and Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3Association: Contingency, Correlation, and Regression Section 3.1 How Can We Explore the Association between Two Categorical Variables?

  2. Learning Objectives • Identify variable type: Response or Explanatory • Define Association • Contingency tables • Calculate proportions and conditional proportions

  3. Learning Objective 1:Response and Explanatory variables • Response variable (Dependent Variable) the outcome variable on which comparisons are made • Explanatory variable (Independent variable) defines the groups to be compared with respect to values on the response variable • Example: Response/Explanatory • Blood alcohol level/# of beers consumed • Grade on test/Amount of study time • Yield of corn per bushel/Amount of rainfall

  4. Learning Objective 2:Association • The main purpose of data analysis with two variables is to investigate whether there is an association and to describe that association • An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable

  5. Learning Objective 3:Contingency Table • A contingency table: • Displays two categorical variables • The rows list the categories of one variable • The columns list the categories of the other variable • Entries in the table are frequencies

  6. Learning Objective 3:Contingency Table What is the response variable? What is the explanatory variable?

  7. Learning Objective 4:Calculate proportions and conditional proportions

  8. Learning Objective 4: Calculate proportions and conditional proportions • What proportion of organic foods contain pesticides? • What proportion of conventionally grown foods contain pesticides? • What proportion of all sampled items contain pesticide residuals?

  9. Learning Objective 4:Calculate proportions and conditional proportions Use side by side bar charts to show conditional proportions Allows for easy comparison of the explanatory variable with respect to the response variable

  10. Learning Objective 4:Calculate proportions and conditional proportions • If there was no association between organic and conventional foods, then the proportions for the response variable categories would be the same for each food type

  11. Chapter 3Association: Contingency, Correlation, and Regression Section 3.2 How Can We Explore the Association between Two Quantitative Variables?

  12. Learning Objectives: • Constructing scatterplots • Interpreting a scatterplot • Correlation • Calculating correlation

  13. Learning Objective 1:Scatterplot • Graphical display of relationship between two quantitative variables: • Horizontal Axis: Explanatory variable, x • Vertical Axis: Response variable, y

  14. Learning Objective 1:Internet Usage and Gross National Product (GDP) Data Set

  15. Learning Objective 1:Internet Usage and Gross National Product (GDP) • Enter values of explanatory variable (x) in L1 • Enter values of of response variable (y) in L2 • STAT PLOT • Plot 1 on • Type: scatter plot • X list: L2 • Y list: L1 • ZOOM • 9:ZoomStat • Graph

  16. Learning Objective 1:Baseball Average and Team Scoring

  17. Learning Objective 1:Baseball Average and Team Scoring • Enter values of explanatory variable (x) in L1 • Enter values of of response variable (y) in L2 • STAT PLOT • Plot 1 on • Type: scatter plot • X list: L1 • Y list: L2 • ZOOM • 9:ZoomStat • Graph Use L3 for x and L4 for y. You will use data from prior example again later on in the PowerPoint.

  18. Learning Objective 2:Interpreting Scatterplots • You can describe the overall pattern of a scatterplot by the trend, direction, and strength of the relationship between the two variables • Trend: linear, curved, clusters, no pattern • Direction: positive, negative, no direction • Strength: how closely the points fit the trend • Also look for outliers from the overall trend

  19. Learning Objective 2:Interpreting Scatterplots: Direction/Association • Two quantitative variables x and y are • Positively associated when • High values of x tend to occur with high values of y • Low values of x tend to occur with low values of y • Negatively associatedwhen high values of one variable tend to pair with low values of the other variable

  20. Learning Objective 2:Example: 100 cars on the lot of a used-car dealership Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? • Positive association • Negative association • No association

  21. Learning Objective 2:Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?

  22. Learning Objective 3:Linear Correlation, r • Measures the strength and direction of the linear association between x and y • A positive r value indicates a positive association • A negative r value indicates a negative association • An r value close to +1 or -1 indicates a strong linear association • An r value close to 0 indicates a weak association

  23. Learning Objective 3:Correlation coefficient: Measuring Strength & Direction of a Linear Relationship

  24. Learning Objective 3:Properties of Correlation • Always falls between -1 and +1 • Sign of correlation denotes direction • (-) indicates negative linear association • (+) indicates positive linear association • Correlation has a unitless measure - does not depend on the variables’ units • Two variables have the same correlation no matter which is treated as the response variable • Correlation is not resistant to outliers • Correlation only measures strength of linear relationship

  25. Leaning Objective 4:Calculating the Correlation Coefficient Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe

  26. Learning Objective 4:Calculating the Correlation Coefficient

  27. STAT CALC menu Choose 8: LinReg(a+bx) 1st number = x variable 2nd number = y variable Enter Learning Objective 4:Internet Usage and Gross National Product (GDP) Correlation = .889

  28. Enter x data into L1 Enter y data into L2 STAT CALC memu Choose 8: LinReg(a+bx) 1st number = x variable 2nd number = y variable Enter Learning Objective 4:Baseball Average and Team Scoring Correlation = .874

  29. Learning Objective 4:Cereal: Sodium and Sugar

  30. Chapter 3Association: Contingency, Correlation, and Regression Section 3.3 How Can We Predict the Outcome of a Variable?

  31. Learning Objectives • Definition of a regression line • Use a regression equation for prediction • Interpret the slope and y-intercept of a regression line • Identify the least-squares regression line as the one that minimizes the sum of squared residuals • Calculate the least-squares regression line

  32. Learning Objectives • Compare roles of explanatory and response variables in correlation and regression • Calculate r2 and interpret

  33. Learning Objective 1:Regression Analysis • The first step of a regression analysis is to identify the response and explanatory variables • We use y to denote the response variable • We use x to denote the explanatory variable

  34. Learning Objective 1:Regression Line • A regression line is a straight line that describes how the response variable (y) changes as the explanatory variable (x) changes • A regression line predicts the value of the response variable (y) for a given level of the explanatory variable (x) • The y-intercept of the regression line is denoted by a • The slope of the regression line is denoted by b

  35. Learning Objective 2:Example: How Can Anthropologists Predict Height Using Human Remains? • Regression Equation: • is the predicted height and is the length of a femur (thighbone), measured in centimeters • Use the regression equation to predict the height of a person whose femur length was 50 centimeters

  36. Learning Objective 3:Interpreting the y-Intercept • y-Intercept: • The predicted value for y when x = 0 • Helps in plotting the line • May not have any interpretative value if no observations had x values near 0

  37. Learning Objective 3:Interpreting the Slope • Slope: measures the change in the predicted variable (y) for a 1 unit increase in the explanatory variable in (x) • Example: A 1 cm increase in femur length results in a 2.4 cm increase in predicted height

  38. Learning Objective 3:Slope Values: Positive, Negative, Equal to 0

  39. Learning Objective 3:Regression Line • At a given value of x, the equation: • Predicts a single value of the response variable • But… we should not expect all subjects at that value of x to have the same value of y • Variability occurs in the y values!

  40. Learning Objective 3:The Regression Line • The regression line connects the estimated means of y at the various x values • In summary, Describes the relationship between x and the estimated means of y at the various values of x

  41. Learning Objective 4:Residuals • Measures the size of the prediction errors, the vertical distance between the point and the regression line • Each observation has a residual • Calculation for each residual: • A large residual indicates an unusual observation

  42. Learning Objective 4:“Least Squares Method” Yields the Regression Line • Residual sum of squares: • The least squares regression line is the line that minimizes the vertical distance between the points and their predictions, i.e., it minimizes the residual sum of squares • Note: the sum of the residuals about the regression line will always be zero

  43. Learning Objective 5:Regression Formulas for y-Intercept and Slope • Slope: • Y-Intercept: Regression line always passes through

  44. Learning Objective 5:Calculating the slope and y intercept for the regression line Slope =26.4 y intercept=-2.28

  45. Learning Objective 5:Internet Usage and Gross National Product (GDP)

  46. Enter x data into L1 Enter y data into L2 STAT CALC menu Choose 8: LinReg(a+bx) 1st number = x variable 2nd number = y variable Enter Learning Objective 5:Internet Usage and Gross National Product =1.548x-3.63

  47. Learning Objective 5:Baseball Average and Team Scoring

  48. Enter x data into L1 Enter y data into L2 STAT CALC Choose 8: LinReg(a+bx) 1st number = x variable 2nd number = y variable Enter Learning Objective 5:Baseball average and Team Scoring

  49. Learning Objective 5:Cereal: Sodium and Sugar

  50. Learning Objective 6:The Slope and the Correlation • Correlation: • Describes the strength of the linear association between 2 variables • Does not change when the units of measurement change • Does not depend upon which variable is the response and which is the explanatory

More Related