1 / 11

Bivariate Data – Pt 3

Bivariate Data – Pt 3. October 2011. Residuals (error) -. The vertical deviation between the observations & the LSRL the sum of the residuals is always zero error = observed - expected. Residual plot . A scatterplot of the ( x , residual) pairs.

Télécharger la présentation

Bivariate Data – Pt 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bivariate Data – Pt 3 October 2011

  2. Residuals (error) - • The vertical deviation between the observations & the LSRL • the sum of the residuals is always zero • error = observed - expected

  3. Residual plot • A scatterplot of the (x, residual) pairs. • Purpose is to tell if the linear model adequately describes the relationship between the predictor and response. • We are hoping to find a “shotgun blast” pattern. That would tell us that the linear relationship is the best fit.

  4. The Shotgun Blast • More precisely, we would say that the ideal residual pattern is homoscedastic and that there is no association between the residual and the predictor. • Homoscedasticity: All elements of a set have the same variance. • We can tell from the residual plot if it is homoscedastic. • We can also tell if there is an association.

  5. Scatterplot • The following graph plots predictor (X) vs. response (Y). • There looks to be a positive association between the two variables. • The line of best fit is (What is ?) • Looks like a linear model might be pretty good. • We calculate the residuals and plot them against X to see if it holds up.

  6. Residual Plot • We plot the predictor (X) vs. residual (ResY). • Look at the variation. The data seem to be spread pretty evenly above and below the zero line, indicating likely homoscedasticity. • What about association? Conclusion: Although the model’s r2 is pretty low, the residual plot tells us that the linear model seems to be a pretty good choice.

  7. Scatterplot • The following plots variables predictor (X) vs. (Y). • There looks to be a negative association between the two variables. • The line of best fit is • (What is r?) • Looks like a linear fit might be pretty good. • We calculate the residuals and plot them against X to see if it holds up.

  8. Residual Plot • We plot the predictor (X) vs. residual (ResY). • This is not so cut and dry. Look at the variation. There are several points where the residual is greater than 1.5 and two cluster of points that are between -.5 and -1. • We would question homoscedasticity. • What about association? Conclusion: Although the model’s r2 is high, the residual plot tells us that the linear model might not be a good choice.

  9. Non-Linear regression • Sometimes we fit a curve instead of a straight line. As it turns out, for this X and Y, a quadratic curve can be fit as shown at right. • The best fit equation is • We do not calculate an r or r2 value because correlation is a measure of the strength of a linear association. This model is non-linear. • We still calculate residuals the same way, .

  10. Residual Plot • We plot the predictor (X) vs. residual (ResY). • These residuals look better than the residuals for the linear model. • Are the residuals… Non-associated? Homoscedastic? Conclusion: Based on the residual data, this model appears to be a better fit than the linear model.

  11. Conclusions • With bivariate data, you examine the scatterplot to see if the data are linearly associated, non-linearly associated or non-associated. • If they are linearly associated, determine the line of best fit and the correlation coefficient. • Calculate the residuals and plot against the original predictor variable. If the residuals are non-correlated and homoscedastic, that is a sign of a good model. If there is still some form of association or heteroscedasticity, the model might be flawed.

More Related