1 / 9

AP Statistics Review

AP Statistics Review. Linear Regression (C7-9 BVD). Explanatory variable goes on x-axis Response variable goes on y-axis Don’t forget labels and scale Statplot 1 st option, specify lists, Zoom 9 D irection: Positive slope or negative slope U nusual points – outliers, influential points

river
Télécharger la présentation

AP Statistics Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AP Statistics Review Linear Regression (C7-9 BVD)

  2. Explanatory variable goes on x-axis • Response variable goes on y-axis • Don’t forget labels and scale • Statplot 1st option, specify lists, Zoom 9 • Direction: Positive slope or negative slope • Unusual points – outliers, influential points • Shape – straight or curves • Scatter – weak, moderate, strong Scatterplots

  3. Correlation measures the strength of linear association between two variables. r =( Σzxzy)/(n-1) • r is always between -1 and 1, inclusive. Negative r means a negative direction, positive r means positive direction. If r is close to 0, that indicates a weaker association, and near 1 or -1 indicates a stronger association. If r is exactly 1 or -1, that means the data are exactly linear. • Correlation is affected a lot by outliers: always look at the scatterplot and residual plot to make sure a linear model makes sense! Correlation (r)

  4. Use a “hat” over y variable to indicate it is a model’s prediction vs. an actual data value. • Use words for y and x variables • y = mx + b but statisticians may use b1 for slope and b0 for y-intercept, and the calculator may use b for slope and a for y-intercept • To find a predicted y, plug x into equation and find y! • Extrapolation, or using your model to make predictions for x’s far from the x’s used to make the model, is likely to lead to inaccurate or worthless predictions. Linear Regression (Line of Best Fit)

  5. Residual = actual y value for a given x minus y value model predicts for a given x • Positive residual means model underestimated, negative residual means model overestimated • Least Squares Regression is a procedure to get line of best fit that minimizes the sum of the squares of residuals. • LinReg L1,L2,Y1 in calculator (make sure diagnostics are on to get r and r2) More on Linear Regression

  6. When doing Linear Regression, always check the scatterplot (DUSS) to make sure shape is basically straight and ALSO check residuals plot for any patterns. • RESID list is automatically generated by calculator whenever you do LINREG. It is under 2nd Stat Edit. Be careful, it overwrites itself if you do a new LINREG. Graph by changing Y-list in Statplot to RESID. • You can also store the RESID list in another list like L3 for easy viewing of the residuals. • If you find standard deviation of residuals (1-var stats would work), it represents the approximate size of a “typical” prediction error (how far off the model will typically be for a given x). More on Residuals

  7. If you have stats (mean, standard deviation of x and y) instead of data, use these formulas to find line of best fit: • b1 = rSy/Sx • b0 = mean y – b1(mean x) • If you want to switch explanatory and response, you can use these equations with reversed x and y means and standard deviations to find the new model. Linear Regression from Stats instead of Data

  8. ___% of the variation in y (context) is accounted for by the regression line of y on x (context). Coefficient of Determination (r2)

  9. Outliers are points far from the other points. • Outliers that are far away in the y-direction but not x-direction have large residuals and may artificially raise r. • Outliers that are far away in the x-direction pull the regression line toward them and may artificially lower r. These are called influential points. • Consider removing influential points before regression and considering them separately. • Also consider alternative models if appropriate. • Do not assume large r means a linear model is THE best model for the data or that it implies a cause/effect relationship between the variables. Outliers

More Related