1 / 21

Lecture 3: Bivariate Regression

Lecture 3: Bivariate Regression. January 20, 2013. Clicker Registration. Administrative. Homework 1 due. Hope you finished it How was it? First quiz this Wednesday 20 min - hard deadline. Should take 10 min if you know the material very well. If you don’t, you might not finish.

zurina
Télécharger la présentation

Lecture 3: Bivariate Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3: Bivariate Regression January 20, 2013

  2. Clicker Registration

  3. Administrative • Homework 1 due. Hope you finished it • How was it? • First quiz this Wednesday • 20 min - hard deadline. Should take 10 min if you know the material very well. If you don’t, you might not finish. • Data sets (if required) available at 8:59(W) or 10:29(X) • How should you study? • No linear regression (that’s Quiz 2); • Problem Set 1 and slides give you the potential topics • Problem set 2 due next Monday (1 week) • Any questions?

  4. Question of the day • Which of the following best describes your feelings about the quiz this coming Wednesday: • We have a quiz this Wednesday?? What? • Anxious because I don’t know what to expect • Nervous because I don’t know how to do the problems • Relaxed, it shouldn’t be too bad. • None of the above.

  5. Last Time Ordinary Least Squares (OLS) • The best fitting line collectively makes the squares of residuals as small as possible (the choice of β0and β 1minimizes the sum of the squared residuals).

  6. Regression (by hand) Data: diamonds.xls Regress price on weight: i.e., price is what we’re predicting (or the dependent variable) and weight is the explanatory variable (independent var). I.e, for all data points, i, find a line through them: PredictedPricei= β0+ β1* weighti. So how do we find the slope and intercept?

  7. Regression using Solver We’ll use Solver to find β0 + β1: • Guess some starting values for β0and β1 • Use those and the line eq to give you a fitted (predicted) Price • Calculate the Residual and Residual^2 • Calculate the sum of the Residual^2 • Use solver to minimize the sum of the residuals^2 by changing β0 and β1 • (solver isn’t perfect and might be off – hence we won’t use it for regression. And it’s also a pain.)

  8. Regression using Excel Doing regression by hand is good to do. Once or twice. • In the Data Analysis add-in Excel has a built in function that makes it much easier • If you’re a Mac person (I am) there is software you can download to give you Solver and the Data Analysis add-in. • Also very easy (probably easier) to do it with StatTools.

  9. Interpreting the Fitted Line Diamond Example • Estimated Price = 43.49 + 2669.7 * Weight • Our estimate of the intercept, β0 , is 43.49 • Our estimate of the slope, β1 , is 2669.7 So according to our model, we can estimate that the average price of a diamond that weighs .4 carat is 43.49 + 2669.7 * (0.4) = $1,111.33

  10. Question Using our model, Estimated Price = 43.49 + 2669.7 * Weight, a diamond that weighs 0.5 costs approximately how much more, on average, than one weighing 0.4: • $154 • $267 • $233 • $1378

  11. Question Using our model, Estimated Price = 43.49 + 2669.7 * Weight, a diamond that weighs 0.5 costs approximately how much more, on average, than one weighing 0.4 : • $154 • $267 • $233 • $1378

  12. Interpreting the Fitted Line Diamond Example: Estimated Price = 43.49 + 2669.7 * Weight So how do we interpret our estimates besides predicting prices? Well, it depends… Context of the problem is important. • Intercept: • The intercept estimates the average response when x = 0 (where the line crosses the y axis). • So the estimated average price of diamond that weighs 0 is $43.49. • Uh… that doesn’t make much sense. • The intercept is the portion of y that is present for all values of x (think about fixed costs)

  13. Interpreting the Fitted Line • Interpreting the intercept: • Unless the observed range of x values includes 0, our estimate (denoted b0) will be an extrapolation. Be cautious

  14. Interpreting the Fitted Line Interpreting our estimate (b1) of the slope (β1): • It’s the marginal change in y for a 1-unit change in x. • While tempting, it is not correct to describe the slope as the change in y caused by changing x. • We’re dealing with associations not causality. • In the context of our problem? • The slope estimates the marginal cost used to find the variable cost (i.e., marginal cost is $2,670 per carat).

  15. Properties of Residuals Residuals: • Show variation that remains in the data after accounting for the linear relationship defined by the fitted line. • Should be plotted against x to check for patterns. • Why?

  16. Properties of Residuals Residual Plots: • If the least squares line captures the association between x and y, then a plot of residuals versus x should stretch out horizontally with consistent vertical scatter. • Can use a visual test for association to check for the absence of a pattern. • Don’t look too long: if you look long enough, you’ll see a pattern. You want to check if there is an obvious and immediate one. • Is there a pattern? • Subtle; increasing as x increases

  17. Properties of Residuals Standard Deviation of the Residuals (se) • Measures how much the residuals vary around the fitted line. • Also known as standard error of the regression or the root mean squared error (RMSE). • For the diamond example, se = $170.21. • Since the residuals are approximately normal, the empirical rule implies that about 95% of the prices are within $340 of the regression.

  18. Explaining Variation R-squared (r2) • Is the square of the correlation between x and y • 0 ≤ r2 ≤ 1 • Is the fraction of the variation accounted for by the least squares regression line. • Higher is obviously better • For the diamond example, r2 = 0.4297 (i.e., the fitted line explains 42.97% of the variation in price). • But I see r-squared and “adjusted r-squared” reported. What’s the difference? • We’ll get there… Always report both r2 and seso others can judge how well the regression equation describes the data.

  19. Conditions for Simple Regression • Linear: Look at a scatterplot. Does pattern resemble a straight line? • Random residual variation: Look at the residual plot to make sure no pattern exists. • No obvious lurking variable: need to think about whether other explanatory variables might better explain the linear association between x and y. • Pay attention to the substantive context of the model • Be very very cautious of making predictions outside the range of observed conditions. • Look at the plots; look at the data!

  20. Example 2: Gas Consumption Data: gas_consumption.csv • Use a simple regression model to Predict gas consumption – Gas (CCF) – by Average Temp • Are the conditions for simple regression met? • Yes • No • What is simple regression? • I have no idea what language you’re speaking

  21. Example 2: Gas Consumption Data: gas_consumption.csv • Use a simple regression model to Predict gas consumption – Gas (CCF) – by Average Temp • Using a simple regression model, what is your estimate of the intercept? • 338.76 • -4.33 • 287.46 • 12.25 • None of the above.

More Related