1 / 43

Chapter 11 (Continued)

Chapter 11 (Continued). Regression and Correlation methods. Linear Multiple Regression Model. Types of Regression Models. Learning Objectives:. This part focuses on Linear Multiple Regression Model : After studying the materials in this section, you should be able to:

gil
Télécharger la présentation

Chapter 11 (Continued)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 11 (Continued) Regression and Correlation methods EPI809/Spring 2008

  2. Linear Multiple Regression Model EPI809/Spring 2008

  3. Types of Regression Models EPI809/Spring 2008

  4. Learning Objectives: This part focuses on Linear Multiple Regression Model: After studying the materials in this section, you should be able to: • Understand the general concepts behind Linear Multiple Regression Model • Fit and Interpret Linear Multiple Regression Computer Output • Perform model diagnosis: Test Overall and partial Significance of a multiple Regression Model, Perform Residual Analysis • Describe Linear Regression Pitfalls EPI809/Spring 2008

  5. Regression Modeling Steps • Specify the model and estimate all unknown parameters • Evaluate Model • Use Model for Prediction & Estimation EPI809/Spring 2008

  6. Linear regression Model specification: Decide what you want to do and select the dependent variable List all potential independent variables for your model EPI809/Spring 2008

  7. Linear Multiple Regression Model 1. Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables EPI809/Spring 2008

  8. Linear Regression Assumptions • Mean of Distribution of Error Is 0 • Distribution of Error Has Constant Variance • Distribution of Error is Normal • Errors Are Independent Extremely Important EPI809/Spring 2008

  9. PopulationMultiple Regression Model Bivariate model EPI809/Spring 2008

  10. Parameter Estimation:You gather the observations for all variables and estimate model parameters EPI809/Spring 2008

  11. Multiple Linear Regression Equations Too complicated by hand! Ouch! EPI809/Spring 2008

  12. Sample Multiple Regression Model Bivariate model EPI809/Spring 2008

  13. Interpretation of Estimated Coefficients EPI809/Spring 2008

  14. Interpretation of Estimated Coefficients 1. Slope (k) • Estimated averaged Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant • Example from textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2) ^ ^ ^ EPI809/Spring 2008

  15. Interpretation of Estimated Coefficients ^ 1. Slope (k) • Estimated Y Changes by k for Each 1 Unit Increase in XkHolding All Other Variables Constant • Example form textbook: If 1 = 0.13, then the systolic blood pressure (Y) Is Expected to Increase by 0.13 for Each 1 Unit Increase in birthweighyt (X1) Given fixed age (X2) 2. Y-Intercept (0), predicted average value of Y When all Xk’s are set 0 ^ ^ ^ EPI809/Spring 2008

  16. Variance of Error estimate • Assuming model is correctly specified… • Best (unbiased) estimator ofis • It is used in formula for computing • Exact formula is too complicated to show • But higher value for s leads to higher EPI809/Spring 2008

  17. Parameter Estimation Example • You’re a Vet epidemiologist for the county cooperative. You gather the following data: MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 • What is the linear relationshipbetween cows’ food intake, weight and milk yield? © 1984-1994 T/Maker Co. EPI809/Spring 2008

  18. Model Specification Example Dependent variable is milk yield (lb) Independent variables for our model are Food intake (lb.) and weight (X100 lb.) EPI809/Spring 2008

  19. Sample SAS codes for plotting DATA Data Cow; /*Reading data in SAS*/ input Milk Food weight@@; cards; 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 ; run; • procgplot; plot milk*food milk*weight; run; EPI809/Spring 2008

  20. Some plots EPI809/Spring 2008

  21. Sample SAS codes for fitting a multiple linear regression PROCREG data=Cow; model milk = food weight; run; EPI809/Spring 2008

  22. Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.06397 0.25986 0.25 0.8214 Food 1 0.20492 0.05882 3.48 0.0399 weight 1 0.28049 0.06860 4.09 0.0264 ParameterEstimation SAS Output ^ P ^ 0 ^ ^ s ^ 1 2 p EPI809/Spring 2008

  23. Parameter Estimates Sum of Mean Source DF Squares Square F Value Pr > F Model 2 9.24974 4.62487 55.44 0.0043 Error 3 0.25026 0.08342 Corrected Total 5 9.50000 Root MSE 0.28883 R-Square 0.9737 Dependent Mean 2.50000 Adj R-Sq 0.9561 Coeff Var 11.55309 ParameterEstimation SAS Output S EPI809/Spring 2008

  24. Interpretation of Coefficients Solution ^ 1. Slope (1) • Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant EPI809/Spring 2008

  25. Interpretation of Coefficients Solution ^ 1. Slope (1) • Milk yield Is Expected to Increase by .2049 for Each 1 lb. Increase in food intake Holding the weight Constant • Slope (2) -Milk yield Is Expected to Increase by .2805 for Each 1 unit (x100 lb.) Increase in weight Holding the food intake Constant ^ EPI809/Spring 2008

  26. Model Evaluation EPI809/Spring 2008

  27. Evaluating Multiple Regression Models 1. Examine Variation Measures 2. Test Significance of Overall Model, portions of overall model and Individual Coefficients 3. Check conditions of a multiple linear regression model using Residuals 4. Assess Multicollinearity among ind. variables EPI809/Spring 2008

  28. Variation Measures EPI809/Spring 2008

  29. Coefficient of Multiple Determination • Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together EPI809/Spring 2008

  30. Check Your Understanding • If you add a variable to the model, how will that affect the R-squared value for the model? EPI809/Spring 2008

  31. Adjusted R2 • R2 Never Decreases When New X Variable Is Added to Model (Disadvantage When Comparing Models) • Solution: Adjusted R2 • Each additional variable reduces adjusted R2, unless SSE goes up enough to compensate EPI809/Spring 2008

  32. Check Your Understanding Using the Vet example: If you add a variable to the model, How will that affect R-squared and the estimate of standard deviation (of the error term)? EPI809/Spring 2008

  33. Check Your Understanding: solution • Model with food intake only: S = 0.64126, R-Square = 0.8269 & Adj R-Sq = 0.7836 • Model with food intake and weight: S = 0.28883, R-Square =0.9737 & Adj R-Sq =0.9561 EPI809/Spring 2008

  34. Thinking challenge • 18 variables • N=20 • R-squared=.95 EPI809/Spring 2008

  35. Testing Overall Significance of regression parameters EPI809/Spring 2008

  36. Testing Overall Significance • Tests if there is a Linear Relationship Between AllX Variables Together & Y • Hypotheses • H0: 1 = 2 = ... = k = 0 • No Linear Relationship • Ha: At Least One Coefficient Is Not 0 • At Least One X Variable linearly Affects Y • Uses F test statistic EPI809/Spring 2008

  37. Overall Significance Test statistic • Test statistic: • Denotation in SAS: EPI809/Spring 2008

  38. Overall SignificanceRejection Rule • Reject H0 in favor of Ha if fcalc falls in colored area • Reject H0 for Ha if P-value = P(F>fcalc)<α Reject H 0  Do Not Reject H 0 F 0 F ( k , n -K-1 , 1-α) EPI809/Spring 2008

  39. Testing Overall Significance Example • You’re a Vet epidemiologist for the county cooperative. You gather the following data: MilkFoodweight 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 • Are cows’ food intake and weight both linearly related to cows’ milk yield? Test at 5% significance level © 1984-1994 T/Maker Co. EPI809/Spring 2008

  40. Testing Overall Significance Example Model: Hypotheses H0: 1 = 2 = 0 (No Linear Relationship) Ha: At Least One Coefficient Is Not 0 EPI809/Spring 2008

  41. Parameter Estimates Sum of Mean Source DF Squares Square F Value Pr > F Model 2 9.24974 4.62487 55.44 0.0043 Error 3 0.25026 0.08342 Corrected Total 5 9.50000 Testing Overall SignificanceSAS Computer Output MS(Model) MS(Error) k n - k -1 n - 1 P-Value EPI809/Spring 2008

  42. Thinking Challenge • k=18, n=20, R-squared=.95 • Would need an F-value >247.3 to reject the null hypothesis! EPI809/Spring 2008

  43. Thinking challenge • F-test for model is significant • Does the model have the best available predictors for y? • Are all the terms in the model important for predicting y? • Or what? EPI809/Spring 2008

More Related