1 / 31

310 likes | 375 Vues

Multiple regression. Regression. Problem: to draw a straight line through the points that best explains the variance. Regression. Problem: to draw a straight line through the points that best explains the variance. Regression.

Télécharger la présentation
## Multiple regression

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Regression**Problem: to draw a straight line through the points that best explains the variance**Regression**Problem: to draw a straight line through the points that best explains the variance**Regression**Problem: to draw a straight line through the points that best explains the variance**Regression**Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Variance explained (change in line lengths2) Variance unexplained (residual line lengths2)**Regression**Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df In regression, each x-variable will normally have 1 df**Regression**Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Essentially a cost: benefit analysis – Is the benefit in variance explained worth the cost in using up degrees of freedom?**Regression**Also have R2: the proportion of total variance explained by the variable Variance explained by x-variable Variance still unexplained Unexplainedvariance Variance explainedby x-variable**Regression example**Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. • What is the R2? • What is the F ratio?**Regression example**Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. • What is the R2? • What is the F ratio? R2 = 150/300 = 0.5 F 1,30 = 150/1 = 30 150/30 Why is df error = 30?**Higher nutrient trees**Lower nutrient trees Multiple regression Herbivore damage Tree age Damage= m1*age + b**Herbivore damage**Tree age Residuals of herbivore damage Tree nutrient concentration**Damage= m1*age + m2*nutrient + b**Herbivore damage Tree age Residuals of herbivore damage Tree nutrient concentration**No interaction (additive):**Interaction (non-additive): y y Damage= m1*age + m2*nutrient + m3*age*nutrient +b**X1**X2 Non-linear regression? Just a special case of multiple regression! X X2 Y 1 1 1.1 2 4 2.0 3 9 3.6 4 16 3.1 5 25 5.2 6 36 6.7 7 49 11.3 Y = m1 x +m2 x2 +b Y = m1 x1 +m2 x2 +b**Jump height (how high ball can be raised off the ground)**8 9 10 11 Feet off ground Total SS = 11.11**X variable parameter SS F1,13 p**Height +0.943 9.96 112 <0.0001 of player**X variable parameter SS p**Weight +0.040 7.92 32 <0.0001 of player F1,13**An idea**Perhaps if we took two people of identical height, the lighter one might actually jump higher? Excess weight may reduce ability to jump high…**X variable parameter SS F p**Height +2.133 9.956 803 <0.0001 Weight -0.059 1.008 81 <0.0001 lighter heavier**X variable parameter SS F p**Height +2.133 9.956 803 <0.0001 Weight -0.059 1.008 81 <0.0001 X variable parameter SS p Weight +0.040 7.92 32 <0.0001 of player • Why did the parameter estimates change? • Why did the F tests change? F1,13**Tall people can jump higher**Heavy people often tall (tall people often heavy) + Height Jump + - Weight People light for their height can jump a bit more**The problem:**The parameter estimate and significance of an x-variable is affected by the x-variables already in the model! How do we know which variables are significant, and which order to enter them in model?**Solutions**1) Use a logical order. For example it makes sense to test the interaction first 2) Stepwise regression: “tries out” various orders of removing variables.**Stepwise regression**Enters or removes variables in order of significance, checks after each step if the significance of other variables has changed Enters one by one: forward stepwise Enters all, removes one by one: backwards stepwise**Forward stepwise regression**• Enter the variable with the highest correlation with y-variable first (p>p enter). • Next enter the variable to explain the most residual variation (p>p enter). • Remove variables that become insignificant (p> p leave) due to other variables being added. And so on…**General words of caution!**• Correlation does not equal causation!**General words of caution!**• Can interpolate between points, but don’t extraoplate (Mark Twain effect) In the space of 176 the lower Mississippi has shortened itself 242 miles. That is an average of a trifle over 1 1/3 miles per year. Therefore, any calm person, who is not blind or idiotic, can see that in the old Oölithic Silurian Period, just a million years ago next November, the Lower Mississippi River was upwards of 1,300,000 miles long, and stuck out over the Gulf of Mexico like a fishing rod

More Related