1 / 29

Regression analysis

Regression analysis. Regression analysis. Objective : Investigate interplay of quantititative variables Identify relations between a dependent variable and one or several independent variables Make predictions based on observed data

allene
Télécharger la présentation

Regression analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression analysis

  2. Regression analysis • Objective: • Investigateinterplayofquantititative variables • Identifyrelationsbetween a dependent variable andoneorseveralindependent variables • Makepredictionsbased on observeddata • Dependent variable: variable whosevaluesshallbeexplained • Independent variables: variables thathave an impact on thedependent variable

  3. Regression analysis Linear regression

  4. Linear regression • Hier wird angenommen, dass der Einfluss der unabhängigen Variablen auf die abhängige Variable linear ist • Dabei unterscheidet man zwischen: • Einfacher linearer Regression: Erklärung einer abhängigen Variable durch eine unabhängige Variable • Multipler linearer Regression: Erklärung einer abhängigen Variable durch mehrere unabhängigen Variablen

  5. Regression analysis Simple linear regression

  6. Linear regression • The variables educationandincomeareconsidered, whereofone variable (eduaction) canbeassumedtohave an impact on theother (income) • Dependent variable Y=(): income • Independent variable X=(): education

  7. Linear regression • Basic ideaof linear regression: find a straightline, whichoptimallydescribesthecorrelationbetweenthetwo variables

  8. Linear regression • Lineares Regressionsmodell ( R: lm(y~x) ): , Her, iscalled,(axis) interceptistheslope, X isthepredictor variable andisthe residual. The residual isthedifferencebetweentheregressionlineandthemeasurementvalues Y. Here, iscalledestimateof Y andwehave:

  9. Linear regression • Objective: • Estimate the coefficients such thatthemodelfitsoptimallytothedata • Predictionof Y values • The straightlineshallbechosen in such a waythatthesquareddistancesbetweenthevaluespredictedbythemodelandtheempiricallyobservedvaluesareminimized • Wewant:

  10. Linear regression • We will obtainestimatesofthecoefficients wich are also calledregressioncoefficients, : • , • , arethe least square (LSQ) estimates of

  11. Linear regression • In casethearenormallydistributedweobtainfor, and confidenceintervals: • where,,aretherespectivestandarderrorsoftheestimates

  12. Linear regression • We obtain a t-Test for the null hypothesis against the alternative • Reject, if ist

  13. Regression analysis Multiple linear regression

  14. Multiple linear regression • Now multiple independent variables . A sample ofsizennow consists of the values , i=1,…,n • Hence: • Here, the , j=1,…,m, are the unknown regression coefficients and the aretheresiduals • Matrix notation:

  15. Linear regression • Estimation oftheregressioncoefficientsisagainperformedwiththe least squaremethod. After extensive calculusoneobtains • Estimation of isobtainedaccordingto: , where • The estimationprocessiscomputationallydemanding (matrixinversionisneeded!) andhastobedonebycomputers

  16. Linear regression • In casethe are normally distributedweobtain a F-test forthenull hypothesisagainstthe alternative • Wewanttotestwhethertheoverallmodelissignificant • Overall F-test statistic: • Reject, if

  17. Regression analysis Logisticregression

  18. Logisticregression • Unitlnow, thedependet variable y was continous. Now, weconsiderthesituationwherethereare just twopossibleoutcomevalues. An exampleisthedichotmoustrait y = „affectionstatus“ withthevalues „1=affectedbythedisease“ and „0= not affected“ (healthy) • Wewanttopredicttheprobalitythat an individual hast thevalue 1 (= isaffected). • The rangeofpossiblevaluesis [0,1]. • => Linear regressioncan not beusedsincethedependent variable is nominal. • => Instead, logisticregressionisused.

  19. Logisticregression • Examplebinarylogisticregression: • Sample withinformation on survivalofthesinkingof Titanic • Question: was thechancetosurvivedependent on sex?

  20. Logisticregression • The oddsratioisusedtomodethechancetosurvive: • Weconsidertheratioofthesurvivalprobabilityofwomenandthesurvivalprobabilityofwomen • The OR of 10.14 indicatesthattheprobabilitytosurvive was 10 timesas high forwomenasformen • Fromhereuntilslide 22: detailsforspecialists • => A regression on thelogarithmicodds (so calledlogit) thatthe 0/1-coded dependent variable takes on thevalue 1

  21. Logisticregression • Logarithmic odds are well suited for regressions analysis since there valuesare in andsincetheyaresymetric • Regression equation • Now, theprobabilitythatthedependent variable takes on thevalue1 giventhevalues x canbecomputedas • Estimationofregressioncoefficientsisnowdonewohtmaximum-likelihoodesitmation. Logarithmicodds, that 0/1-variable takes on thevalue 1 Knownfrom linear regression

  22. Logisticregression • Probabilities depend on • Interpretation oftheparameters: • : definestheprobability P(Y=1|X=0) ofthevalue X=0 oftheindependent variable X: • The larger the larger theprobability P(Y=1|X=0) was setto 1

  23. Logisticregression • : determinestheslopeoftheprobabilityfunction, andtherebyhow strong differencesattheindependent variable X influencedifferencesattheconditionalprobabilites • , conditionalprobabilityofcategory 1 („affected“) ismonotonicallyincreasingfunctionof X • , conditionalprobabilityofcategory 1 („affected“) ismonotonicallydecreasingfunctionof X • , X and Y areidependnet was settozero 0

  24. Logisticregression • Titianci: • Coding: male=1, female=0, survival=1, non-surival=0 • Compute logarithmic odds according to • (In R: glm(y~x,family=binomial(„logit“)) ): • Female: • Male: • The logitcoefficientsaredifficulttointerpret, thereforetheyaretransformed back by () • Female: , Male:

  25. Logisticregression • „Wald-test“ forthe null hypothesisβ = 0 • Rejectthe null hypothesisif, where p isthenumberofdegreesoffreedom (= thenumberofindependent variables) • Titanic example: • Wald-test für : Note: istypically not ofinterest. Here, ititsignficantbecauseitdetectsthatthereweremuchmorementhenwomen on the Titanic. • Wald-test für : • => The null hypothesiscanberejected

  26. Note • So far, the logistic regression example could have been computed with the chi-square test for 2x2 tables. The advantage of the logistic regression is that it can be extended to multiple independent variables and that the independent variables can be continuous.

  27. Logisticregression • Logisticregressionwith multiple predictor variables (In R: glm(y~x1+x2,family=binomial(„logit“)) ): • Multiple predictor variables can also beanalyzedas „cross-classification“ • Example: Doeslungfunctiondepend on airpollutionandsmoking • Dependent variable: lufu= lungfunctiontest, „normal“=1, „not normal“=0 • Dependent variables: • LV=degreeofairpolution, „no“=0, „yes“=1 • Smoking „no“=0, „yes“=1

  28. Logisticregression • Data:

  29. Logisticregression Logistic regressionwith R yields

More Related