1 / 19

There is a hypothesis about dependent and independent variables

Assumptions of linear regression. There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors around the hypothesized regression line.

libby
Télécharger la présentation

There is a hypothesis about dependent and independent variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assumptions of linearregression There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors around the hypothesized regression line There is a hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line There is no clear hypothesis about dependent and independent variables The relation is non-linear We have no data about the distribution of errors around the hypothesized regression line

  2. Assumptions: A linear model applies The x-variable has no error term The distribution of the y errors around the regression line is normal Least squares method

  3. The second example is nonlinear We hypothesize the allometric relation W = aBz Nonlinear regression model Linearised regression model Assumption: The distribution of errors is lognormal Assumption: The distribution of errors is normal

  4. Y=e0.1X+norm(0;Y) Y=X0.5enorm(0;Y) In both cases we have some sort of autocorrelation Using logarithms reduces the effect of autocorrelation and makes the distribution of errors more homogeneous. Non linear estimation instead puts more weight on the larger y-values. If there is no autocorrelation the log-transformation puts more weight on smaller values.

  5. Linearregression European bat species and environmentalcorrelates

  6. N=62 Matrixapproach to linearregression Xis not a squarematrix, henceX-1doesn’texist.

  7. Thespecies – arearelationship of Europeanbats Whataboutthe part of varianceexplained by our model? 1.16: Averagenumber of species per unit area (speciesdensity) 0.24: spatialspeciesturnover

  8. How to interpretthecoefficient of determination Total variance Rest (unexplained) variance Residual (explained) variance Statisticaltestingisdone by an F or a t-test.

  9. The general linear model A model thatassumesthat a dependent variable Y can be expressed by a linearcombination of predictorvariables X iscalled a linear model. ThevectorEcontainstheerrorterms of eachregression. Aimis to minimizeE.

  10. The general linear model Iftheerrors of thepreictorvariablesareGaussiantheerror term e shouldalso be Gaussian and means and variancesareadditive Total variance Explainedvariance Unexplained(rest) variance

  11. Multipleregression Model formulation Estimation of model parameters Estimation of statisticalsignificance

  12. Multiple R and R2

  13. R: correlationmatrix n: number of cases k: number of independent variablesinthe model D<0 isstatistically not significant and should be eliminatedfromthe model. Adjusted R2

  14. A mixed model

  15. Thefinal model Verylowspeciesdensity (log-scale!) Realisticincrease of speciesrichnesswitharea Increase of speciesrichnesswithwinterlength Increase of speciesrichnessathigherlatitudes A peak of speciesrichnessatintermediatelatitudes Isthis model realistic? The model makesrealisticpredictions. Problem mightarise from the intercorrelationbetween the predictorvariables (multicollinearity). We solvethe problem by a step-wiseapproacheliminatingthevariablesthatareeither not significantorgiveunreasonableparametervalues Thevarianceexplanation of thisfinal model ishigherthanthat of theprevious one.

  16. Multiple regression solves systems of intrinsically linear algebraic equations Polynomialregression General additive model • The matrix X’X must not be singular. It est, the variables have to be independent. Otherwise we speak of multicollinearity. Collinearity of r<0.7 are in most cases tolerable. • Multiple regression to be safely applied needs at least 10 times the number of cases than variables in the model. • Statistical inference assumes that errors have a normal distribution around the mean. • The model assumes linear (or algebraic) dependencies. Check first for non-linearities. • Check the distribution of residuals Yexp-Yobs. This distribution should be random. • Check the parameters whether they have realistic values. Multiple regression is a hypothesis testing and not a hypothesis generating technique!!

  17. Standardizedcoefficients of correlation Z-tranformeddistributionshave a mean of 0 an a standard deviation of 1. In thecase of bivariateregression Y = aX+b, Rxx = 1. HenceB=RXY. Hencetheuse of Z-transformedvaluesresultsinstandardizedcorrelationscoefficients, termedb-values

More Related