1 / 20

Pascal Peduzzi UNEP/GRID-Geneva

RiVAMP training: Statistics module Quantifying the role of ecosytems. www.grid.unep.ch. Pascal Peduzzi UNEP/GRID-Geneva. Risk and Vulnerability Assessment Methodology development Project (RiVAMP) Kingston, 5-8 December 2011. Plan. Brief overview of multiple regression statistical concepts

Télécharger la présentation

Pascal Peduzzi UNEP/GRID-Geneva

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RiVAMP training: Statistics module Quantifying the role of ecosytems www.grid.unep.ch Pascal PeduzziUNEP/GRID-Geneva Risk and Vulnerability Assessment Methodology development Project (RiVAMP) Kingston, 5-8 December 2011

  2. Plan • Brief overview of multiple regression statistical concepts • Familiarization with Tanagra Statistical OpenSource software. • Statistical analysis (practice).

  3. 1. Overview of multiple regression statistical concepts

  4. Multiple regression analysis • This section is adapted from the on-line help of StatSoft Electronic Statistics Textbooks(http://www.statsoft.com/textbook/statistics-glossary/). • This example was made for a research on links between deforestation and landslides in North Pakistan. • Peduzzi, P., Landslides and vegetation cover in the 2005 North Pakistan earthquake: a GIS and statistical quantitative approach, Nat. Hazards Earth Syst. Sci., 10, 623-640, 2010. http://www.nat-hazards-earth-syst-sci.net/10/623/2010/nhess-10-623-2010.html).

  5. Multiple regression analysis • This allows to identify what are multiple parameters, together having an influence on a selecte dependant variable. • E.g. Slope and vegetation density can be associated with landslides susceptibility. However you may have steep slopes well covered with vegetation and deforested areas in flat places, thus one variable is not enough to describe landslide area.

  6. Multiple regression analysis When addressing the potential link between one variable (e.g. slope) and a dependant variable (e.g. landslide areas) simple scatter plots provide useful information (See figure A1). simple scatter plot

  7. Puis quelques visuels en 3D pour tester deux variables à la fois.

  8. Pearson correlation, r • The independent variables in the model should not have influence between them. To produce group of independent variables a correlation matrix is computed and variables that are too correlated should not be tested in the same hypothesis. Thus group of uncorrelated variables should be created (see appendix C). The r is the pearson coefficient (or correlation coefficient), it is computed as follows: • where • is the average for a observed dependant variable • is the average for the modelled variable

  9. Outliers

  10. Some traps to avoid Before being too excited about a high correlation: look at the followings: • Is the observed versus modelled allong a ligne or do you have a group of points with one of two points far away? • Do you have 2 (or more) independent variables that are correlated? Perform a correlation mattrix and check this. • Do you have a large number of records?

  11. Example of “bad” correlation

  12. Correlation mattrix

  13. 1. Look at the normality of your dependant variable

  14. 2. How does it look like Study scatterplots versus your dependant variable and already identify the one that seem to be correlated Data visualization  Scatterplots

  15. Differenciate cases • Many sites do not have coral. Separate the cases with coral with those with seagrass (in excell, libreOffice, or using the : Instance selection  Rule-based selection.

  16. Correlation is not causality What we aim to do is to say that factor A (e.g. vegetation density) influence B (landslide area). Now having a correlation between factors A & B can have several origins: A is indeed having an influence on B or B is influencing A or C is influencing A & B. http://xkcd.com/925/

  17. What is a good model? Models are not the reality, they try to approximate it based on a simplification. A good model is a model which : • Explains a significant part of the differences observed (high R2 • The distribution should be along a line. • The number of independant variables is not too high (e.g. between 2 and 4), • p-value of your independant variables < 0.05. • No autocorrelation suspected the independant variables • (see correlation matrix) • It is based on a reasonable amount of records

  18. 4. Let’s do it!

More Related