1 / 25

Correlation and regression

Correlation and regression. http://sst.tees.ac.uk/external/U0000504. Introduction. Scientific rules and principles are often expressed mathematically There are two main approaches to finding a mathematical relationship between variables Analytical Based on theory Empirical

coby
Télécharger la présentation

Correlation and regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and regression http://sst.tees.ac.uk/external/U0000504

  2. Introduction • Scientific rules and principles are often expressed mathematically • There are two main approaches to finding a mathematical relationship between variables • Analytical • Based on theory • Empirical • Based on observation and experience

  3. The straight line (1) • Most graphs based on numerical data are curves. • The straight line is a special case • Data is often manipulated to yield straight line graphs as the straight line is relatively easy to analyse

  4. The Straight line (2) • Straight line equation • y = mx + c • slope = m • m = Dy/Dx • Intercept = c

  5. Correlation & Regression • These are statistical processes which; • Suggest the existence of a relationship • Determine the best equation to fit the data • Correlation is a measure of the strength of a relationship between two variables • Regression is the process of determining that relationship

  6. Correlation and Regression The next few slides illustrate correlation and regression

  7. No Correlation

  8. Positive correlation

  9. Negative correlation

  10. Curvilinear correlation

  11. Correlation coefficient • A statistical measure of the strength of a relationship between two variables. • Pearson’ product-moment correlation coefficient, r • Spearman’s rank correlation coefficient, r • All these take a value in the range -1.0 to + 1.0 • r or r = +1.0 represents a perfect positive correlation • r or r = -1.0 represents a perfect negative correlation • r or r = 0.0 represents a no correlation • values of r or r are associated with a probability of there being a relationship.

  12. Linear regression • Is the process of trying to fit the best straight line to a set of data. • The usual method is based on minimising the squares of the errors between the data and the predicted line • For this reason, it is called “the method of least squares”

  13. Linear regression - assumptions • The error in the independent (x) variable is negligible relative to the error in the dependant (y) variable • The errors are normally, independently and identically distributed with mean 0 and constant variance - NIID(0,s2)

  14. Linear regression model • For a set of data, (x,y), there is an equation that best fits the data of the form • Y = a + bx + e • x is the independent variable or the predictor • y is the measured dependant or predicted variable • Y is the calculated dependant or predicted variable • e is the error term and accounts for that part of Y not “explained” by x. • For any individual data point, i, the difference between the observed and predicted value of y is called the residual, ri • i.e. ri = yi – Yi = yi - (a + bxi) • The residuals provide a measure of the error term

  15. Regression analysis (1) • Check the correlation coefficient • Null Hypothesis • H0: There is no correlation between x & y • H1: There is a correlation between x & y • Decision rule • reject H0 if |r|  critical value at a = 0.05 • If you cannot rejectH0 then proceed no further, otherwise carry out a full regression

  16. Regression analysis (2) • Regression analysis can be carried out using either Excel or Minitab. Excel will need the analysis ToolPak add-in installed. • The output from both Minitab and Excel will give the following information • The regression equation ( in the form y = a + bx) • Probabilities that a  0 and b  0 • The coefficient of determination, R2 • Analysis of variance • In addition you will need to produce at least one of • Residuals vs. fitted values • Residuals vs. x-values • Residuals vs. y values

  17. Interpreting output • Regression equation:- this is the equation that best fits the data and provides the predicted values of y • Analysis of variance:- Determines the proportion of the variation in x & y that can be accounted for by the regression equation and what proportion is accounted for by the error term. Thep-value arising out of this tells us how well the regression equation fits the data. • The proportion of the variation in the data accounted for by the regression equation is called the coefficient of determination, R2 and is equal to the square of the correlation coefficient

  18. Output plots • The output plots are used to check the assumptions about the errors • The normal probability plot should show the residuals lying on a straight line. • The residual plots should have no obvious pattern and should not show the residuals increasing or decreasing with increase in the fitted or measured values.

  19. Non linear relationships • Many functions can be manipulated mathematically to yield a straight line equation. • Some examples are given in the next few slides

  20. Linearisation (2)

  21. Linearisation (3)

  22. Functions involving logs (1) • Some functions can be linearised by taking logs • These are • y = A xn • and y = A ekx

  23. Functions involving logs (2) • For y = Axn, taking logs gives • log y = log a + n log x • A graph of log y vs. log x gives a straight line, slope n and intercept log A. • To find A you must take antilogs (= 10x)

  24. Functions involving logs (3) • For y = Aekx, we must use natural logs • ln y = ln A + kx • This gives a straight line slope k and intercept ln A • To find A we must take antilogs (= ex)

  25. Polynomials • These are functions of general formula • y = a + bx + cx2 + dx3 + … • They cannot be linearised • Techniques for fitting polynomials exist • Both Excel and Minitab provide for fitting polynomials to data

More Related