1 / 13

Correlation

17. Correlation. Chapter17 p399. Semimetric distance – Pearson correlation coefficient or Covariance. How about higher dimension data ? It is useful to have a similar measure to find out how much the dimensions vary from the mean with respect to each other.

tudor
Télécharger la présentation

Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 17 Correlation

  2. Chapter17 p399

  3. Semimetric distance – Pearson correlation coefficient or Covariance • How about higher dimension data ? • It is useful to have a similar measure to find out how much the • dimensions vary from the mean with respect to each other. • Covariance is measured between 2 dimensions, • suppose one have a 3-dimension data set (X,Y,Z), then one can calculate • Cov(X,Y), Cov(X,Z) and Cov(Y,Z) - to compare heterogenous pairs of variables, define the correlation coefficient or Pearson correlation coefficient, -1≦rXY ≦1 -1  perfect anticorrelation 0  independent +1 perfect correlation

  4. Semimetric distance – the squared Pearson correlation coefficient • Pearson correlation coefficient is useful for examining correlations in the data • One may imagine an instance, for example, in which the same TF can cause both enhancement and repression of expression. • A better alternative is the squared Pearson correlation coefficient (pcc), The square pcc takes the values in the range 0 ≦ rsq ≦ 1. 0  uncorrelate vector 1  perfectly correlated or anti-correlated pcc are measures of similarity Similarity and distance have a reciprocal relationship similarity↑  distance↓  d = 1 – r is typically used as a measure of distance

  5. Semimetric distance – Pearson correlation coefficient or Covariance • The resulting rXYvalue will be larger than 0 if a and b tend to increase • together, below 0 if they tend to decrease together, and 0 if they are • independent. • Remark:rXYonly test whether there is a lineardependence, Y=aX+b • if two variables independent  low rXY, • a low rXYmay or may not independent, it may be a non-linear relation • a high rXYis a sufficient but not necessary condition for variable dependence

  6. Semimetric distance – the squared Pearson correlation coefficient • To test for a non-linear relation among the data, one could make a transformation by variables substitution • Suppose one wants to test the relation u(v) = avn • Take logarithm on both sides • log u = log a + n log v • Set Y = log u, b = log a, and X = log v •  a linear relation, Y = b + nX •  log u correlates (n>0) or anti-correlates (n<0) with log v

  7. Semimetric distance – Pearson correlation coefficient or Covariance matrix A covariance matrix is merely collection of many covariances in the form of a d x d matrix:

  8. Spearman’s rank correlation(SRC) • One of the problems with using the PCC is that it is susceptible to being skewed by outliers: a single data point can result in twogenes appearing to be correlated, even when all the other data points suggest that they are not. • Spearman’s rank correlation (SRC) is a non-parametric measure of correlation that is robust to outliers. • SRC is a measure that ignores the magnitude of the changes. The idea of the rank correlation is to transform the original values into ranks, and then to compute the correlation between the series of ranks. • First we order the values of gene A and B in ascending order, and assign the lowest value with rank 1. The SRC between A and B is defined as the PCC between ranked A and B. • In case of ties assign mid-ranks  both are ranked 5, then assign a rank of 5.5

  9. Spearman’s rank correlation The SRC can be calculated by the following formula, where xi and yi denote the rank of the x and y respectively. An approximate formula in case of ties is given by

  10. SRC vs. PCC PCC(A, B) = 0.633 SRC(A,B) = -0.086

  11. Chapter17 p401

  12. Chapter17 p408

More Related