Correlation

Correlation • I have two variables, practically „equal“ (traditionally marked asX andY) – I ask, if they are independent and if they are „correlated“, how much then.

(Pearson) Correlation coefficient If positive deviations from mean in X are connected with positive deviations in Y, and negative ones with negative ones, then the sum is positive Dimensionless number (covariance standardized by variances of single variables), -1 means deterministic negative dependence, +1 deterministic positive dependence.

We presume linear relation, or two-dimensional normal distribution

Even here is r~0, though values aren’t independent But mind, that Yhasn’t normal distribution for this X

r=+0.99 r=-0.99

r=-0.83 r=+0.83

r=-0.45 r=+0.45

Test of null hypothesisH0: =0 ris estimation of parameter of population - . Again translates to the t-test We can use again both, one- and two-tailed test. It is even possible to test null hypothesis, that =some non-zero value, procedure is more complicated.

There are also tabled critical values of r (for different sample sizes)

Comparison with regression • It holds, that coefficient of determination in regression (R2) is square of correlation coefficient computed from the same two variables. • Probability level of significance test about independence is exactly the same in regression and for correlation coefficient.

Just manipulative experiment proves causality

Power of test • Regression is significant just when correlation coefficient is significant. • Power of test increases (in both) with strength of relation and with number of observations. • When I want to estimate somehow, how much observations I need, I must have an idea, how tight the relation is (how high R2 or ρ is in population).

Power of test: critical values r – it is possible to look for how much observations I need to have ~50% chance to reject H0 on given level of significance (at known ρ) More precise calculations are possible, but in any case, I need to have an idea, what is the correlation in population.

Coefficient of rank correlation (Spearmann) [there is also Kendall] • I replace every variable with its rank and I compute its correlation coefficient from rank. For greater samples even values for normal (Pearson) correlation coefficient hold. We can use formula d is difference in rank

But also Spearmann c. will be 0 in this case We can say, that Pearson correlation coefficient is a measure of linear dependence, Spearman is a measure of monotonic dependence.

Another possibility is to use permutation test • I change values of independent variable randomly and I count, how many times the resulted dependent variable will be “so nice” as from our data.

Correlation

Correlation

Presentation Transcript

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

CORRELATION

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation

Correlation