Friday, September 22, 2006

Lesson 10: Correlation

Correlation, the relationship between two variables, is closely related to prediction.

The greater the association between variables, the more accurately we can predict the outcome of events. There is rarely an exact correlation of observed results with a mathematical function - the points never fit exactly on the line. The question is therefore whether an association between two variables could have occurred by chance.

Correlation Coefficient

Scatter Plot and Line of Best Fit



There are numerous methods for calculating correlation, e.g:
The parametric Pearson, or "r value", correlation
The nonparametric Spearman correlation

Pearson correlation calculations are based on the assumption that both X and Y values are sampled from populations that follow a normal (Gaussian) distribution, at least approximately, although with large samples, this assumption is not too important.

Alternatively, the nonparametric Spearman correlation is based on ranking the two variables, and so makes no assumption about the distribution of the values.

A correlation analysis is performed in the same as any other statistical test of significance:

Formulate the null hypothesis. A simpler hypothesis has priority over a more complex theory, so the null hypothesis (H0) is therefore that "There is no correlation between the datasets". You also need to set the significance level (a) before performing the test (e.g. 0.05).



Warning:Correlation tests are in some ways the most misused of all statistical procedures!They are able to show whether two variables could be connected. However, they are not able to show that the variables are not connected! If one variable depends on another, i.e. there is a causal relationship, then it is always possible to find some kind of correlation between the two variables. However, if both variables depend on a third, they can show a correlation without any causal dependency between them. Take care!

Example:
There is a direct correlation between the number of mobile phone masts and the decline in the numbers of house sparrows, Passer domesticus. But do mobile phone masts harm sparrows, or are both effects caused by something else? Or are they both completely independent observations which just happen to correlate? We don't know because correlation tests do not reveal this information - further investigation is necessary.



Critical Values of the Correlation Coefficient

0 Comments:

Post a Comment

<< Home