Correlation, explained.
short but in depth understanding of correlation
Covariance:
let’s say we have two variables — X and Y. We want to assess if there is strong or weak relationship between both of them. How can we do that? That is what covariance is used for.
covariance = SUM ((X- Xmean) (Y- Ymean)) / N -1
basically what we are calculating is the is how both X and Y deviate from the mean on an average i.e. when X is some distance from the mean, does Y equally, but more importantly does it move in the same direction as X from the mean.
We retain the distance & direction variation from mean by taking the Product of X deviation from its mean and Y deviation from its mean. Once we sum all the Xs and Ys, we will get if net deviation from the mean is same (+) or opposite (-ve) or 0. So, covariance is calculating the direction and also the magnitude.
Correlation:
Correlation is what will help us squeeze the covariance between -1 to 1. This is because covariance is a raw magnitude value which does not provide good information and is hard to interpret.Weak relationship = low correlation value, strong relationship = high correlation value.
correlation = covariance / (standard deviation (x) * standard deviation (y))
The denominator squeezes the covariance between -1 and 1. Makes it easier to interpret. Weak relationship = low correlation value eg: 0.2, strong relationship = high correlation value eg: 0.9.
The denominator is product of standard deviations of X and Y as covariance can at maximum be product of standard deviation.
Correlation shortcomings:
As a single point metric, it might be deceitful sometimes. For example, visually we can see below that the correlation is great — 1. However, the calculation will give us a 0 correlation. This happens because correlation is not additive. Correlation only works in one direction.
y-axis
^
|
| +1/\
| / \-1
| / \
|/______\__________> x-axisactual correlation looks good. There is a strong relationship.however, as per calculation, the correlation will be 0.Isn't that interesting?