Here, our objective is to generate two sets of N numbers which have prescribed Mean and Standard Deviation and whose Pearson Correlation coefficient is also prescribed.
First we generate two collections of N numbers:
If we're satisfied with normally distributed numbers, we can do this with Excel, using NORMINV(RAND(),0,1)
From the above numbers we construct another collection,
collection 1, namely This collection will have mean = M1 and
SD = S1, since
Similarly, we construct the collection 2,
based upon the bk, namely So far we've been able to construct two sets of numbers (x and y) with prescribed Means and
Standard Deviations, starting with two sets (a and b) with Mean = 0 and SD = 1. Now we work on
the Pearson Correlation r:
Hence:
Now we generate a u-set, like so:
uk = M1 +
S1 (A ak + B bk)
where A and B are as-yet-unknown constants.
Further, the Standard Deviation of this u-set is determined from: So, using (1), (2) and (3) we get:
So far we've managed to modify the x-set, creating a u-set, yet maintaining the Mean and SD. Now, we calculate the correlation: r(u,y) = { (1/N)Σ (uk - M1) (yk - M2) } / { S1 S2 } = (1/N)Σ (A ak + B bk) bk = A 0 + B 1 = B using (2) and (3). Finally, then, we can start with uncorrelated sets x and y, specify a correlation B, and construct a u-set with the same Mean and SD as the x-set, namely M1 and S1 ... but with the specified correlation, via uk = M1 + S1 { SQRT(1-B2)ak + B bk } Now we consider THREE sets: ak, bk and ck (with Mean = 0 and SD = 1) and three derived sets: xk, yk and zk ... all of which are uncorrelated. We generate a u-set (as we did above) according to:
* Note:
Now, for the correlations: r(u,y) = { (1/N)Σ (uk - M1) (yk - M2) } / { S1 S2 } = (1/N)Σ (A ak + B bk + C ck) bk = A 0 + B 1 + C 0 = B and
We now have sets u, v and y with prescribed Means and Standard Deviations, namely (M1,S1), (M2,S2) and (M3,S3), and correlations r(u,y) = B, r(u,z) = C.
|