1 / 22

Generating Correlated Random Variables

Generating Correlated Random Variables. Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com. Why?. I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables. Previous Method.

hafwen
Télécharger la présentation

Generating Correlated Random Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

  2. Why? • I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables.

  3. Previous Method Use Excel to fill down and then generate another column that was fairly correlated

  4. Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;

  5. Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;

  6. Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;

  7. Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;

  8. Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;

  9. Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1to100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; output; end; run;

  10. Y and x for different correlation coefficients

  11. Generating Correlated Random Variables using Proc IML • To generate more than 2 correlated random variables than it’s easier to use the Cholesky decomposition method in Proc IML. • IML = Interactive Matrix Language

  12. Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Use is similar to set. Reading in the simulated data and the means

  13. Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Variance covariance matrix

  14. Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Applying Cholesky’s decompositon

  15. Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Concatenating the variables

  16. Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Correlated Variables

  17. Generating Correlated Random Variables using Proc IML prociml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 49, 925}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); appendfrom final_simulated; quit; Outputting the variables

  18. References • Generating Multivariate Normal Data by using Proc IML Lingling Han, University of Georgia, Athens, GA

  19. Appendix • Correlation Coefficient =

  20. R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2;

  21. R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1 y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2

  22. R Code - Generating Correlated Random Variables using Matrices C = matrix(c(4, 9, 9, 25), nrow = 2, ncol = 2) cholc = chol(C) R = matrix(c(r1,r2), nrow = 100, ncol = 2, byrow = F) mean = matrix(c(mean1,mean2), nrow = 100, ncol = 2, byrow = T) RC = mean + R %*% cholc Use previous values of r1 and r2

More Related