1 / 27

SAS Macro Coding for Jackknife Repeated Replication

SAS Macro Coding for Jackknife Repeated Replication. Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language

haracha
Télécharger la présentation

SAS Macro Coding for Jackknife Repeated Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS Macro Coding for Jackknife Repeated Replication • Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language • This presentation will demonstrate how to use a general JRR macro to correctly calculate variance estimates for means and regression coefficients (logistic and OLS models) SI Workshop: July 15, 2005

  2. Analysis of Complex Sample Survey Data • Data from complex sample surveys must be analyzed using techniques which adjust for the clustering of the sample design • SAS, SPSS, and Stata assume a simple random sample and do not correctly calculate variances and standard errors within the standard procedures SI Workshop: July 15, 2005

  3. Analysis of Complex Survey Data • SAS and Stata offer survey and svy procedures which use the Taylor Series Linearization approach • JRR is another widely used replication approach, offers an alternative to the Taylor Series method • JRR is flexible and can be adapted to many different types of statistics such as means, regression coefficients, and other statistics of interest SI Workshop: July 15, 2005

  4. Visual Representation of JRR process • JRR systematically removes a small portion of the sample and statistics of interest are computed repeated for each sub-sample • In this example, str=42 and secu=2 is deleted and str=42 and secu=1 is doubled. • This process is followed for each strata until entire dataset is covered SI Workshop: July 15, 2005

  5. SI Workshop: July 15, 2005

  6. SI Workshop: July 15, 2005

  7. SAS JRR Macro: Logistic Regression *Logistic Regression Jackknife for Analysis of Complex Survey Data****************** ; *Pat Berglund, July 2003 for Summer Institute Workshop ; libname d 'd:\sumclass' ; options compress=yes nofmterr symbolgen ; options macrogen mprint; *create outer jackknife macro with parameters ; *Parameters to fill in: *ncluster=number of clusters, in the NCS I dataset this is 42 ; *weight=case weight ; *depend=dependent variable for the logistic model ; *preds=predictor variables entered with a space between each one ; *indata=input dataset* ; %macro jacklogods(ncluster,weight,depend,preds,indata); SI Workshop: July 15, 2005

  8. *section 1: jackknife using strata and secu variables to do 42 jackknife selections* ; *each iteration of do loop selects one strata*secu combination and doubles the contribution of strata=x and secu=1 while setting strata=x and secu=2 to zero ; *all other combinations stay the same* ; %let nclust=%eval(&ncluster); data one; set &indata; %macrowgtcal ; %do i=1%to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend; %wgtcal ; SI Workshop: July 15, 2005

  9. **section 2: run base model/statistic of interest for entire sample using full weight* ; %macrobase ; ods output parameterestimates=parms (keep=variable estimate ) ; ods listing close ; proc logistic des data=ONE ; model &depend=&preds ; weight &weight ; run ; ods listing ; proc print data=parms ; run ; proc sort ; by variable ; run ; %mend base ; %base ; SI Workshop: July 15, 2005

  10. *Section 3: Run Replicate Models* ; * replicate models, one for each strata using weight developed in jackknife section 1* ; *save statistic of interest for use with variance estimation* ; %macroreps ; %do j=1%to &nclust ; ods output parameterestimates=parms&j (keep=estimate variable rename=(estimate=estimate&j )) ; ods listing close ; proc logistic des data=ONE ; model &depend=&preds ; weight pwt&j ; run ; proc sort ; by variable ; %end ; %mend reps; %reps ; SI Workshop: July 15, 2005

  11. *Section 4: Merge Base and Replicate files together for calculation of statistics of interest* ; data rep ; merge parms %do k=1%to &nclust; parms&k %end;; by variable ; procprint ; run ; SI Workshop: July 15, 2005

  12. *Section 5-Calculate complex design corrected variance and standard errors *variance = sum of the squared differences between the base statistic and the replicate statistics ; *standard error= square root of the sum of the squared differences (variance) ; *Odds Ratio=exponent of the coefficient ; *Confidence Intervals=OR+-1.96*corrected standard error* ; odslisting ; data calculate ; set rep ; %macroit ; %do j=1%to &nclust ; sqdiff&j=(estimate-estimate&j)**2; %end; sumdiff=sum(of sqdiff1-sqdiff&nclust); stderr=sqrt(sumdiff) ; or=exp(estimate) ; lowor=or-(1.96*stderr) ; upor=or+(1.96*stderr) ; %mend it ; %it; run ; SI Workshop: July 15, 2005

  13. procprint ; var variable estimate stderr or lowor upor ; run ; %mend jacklogods ; %jacklogods(42,p2wtv3,deplt1,sexf,d.ncsdxdm3 ) ; *comparison with SRS logistic regression* ; proclogistic des data=d.ncsdxdm3 ; weight p2wtv3 ; model deplt1=sexf ; run ; *comparison with SAS surveylogistic ; procsurveylogistic data=d.ncsdxdm3 ; strata str ; cluster secu ; weight p2wtv3 ; model deplt1 (event='1') =sexf ; run ; SI Workshop: July 15, 2005

  14. Results from Logistic JRR Design Corrected Results: Variable Estimate stderr or lowor upor SEXF 0.7434 0.088842 2.10315 1.92902 2.27728 SI Workshop: July 15, 2005

  15. SRS Results Analysis of Maximum Likelihood Estimates Std. Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq SEXF 1 0.7434 0.0724 105.3802 <.0001 SI Workshop: July 15, 2005

  16. SAS Surveylogistic Results Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 2.0084 0.0776 669.6525 <.0001 SEXF 1 -0.7434 0.0889 70.0103 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits SEXF 0.475 0.399 0.566 SI Workshop: July 15, 2005

  17. %macroreps ; %do j=1%to &nclust ; ods output parameterestimates=parms&j (keep=estimate variable rename=(estimate=estimate&j )) ; ods listing close ; proc reg data=ONE ; model &depend=&preds ; weight pwt&j ; run ; proc sort ; by variable ; %end ; %mend reps; %reps ; data rep ; merge parms %do k=1%to &nclust; parms&k %end;; by variable ; procprint ; run ; odslisting ; data calculate ; set rep ; %macroit ; %do j=1%to &nclust ; sqdiff&j=(estimate-estimate&j)**2; %end; sumdiff=sum(of sqdiff1-sqdiff&nclust); stderr=sqrt(sumdiff) ; %mend it ; %it; run ; procprint ; title"Results from JRR for OLS regression" ; var variable estimate stderr ; run ; %mend jackgenmod ; %jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ; procsurveyreg data=d.ncsdxdm3 ; title"Example of Proc SurveyReg" ; strata str ; cluster secu ; weight p2wtv3 ; model incpers=sexf ag25 ag35 ag45 ; run ; Another approach: Linear Regression %macro jackgenmod(ncluster,weight,depend,preds,indata); %let nclust=%eval(&ncluster); data one; set &indata; %macrowgtcal ; %do i=1%to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend; %wgtcal ; SI Workshop: July 15, 2005

  18. Base Model for OLS %macrobase ; ods output parameterestimates=parms (keep=variable estimate ) ; title "Example of Proc Reg without design correction" ; proc reg data=ONE ; model &depend=&preds ; weight &weight ; run ; proc sort ; by variable ; run ; %mend base ; %base ; SI Workshop: July 15, 2005

  19. Replicate Models • %macroreps ; • %do j=1%to &nclust ; • ods output parameterestimates=parms&j • (keep=estimate variable rename=(estimate=estimate&j )) ; • ods listing close ; • proc reg data=ONE ; • model &depend=&preds ; • weight pwt&j ; • run ; • proc sort ; • by variable ; • %end ; • %mend reps; • %reps ; SI Workshop: July 15, 2005

  20. Merge Replicate Datasets with Base Dataset data rep ; merge parms %do k=1%to &nclust; parms&k %end;; by variable ; procprint ; run ; odslisting ; SI Workshop: July 15, 2005

  21. Calculate Corrected Standard Errors from Distribution of Replicate Coefficients data calculate ; set rep ; %macroit ; %do j=1%to &nclust ; sqdiff&j=(estimate-estimate&j)**2; %end; sumdiff=sum(of sqdiff1-sqdiff&nclust); stderr=sqrt(sumdiff) ; %mend it ; %it; run ; SI Workshop: July 15, 2005

  22. Code to Print Results from JRR and Execute Outer Macro procprint ; title"Results from JRR for OLS regression" ; var variable estimate stderr ; run ; %mend jackgenmod ; %jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ; SI Workshop: July 15, 2005

  23. Proc SurveyReg Code procsurveyreg data=d.ncsdxdm3 ; title"Example of Proc SurveyReg" ; strata str ; cluster secu ; weight p2wtv3 ; model incpers=sexf ag25 ag35 ag45 ; run ; SI Workshop: July 15, 2005

  24. Parameter Estimates from OLS SRS Regression Parameter Estimates Parameter Std. Variable DF Estimate Error t Value Intercept 1 11077 485.53334 22.81 SEXF 1 -12096 434.45468 -27.84 AG25 1 15227 586.69609 25.95 AG35 1 22194 600.60265 36.95 AG45 1 21404 683.46087 31.32 SI Workshop: July 15, 2005

  25. JRR Results Results from JRR for OLS regression Obs Variable Estimate stderr 1 Intercept 11077 529.49 2 AG25 15227 698.83 3 AG35 22194 1026.29 4 AG45 21404 1055.67 5 SEXF -12096 689.31 SI Workshop: July 15, 2005

  26. Proc SurveyReg Results Estimated Regression Coefficients Standard Parameter Estimate Error t Value Pr > |t| Intercept 11077.003 532.95062 20.78 <.0001 SEXF -12095.819 690.29149 -17.52 <.0001 AG25 15227.170 698.54031 21.80 <.0001 AG35 22194.355 1017.50689 21.81 <.0001 AG45 21403.763 1062.42802 20.15 <.0001 SI Workshop: July 15, 2005

  27. Conclusions • JRR is a flexible and convenient alternative to canned software procedures/programs • Any statistic/procedure can be used within JRR structure, assuming it makes statistical sense • SAS Macro coding allows parsimonious syntax and is ideal for repetitive and flexible coding SI Workshop: July 15, 2005

More Related