1 / 31

Nearest neighbor matching

Nearest neighbor matching. USING THE GREEDY MATCH MACRO. Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern.

trish
Télécharger la présentation

Nearest neighbor matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it

  2. /* Define the library for formats */ LIBNAME saslib "G:\oldpeople\sasdata\" ; OPTIONS NOFMTERR FMTSEARCH = (saslib) ;

  3. /* Define the library for study data */ LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;

  4. Include the Macro %INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearestmacro.sas' ;

  5. %propen(libname, dsname, idvariable, dependent, propensity) LIBNAME = directory for data sets DSNAME = dataset with study data IDVARIABLE = subject ID variable DEPENDENT = dependent variable PROPENSITY = propensity score produced in logistic regression

  6. %propen(study,allpropen,id,athome,prob); FOR EXAMPLE Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did

  7. Explaining the macro A Challenge

  8. %macropropen(lib,dsn,id,depend,prob); Data in5 ; set &lib..&dsn; Creates a temporary data set

  9. Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals %Do countr = 1 %to 5; %let digits = %eval(6 - &countr); %let roundto = %eval(10**&digits); %let roundto = %sysevalf(1/&roundto); %let nextin = %eval(&digits - 1);

  10. MACRO NOTES %Do countr = 1 %to 5;/* Starts %DO loop */ Use %EVAL function to do integer arithmetic %let digits = %eval(6 - &countr); Use %SYSEVALF function to do non-integers

  11. /* Output control to one data set, intervention to another *//* Create random number to sort within group */

  12. Create 2 data sets DATA yes1 (KEEP= &probid_ydepend_yrandnum) no1 (KEEP = &probid_ndepend_nrandnum); SET in&digits; We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal places We only keep four variables

  13. Assignment statements randnum = RANUNI(0); &prob = ROUND(&prob,&roundto); Create a random number and Round propensity score to a set number of digits

  14. Output to Case Data set … IF &depend = 1 THEN DO ; id_y = &id ; depend_y = &depend ; OUTPUT yes1 ; END ; We need to rename the dependent & id variables or they’ll get overwritten

  15. … Or output control data set ELSE IF &depend = 0 THEN DO ; id_n = &id ; depend_n = &depend ; OUTPUT no1 ; END ; Notice the data sets were named no1 and yes1 It becomes evident why shortly

  16. /* Runs through control and experimental and matches up to 20 subjects with identical propensity score */

  17. %Do i = 1 %to 20; %let j = %eval(&i +1); procsortdata = yes&i; by &probrandnum; datayes&iyes&j; set yes&i; by &prob; if first.&prob then output yes&i; else output yes&j; NOTE: Matching without replacement

  18. Same thing for controls procsortdata = no&i; by &probrandnum; datano&ino&j; set no&i; by &prob; if first.&prob then output no&i; else output no&j; The randnum insures matching scores are pulled at random

  19. Merge matches, end loop DATAmatch&i; MERGE yes&i(in= ina)no&i(in= inb); BY &prob; IF ina AND inb; run; %END ;

  20. /* Adds all matches into a single data set */ DATAallmatches; SET %DO k = 1 %TO 20; match&k %END ; Concatenate all data sets with matches (N=20)

  21. Create two data sets with IDs DATA allyes (RENAME = (id_y = &id depend_y = &depend)) allno (RENAME = (id_n = &id depend_n = &depend)); SET allmatches ;

  22. Create one file of all matched IDs DATAmatchfile; SET allyesallno; And sort it … procsortdata = matchfile; by &id &depend ;

  23. proc sort data = in&digits ; by &id &depend ;

  24. /* Creates a data set of all subjects with n-digit match */ /* Creates a second data set of subjects with no match */ data matches&digitsin&nextin ; merge in&digits (in = ina) matchfile (in= inb) ; by &id &depend ; if ina and inb then output matches&digits ; else output in&nextin ;

  25. JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH Title "Matches &roundto " ;proc freq data = matches&digits ; tables &depend ;run ;%end ; End loop. Now match to 4 decimal places, etc

  26. /* Adds 1- to 5-digit matches into a single data set */ data &lib..finalset; set %do m = 1 %to 5; matches&m %end ;

  27. One final check & done ! Title "Distribution of Dependent Variable in &lib..finalset " ; procfreqdata = &lib..finalset; tables &depend ; run; %mendpropen; run;

  28. Did it work? ** P <.01 **** P < .0001

  29. Model Comparison

  30. Odds ratio

  31. How near?

More Related