Nearest neighbor matching

Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it

/* Define the library for formats */ LIBNAME saslib "G:\oldpeople\sasdata\" ; OPTIONS NOFMTERR FMTSEARCH = (saslib) ;

/* Define the library for study data */ LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ;

Include the Macro %INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearestmacro.sas' ;

%propen(libname, dsname, idvariable, dependent, propensity) LIBNAME = directory for data sets DSNAME = dataset with study data IDVARIABLE = subject ID variable DEPENDENT = dependent variable PROPENSITY = propensity score produced in logistic regression

%propen(study,allpropen,id,athome,prob); FOR EXAMPLE Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did

Explaining the macro A Challenge

%macropropen(lib,dsn,id,depend,prob); Data in5 ; set &lib..&dsn; Creates a temporary data set

Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals %Do countr = 1 %to 5; %let digits = %eval(6 - &countr); %let roundto = %eval(10**&digits); %let roundto = %sysevalf(1/&roundto); %let nextin = %eval(&digits - 1);

MACRO NOTES %Do countr = 1 %to 5;/* Starts %DO loop */ Use %EVAL function to do integer arithmetic %let digits = %eval(6 - &countr); Use %SYSEVALF function to do non-integers

/* Output control to one data set, intervention to another *//* Create random number to sort within group */

Create 2 data sets DATA yes1 (KEEP= &probid_ydepend_yrandnum) no1 (KEEP = &probid_ndepend_nrandnum); SET in&digits; We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal places We only keep four variables

Assignment statements randnum = RANUNI(0); &prob = ROUND(&prob,&roundto); Create a random number and Round propensity score to a set number of digits

Output to Case Data set … IF &depend = 1 THEN DO ; id_y = &id ; depend_y = &depend ; OUTPUT yes1 ; END ; We need to rename the dependent & id variables or they’ll get overwritten

… Or output control data set ELSE IF &depend = 0 THEN DO ; id_n = &id ; depend_n = &depend ; OUTPUT no1 ; END ; Notice the data sets were named no1 and yes1 It becomes evident why shortly

/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */

%Do i = 1 %to 20; %let j = %eval(&i +1); procsortdata = yes&i; by &probrandnum; datayes&iyes&j; set yes&i; by &prob; if first.&prob then output yes&i; else output yes&j; NOTE: Matching without replacement

Same thing for controls procsortdata = no&i; by &probrandnum; datano&ino&j; set no&i; by &prob; if first.&prob then output no&i; else output no&j; The randnum insures matching scores are pulled at random

Merge matches, end loop DATAmatch&i; MERGE yes&i(in= ina)no&i(in= inb); BY &prob; IF ina AND inb; run; %END ;

/* Adds all matches into a single data set */ DATAallmatches; SET %DO k = 1 %TO 20; match&k %END ; Concatenate all data sets with matches (N=20)

Create two data sets with IDs DATA allyes (RENAME = (id_y = &id depend_y = &depend)) allno (RENAME = (id_n = &id depend_n = &depend)); SET allmatches ;

Create one file of all matched IDs DATAmatchfile; SET allyesallno; And sort it … procsortdata = matchfile; by &id &depend ;

proc sort data = in&digits ; by &id &depend ;

/* Creates a data set of all subjects with n-digit match */ /* Creates a second data set of subjects with no match */ data matches&digitsin&nextin ; merge in&digits (in = ina) matchfile (in= inb) ; by &id &depend ; if ina and inb then output matches&digits ; else output in&nextin ;

JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH Title "Matches &roundto " ;proc freq data = matches&digits ; tables &depend ;run ;%end ; End loop. Now match to 4 decimal places, etc

/* Adds 1- to 5-digit matches into a single data set */ data &lib..finalset; set %do m = 1 %to 5; matches&m %end ;

One final check & done ! Title "Distribution of Dependent Variable in &lib..finalset " ; procfreqdata = &lib..finalset; tables &depend ; run; %mendpropen; run;

Did it work? ** P <.01 **** P < .0001

Model Comparison

Odds ratio

How near?

Nearest neighbor matching