230 likes | 362 Vues
Lesson 8 - Topics. Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs 14-15 in course notes LSB 4:11;5:3. Making SAS Datasets From Procedures Output from SAS PROCs can be put into SAS datasets:
E N D
Lesson 8 - Topics • Creating SAS datasets from procedures • Using ODS and data steps to make reports • Using PROC RANK • Programs 14-15 in course notes • LSB 4:11;5:3
Making SAS Datasets From Procedures Output from SAS PROCs can be put into SAS datasets: 1. To do further processing of the information from the output 2. To reformat output to make a report 3. To restructure original SAS dataset or create new variables
Ways to Put Output into SAS Datasets • Using OUTPUT statement available from many procedures • Using ODS OUTPUT statement – any output table can be put into a SAS dataset
Report We Want to Generate Quartiles of Weight by Gender and Center sex clinic N P25 P50 P75 Male A 9 180.0 190.0 208.0 Male B 16 158.3 174.8 218.3 Male C 29 178.0 199.5 220.5 Male D 11 172.0 184.5 194.0 Female A 6 125.0 143.5 160.5 Female B 9 150.0 164.5 184.0 Female C 6 132.5 134.3 138.5 Female D 6 131.0 137.5 148.5
Program 14 LIBNAME class ‘C:\SAS_Files'; * Will use SAS dataset version of TOMHS data; DATA wt; SET class.tomhsp (KEEP=ptid age sex clinic wtbl wt12 ); wtchg = wt12 - wtbl; RUN; PROCFORMAT; VALUEsexF 1 = ‘Male’ 2=‘Female’; RUN;
Create report by sex and clinic of univariate info; PROCSORTDATA = wt; BY sex clinic; PROCUNIVARIATEDATA = wt NOPRINT; BY sex clinic; VAR wt12 ; OUTPUTOUT=univinfo N = n Q1 = p25 MEDIAN = p50 Q3 = P75 ; Dataset univinfo will have one observation for each combination of sex and clinic. Name of new dataset Statistic name = variable name
PROCPRINTDATA = univinfo; FORMAT sex sexF.; RUN; Obs sex clinic n p75 p50 p25 1 Male A 9 208.00 190.00 180.00 2 Male B 16 218.25 174.75 158.25 3 Male C 29 220.50 199.50 178.00 4 Male D 11 194.00 184.50 172.00 5 Female A 6 160.50 143.50 125.00 6 Female B 9 184.00 164.50 150.00 7 Female C 6 138.50 134.25 132.50 8 Female D 6 148.50 137.50 131.00
PROCPRINTDATA = univinfo NOOBS; VAR sex clinic n p25 p50 p75; FORMAT p25 p50 p75 6.1 ; TITLE'Quartiles of Weight by Gender/Center'; RUN; Quartiles of Weight by Gender/Center sex clinic N P25 P50 P75 Male A 9 180.0 190.0 208.0 Male B 16 158.3 174.8 218.3 Male C 29 178.0 199.5 220.5 Male D 11 172.0 184.5 194.0 Female A 6 125.0 143.5 160.5 Female B 9 150.0 164.5 184.0 Female C 6 132.5 134.3 138.5 Female D 6 131.0 137.5 148.5
Using ODS to Send Output to a SAS Dataset Syntax: ODSOUTPUT output-table = new-data-set; * Output quantile table to a dataset; ODSOUTPUT quantiles = qwt; PROCUNIVARIATEDATA = wt ; VAR wtbl wt12 ; RUN; ODSOUTPUT CLOSE ; PROCPRINTDATA=qwt; RUN;
Display of Output Dataset Obs Varname Quantile Estimate 1 wtbl 100% Max 279.30 2 wtbl 99% 274.15 3 wtbl 95% 246.40 4 wtbl 90% 237.40 5 wtbl 75% Q3 215.15 6 wtbl 50% Median 192.65 7 wtbl 25% Q1 165.90 8 wtbl 10% 141.50 9 wtbl 5% 137.40 10 wtbl 1% 130.25 11 wtbl 0% Min 128.50 12 wt12 100% Max 271.50 13 wt12 99% 271.50 14 wt12 95% 239.00 15 wt12 90% 227.00 16 wt12 75% Q3 202.50 17 wt12 50% Median 180.00 18 wt12 25% Q1 153.50 19 wt12 10% 133.00 20 wt12 5% 130.00 21 wt12 1% 123.00 22 wt12 0% Min 123.00 Would like to put side-by-side
DATA wtbl wt12 ; SET qwt; if varname = 'wtbl' then output wtbl; else if varname = 'wt12' then output wt12; RUN; PROC DATASETS ; MODIFY wtbl; RENAME estimate = wtbl; MODIFY wt12; RENAME estimate = wt12; RUN; DATA all; MERGE wtbl wt12; DROP varname; RUN; PROC PRINT; Separate the data into 2 datasets PROC DATASETS used for changing variable names Put 2 datasets side-by-side
Obs Quantile wtbl wt12 1 100% Max 279.30 271.50 2 99% 274.15 271.50 3 95% 246.40 239.00 4 90% 237.40 227.00 5 75% Q3 215.15 202.50 6 50% Median 192.65 180.00 7 25% Q1 165.90 153.50 8 10% 141.50 133.00 9 5% 137.40 130.00 10 1% 130.25 123.00 11 0% Min 128.50 123.00
ODS OUTPUT ParameterEstimates (persist=proc) = betas; PROC REGDATA=WT; MODEL dbpchg = wtchg age sex; RUN; PROC REGdata=wt; MODEL sbpchg = wtchg age sex; RUN; ODS OUTPUT CLOSE; PROC PRINT DATA=betas; RUN;
Display of Output Dataset - Report Obs Dependent Variable Estimate StdErr tValue Probt 1 dbpchg Intercept -0.059 6.431 -0.01 0.99 2 dbpchg wtchg 0.175 0.073 2.38 0.02 3 dbpchg age -0.101 0.112 -0.91 0.37 4 dbpchg sex -2.622 1.572 -1.67 0.10 5 sbpchg Intercept -3.849 13.304 -0.29 0.77 6 sbpchg wtchg 0.364 0.152 2.40 0.02 7 sbpchg age -0.042 0.231 -0.18 0.86 8 sbpchg sex -4.118 3.253 -1.27 0.21
Display of Output Dataset Using BY Statement PROC PRINT; VAR variable estimate stderr tvalue probt; BY dependent NOTSORTED; FORMAT estimate 7.3 stderr 7.3 probt pvalue5.2 ; Dependent=dbpchg Obs Variable Estimate StdErr tValue Probt 1 Intercept -0.059 6.431 -0.01 0.99 2 wtchg 0.175 0.073 2.38 0.02 3 age -0.101 0.112 -0.91 0.37 4 sex -2.622 1.572 -1.67 0.10 Dependent=sbpchg Obs Variable Estimate StdErr tValue Probt 5 Intercept -3.849 13.304 -0.29 0.77 6 wtchg 0.364 0.152 2.40 0.02 7 age -0.042 0.231 -0.18 0.86 8 sex -4.118 3.253 -1.27 0.21
PROC RANK • Used to divide observations into equal size categories based on values of a variable • Creates a new variable containing the categories • New variable is added to the dataset or to a new dataset • Example: Divide weight change into 5 equal categories (Quinitiles)
PROC RANK SYNTAX PROCRANKDATA = dataset OUT = outdataset GROUPS = # of categories VARvarname; RANKSnewvarname; Most of the time you can set OUT to be the same dataset specified in DATA. PROC RANK writes no output
PROGRAM 15 LIBNAME class ‘C:\SAS_Files'; DATA wtchol; SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12); wtchg = wt12 - wtbl; cholchg = chol12 - cholbl; RUN; *This PROC will add a new variable to dataset which is the tertile of weight change. The new variable will be 0,1,or 2; PROCRANKDATA = wtchol GROUPS=3OUT = wtchol; VAR wtchg; RANKS twtchg; Name of new variable
PARTIAL LOG 8 DATA wtchol; 9 SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12); 10 wtchg = wt12 - wtbl; 11 cholchg = chol12 - cholbl; 12 RUN; NOTE: There were 100 observations read from the data set CLASS.TOMHSP. NOTE: The data set WORK.WTCHOL has 100 observations and 9 variables. PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol; 20 VAR wtchg; RANKS twtchg; 21 RUN; NOTE: The data set WORK.WTCHOL has 100 observations and 10 variables.
PROCFREQDATA = wtchol; TABLES twtchg; RUN; OUTPUT: Rank for Variable wtchg Cumulative Cumulative twtchg Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 31 33.70 31 33.70 1 30 32.61 61 66.30 2 31 33.70 92 100.00 Frequency Missing = 8
PROCPRINTDATA = wtchol (obs=20); VAR ptid wtchg twtchg; TITLE'Partial Listing of Datset wtchol with new variable added'; RUN; Partial Listing of Datset wtchol with new variable added Obs PTID wtchg twtchg 1 A00083 -12.00 1 2 A00301 . . 3 A00312 -9.50 1 4 A00354 -21.00 0 5 A00400 . . 6 A00504 -9.25 1 7 A00608 . . 8 A00720 -18.50 0 9 A00762 -5.25 2 10 A00811 -6.75 1
PROCMEANSN MEAN MIN MAXMAXDEC=2; VAR cholchg wtchg; CLASS twtchg; TITLE'Mean Cholesterol Change by Tertile of Weight Change'; RUN;
Mean Cholesterol Change by Tertile of Weight Change The MEANS Procedure Rank for Variable N wtchg Obs Variable N Mean Minimum Maximum -------------------------------------------------------------------------- 0 31 cholchg 30 -13.43 -55.00 47.00 wtchg 31 -22.51 -36.50 -14.30 1 30 cholchg 30 -4.70 -37.00 26.00 wtchg 30 -10.21 -14.00 -6.80 2 31 cholchg 31 -0.74 -52.00 45.00 wtchg 31 -1.82 -6.50 13.00 -------------------------------------------------------------------------- Could graph this data in an x-y plot (3 points) Cutpoints for tertiles