280 likes | 407 Vues
Learn to modify existing variables and create new ones in SAS. Understand syntax for recategorizing variables and create meaningful formats for output.
E N D
Prerequisites • Recommended modules to complete before viewing this module • 1. Introduction to the NLTS2 Training Modules • 2. NLTS2 Study Overview • 3. NLTS2 Study Design and Sampling • NLTS2 Data Sources, either • 4. Parent and Youth Surveys or • 5. School Surveys, Student Assessments, and Transcripts • NLTS2 Documentation • 10. Overview • 11. Data Dictionaries • 12. Quick References
Prerequisites • Recommended modules to complete before viewing this module (cont’d) • 13. Analysis Example: Descriptive/Comparative Using Longitudinal Data • Accessing Data • 14b. Files in SAS • 15b. Frequencies in SAS
Overview • Purpose • Modifying existing variables • Creating new variables • Summary • Closing • Important information
NLTS2 restricted-use data NLTS2 data are restricted. Data used in these presentations are from a randomly selected subset of the restricted-use NLTS2 data. Results in these presentations cannot be replicated with the NLTS2 data licensed by NCES.
Purpose • Learn to • Modify an existing variable • Create a new variable • Join/combine data from different sources
Modifying existing variables • How to modify a variable. • To collapse categories, break a continuous variable into categories, or recode a variable, it is not always necessary to create a new variable in SAS. • User-assigned formats control how output prints but does not change the variable. • Syntax for categorizing an existing variable with a format PROC FORMAT ; VALUE b2catfmt low-1 = "(<=1) 1 or younger" 2-5 = "(2-5) 2 to 5 years of age" 6-10 = "(6-10) 6 to 10 years of age" 11-high = "(>=11) 11 or older" ; PROC FREQ data = collapse ; TABLES np1B2a ; FORMAT np1B2a b2catfmt. ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Syntax to modify an existing variable • Create a new variable rather than permanently changing the exiting variable • Create a new format so values are meaningful PROC FORMAT ; VALUE b2catfmt 1 = "(1) 1 or younger" 2 = "(2) 2 to 5 years of age" 3 = "(3) 6 to 10 years of age" 4 = "(4) 11 or older" ; • Recode the variable in a data step • This would result in a temporary change. Why? What would make it a permanent change? DATA collapse ; SET sasdb.n2w1parent ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Syntax to recode an existing variable into a new variable with value and variable labels. /* create age of youth when diagnosed – with age range categories*/ if missing(np1B2a) then np1B2a_Cat = np1B2a ; else if np1B2a <= 1 then np1B2a_Cat = 1 ; else if 2<=np1B2a<=5 then np1B2a_Cat = 2 ; else if 6<=np1B2a<=10 then np1B2a_Cat = 3 ; else if np1B2a > 10 then np1B2a_Cat = 4 ; FORMAT np1B2a_Cat b2catfmt. ; LABEL np1B2a_Cat = '(np1B2a_cat) Age of youth when diagnosed - categorized into ranges' ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables • Look at results • Run a frequency of the new variable • Useful to look at a crosstab of the original variable by the new variable to check how values were coded • Look at frequency distributions and crosstab of new vs. old variables • The “LIST” option on TABLES statement will print the crosstab table more compactly. • A FORMAT statement without a format specified will strip existing formats. TABLES np1B2a_Cat * np1B2a/MISSPRINT LIST ; FORMAT np1B2a ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • Modifying a variable • Use Wave 3 parent/youth interview file • Collapse np3NbrProbs into a new variable • 0-1 • 2 • 3 • 4-6 • Remember to • Label the variable. • Add value formats. • Account for missing values. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • PROC FREQ with a user-defined format (no change made to np3NbrProbs) These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • PROC FREQ with new variable np3NbrProbs_Cat created from np3NbrProbs These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Modifying existing variables: Example • Created np3NbrProbs_Cat compared with original np3NbrProbs • Stripped existing formats from np3NbrProbs with format statement • FORMAT np3NbrProbs; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • How to create a new variable. • The values in the new variable can be the results of calculations, assignments, or logic. • A new variable can be created from an existing variable or from multiple variables, including variables from other sources and/or waves. • Variables from other sources/waves must be added to the active data file before creating the new variable. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • Be aware of any coding differences between the variables when combining values. • Decide what to do with missing values. • Example: Create a variable using parent interview data from Waves 1, 2, and 3. • Has student been suspended and/or expelled in any wave? These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables Create a format for the new variable and join data needed PROC FORMAT ; VALUE fmta 0 = "(0) Never suspended/expelled" 1 = "(1) Suspended or expelled in any wave" 2 = "(2) Suspended or expelled every wave" ; DATAcollapse ; MERGE sasdb.n2w1parent (keep=ID np1d7h) sasdb.n2w2paryouth (keep=ID np2d5d) sasdb.n2w3paryouth (keep=ID np3d5d) sasdb.n2w4paryouth(keep=ID np4d5d) ; BY ID ; These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables • Syntax If np1D7h>=0 and np2D5d>=0 and np3D5d>=0 and np4D5d>=0then do ; if np1D7h=1 and np2D5d=1 and np3D5d=1 and np4D5d=1 then np4D5d_ever = 2 ; else if np1D7h=1 or np2D5d=1 or np3D5d=1 or np4D5d=1 then np4D5d_ever = 1 ; else np4D5d_ever = 0 ; end ; • Code will result in a variable that • Requires a value for every wave • Is 0 if never suspended/expelled • Is 1 if suspended/expelled in any wave • Is 2 if suspend/expelled in all three waves. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example • Creating a new variable • Use the Wave 4 parent/youth interview file. • Bring in np1F7 from Wave 1, np2P8_J4 from Wave 2, and np3P8_J4 from Wave 3 interview files. • Create a new variable np4P8_J4_ever (ever done volunteer or community service). • Initialize value to “0” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “0.” • Reassign to “1” if any value in np1F7, np2P8_J4, np3P8_J4, or np4P8_J4 is “1.” These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example • Creating a new variable (cont’d) • Assign a variable label and value labels. • Run a frequency of np4P8_J4_ever. • Run a crosstabulation of np4P8_J4_ever by np4P8_J4. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Creating new variables: Example These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Summary • Be aware of differences in coding between similar variables when building composite variables. • Missing values must be considered. • Know how missing values are being coded, particularly when using more than one variable to create another. • Joined data are more likely to have missing values. • Weights • Generally, the analysis weight would be the weight from the smallest sample when combining data. • When filling in values for a variable in an active file with values from another, it is OK to use the weight in the active file. These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Summary Know the values, mind the missing, and watch your weights! These results cannot be replicated with full dataset; all outputin modules generated with a random subset of the full data.
Closing • Topics discussed in this module • Modifying existing variables • Creating new variables • Summary • Next module: • 18b. PROC SURVEY Procedures in SAS
Important information • NLTS2 website contains reports, data tables, and other project-related information http://nlts2.org/ • Information about obtaining the NLTS2 database and documentation can be found on the NCES website http://nces.ed.gov/statprog/rudman/ • General information about restricted data licenses can be found on the NCES websitehttp://nces.ed.gov/statprog/instruct.asp • E-mail address: nlts2@sri.com