1 / 48

Introduction to the SAS ® System’s PROC FORMAT

Introduction to the SAS ® System’s PROC FORMAT . Ben Cochran The Bedford Group. A SAS Institute Alliance Partner bedfordgroup@nc.rr.com www.bedford-group.com. Introduction . A format is a set of instructions to the SAS system about how to WRITE or DISPLAY a stored value.

issac
Télécharger la présentation

Introduction to the SAS ® System’s PROC FORMAT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the SAS® System’s PROC FORMAT Ben Cochran The Bedford Group A SAS Institute Alliance Partner bedfordgroup@nc.rr.com www.bedford-group.com

  2. Introduction A format is a set of instructions to the SAS system about how to WRITE or DISPLAY a stored value. For example, the number 9876 can be written several ways: There are dozens of default formats that ship with the SAS system. Or, you can create your own with PROC FORMAT. more 2

  3. Introduction Use PROC FORMAT to: • create tables that store coded values and the definitions of the codes • reference these user-created formats when a table lookup operation is . needed. PROC FORMAT can be used to create: • value formats descriptive labels for numeric or character values • picture formats numeric templates such as 999-99-9999 for SSNs • informats directions to SAS on how to READ values User defined FORMATs and INFORMATS: • are stored in SAS catalogs • can be temporary or permanent more 3

  4. Introduction • creates user - defined formats • can create user - defined informats • does not generate output PROC FORMAT: Typical Syntax of the FORMAT Procedure : proc format options; value name options range1 = ‘label1’ range2=‘label2’ ; run ; Note: Formats can be used on a PUT or FORMAT statement, or with the PUT function. Examples  4

  5. Value Statement • single numbers - valueanswer1 =‘ Yes ’ 2=‘ No ’; • range of numbers - value gfmt low – 10 = ‘ Group 1 ’ . 11 – 20 =‘ Group 2 ’. 21 – high = ‘ Group 3 ’; • several values - value$state‘APEX’,’CARY’,’RALEIGH’=‘NC’.‘ATLANTA’,’SAVANNAH’= ‘GA’; • other -value sexfmt 1 = ‘ FEMALE ’ . 2 = ‘ MALE ’. other = ‘ ?‘ ; VALUE statement examples : example 5

  6. The VALUE Statement Create a format for monthly values : print step 6

  7. The VALUE Statement Generate a report to show the unformatted values of MONTH. Notice the unformatted values of MONTH. formatted 7

  8. The VALUE Statement Recall the program and format the values of MONTH. Notice the formatted values of MONTH.  8

  9. Examine SASHELP.PM proc freq data=sasuser.pm; run; Partial PROC FREQ output… 9

  10. Examining SASHELP.PM Partial PROC FREQ output… Note: There is also a variable REGION that has 2 values: NORTH AMERICA, and OTHER. Not all variables are appropriate to be processed with the FREQ procedure. 10

  11. PROC FREQ proc freq data = sasuser.pm ; tables region * code / nofreq; run ; Create a TWO WAY report from CODE and REGION. Notice the values of CODE and how they are formatted in this report. 11

  12. PROC FREQ Task: Use PROC FORMAT to help reformat the report so that the values of CODE ‘ fit ’ into the report. This is an example of formatting character values. proc format ; value $cdfmt ‘SECOND DAY’= ‘2nd DAY ’ ‘THIRD DAY’ = ‘3rd DAY ’ ‘NEXT WEEK’ = ‘Next Week’ ; proc freq data=sasuser.pm; format code $cdfmt. ; tables region * code / nofreq ; run ; more  12

  13. PROC FREQ Formatting Numeric Values Task: Use PROC FORMAT to format the values of a numeric variable. proc format; value incfmt LOW - 1000 = ‘Small ’ 1001 - 5000 = ‘Medium’ 5001 - HIGH = ‘Large ’ ; proc freq data=sasuser.pm; tables region * income / norow nocol nopercent ; format incomeincfmt. ; run ; 13

  14. FUZZ Factor Demonstrate the use of the FUZZ option on the VALUE statement. 14

  15. The PICTURE Statement You can also create formats with the PICTURE statement. The format (picture) names: • are valid SAS names (begin with a letter or underscore, 8 characters or . less ) • are unique (cannot be the same as a SAS format ) . Pictures: • are valid for numeric variables only • are a sequence of characters in quotes that specify how a number is to . be formatted – like a template • are created with 3 types of characters • 0 : defines positions for digits (leading zeros not printed) • 1 – 9 : defines positions for digits includes leading zeros • non – numeric : message characters (printed after numeric digits) Example: picture combin low – high = ’99-99-99’ ; more  15

  16. The PICTURE Statement Demonstrate the PICTURE statement. more  Why isn’t there a ‘$’ for the negative values of MONEY ? 16

  17. The PICTURE Statement Enhance the program to add a PREFIX to negative values. more  17

  18. The INVALUE Statement Create an INFORMAT with the INVALUE statement: more  18

  19. Creating Permanent Formats You can store a format in a permanent catalog so you won’t have to re- create a format every time a program is run. The LIBRARY= option in the PROC FORMAT statement specifies the storage location. The typical form of a PROC FORMAT statement is: proc format library =libref < .catalog > ; If the LIBRARY = option: • is missing, formats are stored in the WORK.FORMATS catalog • specifies only a libref, formats are stored in libref.FORMATS catalog • specifies libref.catalog, formats are stored in that catalog. Write access to the libref data library is required to create formats. Without the LIBRARY = option, the formats are stored temporarily. more  19

  20. Creating Permanent Formats The typical form of a PROC FORMAT step is: proc format library = libref < .catalog > ; value$char- format‘value1’ = ‘formatted - value – 1’ ‘value2’ = ‘formatted – value –2’ ‘value3’ = ‘formatted – value –3’; value num-format value = ‘formatted – value –1’ value = ‘formatted – value– 2’ value= ‘formatted – value– 3’; run; Character formats: • have names that are valid SAS names, except they begin with a ‘$’ and can . NOT end in a number • have quoted values • can only be associated with a character variable Numericformats: • have names that are valid SAS names but can NOT end in a number • can only be associated with a numeric variable more  20

  21. Creating Permanent Formats Task: Create both a Character and a Numeric format and store them permanently in the SAS_3 library: Notice the use of the keywords: other, low, and high. • other - a catch-all category • low - the lowest value of the variable with which this format will be associated • high - the highest value of the variable with which this format will be associated more  22

  22. Creating Permanent Formats Looking at the log verifies the storage location as SAS_3.FORMATS. These two formats ( $reg_fmt and amt_fmt) are stored in the FORMATS catalog of the SAS_3 library. The formats cannot be edited, so save the source code that is used to create the formats. more  23

  23. Using Permanent Formats Formats are referenced in: • FORMAT statements • PUT statements • PUT functions When a user defined format is referenced, SAS: • loads the format into memory from the catalog entry • performs a binary search on values in the table to execute a lookup • returns a single result for each lookup operation To help SAS find your user defined format quickly, use the FMTSEARCH= option. The typical form of the FMTSEARCH= option is: options fmtsearch=(item-1, item-2, item-3… item-n); item is either libref or libref.catalog example  24

  24. Using Permanent Formats Task: Use the FMTSEARCH option to tell SAS where to search for the user defined formats that are to be used in the PROC FORMAT step. Notice: • the OPTIONS statement • the FORMAT statement • the options on the TABLE statement What kind of variable is STORE? How many values does it have? What kind of variable is AMOUNT? How many values does it have? output  25

  25. Using Permanent Formats PROC FREQ Output... PUT  26

  26. Using Permanent Formats Using the PUT Function You can use the PUT function to return the formatted value of a variable for use in an expression of assignment statement. The typical form of the PUT function: . . . put (argument, format) . . . The PUT funtion: • always returns a character string • writes the value defined ( argument ) using the format specified • aligns the result depending on the type ( $ or num.) of format • requires that argument and format agree in type • can be used in a WHERE statement or subsetting IF to produce a subset . based on the formatted value. output  27

  27. Using Permanent Formats Using the PUT Function Illustrate how the PUT function works in an assignment statement. store region $ 4. $ 13. 850 Western region = put ( store, $regfmt.); The length of the variable REGION is determined by the default length of the character string returned by the format $REGFMT. In many cases, expecially if a permanent format exist, the PUT function requires less coding than IF – THEN / ELSE statements. example  28

  28. Using Permanent Formats Task: Use the PUT function to subset data, as well as creating a new variable. Note: In this example, a variable to store the values of REGION was NOT created. The logic in the WHERE statement is represented in the following two statements: region = put(store, $reg_fmt); where region= ‘Midwest’; In many cases, especially if a permanent format exist, the PUT function requires less coding than IF – THEN / ELSE statements. output  29

  29. Using Permanent Formats Task: Print the first 22 observations of the MIDWEST data set. example  30

  30. Creating Formats From SAS Data Sets You can create a format from a SAS data set. This is referred to as a CONTROL DATA SET, and it is read in to a format using the CNTLIN = option in PROC FORMAT. The typical form of the CNTLIN option is: proc format library = libref.catalog CNTLIN=SAS data set; run; The CNTLIN data set: • must contain the variables: FMTNAME, START, and LABEL • must contain the variable TYPE for character formats, unless the value for . FMTNAME begins with a ‘$’. • does not require a TYPE variable for a numeric format • can be used to create new formats, as well as edit existing ones • must be sorted by FMTNAME if multiple formats are specified. example  31

  31. Creating Formats From SAS Data Sets The STORES data set will be used as a CNTLIN data set to create a permanent format in the SAS_3.FORMATS catalog. Notice the names of the variables. example  32

  32. Creating Formats From SAS Data Sets Manipulate the STORES data set to make it suitable as a CNTLIN data set. Then write the PROC FORMAT step to create the permanent format. data stores; set sas_3.stores(rename=(store=start store_name=label)); fmtname = ‘strfmt’; type=‘C’; run; proc format library = sas_3.formats CNTLIN=stores ; run; Notice the use of the RENAME= option on the SET statement. Also note the name of the format. log  33

  33. Creating Formats From SAS Data Sets An examination of the log reveals that the $STRFMT format was created and is stored permanently in the SAS_3.FORMATS catalog. log  34

  34. Documenting Formats To obtain more information about formats, use the FMTLIB option in the PROC FORMAT statement. proc format library = sas_3 FMTLIB; run; log  35

  35. Documenting Formats SAS Output continued…. By default, documentation on all formats is displayed. To control which format gets displayed, use a SELECT statement with the format name. edit  36

  36. Maintaining Formats One way to maintain permanent formats is to use Control Data Sets with PROC FORMAT. FMT_ DATA Formats Catalog proc format library = sas_3.formats CNTLOUT = fmt_data; run;   EDITING PROCESS Formats Catalog FMT_ DATA proc format library = sas_3.formats CNTLIN = fmt_data; run;  • Use the CNTLOUT option to create an output dataset that has info on all formats, • Use PROC FSEDIT, or VIEWTABLE to add, delete or change information • Use the CNTLIN option to re-create the formats from the modified data set. example  37

  37. Maintaining Formats The Atlanta Eastside store has decided to move to Miami. Although their location has changed, their store id number remains the same: 200. Task: Change the $STRFMT format so that store 200 has a location of Miami. proc format library = sas_3.formats CNTLOUT = fmt_data; select $strfmt; run; proc print data = fmt_data(obs=7); run; example  38

  38. Maintaining Formats The first 7 observations and first 16 variables of the FMT_DATA data set. The objective is to change the value of ‘Atlanta Eastside’ to ‘Miami’ in observation 4. Partial PROC PRINT output PROC FSVIEW will be used to make these changes, but first, examine PROC CONTENTS output on the next page…. more  39

  39. Maintaining Formats PROC CONTENTS output… more  40

  40. Maintaining Formats To edit the data set, submit the following statements: proc fsview data = fmt_data; run; Next, issue the ‘ Modify Member ‘ command to open the window in Edit mode. The ‘E’ indicates we are in Edit mode. Next, move the cursor over to the LABEL column of the 4th row and type ‘ Miami ‘ over ‘ Atlanta Eastside ‘. Then, ‘end’ out of the window to save the changes. more  41

  41. Maintaining Formats Next, submit the following statements: procformatlibrary = sas_3 cntlin=fmt_data; select $strfmt; run; The following report appears in the Output window… notice line 4… more  42

  42. Advantages of Formats Advantages of using formats for table lookups: • formats can be used in PROC steps, frequently eliminating the need to . preprocess data in a DATA step. • values to be looked up can be discrete values, one of a list of values, or a range . of values. • DATA step lookups using the PUT function with a format require less CPU time . than a comparable MERGE step. • by default, a binary search is used to locate the value to be looked up. Disadvantages of using formats for table lookups: • each format can return only one value as the result of a lookup operation. • the values of only one variable are used to perform the lookup operation. • since the entire format must be loaded into memory, the size of the format is . limited by amount of available memory. • only one format can be applied to each variable using the FORMAT statement. 5.4  43

  43. V8 Enhancements The new enhancement to PROC FORMAT is the MULTILABLE option on the VALUE statement. This option allows multiple values for a given range or overlapping ranges. What this means, is that a variable’s value can be in two or more different ranges. The SASUSER.HOUSES data set is used to illustrate the new features of PROC FORMAT. Notice the number of observations. Notice the values of PRICE. 44

  44. V8 Enhancements Notice the placement of the MULTILABEL option on the VALUE statement in PROC FORMAT. Notice the use of the MLF option on the CLASS statement in PROC TABULATE. output  45

  45. V8 Enhancements How many observations in the SASUSER.HOUSES dataset? Add up the columns and compare that amount to the FREQUENCY cell. What is going on? PRINT  46

  46. V8 Enhancements Task: Illustrate how the multilabel optionworks. Create a new data set named houses with a new variable called PRICE2.Format the PRICE variable and leave PRICE2 unformatted. Use PROC PRINT to view the results. MEANS  47

  47. Version 8 Enhancements Date formats can now have users specified separators. The following are used to specify the separator : • B separates with a blank • C “ “ “ colon • D “ “ “ dash • P “ ““ period • S “ “ “ slash (default) • N indicates NO separator. • ex.mmddyyd10.  12-31-1999 more 48

  48. The End Any Questions?????????? more 49

More Related