1 / 7

Section 3.9: RETAIN & sum statements

Section 3.9: RETAIN & sum statements because all variables are set to missing at the start of each iteration of the DATA step, we need a statement to override this... RETAIN does the job - it keeps the previous iteration’s value of whatever variable you list in the RETAIN statement

ralph
Télécharger la présentation

Section 3.9: RETAIN & sum statements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 3.9: RETAIN & sum statements • because all variables are set to missing at the start of each iteration of the DATA step, we need a statement to override this... RETAIN does the job - it keeps the previous iteration’s value of whatever variable you list in the RETAIN statement • the sum statement is used to accumulate values of a variable ... see example on p.93 • Now use the RETAIN and sum statements to find the max. cost job and the total cost of all jobs for the data in section 3.5, p.85 - use retain maxcost; maxcost=max(maxcost,cost); totcost+cost;

  2. DATA homeimprovements; INPUT Owner $ 1-7 Description $ 9-33 Cost; IF Cost = . THEN CostGroup = 'missing'; ELSE IF Cost < 2000 THEN CostGroup = 'low'; ELSE IF Cost < 10000 THEN CostGroup = 'medium'; ELSE CostGroup = 'high'; DATALINES; 1234567890123456789012345678901234567890 Bob kitchen cabinet face-lift 1253.00 Shirley bathroom addition 11350.70 Silvia paint exterior . Al backyard gazebo 3098.63 Norm paint interior 647.77 Kathy second floor addition 75362.93 ; PROC PRINT DATA = homeimprovements; TITLE 'Home Improvement Cost Groups'; RUN;

  3. Now let’s consider Arrays in SAS - there are two types: implicit and explicit ... we’ll look at explicit only since they are the recommended type to use: • the ARRAY statement defines a set of variables (either all character or all numeric) so you may process them all at one time. An explicit array statement must contain a name for the array, a number that tells how many elements there are in the array, and a list of the elements (variables) in the array. • arrays are often processed in DO groups so that the same thing is done to all elements of the array. • See the example in section 3.10 on p. 94-95 • I is used as an index variable to refer to members of the array. I is incremented by 1 each time through the DO loop... • The array variables themselves do not become part of the DATA set, but I does

  4. DATA songs; INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr; *ratings on a scale of 1 to 5 for each of 10 songs; *missing values denoted by a 9 - not usual for SAS; *so create an array and replace 9 by . in each case; ARRAY song (10) domk wj hwow simbh kt aomm libm tr filp ttr; DO i = 1 TO 10; IF song(i) = 9 THEN song(i) = .; END; 12345678901234567890 Albany 54 4 3 5 9 9 2 1 4 4 9 Richmond 33 5 2 4 3 9 2 9 3 3 3 Oakland 27 1 3 2 9 9 9 3 4 2 3 Richmond 41 4 3 5 5 5 2 9 4 5 5 Berkeley 18 3 4 9 1 4 9 3 9 3 2 ; PROC PRINT DATA = songs; TITLE 'WBRK Song Survey'; RUN;

  5. There are a couple of different ways to shortcut variable names in SAS • make your variables start with the same characters and end with consecutive numbers: a1, a2 a3, ... a24; then you may abbreviate them as a1-a24 • if your variables are not set up as above, you may refer to consecutive ones with the double hyphen method: e.g., for INPUT x y r ca cb cc; you could then refer to these variables as : PROC PRINT; var y--cb; run; To find the internal order of variables in your dataset, use PROC CONTENTS POSITION; • there are a few special SAS name lists: _ALL_ refers to all the variables; _CHARACTER_ refers to all the character variables, and _NUMERIC_ refers to all the numeric variables in the dataset. (PUT _ALL_ and MEAN(of _NUMERIC_) are examples of how you might use these...)

  6. Notice how these abbreviations are used in the example program on page 97: DATA songs; INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr; ARRAY new (10) Song1 - Song10; ARRAY old (10) domk -- ttr; DO i = 1 TO 10; IF old(i) = 9 THEN new(i) = .; ELSE new(i) = old(i); END; AvgScore = MEAN(OF Song1 - Song10); DATALINES; Albany 54 4 3 5 9 9 2 1 4 4 9 Richmond 33 5 2 4 3 9 2 9 3 3 3 Oakland 27 1 3 2 9 9 9 3 4 2 3 Richmond 41 4 3 5 5 5 2 9 4 5 5 Berkeley 18 3 4 9 1 4 9 3 9 3 2 PROC PRINT DATA = songs; TITLE 'WBRK Song Survey'; RUN;

  7. Homework for Wednesday: • complete your reading of the textbook through Chapter 3 • look at the “oscars” dataset: • read the excel data into SAS • print it back out • begin exploring this dataset using the various methods we’ve talked about so far: • SORTing and PRINTing; MEANS for the numeric variables as appropriate; FREQ for the categorical variables with crosstabulations as you think might be interesting… • we’ll talk about it some more on wednesday… this data will be part of your midterm

More Related