440 likes | 457 Vues
EPIB 698E Lecture 4. Instructor: Raul Cruz-Cano raulcruz@umd.edu 10/2/2013. 1. Creating and Redefining Variables. You can create and redefine variables with assignment statements as follows: Variable =expression. Home gardener's data.
E N D
EPIB 698E Lecture 4 Instructor: Raul Cruz-Cano raulcruz@umd.edu 10/2/2013 1
Creating and Redefining Variables • You can create and redefine variables with assignment statements as follows: Variable =expression
Home gardener's data • Gardeners were asked to estimate the pounds they harvested for four corps:tomatoes, zucchini, peas and grapes. Here is the data: Gregor 10 2 40 0 Molly 15 5 10 1000 Luther 50 10 15 50 Susan 20 0 . 20 • Task: • add new variable group with a value of 14; • add variable type to indicate home gardener; • Create a new variable zucchini_1 which equals to zucchini*10 • derive total pounds of corps for each gardener; • derive % of tomatoes for each gardener
Home gardener's data DATA homegarden; INFILE‘E:\Garden.txt'; INPUT Name $ 1-7 Tomato Zucchini Peas grapes; group = 14; Type = 'home'; Zucchini_1= Zucchini * 10; Total=tomato + zucchini_1 + peas + grapes; PerTom = (Tomato / Total) * 100; Run;
Home gardener's data • Check the log window: Missing values were generated as a result of performing an operation on missing values. • Since for the last subject, we have missing values for peas, so we the variable total and PerTom, which are calculated from peas, are set to missing
SAS functions • SAS has over 400 functions, with the following general form: Function-name (argument, argument, …) • All functions must have parentheses even if they don’t require any arguments • Example: • X=Int(log(10)); • Mean_score = mean(score1, score2, score3); The Mean function returns mean of non-missing arguments, which differs from simply adding and dividing by their number, which would return a missing values if any arguments are missing
Common Functions And Operators • Functions ABS: absolute value EXP: exponential LOG: natural logarithm MAX and MIN: maximum and minimum SQRT: square root SUM: sum of variables Example:SUM (of x1-x10, x21) • Arithmetic: +, -, *, /, ** (not ^)
Example: pumpkin carving contest data • This data contains contestant’s name , age, type of pumpkin (carved or decorated), date of entry and the scores from 5 judges. Alicia Grossman 13 c 10-28-2003 7.8 6.5 7.2 8.0 7.9 Matthew Lee 9 D 10-30-2003 6.5 5.9 6.8 6.0 8.1 Elizabeth Garcia 10 C 10-29-2003 8.9 7.9 8.5 9.0 8.8 Lori Newcombe 6 D 10-30-2003 6.7 5.6 4.9 5.2 6.1 Jose Martinez 7 d 10-31-2003 8.9 9.510.0 9.7 9.0 Brian Williams 11 C 10-29-2003 7.8 8.4 8.5 7.9 8.0 • We will derive the means scores using the “Mean” function • Transform values of “type” to upper case • Get the day of the month from the SAS date
Example: pumpkin carving contest data DATA contest; INFILE"C:\Pumpkin.txt"; INPUT @1 Name $16. @18 Age 2. @21 Type $1. @23 Date MMDDYY10. (Scr1 Scr2 Scr3 Scr4 Scr5) (4.1); AvgScore= MEAN(Scr1,Scr2,Scr3,Scr4, Scr5); DayEntered = DAY(Date); Type = UPCASE(Type); run;
Working with SAS Date • A SAS date is a numeric value equal to the number of days since Jan. 1, 1960. For example:
Using IF-THEN statement • IF-THEN statement is used for conditional processing. Example: you want to derive means test scores for female students but not male students. Here we derive means conditioning on gender =‘female’ • Syntax: If condition then action; Eg: If gender =‘F’ then mean_score =mean(scr1, scr2);
Using IF-THEN statement List of Logical comparison operators Note: Missing numeric values will be treated as the most negative values you can reference on your computer
Using IF-THEN statement • Example: We have data contains the following information of subjects: Age Gender Midterm Quiz FinalExam 21 M 80 B- 82 20 F 90 A 93 35 M 87 B+ 85 48 F 80 C 76 59 F 95 A+ 97 15 M 88 C 93 • Task: To group student based on their age (<20, [20-40), [40-60), >=60)
data conditional; input Age Gender $ Midterm Quiz $2. FinalExam; datalines; 21 M 80 B- 82 20 F 90 A 93 35 M 87 B+ 85 48 F 80 C 76 59 F 95 A+ 97 15 M 88 C 93 ; run; data new1; set conditional; if Age < 20 then AgeGroup = 1; if 20 <= Age < 40 then AgeGroup = 2; if 40 <= Age < 60 then AgeGroup = 3; if Age >= 60 then AgeGroup = 4; Run;
Multiple conditions with AND and OR • IF condition1 and condition2 then action; • Eg: If age <40 and gender=‘F’ then group=1; If age <40 or gender=‘F’ then group=2;
IF-THEN statement, multiple conditions • Example: We have data contains the following information of subjects: Age Gender Midterm Quiz FinalExam 21 M 80 B- 82 20 F 90 A 93 35 M 87 B+ 85 48 F 80 C 76 59 F 95 A+ 97 15 M 88 C 93 • Task: To group student based on their age (<40, >=40),and gender
data new1; set conditional; If age <40 and gender='F' then group=1; If age >=40 and gender='F' then group=2; IF age <40 and gender ='M' then group=3; IF age >=40 and gender ='M' then group=4; run;
Note: Missing numeric values will be treated as the most negative values you can reference on your computer • Example: group age into age groups with missing values 21 M 80 B- 82 20 F 90 A 93 . M 87 B+ 85 48 F 80 C 76 59 F 95 A+ 97 . M 88 C 93
IF-THEN statement, with multiple actions • Example: We have data contains the following information of subjects: Age Gender Midterm Quiz FinalExam 21 M 80 B- 82 20 F 90 A 93 35 M 87 B+ 85 48 F 80 C 76 59 F 95 A+ 97 15 M 88 C 93 • Task: To group student based on their age, and assign test date based on the age group
Multiple actions with Do, end • Syntax: IF condition then do; Action1 ; Action 2; End; If age <=20 then do ; group=1; exam_date =“Monday”; End;
IF-THEN/ELSE statement • Syntax IF condition1 then action1; Else if condition2 then action2; Else if condition3 then action3; • IF-THEN/Else statement has two advantages than IF-THEN statement (1) It is more efficient, use less computing time (2) Else logic ensures that your groups are mutually exclusive so that you do not put one observation into more than one groups.
IF-THEN/ELSE statement data new1; set conditional; if Age < 20then AgeGroup = 1; else if Age >= 20and Age < 40then AgeGroup = 2; else if Age >= 40and Age < 60then AgeGroup = 3; elseif Age >= 60then AgeGroup = 4; run;
The IN operator • If you want to test if a value is one of the possible choices, you can use multiple “OR” statement like this: IF grade =‘A’ or grade =‘B’ or grade =‘C’ then PASS=‘yes’; • A alternative is to use a IN operator: IF grade in (‘A’ ‘B’ ‘C’) then PASS=‘yes’; IF grade in (‘A’ , ‘B’ ,‘C’) then PASS=‘yes’;
The iterative DO loop • Iterative DO loop is used to execute a group of SAS statements multiple times • One form of an iterative DO statements follows: Do index-variable =start to stop by increment; SAS statement; End; • Without increment, it defaults to 1
Example: You want to compute the total amount of money you will have if you start with $100 and invested it at a 3.75% interest rate for 3 years. data compound; Interest = .0375; Total = 100; do Year = 1to3; Total =Total + Interest*Total; output; end; format Total dollar10.2; run;
Example: suppose you want to generate a table of integers from 1 to 10, along with their squares and square roots: data table; do n = 1to10; Square = n*n; SquareRoot = sqrt(n); output; end; run;
Using a DO loop o graph an equation: data equation; do X = -10to10by.01; Y = 2*X**3 - 5*X**2 + 15*X -8; output; end; run;
Other Forms of an iterative DO loop • Do x=1, 2, 5, 10; • Do month=‘Jan’ ,‘Feb’ ,‘Mar’; • Do n=1,3, 5 to 9 by 2, 100 to 200 by 50; (values of n are: 1,3, 5, 7,9, 100, 150, 200) Data test; do gender='F' ,'M'; do x=1to5; if gender='F'then y=2*X; elseif gender='M'then y=x; output; end; end; run;
Do while and Do until statement • Instead of choosing a stopping value for an interactive DO loop, you can stop a loop when a condition is met or while a condition is true. The basic form is: Do until (condition); Statements; End; Do while (condition); Statements ; End;
Difference of DO UNTIL and DO WHILE • Do until: the loop continues to repeat until the condition ( in the parentheses) is true. The condition is tested at the bottom of the loop, thus a DO UNTIL loop always executes at least once • Do while: a DO WHILE loop iterates as long as the condition following WHILE is true. The condition is test at the top if the loop. A DO WHILE block does not execute even once if the condition is false
Back to the interest problem: instead of asking how much money you have after certain years, you want to know how many years you need to keep your money in the bank to double your money. data double; Interest = .0375; Total = 100; dountil (Total ge 200); Year + 1; /*same as Year=Year + 1; */ Total = Total + Interest*Total; output; end; format Total dollar10.2; run;
data double; Interest = .0375; Total = 100; dowhile (Total le 200); Year + 1; Total = Total +Interest*Total; output; end; format Total dollar10.2; run;
A caution when using a DO UNTIL statement • If a condition is never true in a DO UNTIL statement, you will have a infinite loop. Eg: do until (Total =200) ; • To prevent a infinite loop, you can combine a regular DO loop with an UNTIL condition: data double; Interest = .0375; Total = 100; do Year = 1to100until (Total = 200); Total = Total + Interest*Total; output; end; format Total dollar10.2; run;
LEAVE statement • The leave statement inside a DO loop shifts control to the statement following the END statement at the bottom of the loop
data leave_it; Interest = .0375; Total = 100; do Year = 1to100; Total = Total + Interest*Total; output; if Total ge 200thenleave; end; format Total dollar10.2; run;
Simplifying programs with Arrays • SAS Arrays are a collection of elements (usually SAS variables) that allow you to write SAS statements referencing this group of variables. • Arrays are defined using Array statement as: ARRAY name (n) variable list name: is a name you give to the array n: is the number of variables in the array eg: ARRAY store (4) macys sears target costco Store(1) is the variable for macys Store(2) is the variable for sears
Simplifying programs with Arrays • A radio station is conducting a survey asking people to rate 10 songs. The rating is on a scale of 1 to 5, with 1=Do not like the song; 5-like the song; • IF the listener does not want to rate a song, he puts a “9” to indicate missing values • Here is the data with location, listeners age and rating for 10 songs Albany 54 4 3 5 9 9 2 1 4 4 9 Richmond 33 5 2 4 3 9 2 9 3 3 3 Oakland 27 1 3 2 9 9 9 3 4 2 3 Richmond 41 4 3 5 5 5 2 9 4 5 5 Berkeley 18 3 4 9 1 4 9 3 9 3 2 • We want to change 9 to missing values (.)
Simplifying programs with Arrays DATA songs; INFILE'F:\radio.txt'; INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr; ARRAY song (10) domk wj hwow simbh kt aomm libm tr filp ttr; DO i = 1TO10; IF song(i) = 9THEN song(i) = .; END; run;
Using shortcuts for lists of variable names • When writing SAS programs, we will often need to write a list of variables names. When you have a data will many variables, a shortcut for lists of variables names is helpful • Numbered range list: variables which starts with same characters and end with consecutive number can be part of a numbered range list • Eg : INPUT cat8 cat9 cat10 cat11 INPUT cat8 – cat11
Using shortcuts for lists of variable names • Name range list: name range list depends on the internal order, or position of the variables in a SAS dataset. This is determined by the appearance of the variables in the DATA step. • Eg : Data new; Input x1 x2 y2 y3; Run; • Then the internal range list is: x1 x2 y2 y3 • Shortcut for this variable list is x1-y3; • Proc contents procedure with the POSITION option can be used to find out the internal order
Using shortcuts for lists of variable names DATA songs; INFILE'F:\radio.txt'; INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr; ARRAY new (10) Song1 - Song10; ARRAY old (10) domk wj hwow simbh kt aomm libm tr filp ttr; DO i = 1TO10; IF old(i) = 9THEN new(i) = .; ELSE new(i) = old(i); END; AvgScore = MEAN(OF Song1 - Song10); run;
Homework #3 • Due 10/8/2013