“LAG with a WHERE” and other DATA Step Stories

# “LAG with a WHERE” and other DATA Step Stories

Télécharger la présentation

## “LAG with a WHERE” and other DATA Step Stories

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. “LAGwith a WHERE”and other DATA Step Stories Neil Howard A

2. Table of Contents • Chapter 1“LAG with a WHERE” • Chapter 2“A DIFferent LAG” • Chapter 3“To LAG or to LEAD” • Chapter 4“When RETAIN Doesn’t Retain” • Chapter 5“Don’t Order My Variables Around” • Chapter 6“The Case of the Missing Values”

3. Chapter One LAG with a WHERE

4. “LAG with a WHERE” • an interesting fairy tale • must first understand: • LAG function • WHERE statement • implications of conditional execution

5. LAG function • Syntax: • nspecifies number of lagged values • argument is numeric or character LAG<n>(argument)

6. LAG function • LAG functions return values from a queue • A LAGn function stores a value in a queue and returns a value stored previously in that queue • Each occurrence of a LAGnfunction generates its own queue • n is the length of the queue

7. LAG function • LAG function is executable • LAG function can be conditionally executed • NOTE: storing and returning values from the queue occurs only when the function is executed

8. SIMPLE LAG data new; input x @@; lag1=lag1(x); lag2=lag2(x); cards; 1 2 3 4 5 6 ;

9. (Note initialization to missing) X LAG1 LAG2 1 . . 2 1 . 3 2 1 4 3 2 5 4 3 6 5 4

10. CONDITIONAL LAG data new; input a b @@; LAGa = LAG(a); if b=2 then LAGb=LAG(a); cards; 1 1 2 1 3 2 4 1 5 2 6 1 ;

11. A B LAGA LAGB 1 1 . . 2 1 1 . 3 2 2 . 4 1 3 . 5 2 4 3 6 1 5 .

12. Every other lagged value ? data new; input x @@; * conditional; if mod(x,2)=0 then condLAG1 = lag(x); LAGx=lag(x); * unconditional; if mod(x,2)=0 then condLAG2 = LAGx; cards; 1 2 3 4 5 6 7 8 ;

13. right answer X LAGx condLAG1 condLAG21 . . . 2 1 . 1 3 2 . . 4 3 2 3 5 4 . . 6 5 4 5 7 6 . . 8 7 6 7

14. WHERE statement • Selects observations before they’re brought into the LPDV • After data set options applied • Before any other data step statements executed, including SET, BY, etc. • Functions differently with BY and first. and last. • Only works w/ SAS data (not raw data)

15. Given this data: VISIT WEIGHT 01JAN2003 88 02JAN2003 22 03JAN2003 154 04JAN2003 21 05JAN2003 112 CUTOFF

16. WHERE Subsetting IF data w ; set q ; lagwgt = lag(weight) ; where visit>"01jan2003"d ; run ; data w ; set q ; lagwgt = lag(weight) ; if visit > "01jan2003"d ; run ; • DIFFERENCE? • WHERE will not pick up first lagged value • subsetting IF will…

17. Output from WHERE VISIT WEIGHT LAGWGT 02JAN2003 22 . 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21

18. Output from subsetting IF VISIT WEIGHT LAGWGT 02JAN2003 22 88 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21

19. Chapter Two A DIFferent LAG

20. “A DIFferent LAG” • DIF function • Syntax: • n specifies number of lags • argument is numeric DIF<n>(argument)

21. DIF function • DIF function returns the first difference between the argument and its nth lag. • Defined as: DIF(X) = X - LAGn(X) ;

22. DIF function • Same storing/returning from LAGn queues apply • Same caveats for conditional execution

23. data new; input x @@; lagx = lag(x); difx = dif(x); cards; 1 2 8 4 3 9 7 ;

24. x lagx difx1 . . 2 - 1 = 1 8 2 6 4 8 -4 3 4 -1 9 3 6 7 9 -2

25. Chapter Three To LAG or to LEAD

26. Is there a LEAD function? • No LEAD function or negative LAG • Several solutions at: • www.sconsig.com • Including: • Sort in descending order (reverse) • …then use the LAG function

27. Most elegant solution: • MERGE the data set with itself • Read the data set twice • Using a 1:1 MERGE • No BY statement • Using firstobs=2

28. data lagged ; merge master ( keep = var ) master ( firstobs = 2 rename = (var =nextvar ) ) ; **** no BY statement ; run;

29. Results of merge

30. Chapter Four When RETAIN Doesn’t Retain

31. Retained Variables • all SAS special variables, e.g. • _N_ • _ERROR_ • all vars in RETAIN statement • all vars from SET or MERGE • accumulator vars in SUM stmt

32. Variables Not Retained • Variables from INPUT statement • User-defined variables/ vars created in DATA step • UNLESS……what?

33. concatenation data A ; input id \$ site \$; cards; 10212 00 10213 00 ; dataB; input id \$; cards; 02001 03005 06900 ; data c; setA B; if missing(site) then site = substr(id,1,2); run;

34. ? ?

35. Solution data C; setA B(in=inb); if inb then site = substr(id,1,2); run; test that the observation has come from B and only then extract the site value....

36. !

37. Chapter Five Don’t Order My Variables Around “the variable order is not always declared where it seems to occur…” Ron Fehd

38. Question posed: How do I reorder the variables in my SAS data set?

39. “Don’t Order My Variables Around” • WHY? • exporting / export wizard • SAS Viewer end users • manipulate groups/lists of vars (age - - diag) • with PUT or ARRAY • what else?

40. “Don’t Order My Variables Around” • storage: • in LPDV • in SAS data set • presentation layer

41. My question to you: What forces the order of the variables in a SAS data set in the first place? The order in which they are seen by the compiler when the data set is created.

42. “Don’t Order My Variables Around” • RETAIN statement • (ATTRIB statement) • (LENGTH statement) • (PROC TRANSPOSE) • ??????

43. “Don’t Order My Variables Around” • Why RETAIN? • retain functionality implicit for vars coming from SET or MERGE • Nothing you can mess up (attributes, etc.)!

44. Original CONTENTS PROCEDURE -----Variables Ordered by Position----- # Variable Type Len Pos 1 NAME Char 8 0 2 SEX Char 8 8 3 AGE Num 8 16 4 ID Num 8 24 5 RX_GRP Num 8 32

45. Original NAME SEX AGE ID RX_GRP John M 35 101 2 Dan M 53 206 1 Howard M 45 321 3

46. data new; retain id rx_grp name sex age; *** 1st reference to compiler; set master; run;

47. Reordered CONTENTS PROCEDURE -----Variables Ordered by Position----- # Variable Type Len Pos 1 ID Num 8 0 2 RX_GRP Num 8 8 3 NAME Char 8 16 4 SEX Char 8 24 5 AGE Num 8 32

48. Reordered ID RX_GRP NAME SEX AGE 101 2 John M 35 206 1 Dan M 53 321 3 Howard M 45

49. Chapter Six The Case of the Missing Values

50. “How do MISSINGs compare?” • QUESTION: If A > B then<perform action>; If either A or B is missing, isn’t the statement just ignored? What if both are missing?