Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

# Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

## Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University

2. Today • Review top-down parsing • Recursive descent • LL(1) parsing • Start bottom-up parsing • LR(k) • SLR(k) • LALR(k)

3. Top-down parsing • Parser repeatedly faces the following problem • Given program P starting with terminal t • Non-terminal N with list of possible production rules: N α1 … N αk • Predict which rule can should be used to derive P

4. Recursive descent parsing • Define a function for every nonterminal • Every function work as follows • Find applicable production rule • Terminal function checks match with next input token • Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead

5. Boolean expressions example E  LIT | (E OP E) | not E LIT true|false OP and | or | xor not ( not true or false ) production to apply known from next token E E E => notE => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not ( E OP E ) not LIT or LIT true false

6. Flavors of top-down parsers • Manually constructed • Recursive descent (previous lecture, review now) • Generated (this lecture) • Based on pushdown automata • Does not use recursion

7. Recursive descent parsing • Define a function for every nonterminal • Every function work as follows • Find applicable production rule • Terminal function checks match with next input token • Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead

8. Matching tokens E  LIT | (E OP E) | not E LIT true|false OP and | or | xor match(token t) { if (current == t) current = next_token() else error } Variable current holds the current input token

9. Functions for nonterminals E  LIT | (E OP E) | not E LIT true|false OP and | or | xor E() { if (current  {TRUE, FALSE}) // E LIT LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E not E match(NOT); E(); else error; } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; }

10. Implementation via recursion E() { if (current  {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error; } E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } OP() { if (current == AND) match(AND); else if (current == OR) match(OR); else if (current == XOR) match(XOR); else error; }

11. How is prediction done? p. 189 • For simplicity, let’s assume no null production rules • See book for general case • Find out the token that can appear first in a rule – FIRST sets

12. FIRST sets • For every production rule Aα • FIRST(α) = all terminals that α can start with • Every token that can appear as first in α under some derivation for α • In our Boolean expressions example • FIRST( LIT ) = { true, false } • FIRST( ( E OP E ) ) = { ‘(‘ } • FIRST( not E ) = { not } • No intersection between FIRST sets => can always pick a single rule • If the FIRST sets intersect, may need longer lookahead • LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens • LL(1) is an important and useful class

13. Computing FIRST sets Assume no null productions A  Initially, for all nonterminalsA, setFIRST(A) = { t | Atω for some ω } Repeat the following until no changes occur:for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B) This is known as fixed-point computation

14. FIRST sets computation example STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

15. 1. Initialization STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

16. 2. Iterate 1 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

17. 2. Iterate 2 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPRTERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

18. 2. Iterate 3 – fixed-point STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

19. LL(k) grammars • A grammar is in the class LL(K) when it can be derived via: • Top-down derivation • Scanning the input from left to right (L) • Producing the leftmost derivation (L) • With lookahead of k tokens (k) • For every two productions Aα and Aβ we have FIRST(α) ∩ FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions) • A language is said to be LL(k) when it has an LL(k) grammar • What can we do if grammar is not LL(k)?

20. LL(k) parsing via pushdown automata • Pushdown automaton uses • Prediction stack • Input stream • Transition table • nonterminals x tokens -> production alternative • Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t

21. LL(k) parsing via pushdown automata • Two possible moves • Prediction • When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error • Match • When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t • If (t ≠ T) syntax error • Parsing terminates when prediction stack is empty • If input is empty at that point, success. Otherwise, syntax error

22. Model of non-recursivepredictive parser Predictive Parsing program Stack Output Parsing Table

23. Example transition table (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Which rule should be used Input tokens Nonterminals

24. Running parser example aacbb\$ A  aAb | c

25. Illegal input example abcbb\$ A  aAb | c

26. Using top-down parsing approach • Compute parsing table • If table is conflict-free then we have an LL(k) parser • If table contains conflicts investigate • If grammar is ambiguous try to disambiguate • Try using left-factoring/substitution/left-recursion elimination to remove conflicts

27. Marking “end-of-file” Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’  S \$where \$ is not part of the set of tokens To parse an input P with G’ we change it into P\$ Simplifies top-down parsing with null productions and LR parsing

28. Bottom-up parsing • No problem with left recursion • Widely used in practice • Shift-reduce parsing: LR(k), SLR, LALR • All follow the same pushdown-based algorithm • Read input left-to-right producing rightmost derivation • Differ on type of “LR Items” • Parser generator CUP implements LALR

29. Some terminology • The opposite of derivation is called reduction • Let Aα be a production rule • Let βAµ be a sentence • Replace left-hand side of rule in sentence:βAµ=> βαµ • A handle is a substring that is reduced during a series of steps in a rightmost derivation

30. Rightmost derivation example 1 + (2) + (3) E  E + (E) E i E + (2) + (3) Each non-leaf node represents a handle E + (E) + (3) Rightmost derivation in reverse E + (3) E E + (E) E E E E E 1 + ( 2 ) + 3 ( )

31. LR item To be matched Already matched Input N  αβ Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

32. LR items N  αβ Shift Item N  αβ Reduce Item

33. Example Z  exprEOF expr  term | expr+ term term  ID | (expr) Z  E \$ E  T | E + T T  i | ( E ) (just shorthand of the grammar on the top)

34. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) i + i \$ Z  E \$ Why do we need these additional LR items? Where do they come from? What do they mean? E  T E  E + T T  i T  (E )

35. -closure { Z  E \$, Z  E \$ E  T | E + T T  i | ( E ) E  T, -closure({Z  E \$}) = E  E + T, T  i , T  ( E ) } Given a set S of LR(0) items If P  αNβ is in S then for each rule N  in the grammarS must also contain N 

36. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) i + i \$ Remember position from which we’re trying to reduce Items denote possible future handles Z  E \$ E  T E  E + T T  i T  ( E )

37. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) i + i \$ Match items with current token Z  E \$ T  i Reduce item! E  T E  E + T T  i T  ( E )

38. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) T + i \$ i Z  E \$ Reduce item! E  T E  T E  E + T T  i T  ( E )

39. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) E + i \$ T i Z  E \$ Reduce item! E  T E  T E  E + T T  i T  ( E )

40. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) E + i \$ T i Z  E \$ Z  E\$ E  T E  E + T E  E+ T T  i T  ( E )

41. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) E + i \$ T i Z  E \$ Z  E\$ E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

42. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) E + T \$ i T i Z  E \$ Z  E\$ E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

43. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) E + T \$ T i i Reduce item! Z  E \$ Z  E\$ E  E+T E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

44. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) E \$ E + T T i i Z  E \$ Z  E\$ E  T E  E + T E  E+ T T  i T  ( E )

45. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) E \$ E + T T i Reduce item! i Z  E \$ Z  E\$ Z  E\$ E  T E  E + T E  E+ T T  i T  ( E )

46. Example: parsing with LR items Z  E \$ E  T | E + T T  i | ( E ) Z E \$ E + T Reduce item! i T Z  E \$ i Z  E\$ Z  E\$ E  T E  E + T E  E+ T T  i T  ( E )

47. Computing item sets • Initial set • Z is in the start symbol • -closure({ Zα | Zαis in the grammar } ) • Next set from a set S and the next symbol X • step(S,X) = { NαXβ | NαXβ in the item set S} • nextSet(S,X) = -closure(step(S,X))

48. LR(0) automaton example reduce state shift state q6 E  T T T q7 q0 T  (E) E  T E  E + T T  i T  (E) Z  E\$ E  T E  E + T T  i T  (E) ( q5 i i T  i E E ( ( i q1 q8 q3 Z  E\$ E  E+ T T  (E) E  E+T E  E+T T  i T  (E) + + \$ ) q9 q2 Z  E\$ T  (E)  T q4 E  E + T

49. GOTO/ACTION tables empty – error move ACTION Table GOTO Table

50. LR(0) parser tables • Two types of rows: • Shift row – tells which state to GOTO for current token • Reduce row – tells which rule to reduce (independent of current token) • GOTO entries are blank