1 / 73

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2. Mayer Goldberg and Roman Manevich Ben-Gurion University. Today. Review top-down parsing Recursive descent LL(1) parsing Start bottom-up parsing LR(k) SLR(k) LALR(k). Top-down parsing.

kovit
Télécharger la présentation

Winter 2012-2013 Compiler Principles Syntax Analysis (Parsing) – Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Winter 2012-2013Compiler PrinciplesSyntax Analysis (Parsing) – Part 2 Mayer Goldberg and Roman Manevich Ben-Gurion University

  2. Today • Review top-down parsing • Recursive descent • LL(1) parsing • Start bottom-up parsing • LR(k) • SLR(k) • LALR(k)

  3. Top-down parsing • Parser repeatedly faces the following problem • Given program P starting with terminal t • Non-terminal N with list of possible production rules: N α1 … N αk • Predict which rule can should be used to derive P

  4. Recursive descent parsing • Define a function for every nonterminal • Every function work as follows • Find applicable production rule • Terminal function checks match with next input token • Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead

  5. Boolean expressions example E  LIT | (E OP E) | not E LIT true|false OP and | or | xor not ( not true or false ) production to apply known from next token E E E => notE => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not ( E OP E ) not LIT or LIT true false

  6. Flavors of top-down parsers • Manually constructed • Recursive descent (previous lecture, review now) • Generated (this lecture) • Based on pushdown automata • Does not use recursion

  7. Recursive descent parsing • Define a function for every nonterminal • Every function work as follows • Find applicable production rule • Terminal function checks match with next input token • Nonterminal function calls (recursively) other functions • If there are several applicable productions for a nonterminal, use lookahead

  8. Matching tokens E  LIT | (E OP E) | not E LIT true|false OP and | or | xor match(token t) { if (current == t) current = next_token() else error } Variable current holds the current input token

  9. Functions for nonterminals E  LIT | (E OP E) | not E LIT true|false OP and | or | xor E() { if (current  {TRUE, FALSE}) // E LIT LIT(); else if (current == LPAREN) // E ( E OP E ) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) // E not E match(NOT); E(); else error; } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; }

  10. Implementation via recursion E() { if (current  {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPARENT); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error; } E → LIT | ( E OP E ) | not E LIT → true | false OP → and | or | xor LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error; } OP() { if (current == AND) match(AND); else if (current == OR) match(OR); else if (current == XOR) match(XOR); else error; }

  11. How is prediction done? p. 189 • For simplicity, let’s assume no null production rules • See book for general case • Find out the token that can appear first in a rule – FIRST sets

  12. FIRST sets • For every production rule Aα • FIRST(α) = all terminals that α can start with • Every token that can appear as first in α under some derivation for α • In our Boolean expressions example • FIRST( LIT ) = { true, false } • FIRST( ( E OP E ) ) = { ‘(‘ } • FIRST( not E ) = { not } • No intersection between FIRST sets => can always pick a single rule • If the FIRST sets intersect, may need longer lookahead • LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens • LL(1) is an important and useful class

  13. Computing FIRST sets Assume no null productions A  Initially, for all nonterminalsA, setFIRST(A) = { t | Atω for some ω } Repeat the following until no changes occur:for each nonterminal A for each production A Bω set FIRST(A) = FIRST(A) ∪ FIRST(B) This is known as fixed-point computation

  14. FIRST sets computation example STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

  15. 1. Initialization STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

  16. 2. Iterate 1 STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

  17. 2. Iterate 2 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPRTERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

  18. 2. Iterate 3 – fixed-point STMT if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

  19. LL(k) grammars • A grammar is in the class LL(K) when it can be derived via: • Top-down derivation • Scanning the input from left to right (L) • Producing the leftmost derivation (L) • With lookahead of k tokens (k) • For every two productions Aα and Aβ we have FIRST(α) ∩ FIRST(β) = {}(and FIRST(A) ∩ FOLLOW(A) = {} for null productions) • A language is said to be LL(k) when it has an LL(k) grammar • What can we do if grammar is not LL(k)?

  20. LL(k) parsing via pushdown automata • Pushdown automaton uses • Prediction stack • Input stream • Transition table • nonterminals x tokens -> production alternative • Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t

  21. LL(k) parsing via pushdown automata • Two possible moves • Prediction • When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error • Match • When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t • If (t ≠ T) syntax error • Parsing terminates when prediction stack is empty • If input is empty at that point, success. Otherwise, syntax error

  22. Model of non-recursivepredictive parser Predictive Parsing program Stack Output Parsing Table

  23. Example transition table (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Which rule should be used Input tokens Nonterminals

  24. Running parser example aacbb$ A  aAb | c

  25. Illegal input example abcbb$ A  aAb | c

  26. Using top-down parsing approach • Compute parsing table • If table is conflict-free then we have an LL(k) parser • If table contains conflicts investigate • If grammar is ambiguous try to disambiguate • Try using left-factoring/substitution/left-recursion elimination to remove conflicts

  27. Marking “end-of-file” Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G’ with a new start non-terminal S‘ with a new production rule S’  S $where $ is not part of the set of tokens To parse an input P with G’ we change it into P$ Simplifies top-down parsing with null productions and LR parsing

  28. Bottom-up parsing • No problem with left recursion • Widely used in practice • Shift-reduce parsing: LR(k), SLR, LALR • All follow the same pushdown-based algorithm • Read input left-to-right producing rightmost derivation • Differ on type of “LR Items” • Parser generator CUP implements LALR

  29. Some terminology • The opposite of derivation is called reduction • Let Aα be a production rule • Let βAµ be a sentence • Replace left-hand side of rule in sentence:βAµ=> βαµ • A handle is a substring that is reduced during a series of steps in a rightmost derivation

  30. Rightmost derivation example 1 + (2) + (3) E  E + (E) E i E + (2) + (3) Each non-leaf node represents a handle E + (E) + (3) Rightmost derivation in reverse E + (3) E E + (E) E E E E E 1 + ( 2 ) + 3 ( )

  31. LR item To be matched Already matched Input N  αβ Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

  32. LR items N  αβ Shift Item N  αβ Reduce Item

  33. Example Z  exprEOF expr  term | expr+ term term  ID | (expr) Z  E $ E  T | E + T T  i | ( E ) (just shorthand of the grammar on the top)

  34. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) i + i $ Z  E $ Why do we need these additional LR items? Where do they come from? What do they mean? E  T E  E + T T  i T  (E )

  35. -closure { Z  E $, Z  E $ E  T | E + T T  i | ( E ) E  T, -closure({Z  E $}) = E  E + T, T  i , T  ( E ) } Given a set S of LR(0) items If P  αNβ is in S then for each rule N  in the grammarS must also contain N 

  36. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) i + i $ Remember position from which we’re trying to reduce Items denote possible future handles Z  E $ E  T E  E + T T  i T  ( E )

  37. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) i + i $ Match items with current token Z  E $ T  i Reduce item! E  T E  E + T T  i T  ( E )

  38. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) T + i $ i Z  E $ Reduce item! E  T E  T E  E + T T  i T  ( E )

  39. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) E + i $ T i Z  E $ Reduce item! E  T E  T E  E + T T  i T  ( E )

  40. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) E + i $ T i Z  E $ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

  41. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) E + i $ T i Z  E $ Z  E$ E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

  42. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) E + T $ i T i Z  E $ Z  E$ E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

  43. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) E + T $ T i i Reduce item! Z  E $ Z  E$ E  E+T E  E+T E  T T  i E  E + T E  E+ T T  ( E ) T  i T  ( E )

  44. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) E $ E + T T i i Z  E $ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

  45. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) E $ E + T T i Reduce item! i Z  E $ Z  E$ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

  46. Example: parsing with LR items Z  E $ E  T | E + T T  i | ( E ) Z E $ E + T Reduce item! i T Z  E $ i Z  E$ Z  E$ E  T E  E + T E  E+ T T  i T  ( E )

  47. Computing item sets • Initial set • Z is in the start symbol • -closure({ Zα | Zαis in the grammar } ) • Next set from a set S and the next symbol X • step(S,X) = { NαXβ | NαXβ in the item set S} • nextSet(S,X) = -closure(step(S,X))

  48. LR(0) automaton example reduce state shift state q6 E  T T T q7 q0 T  (E) E  T E  E + T T  i T  (E) Z  E$ E  T E  E + T T  i T  (E) ( q5 i i T  i E E ( ( i q1 q8 q3 Z  E$ E  E+ T T  (E) E  E+T E  E+T T  i T  (E) + + $ ) q9 q2 Z  E$ T  (E)  T q4 E  E + T

  49. GOTO/ACTION tables empty – error move ACTION Table GOTO Table

  50. LR(0) parser tables • Two types of rows: • Shift row – tells which state to GOTO for current token • Reduce row – tells which rule to reduce (independent of current token) • GOTO entries are blank

More Related