1 / 55

Bottom-Up Parsing

Bottom-Up Parsing. CS 471 September 19, 2007. Where Are We?. Finished Top-Down Parsing Starting Bottom-Up Parsing. Lexical Analysis. Syntactic Analysis. Semantic Analysis. Building a Parser. Have a complete recipe for building a parser. Language Grammar. LL(1) Grammar.

libby
Télécharger la présentation

Bottom-Up Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bottom-Up Parsing CS 471 September 19, 2007

  2. Where Are We? • Finished Top-Down Parsing • Starting Bottom-Up Parsing Lexical Analysis Syntactic Analysis Semantic Analysis

  3. Building a Parser • Have a complete recipe for building a parser Language Grammar LL(1) Grammar Predictive Parse Table Recursive-Descent Parser Recursive-Descent Parser w/AST Gen

  4. Bottom-Up Parsing • More general than top-down parsing • And just as efficient • Builds on ideas in top-down parsing • Preferred method in practice • Also called LR parsing • L means that tokens are read left to right • R means that it constructs a rightmost derivation

  5. Top Down vs. Bottom Up Parsing • Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input scanned unscanned Top Down Bottom Up

  6. An Introductory Example • Consider the following grammar: E  E + ( E ) | int • Why is this not LL(1)? • LR parsers: • Can handle left-recursion • Don’t need left factoring

  7. The Idea • LR parsing reduces a string to the start symbol by inverting productions: str = input string of terminals repeat • Identify b in str such that A bis a production (i.e., str = abg) • Replace b by A in str (i.e., str becomes a A g) until str = G

  8. A Bottom-up Parse in Detail (1) int + (int) + (int) int + ( int ) + ( int )

  9. A Bottom-up Parse in Detail (2) int+ (int) + (int) E + (int) + (int) E int + ( int ) + ( int )

  10. A Bottom-up Parse in Detail (3) int + (int) + (int) E + (int) + (int) E + (E) + (int) E E int + ( int ) + ( int )

  11. A Bottom-up Parse in Detail (4) int + (int) + (int) E + (int) + (int) E + (E)+ (int) E + (int) E E E int + ( int ) + ( int )

  12. A Bottom-up Parse in Detail (5) int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E + (E) E E E E int + ( int ) + ( int )

  13. A Bottom-up Parse in Detail (6) E int + (int) + (int) E + (int) + (int) E + (E) + (int) E + (int) E + (E) E E E E E int + ( int ) + ( int )

  14. G A a b b c d e Another example • Grammar: • Is “abbcde” in L(G)? • Yes “Reverse” derivation: B A

  15. Choosing reductions • Basic algorithm: • Search for right sides of productions, reduce • Does this work? • Not always: • Problem: “aAAcde” is not part of any sentential form

  16. How do we choose? • Important Fact #1 about bottom-up parsing: • An LR parser traces a rightmost derivation in reverse

  17. A → b a A w a b w Why does this help? • Right-most derivation • A is the right-most non-terminal in g3 • w contains only terminal symbols • Unambiguous grammar • Right-most derivation is unique At each step, reduction is unique G →g1→g2→g3→g4→g5 → input

  18. Notation • Split input into two substrings • Right substring (a string of terminals) is as yet unexamined by parser • Left substring has terminals and non-terminals • The dividing point is marked by a I • The Iis not part of the string • Initially, all input is unexamined: Ix1x2 . . . xn

  19. Shift-Reduce Parsing • Bottom-up parsing uses only two kinds of actions: Shift and Reduce • Shift:Move I one place to the right • Shifts a terminal to the left string • E + (Iint )  E + (intI) • Reduce: Apply an inverse production at the right end of the left string • If E  E + ( E )is a production, then • E + (E + ( E )I)  E +(EI)

  20. Shift-Reduce Example • Iint + (int) + (int)$ shift • intI + (int) + (int)$ red. E  int • E I+ (int) + (int)$ shift 3 times • E + (intI ) + (int)$ red. E  int • E + (E I) + (int)$ shift • E + (E)I + (int)$red. E  E + (E) • E I+ (int)$shift 3 times • E + (intI )$ red. E  int • E + (E I)$ shift • E + (E)I $ red. E  E + (E) • E I $ accept E E E E E int + ( int ) + ( int )

  21. How do we keep track? • Left part string implemented as a stack • Top of the stack is the I • Shift: • Pushes a terminal on the stack • Reduce: • Pops 0 or more symbols off of the stack • Symbols are right-hand side of a production • Pushes a non-terminal on the stack (production LHS) • Terminology • We refer to the top set of symbols as a handle

  22. Shift-Reduce Parsing • derivation stack input stream action • (1+2+(3+4))+5 ← (1+2+(3+4))+5 shift • (1+2+(3+4))+5 ← ( 1+2+(3+4))+5 shift • (1+2+(3+4))+5 ← (1 +2+(3+4))+5 reduce E→num • (E+2+(3+4))+5 ← (E +2+(3+4))+5 reduce S → E • (S+2+(3+4))+5 ← (S +2+(3+4))+5 shift • (S+2+(3+4))+5 ← (S+ 2+(3+4))+5 shift • (S+2+(3+4))+5 ← (S+2 +(3+4))+5 reduce E→num • (S+E+(3+4))+5 ← (S+E +(3+4))+5 reduce S→S+E • (S+(3+4))+5 ← (S +(3+4))+5 shift • (S+(3+4))+5 ← (S+ (3+4))+5 shift • (S+(3+4))+5 ← (S+( 3+4))+5 shift • (S+(3+4))+5 ← (S+(3 +4))+5 reduce E→num

  23. Problem • • How do we know which action to take -- whether to shift or reduce, and which production? • • Sometimes can reduce but shouldn’t • –e.g., X → ε can always be reduced • • Sometimes can reduce in different ways

  24. Action Selection Problem • Given stack σ and look-ahead symbol b, should parser: • shiftb onto the stack (making it σb) • reduce some production X → γ assuming that stack has the form  γ (making it X) • If stack has form  γ, should apply reduction X → γ (or shift) depending on stack prefix  •  is different for different possible reductions, since γ’s have different length. • How to keep track of possible reductions?

  25. Parser States • Goal: know what reductions are legal at any given point • Idea: summarize all possible stack prefixes  as a finite parser state • Parser state is computed by a DFA that reads in the stack  • Accept states of DFA: unique reduction! • Summarizing discards information • affects what grammars parser handles • affects size of DFA (number of states)

  26. LR(0) Parser • Left-to-right scanning, Right-most derivation, “zero” look-ahead characters • • Too weak to handle most language grammars (e.g., “sum” grammar) • • But will help us understand shift-reduce parsing

  27. LR(0) States • • A state is a set of items keeping track of progress on possible upcoming reductions • • An LR(0) itemis a production from the language with a separator “.” somewhere in the RHS of the production • • Stuff before “.” is already on stack (beginnings of possible γ’s to be reduced) • • Stuff after “.” : what we might see next • • The prefixes  represented by state itself state E→num ● E→ (● S ) item

  28. S →( L ) | id L →S | L , S Start State & Closure • Constructing a DFA to read stack: • • First step: augment grammar with prod’n S →S $ • • Start state of DFA: empty stack = S → . S $ • • Closure of a state adds items for all productions whose LHS occurs in an item in the state, just after “.” • Set of possible productions to be reduced next • Added items have the “.” located at the beginning: no symbols for these items on the stack yet closure S →. S $ S → . ( L ) S → . id S →. S $

  29. S →( L ) | id L →S | L , S Applying Terminal Symbols • In new state, include all items that have appropriate input symbol just after dot, advance dot in those items, and take closure. S →( . L ) L → . S L → . L , S S →. ( L ) S → . id S ’ → . S $ S → . ( L ) S → . id ( id S → id id (

  30. S →( L ) | id L →S | L , S Applying Nonterminal Symbols • • Non-terminals on stack treated just like terminals (except added by reductions) S →( . L ) L → . S L → . L , S S →. ( L ) S → . id S →( L . ) L → L . , S S ’ → . S $ S → . ( L ) S → . id L ( S L → S . id S → id id (

  31. Applying Reduce Actions • • Pop RHS off stack, replace with LHS X (X→γ) S →( . L ) L → . S L → . L , S S →. ( L ) S → . id S →( L . ) L → L . , S S ’ → . S $ S → . ( L ) S → . id L ( S L → S . id ( S → id . id States causing reductions

  32. S →( L ) | id L →S | L , S Full DFA (Appel p. 62) 2 id 8 id • • reduce-only state: reduce • • if shift transition for look-ahead: shift otherwise: syntax error • • current state: push stack through DFA 1 S ’ → . S $ S → . ( L ) S → . id S →id . L → L , . S S → . ( L ) S → . id 9 S id L → L , S . ( 3 S →( . L ) L → . S L → . L , S S →. ( L ) S → . id , ( 5 L S → ( L . ) L → L . , S S ( S ) 4 7 6 L → S . S → ( L ) . S ’ → S . $ $ final state

  33. S →( L ) | id L →S | L , S Parsing Example: ((x),y) • derivation stack input action • ((x),y) ← 1 ((x),y) shift, goto 3 • ((x),y) ← 1 (3 (x),y) shift, goto 3 • ((x),y) ← 1 (3 (3 x),y) shift, goto 2 • ((x),y) ← 1 (3 (3 x2 ),y) reduce Sid • ((S),y) ← 1 (3 (3S7 ),y) reduce LS • ((L),y) ← 1 (3 (3L5 ),y) shift, goto 6 • ((L),y) ← 1 (3 (3L5)6 ,y) reduce S(L) • (S,y) ← 1 (3S7 ,y) reduce LS • (L,y) ← 1 (3L5 ,y) shift, goto 8 • (L,y) ← 1 (3L5 , 8 y) shift, goto 9 • (L,y) ← 1 (3L5 , 8 y2 ) reduce Sid • (L,S) ← 1 (3L5 , 8S9 ) reduce LL,S • (L) ← 1 (3L5 ) shift, goto 6 • (L) ← 1 (3L5 )6 reduce S(L) • S 1S4$ done

  34. next action next state Implementation: LR Parsing Table input (terminal) symbols non-terminal symbols state state Action table Used at every step to decide whether to shift or reduce Goto table Used only when reducing, to determine next state X   ▪ a X

  35. next actions next state on red’n Shift-Reduce Parsing Table terminal symbols non-terminal symbols • Action table • 1. shift and goto state n • 2. reduce using X → γ • pop symbols γ off stack • using state label of top (end) of stack, look up X in goto table and goto that state • • DFA + stack = push-down automaton (PDA) state

  36. List Grammar Parsing Table

  37. Shift-Reduce Parsing • Grammars can be parsed bottom-up using a DFA + stack • DFA processes stack σ to decide what reductions might be possible given • shift-reduce parser or push-down automaton (PDA) • Compactly represented as LR parsing table • State construction converts grammar into states that decide action to take

  38. Checkpoint • • Limitations of LR(0) grammars • • SLR, LR(1), LALR parsers • • Automatic parser generators

  39. LR(0) Limitations • • An LR(0) machine only works if states with reduce actions have a single reduce action – in those states, always reduce ignoring lookahead • • With more complex grammar, construction gives states with shift/reduce or reduce/reduce conflicts • • Need to use look-ahead to choose ok shift/reduce reduce/reduce L → L , S . L → L , S . S → S ., L L → L , S . L → S .

  40. LR(0) Construction S→ E + S | E E→num | ( S ) 1 S’ →. S $ S→ . E + S S→ . E E→ . num E→ . ( S ) 2 E S→E . + S S→E . + 3 S→E + . S What do we do in state 2?

  41. SLR grammars • Idea: Only add reduce action to table if look-ahead symbol is in the FOLLOW set of the non-terminal being reduced • • Eliminates some conflicts • • FOLLOW(S) = { $, ) } • • Many language grammars are SLR

  42. LR(1) Parsing • • As much power as possible out of 1 lookahead symbol parsing table • • LR(1) grammar = recognizable by a shift/reduce parser with 1 look-ahead. • • LR(1) item = LR(0) item + look-ahead symbols possibly following production • LR(0): S→ .S + E • LR(1): S→ .S + E +

  43. LR(1) State • • LR(1) state = set of LR(1) items • • LR(1) item = LR(0) item + set of lookahead symbols • • No two items in state have same production + dot configuration S→S . + E + S→S . + E $ S→S + . E num S→S . + E +,$ S→S + . E num

  44. LR(1) Closure • Consider A→β . C δ λ Closure formed just as for LR(0) except • Look-ahead symbols include characters following the non-terminal symbol to the right of dot: FIRST(δ) • If non-terminal symbol may produce last symbol of production (δ is nullable), look-ahead symbols include look-ahead symbols of production (λ) S →. S $ S→ . E + S $ S→ . E $ E→ . num +,$ E→ . ( S ) +,$ 1 S→ E + S | E E→num | ( S ) 2

  45. LR(1) DFA construction • Given LR(1) state, for each symbol (terminal or non-terminal) following a dot, construct a state with dot shifted across symbol, perform closure S→ E + S | E E→num | ( S ) 1 S’ →. S $ S→ . E + S $ S→ . E $ E→ . num +,$ E→ . ( S) +,$ 2 S→E . + S $ S→E . $ E

  46. LR(1) example • Reductions unambiguous if: look-aheads are disjoint, not to right of any dot in state S→ E + S | E E→num | ( S ) 1 S’ →. S $ S→ . E + S $ S→ . E $ E→ . num +,$ E→ . ( S) +,$ 2 S→E . + S $ S→E . $ E

  47. LALR Grammars • • Problem with LR(1): too many states • • LALR(1) (Look-Ahead LR) • Merge any two LR(1) states whose items are identical except look-ahead • Results in smaller parser tables—works extremely well in practice • Usual technology for automatic parser generators S→id .+ S→E .$ S→id . $ S→E .+ + = ?

  48. How are Parsers Written? • Automatic parser generators: yacc, bison • Accept LALR(1) grammar specification • plus: declarations of precedence, associativity • output: LR parser code (inc. parsing table) • Some parser generators accept LL(1) • less powerful

  49. Associativity • S→ S + E | E • E→num | ( S ) • E→ E + E | num | ( E ) • What happens if we run this grammar • through LALR construction?

  50. Conflict! • E→ E + E | num | ( E ) E→ E + E . + E→ E . + E +,$ shift/reduce conflict Shift: 1+(2+3) Reduce: (1+2)+3 1+2+3 ^

More Related