340 likes | 470 Vues
This lecture covers the fundamental concepts of bottom-up parsing and LL(1) grammars as part of the CS412/413 course, led by Andrew Myers in Spring '99. Key topics include creating LL(1) grammars, limitations of LL(1) parsing, and an introduction to LR(0) parser construction. Additionally, it addresses the necessity of transforming left-recursive grammars to right-recursive ones for successful parsing, strategies for error detection and recovery, and practical applications in programming language syntax parsing.
E N D
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 5: Bottom-up parsing
Outline • Creating LL(1) grammars • Limitations of LL(1) grammars • Bottom-up parsing • LR(0) parser construction CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Administration • Should have received mail about group assignments by now • Homework 1 due next class (Friday) • Monday considered 2 days late (-20%), Tuesday 3 days (-40%) • No class next Monday (Feb 8) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Programming Assignment • Due Monday, Feb 15 • Implement a lexer for Iota language • Do not need to implement DFA construction • Opportunity to work as group • We expect high quality CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Review • Can construct recursive descent parsers for LL(1) grammars Language grammar How to perform this step? LL(1) grammar predictive parse table recursive-descent parser recursive-descent parser w/ AST generation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Grammars • Have been using grammar for language of “sums with parentheses” • Original grammar: S S + E| E E number | ( S ) • LL(1) grammar for same language: S ES’ S’ | + S E number | ( S ) (1+(3+4))+5 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
(1 + 2 + (3 + 4)) + 5 Left-recursive vs Right-recursive • Original grammar was left-recursive SS+ E S E • LL(1) grammar is right-recursive : parsed top-down S E S’ S’ | + S • Left-recursive grammars don’t work with top-down parsing -- need an arbitrary amount of look-ahead S E + S S E S S + E (...) + (...) + (...) + (...) ... S + E S + E CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
How to create an LL(1) grammar • Write a right-recursive grammar S E + S S E • Left-factor common prefixes, place suffix in new non-terminal S E S’ S’ S’ + S CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
(1 + 2 + (3 + 4)) + 5 Right Recursion S • Right recursion : right-associative E S’ + ( S ) + S + 5 E S’ 5 1 + 1 + S 2 + E S’ 3 4 2 + S E S’ • Left recursion : left-associative + ( S ) + 5 E S’ + + + S 3 E 1 2 3 4 4 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Associativity • We can provide left-associativity by massaging the recursive-descent code void parse_S() { switch (token) { case ‘(’: case number: parse_E(); parse_S’(); return; default: throw new ParseError(); } } void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)‘: return; case EOF: return; default: throw new ParseError(); } } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Associativity void parse_S() {// parses a sequence of E + E + E ... switch (token) { case ‘(’: case number: parse_E(); switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)‘: return; case EOF: return; default: throw new ParseError(); } return; default: throw new ParseError(); } } tail recursion CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
+ 3 4 Flattening Associative Operators void parse_S () { // parses an arbitrary sequence of E + E + E ... while (true) { switch (token) { case ‘(’: case number: parse_E (); switch (token) { case ‘+’: token = input.read(); break; case ‘)‘: case EOF: return; default: throw new ParseError(); } break; default: throw new ParseError(); } } } (1 + 2 + (3+4)) + 5 + + 5 1 2 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Summary • Now have complete recipe for building a parser Language grammar LL(1) grammar predictive parse table recursive-descent parser recursive-descent parser w/ AST generation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Bottom-up parsing • A more powerful parsing technology • LR grammars -- more power than LL • can handle left-recursive grammars, virtually all programming languages • More natural expression of programming language syntax • Shift-reduce parsers • automatic parser generators (e.g. yacc) • detect errors as soon as possible • allows better error recovery CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E| E E number | ( S ) Top-down parsing (1+2+(3+4))+5 S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E(1+2+E)+E ... • In left-most derivation, entire tree above a token (2) has been expanded when encountered • Must be able to predict! S S + E E 5 ( S ) S + E ( S ) S + E E S + E 2 4 1 E 3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E| E E number | ( S ) Bottom-up parsing • Right-most derivation-- backward • Start with the tokens • End with the start symbol (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 (S+(3+4))+5 (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5 (S+(S))+5 (S+E)+5 (S)+5 E+5 S+E S CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E| E E number | ( S ) Bottom-up parsing (1+2+(3+4))+5 (1+2+(3+4))+5 (E+2+(3+4))+5 (1 +2+(3+4))+5 (S+2+(3+4))+5 (1 +2+(3+4))+5 (S+E+(3+4))+5 (1+2 +(3+4))+5 (S+(3+4))+5 (1+2+(3 +4))+5 (S+(E+4))+5 (1+2+(3 +4))+5 (S+(S+4))+5 (1+2+(3 +4))+5 (S+(S+E))+5 (1+2+(3+4 ))+5 (S+(S))+5 (1+2+(3+4 ))+5 (S+E)+5 (1+2+(3+4) )+5 (S)+5 (1+2+(3+4) )+5 E+5 (1+2+(3+4)) +5 S+E(1+2+(3+4))+5 S(1+2+(3+4))+5 right-most derivation CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E| E E number | ( S ) Bottom-up parsing • (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 … • Advantage of bottom-up parsing: can select productions based on more information S S + E E 5 ( S ) S + E ( S ) S+E E S + E 2 4 1 E 3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Top-down vs. Bottom-up Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input scanned unscanned scanned unscanned Top-down Bottom-up CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Shift-reduce parsing • Parsing is a sequence of shift and reduce operations • Parser state is a stack of terminals and non-terminals (grows to the right) • Unconsumed input is a string of terminals • Current derivation step is always stack+input • Shift -- push head of input onto stack stack input ( 1+2+(3+4))+5 (1 +2+(3+4))+5 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Reduce • Replace symbols in top of stack with non-terminal symbol X, corresponding to production X (pop , push X) stack input (S+E +(3+4))+5 reduce S S+E (S +(3+4))+5 • What effect does this have on derivation? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S S + E| E E number | ( S ) Shift-reduce parsing derivation input stream action stack (1+2+(3+4))+5 (1+2+(3+4))+5 shift (1+2+(3+4))+5 ( 1+2+(3+4))+5 shift (1+2+(3+4))+5 (1 +2+(3+4))+5 reduceEnum (E+2+(3+4))+5 (E +2+(3+4))+5 reduceS E (S+2+(3+4))+5 (S +2+(3+4))+5 shift (S+2+(3+4))+5 (S+ 2+(3+4))+5 shift (S+2+(3+4))+5 (S+2 +(3+4))+5 reduceEnum (S+E+(3+4))+5 (S+E +(3+4))+5 reduce S S+E (S+(3+4))+5 (S +(3+4))+5 shift (S+(3+4))+5 (S+ (3+4))+5 shift (S+(3+4))+5 (S+( 3+4))+5 shift (S+(3+4))+5 (S+(3 +4))+5 reduce Enum CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Problem • How do we know which action to take -- whether to shift or reduce, and which production? • Sometimes can reduce but shouldn’t • e.g., X can always be reduced • Sometimes can reduce in different ways CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Action Selection Problem • Given stack and input symbol b, should we • shift b onto the stack (making it b) • reduce some production X assuming that stack has the form (making it X) • Should apply reduction X depending on what stack prefix is -- but is different for different possible reductions, since ’s have different length. How to keep track? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Parser States • Idea: summarize all possible stack prefixes as a parser state • A state transition function updates the parser state as shifts and reductions are performed: DFA • Summarizing discards information • affects what grammars parser handles • affects size of DFA (number of states) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
LR(0) parser • Left-to-right scanning, Right-most derivation, zero look-ahead characters • Too weak to handle most language grammars (including this one) • But will help us understand how to build better parsers CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
LR(0) states • A state is a set of items • An LR(0) item is a production from the language with a separator “.” somewhere in the RHS of the production • Stuff before “.” already on stack (beginnings of possible ’s) • Stuff after “.” : what we might see next • The prefixes represented by state E number . E ( .S ) state item CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
An LR(0) grammar: non-empty lists S ( L ) S id L S L L , S x (x,y) (x, (y,z), w) ((((x)))) (x, (y, (z, w))) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
S ( L ) | id L S | L, S Closure S ’ . S $ S . ( L ) S . id start state Closure S ’ .S $ • Closure of a state adds items for all productions whose LHS occurs in an item in the state, just after “.” • Added items have the “.”located at the beginning • Like NFA DFA conversion CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Applying shift actions S ( .L ) L .S L .L , S S . ( L ) S . id S ( L ) | id L S | L , S S ’ . S $ S . ( L ) S . id ( ( id id S id . In new state, include all items that have appropriate input symbol just after dot, and advance dot in those items. (and take closure) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Applying reduce actions S ( .L ) L .S L .L , S S . ( L ) S . id S ( L.) L L . , S • Need to set state after reducing • On reduction, pop back to old state and take DFA transition on non-terminal reduced L S ’ . S $ S . ( L ) S . id ( S ( L S. id id S id . states causing reductions CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Full DFA (Appel p. 63) 8 9 2 L L , . S S . ( L ) S . id 1 id S id S ’ . S $ S . ( L ) S . id S id . L L , S . id 3 S ( .L ) L .S L .L , S S . ( L ) S . id ( 5 L S ( L.) L L . , S S ) ( S 6 S ( L ). 4 7 L S. S ’ S . $ $ final state CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Idea: stack is labeled w/state Let’s try parsing ((x),y) derivation stack input action ((x),y) 1 ((x),y) shift, goto 3 ((x),y) 1 (3 (x),y)shift, goto 3 ((x),y) 1 (3 (3 x),y)shift, goto 2 ((x),y) 1 (3 (3 x2 ),y)reduce Sid ((S),y) 1 (3 (3 S7 ),y)reduce LS ((L),y) 1 (3 (3 L5 ),y)shift, goto 6 ((L),y) 1 (3 (3 L5)6 ,y)reduce S(L) (S,y) 1 (3 S7 ,y)reduce LS (L,y) 1 (3 L5 ,y)shift, goto 8 (L,y) 1 (3 L5 , 8 y)shift, goto 9 (L,y) 1 (3 L5 , 8 y2 )reduce Sid (L,S) 1 (3 L5 , 8 S9 )reduce LL , S (L) 1 (3 L5 )shift, goto 6 (L) 1 (3 L5 )6 reduce S(L) S 1S4$ done S ( L ) | id L S | L, S
Summary • Grammars can be parsed bottom-up using a DFA + stack • State construction converts grammar into states that capture information needed to know what action to take • Stack entries labeled by state index • Next time: SLR, LR(1) parsers, automatic parser generators CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers