200 likes | 355 Vues
This discussion explores LL(1) grammars and table-driven parsing, emphasizing parsing approaches like full backtracking and deterministic parsing. It highlights the purpose and construction of simple LL(1) grammars, using prefix expression grammars as examples. The efficiency of LL(1) parsers, based on single-symbol lookahead, is dissected along with the limitation of simple LL(1) grammars and potential enhancements to grammar rules. The discussion provides insights into constructing parse tables and the relationship between production rules and parsing decisions, ultimately detailing how LL(1) strategies can yield efficient parsing outcomes.
E N D
Topics • Approaches to Parsing • Full backtracking • Deterministic • Simple LL(1), table-driven parsing • Improvements to simple LL(1) grammars
Prefix Expression Grammar • Consider the following grammar (which yields prefix expressions for binary operators): E N | OEE O + | | * | / N 0 | 1 | 2 | 3 | 4 • Here, prefix expressions associate an operator with the next two operands. * + 2 3 4 (* (+ 2 3) 4) (2 + 3) * 4 = 20 * 2 + 3 4 (* 2 (+ 3 4)) 2 * (3 + 4) = 14
Top-Down Parsing with Backtracking *+342 E N | OEE O + | | * | / N 0 | 1 | 2 | 3 | 4
What are the obvious problems? • We never know what production to try. • It appears to be terribly inefficient—and it is. • Are there grammars for which we can always know what rule to choose? Yes! • Characteristics: • Only single symbol look ahead • Given a non-terminal and a current symbol, we always know which production rule to apply
LL(1) Parsers • An LL parser parses the input from Left to right, and constructs a Leftmost derivation of the sentence. • An LL(k) parser uses k tokens of look-ahead. • LL(1) parsers, although fairly restrictive, are attractive because they only need to look at the current non-terminal and the next token to make their parsing decisions. • LL(1) parsers require LL(1) grammars.
Simple LL(1) Grammars For simple LL(1) grammars all rules have the form A a11 | a22 | … | ann where • ai is a terminal, 1 <= i <= n • ai aj for i j and • i is a sequence of terminals and non-terminal or is empty, 1 <= i <= n
By making all production rules of the form: A a11 | a22 | … | ann Thus, E 0 | 1 | 2 | 3 | 4 | +EE | EE | *EE | /EE Why is this not a simple LL(1) grammar? E N | OEE O + | | * | / N 0 | 1 | 2 | 3 | 4 How can we change it to simple LL(1)? Creating Simple LL(1) Grammars
8 7 * E E E E 6 5 3 8 + E E 4 2 * E E 3 4 4 2 3 3 ? Example: LL(1) Parsing E (1)0 | (2)1 | (3)2 | (4)3 | (5)4 | (6)+EE | (7)EE | (8)*EE | (9)/EE * + 2 3 4 2 * 3 E E Success! Fail! Output = 8 6 3 4 5
Simple LL(1) Parse Table A parse table is defined as follows: (V {#}) (VT {#}) {(, i), pop, accept, error} where • is the right side of production number i • # marks the end of the input string (# V) If A (V {#}) is the symbol on top of the stack and a (VT {#}) is the current input symbol, then: ACTION(A, a) = pop if A = a for a VT accept if A = # and a = # (a, i) which means “pop, then push a and output i” (A a is the ith production) error otherwise
Parse TableE (1)0 | (2)1 | (3)2 | (4)3 | (5)+EE | (6)*EE VT {#} V{#} All blank entries are error
Simple LL(1):More Restrictive than Necessary • Simple LL(1) grammars are very easy and efficient to parse but also very restrictive. • The good news: we can achieve the same desirable results without being so restrictive. • How? We only need to retain the restriction that single-symbol look ahead uniquely determines which rule to use.
Relaxing Simple LL(1) Restrictions • Consider the following grammar, which is not simple LL(1): E (1)N | (2)OEE O (3)+ | (4)* N (5)0 | (6)1 | (7)2 | (8)3 • What are the problem rules? (1) & (2) • Observe that it is possible distinguish between rules 1 and 2. • N leads to {0, 1, 2, 3} • O leads to {+, *} • {0, 1, 2, 3} {+, *} = • Thus, if we see 0, 1, 2, or 3 we choose (1), and if we see + or *, we choose (2).
LL(1) Grammars • FIRST() = { | * and VT} • A grammar is LL(1) if for all rules of the form A 1 | 2 | … | n the sets FIRST(1), FIRST(2), …, and FIRST(n) are pair-wise disjoint; that is, FIRST(i) FIRST(j) = for i j
E (1)N | (2)OEEO (3)+ | (4)*N (5)0 | (6)1 | (7)2 | (8)3 For (A, a), we select (, i) if a FIRST() and is the right hand side of rule i. VT{#} V{#}
(2)OEE (4)* (2)OEE (1)N (3)+ (1)N (1)N (8)3 (6)1 (7)2 What does 2 4 2 3 1 6 1 7 1 8 mean? E (1)N | (2)OEEO (3)+ | (4)*N (5)0 | (6)1 | (7)2 | (8)3 E 2 4 2 3 1 6 1 7 1 8 defines a parse tree via a preorder traversal.