300 likes | 438 Vues
This document delves into top-down parsing methodologies such as recursive descent and predictive parsing. It emphasizes the importance of deriving leftmost derivations to construct parse trees consistently with input scanning. The text also addresses challenges like error recovery, backtracking, and how to eliminate it through left factoring and avoiding left recursion in the grammar. Additionally, the use of transition diagrams is discussed for visualizing parser operations and identifies essential steps for simplifying these diagrams to enhance the parsing process.
E N D
Top-Down Parsing • Identify a leftmost derivation for an input string • Why ? • By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion that is consistent with scanning the input. • A aBc adDc adec (scan a, scan d, scan e, scan c - accept!) • Recursive-descent parsing concepts • Predictive parsing • Recursive / Brute force technique • non-recursive / table driven • Error recovery • Implementation
Top-Down Parsing • From Grammar to Parser, take I
Recursive Descent Parsing S S cad cad c d A c d A a b Problem: backtrack S S cad cad c d A c d A a b a • General category of Parsing Top-Down • Choose production rule based on input symbol • May require backtracking to correct a wrong choice. • Example: S c A d • A ab | a input: cad S cad c d A a
Top-Down Parsing • From Grammar to Parser, take II
Predictive Parsing • Backtracking is bad! • To eliminate backtracking, what must we do/be sure of for grammar? • no left recursion • apply left factoring • (frequently) when grammar satisfies above conditions:current input symbol in conjunction with current non-terminal uniquely determines the production that needs to be applied. • Utilize transition diagrams: • For each non-terminal of the grammar do following: • 1. Create an initial and final state • 2. If A X1X2…Xn is a production, add path with edges X1, X2, … , Xn • Once transition diagrams have been developed, apply a straightforward technique to algorithmicize transition diagrams with procedure and possible recursion.
Transition Diagrams F ( E ) | id E TE’ E’ + TE’ | T FT’ T’ * FT’ | F T T’ E’ T: E: 7 0 8 1 9 2 + T E’ E’: 3 5 4 6 ( * F E ) T’ F: T’: 10 14 11 15 12 16 13 17 id • Unlike lexical equivalents, each edge represents a token • Transition implies: if token, match input else call proc • Recall earlier grammar and its associated transition diagrams How are transition diagrams used ? Are -moves a problem ? Can we simplify transition diagrams ? Why is simplification critical ?
How are Transition Diagrams Used ? main() { TD_E(); } TD_E’() { token = get_token(); if token = ‘+’ then { TD_T(); TD_E’(); } } What happened to -moves? … “else unget()and terminate” NOTE: not all error conditions have been represented. TD_F() { token = get_token(); if token = ‘(’ then { TD_E(); match(‘)’); } else if token.value <> id then {error + EXIT} else ... } TD_E() { TD_T(); TD_E’(); } TD_T() { TD_F(); TD_T’(); } TD_E’() { token = get_token(); if token = ‘*’ then { TD_F(); TD_T’(); } }
How can Transition Diagrams be Simplified ? + E’ E’: 3 5 T 4 6
How can Transition Diagrams be Simplified ? (2) + E’ E’: 3 5 + E’: 3 5 T T 4 4 6 6
How can Transition Diagrams be Simplified ? (3) + E’ E’: 3 5 T + + E’: 3 5 E’: 3 4 T T 4 4 6 6 6
How can Transition Diagrams be Simplified ? (4) + E’ E’: 3 5 T + + E’: 3 5 E’: 3 4 T E’ E: 0 1 2 T T 4 4 6 6 6
How can Transition Diagrams be Simplified ? (5) + E’ E’: 3 5 T + + E’: 3 5 E’: 3 4 T T E’ E: E: 0 0 1 2 T T 4 4 6 6 6 6 T + 3 4
Additional Transition Diagram Simplifications * 10 13 F F T: 7 * T’: 10 11 13 ( E ) F: 14 15 16 17 id • Similar steps for T and T’ • Simplified Transition diagrams: Why is simplification important ? How does code change?
Top-Down Parsing • From Grammar to Parser, take III
Motivating Table-Driven Parsing 1. Left to right scan input 2. Find leftmost derivation Terminator Grammar: E TE’ E’ +TE’ | T id Input : id + id $ Derivation: E Processing Stack:
Non-Recursive / Table Driven Input (String + terminator) Predictive Parsing Program Stack a + b $ Output NT + T symbols of CFG What actions parser should take based on stack / input Parsing Table M[A,a] X Y Z $ Empty stack symbol • General parser behavior: X : top of stack a : current input • 1. When X=a = $ halt, accept, success • 2. When X=a $ , POP X off stack, advance input, go to 1. • 3. When X is a non-terminal, examine M[X,a] • if it is an error call recovery routine • if M[X,a] = {X UVW}, POP X, PUSH W,V,U • DO NOT expend any input
Algorithm for Non-Recursive Parsing Set ip to point to the first symbol of w$; repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is terminal or $ then if X=a then pop X from the stack and advance ip else error() else /* X is a non-terminal */ if M[X,a] = XY1Y2…Ykthen begin pop X from stack; push Yk, Yk-1, … , Y1 onto stack, with Y1 on top output the production XY1Y2…Yk end else error() until X=$ /* stack is empty */ Input pointer May also execute other code based on the production used
Example E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id INPUT SYMBOL Non-terminal id + * ( ) $ E ETE’ ETE’ E’ E’+TE’ E’ E’ T TFT’ TFT’ T’ T’ T’*FT’ T’ T’ F Fid F(E) Our well-worn example ! Table M
Trace of Example STACK INPUT OUTPUT
Trace of Example STACK INPUT OUTPUT $E $E’T $E’T’F $E’T’id $E’T’ $E’ $E’T+ $E’T $E’T’F $E’T’id $E’T’ $E’T’F* $E’T’F $E’T’id $E’T’ $E’ $ id + id * id$ id + id * id$ id + id * id$ id + id * id$ + id * id$ + id * id$ + id * id$ id * id$ id * id$ id * id$ * id$ * id$ id$ id$ $ $ $ E TE’ T FT’ F id T’ E’ +TE’ T FT’ F id T’ *FT’ F id T’ E’ Expend Input
Leftmost Derivation for the Example The leftmost derivation for the example is as follows: E TE’ FT’E’ id T’E’ id E’ id + TE’ id + FT’E’ id + id T’E’ id + id * FT’E’ id + id * id T’E’ id + id * id E’ id + id * id
What’s the Missing Puzzle Piece ? Constructing the Parsing Table M ! 1st : Calculate First & Follow for Grammar 2nd: Apply Construction Algorithm for Parsing Table ( We’ll see this shortly ) Basic Tools: First:Let be a string of grammar symbols. First() is the set that includes every terminal that appears leftmost in or in any string originating from . NOTE: If , then is First( ). Follow: Let A be a non-terminal. Follow(A) is the set of terminals a that can appear directly to the right of A in some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A). * * *
Motivation Behind First & Follow Is used to help find the appropriate reduction to follow given the top-of-the-stack non-terminal and the current input symbol. First: Example: If A , and a is in First(), then when a=input, replace A with (in the stack). ( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH . Follow: Is used when First has a conflict, to resolve choices, or when First gives no suggestion. When or , then what follows A dictates the next choice to be made. * Example: If A , and b is in Follow(A ), then when and b is an input character, then we expand A with , which will eventually expand to , of which b follows! ( : i.e., First( ) contains .) * *
An example. STACK INPUT OUTPUT $S abbd$ S aB C d B CB | |S a C b
Computing First(X) : All Grammar Symbols • 1. If X is a terminal, First(X) = {X} • 2. If X is a production rule, add to First(X) • 3. If X is a non-terminal, and X Y1Y2…Yk is a production rule • Place First(Y1) in First(X) • if Y1 , Place First(Y2) in First(X) • if Y2 , Place First(Y3) in First(X) • … • if Yk-1 , Place First(Yk) in First(X) • NOTE: As soon as Yi , Stop. • Repeat above steps until no more elements are added to any First( ) set. • Checking “Yj ?”essentially amounts to checking whether belongs to First(Yj) * * * * *
Computing First(X) : All Grammar Symbols - continued • Informally, suppose we want to compute • First(X1 X2 … Xn ) = First (X1) “+” • First(X2) if is in First(X1) “+” • First(X3) if is in First(X2) “+” • … • First(Xn) if is in First(Xn-1) Note 1: Only add to First(X1 X2 … Xn) if is in First(Xi) for all i Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !
Example 1 Given the production rules: S i E t SS’ | a S’ eS | E b
Example 1 Given the production rules: S i E t SS’ | a S’ eS | E b Verify that First(S) = { i, a } First(S’) = { e, } First(E) = { b }
Example 2 E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id Computing First for:
Example 2 E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id Overall: First(E) = { ( , id } = First(F) First(E’) = { + , } First(T’) = { * , } First(T) First(F) = { ( , id } Computing First for: First(TE’) First(T) “+” First(E’) First(E) * Not First(E’) since T First(T) First(F) “+” First(T’) First(F) * Not First(T’) since F First((E)) “+” First(id) “(“ and “id”