Chapter 9

Chapter 9 Syntax Analysis SEG2101 Chapter 9

Contents • Context free grammars • Top-down parsing • Bottom-up parsing • Attribute grammars • Dynamic semantics • Tools for syntax analysis • Chomsky’s hierarchy SEG2101 Chapter 9

The Role of Parser SEG2101 Chapter 9

9.1: Context Free Grammars • A context free grammar consists of terminals, nonterminals, a start symbol, and productions. • Terminals are the basic symbols from which strings are formed. • Nonterminals are syntactic variables that denote sets of strings. • One nonterminal is distinguished as the start symbol. • The productions of a grammar specify the manner in which the terminal and nonterminals can be combined to form strings. • A language that can be generated by a grammar is said to be a context-free language. SEG2101 Chapter 9

Example of Grammar SEG2101 Chapter 9

Notational Conventions • Aho P.166 • Example P.167 EEAE|(E)|-E|id A+|-|*|/| SEG2101 Chapter 9

Derivations • E-E is read “E derives -E” • E-E-(E)=-(id)is called aderivation of -(id) from E. • If A is a production and  and  are arbitrary strings of grammar symbols, we say A  . • If 12... n, we say 1derives n. SEG2101 Chapter 9

Derivations (II) •  means “derives in one step.” •  means “derives in zero or more steps.” •  • if  and  then  •  means “derives in one or more steps.” • If S, where  may contain nonterminals, then we say that  is a sentential form. * * * * + * SEG2101 Chapter 9

Derivations (III) • G: grammar, S: start symbol, L(G): the language generated by G. • Strings in L(G) may contain only terminal symbols of G. • A string of terminal w is said to be in L(G) if and only if Sw. • The string w is called a sentence of G. • A language that can be generated by a grammar is said to be a context-free language. • If two grammars generate the same language, the grammars are said to be equivalent. + + SEG2101 Chapter 9

Derivations (IV) EEAE|(E)|-E|id A+|-|*|/| • The string -(id+id) is a sentence of the above grammar because E-E-(E+E)-(id+E)-(id+id) We write E-(id+id) * SEG2101 Chapter 9

Parse Tree EE+E|E*E|(E)|-E|id SEG2101 Chapter 9

Parse Tree (II) SEG2101 Chapter 9

Two Parse Trees SEG2101 Chapter 9

Ambiguity • A grammar that produces more than one parse tree for some sentence is said to be ambiguous. SEG2101 Chapter 9

Eliminating Ambiguity • Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. • E.g. “match each else with the closest unmatched then” SEG2101 Chapter 9

Eliminating Left Recursion • A grammar is left recursive if it has a nonterminal A such that there is a derivation AA for some string . • AA| can be replaced by A A’ A’A’| • AA1|A2 |… |Am|1|2|…|n| A1A’|2A’|…|nA’| A’1A’|2A’|… mA’| + SEG2101 Chapter 9

Algorithm: Eliminating Left Recursion SEG2101 Chapter 9

SAa|b AAc|Sd| AAc|Aad|bd| SAa|b AbdA’|A’ A’cA’|adA’| Examples SEG2101 Chapter 9

Left Factoring • Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive parsing. • The basic idea is that when it is not clear which of two alternative productions to use to expand a nonterminal A, we may be able to rewrite the A-productions to defer the decision until we have seen enough of the input to make the right choice. • Stmt --> if expr then stmt else stmt • | if expr then stmt SEG2101 Chapter 9

Algorithm: Left Factoring SEG2101 Chapter 9

Left Factoring (example p178) • A1|2 • The following grammar abstracts the dangling-else problem: • SiEtS|iEtSeS|a • Eb SEG2101 Chapter 9

9.2: Top Down Parsing • Recursive-descent parsing • Predictive parsers • Nonrecursive predictive parsing • FIRST and FOLLOW • Construction of predictive parsing table • LL(1) grammars • Error recovery in predictive parsing (if time permits) SEG2101 Chapter 9

Recursive-Descent Parsing • Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. • It can also viewed as an attempt to construct a parse tree for the input string from the root and creating the nodes of the parse tree in preorder. Grammar: Input string w = cad SEG2101 Chapter 9

Predictive Parsers • By carefully writing a grammar, eliminating left recursion, and left factoring the resulting grammar, we can obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking, i.e., a predictive parser. ScAd AaA’ A’b| SEG2101 Chapter 9

Predictive Parser (II) • Recursive-descent parsing is a top-down method of syntax analysis in which we execute a set of recursive procedures to process the input. • A procedures is associated with each nonterminal of a grammar. • Predictive parsing is what in which the look-ahead symbol unambiguously determines the procedure selected for each nonterminal. • The sequence of procedures called in processing the input implicitly defines a parse tree for the input. SEG2101 Chapter 9

SEG2101 Chapter 9

Nonrecursive predictive parsing SEG2101 Chapter 9

Predictive Parsing Program SEG2101 Chapter 9

Parsing Table M Grammar: Input: id + id * id SEG2101 Chapter 9

Moves Made by Predictive Parser SEG2101 Chapter 9

FIRST and FOLLOW • If  is any string of grammar symbols, FIRST() is the set of terminals that begin the strings derived from . If  then  is also in FIRST(). • FOLLOW(A), for nonternimal A, is the set of terminals a that can appear immediately to the right of A in some sentential form, i.e. the set of terminals a such that there exists a derivation of the form SAa for some  and . • If A can be the rightmost symbol in some sentential form, the $ is in FOLLOW(A). * SEG2101 Chapter 9

Compute FIRST(X) SEG2101 Chapter 9

Compute FOLLOW(A) SEG2101 Chapter 9

Construction of Predictive Parsing Tables SEG2101 Chapter 9

Example of Producing Parsing Table SEG2101 Chapter 9

LL(1) Grammars • A grammar whose parsing table has no multiply-defined entries is said to be LL(1). • First L: scanning from left to right • Second L: producing a leftmost derivation • 1: using one input symbol of lookahead at each step to make parsing action decision. SEG2101 Chapter 9

Properties of LL(1) • No ambiguous or left recursive grammar can be LL(1). • Grammar G is LL(1) iff whenever A| are two distinct productions of G and: • For no terminal a do both  and  derive strings beginning with a. FIRST()FIRST()= • At most one of  and  can derive the empty string. • If , the  does not derive any string beginning with a terminal in FOLLOW(A). FIRST(FOLLOW(A))FIRST(FOLLOW(A))= * SEG2101 Chapter 9

LL(1) Grammars: Example SEG2101 Chapter 9

Non-LL(1) Grammar: Example SEG2101 Chapter 9

Error recovery in predictive parsing • An error is detected during the predictive parsing when the terminal on top of the stack does not match the next input symbol, or when nonterminal A on top of the stack, a is the next input symbol, and parsing table entry M[A,a] is empty. • Panic-mode error recovery is based on the idea of skipping symbols on the input until a token in a selected set of synchronizing tokens. SEG2101 Chapter 9

How to select synchronizing set? • Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it likely that parsing can continue. • We might add keywords that begins statements to the synchronizing sets for the nonterminals generating expressions. SEG2101 Chapter 9

How to select synchronizing set? (II) • If a nonterminal can generate the empty string, then the production deriving  can be used as a default. This may postpone some error detection, but cannot cause an error to be missed. This approach reduces the number of nonterminals that have to be considered during error recovery. • If a terminal on top of stack cannot be matched, a simple idea is to pop the terminal, issue a message saying that the terminal was inserted. SEG2101 Chapter 9

Example: error recovery “synch” indicating synchronizing tokens obtained from FOLLOW set of the nonterminal in question. If the parser looks up entry M[A,a] and finds that it is blank, the input symbol a is skipped. If the entry is synch, the the nonterminal on top of the stack is popped. If a token on top of the stack does not match the input symbol, then we pop the token from the stack. SEG2101 Chapter 9

Example: error recovery (II) SEG2101 Chapter 9

9.3: Bottom Up Parsing and LR Parsers • Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves (bottom) and working up towards the root (top). • “Reducing” a string w to the start symbol of a grammar. • At each reduction step a particular substring machining the right side of a production is replaced by the symbol on the left of that production, and if the substring is chosen correctly at each step, a rightmost derivation is traced out in reverse. SEG2101 Chapter 9

Example • Grammar: SaABe AAbc|b Bd • Reduction: abbcde aAbcde aAde aABe S SEG2101 Chapter 9

Operator-Precedence Parsing Grammar for expression Can be rewritten as With the precedence relations inserted, id + id * id can be written as: SEG2101 Chapter 9

LR(k) Parsers • L: left-to-right scanning of the input • R: constructing a rightmost derivation in reverse • k: the number of input symbols of lookahead that are used in making parsing decisions. SEG2101 Chapter 9

LR Parsing SEG2101 Chapter 9

Chapter 9

Chapter 9

Presentation Transcript

Chapter 9

CHAPTER 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

Chapter 9

CHAPTER 9

Chapter 9

Chapter 9

Chapter 9