Understanding Context-Free Grammars and Parsing Techniques

Contents • Introduction • A Simple Compiler • Scanning – Theory and Practice • Grammars and Parsing • LL(1) Parsing • LR Parsing • Lex and yacc • Semantic Processing • Symbol Tables • Run-time Storage Organization • Code Generation and Local Code Optimization • Global Optimization

Chapter 4 Grammars and Parsing

Outline • Context-Free Grammars • Errors in Context-Free Grammars • Transforming Extended BNF Grammars • Parsers and Recognizers • Grammar Analysis Algorithms

Context-Free Grammars: Concepts and Notation • A context-free grammar G = (Vt, Vn, S, P) • A finite terminal vocabularyVt • The token set produced by scanner • A finite set of nonterminal vacabularyVn • Intermediate symbols • A start symbolSVn that starts all derivations • Also called goal symbol • P, a finite set of productions (rewriting rules) of the form AX1X2Xm • AVn, Xi  Vn∪ Vt, 1i m • A is a valid production

Context-Free Grammars: Concepts and Notation (Cont’d.) • Other notations • The vocabulary V of a CFG is the set of terminal and nonterminal symbols • V= Vn∪Vt • L(G), the set of strings derivable from S comprise • Context-free language of grammar G • Notational conventions • a, b, c,  denote symbols in Vt • A, B, C,  denote symbols in Vn • U, V, W,  denote symbols in V • , , , denote strings in V* • u, v, w,  denote strings in Vt*

Context-Free Grammars: Concepts and Notation (Cont’d.) • Derivation • One step derivation • If A, then A  • One-step derivation  • One or more steps derivation + • Zero or more steps derivation * • If S *, then  is said to be sentential form of the CFG. • SF(G) is the set of sentential forms of grammar G. • L(G) = {x Vt*| S+ x} • L(G) = SF(G) ∩Vt*; that is, the language of G is simply those sentential forms of G that are terminal strings.

EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail  G0 Context-Free Grammars: Concepts and Notation (Cont’d.) • Left-most derivation, a top-down parsers • lm,  lm+,  lm* • A sentential form produced via a leftmost derivation sequence is called a left sentential form. • E.g. of leftmost derivation of F(V+V) E lm Prefix(E)  lm F(E)  lm F(V Tail)  lm F(V+E)  lm F(V+V Tail)  lm F(V+V)

EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail  G0 Context-Free Grammars: Concepts and Notation (Cont’d.) • Right-most derivation (canonical derivation) • rm,  rm+,  rm* • Bottom-up parsers • A sentential form produced via a rightmost derivation sequence is called a right sentential form. • E.g. of rightmost derivation of F(V+V) E rm Prefix(E)  rm Prefix(V Tail)  rm Prefix(V+E)  rm Prefix(V+V Tail)  rm Prefix(V+V)  rm F(V+V) Same # of steps, but different order

Context-Free Grammars: Concepts and Notation (Cont’d.) • A parse tree • Rooted by the start symbol • Its leaves are grammar symbols or 

Context-Free Grammars: Concepts and Notation (Cont’d.) • Aphase of a sentential form is a sequence of symbols descended from a single nonterminal in the parse tree. • Simple or prime phrase • A simple phrase is a sequence of symbols directly derived form a nonterminal. • The handle of a sentential form is the left-most simple phrase.

Errors in Context-Free Grammars • CFGs are a definitional mechanism. They may have errors, just as programs may. • Flawed CFG • Useless nonterminals • Unreachable • Derive no terminal string SA|B Aa BBb Cc Nonterminal C cannot be reached form S Nonterminal B derives no terminal string Do exercise 7. S is the start symbol.

Errors in Context-Free Grammars (Cont’d.) • Ambiguous • Grammars that allow different parse trees for the same terminal string • It is impossible to decide whether a given CFG is ambiguous

Errors in Context-Free Grammars (Cont’d.) • It is impossible to decide whether a given CFG is ambiguous • For certain grammar classes, we can prove that constituent grammars are unambiguous • Wrong language • A general comparison algorithm applicable to all CFGs is known to be impossible

Transforming Extended BNF Grammars • Extended BNF  BNF • Extended BNF allows • Square bracket [] • Optional list {}

Parsers and Recognizers • Recognizer • An algorithm that does Boolean-valued test • Is this input syntactically valid? • Parser • Answers more general questions • Is this input valid? • And, if it is, what is its structure (parse tree)?

Parsers and Recognizers (Cont’d.) • Two general approaches to parsing • Top-down parser • Expanding the parse tree (via predictions) in a depth-first manner • Preorder traversal of the parse tree • Predictive in nature • lm • LL parser, recursive descent

Parsers and Recognizers (Cont’d.) • Bottom-up parser • Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the productions used to generate the leaves • Postorder traversal of the parse tree • rm • LR parser, shift-reduce parser

Parsers and Recognizers (Cont’d.) To parse begin SimpleStmt; SimpleStmt; end$

Parsers and Recognizers (Cont’d.) • Naming of parsing techniques • Top-down • LL • Bottom-up • LR The way to parse token sequence L: Leftmost R: Righmost

Grammar Analysis Algorithms • Goal of this section: • Discuss a number of important analysis algorithms for Grammars

Grammar Analysis Algorithms (Cont’d.) • The data structure of a grammar G

Grammar Analysis Algorithms (Cont’d.) • What nonterminals can derive ? A  BCD  BC  B   • An iterative marking algorithm

Grammar Analysis Algorithms (Cont’d.) • First() • The set of all the terminal symbols that can begin a sentential form derivable from  • If  is the right-hand side of a production, then First() contains terminal symbols that begin strings derivable from  First()={aVt|  * a} {if  *  then {} else }

根據定義, FIRST(X) 集合之計算可依下列三步驟而得: • 1. If XT, then FIRST(X) = {X}. • 2. If XN, X→, then add  to FIRST(X). • 3. If XN, and X → Y1 Y2 . . . Yn, then add all non- elements of FIRST(Y1) to FIRST(X),if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add  to FIRST(X).

文法 G 定義如下: E  TE’ E’  +TE’ |  T  FT’ T’ *FT’ |  F  (E) | id 則其 FIRST 求解如下: FIRST E ( id E’ +  T ( id T’ *  F ( id

Follow(A) • A is any nonterminal • Follow(A) is the set of terminals that may follow A in some sentential form Follow(A)={aVt|S*  Aa  } {if S + A then {} else }

根據定義, FOLLOW(X)集合之計算可依下列三步驟而得: • 1. Put $ into FOLLOW(S). • 2. For each A B, add all non- elements of FIRST() to FOLLOW(B). • 3. For each A B or A B, where FIRST(), add all of FOLLOW(A) to FOLLOW(B).

文法 G 定義如下: E  TE’ E’  +TE’ |  T  FT’ T’ *FT’ |  F  (E) | id 則其 FIRST 求解如下: FIRST F ( id T’ *  T ( id E’ +  E ( id FOLLOW之求解: FOLLOW E $ ) E’ $ ) T + $ ) T’ + $ ) F * + $ )

Grammar Analysis Algorithms (Cont’d.) • Definition of C data structures and subroutines • first_set[X] • contains terminal symbols and  • X is any single vocabulary symbol • follow_set[A] • contains terminal symbols and  • A is a nonterminal symbol

It is a subroutine of fill_first_set()

EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail  G0 The execution of fill_first_set() using grammar G0

EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail  G0 The execution of fill_follow_set() using grammar G0 $,) ( $,)

More examples The execution of fill_first_set() S  aSe S  B B  bBe B C C  cCe C  d The execution of fill_follow_set() $,e $,e $,e

More examples The execution of fill_first_set() S  ABc A  a A   B b B   The execution of fill_follow_set() $ b,c c

Understanding Context-Free Grammars and Parsing Techniques

Understanding Context-Free Grammars and Parsing Techniques

Presentation Transcript

Contents

Contents

Contents

Contents

Contents

Contents

Contents

CONTENTS

Contents

Contents