390 likes | 524 Vues
This document dives into the fundamental concepts of context-free grammars (CFGs), parsing methods, and their applications in compiler design. It covers grammar structures, derivation processes, and addresses common errors in CFGs. Theoretical and practical aspects of parsing, including top-down and bottom-up techniques, are explained with examples. Additionally, the role of tools like Lex and Yacc in semantic processing, symbol tables, and code generation is discussed, providing a comprehensive overview for students and professionals in computer science and software engineering.
E N D
Contents • Introduction • A Simple Compiler • Scanning – Theory and Practice • Grammars and Parsing • LL(1) Parsing • LR Parsing • Lex and yacc • Semantic Processing • Symbol Tables • Run-time Storage Organization • Code Generation and Local Code Optimization • Global Optimization
Outline • Context-Free Grammars • Errors in Context-Free Grammars • Transforming Extended BNF Grammars • Parsers and Recognizers • Grammar Analysis Algorithms
Context-Free Grammars: Concepts and Notation • A context-free grammar G = (Vt, Vn, S, P) • A finite terminal vocabularyVt • The token set produced by scanner • A finite set of nonterminal vacabularyVn • Intermediate symbols • A start symbolSVn that starts all derivations • Also called goal symbol • P, a finite set of productions (rewriting rules) of the form AX1X2Xm • AVn, Xi Vn∪ Vt, 1i m • A is a valid production
Context-Free Grammars: Concepts and Notation (Cont’d.) • Other notations • The vocabulary V of a CFG is the set of terminal and nonterminal symbols • V= Vn∪Vt • L(G), the set of strings derivable from S comprise • Context-free language of grammar G • Notational conventions • a, b, c, denote symbols in Vt • A, B, C, denote symbols in Vn • U, V, W, denote symbols in V • , , , denote strings in V* • u, v, w, denote strings in Vt*
Context-Free Grammars: Concepts and Notation (Cont’d.) • Derivation • One step derivation • If A, then A • One-step derivation • One or more steps derivation + • Zero or more steps derivation * • If S *, then is said to be sentential form of the CFG. • SF(G) is the set of sentential forms of grammar G. • L(G) = {x Vt*| S+ x} • L(G) = SF(G) ∩Vt*; that is, the language of G is simply those sentential forms of G that are terminal strings.
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 Context-Free Grammars: Concepts and Notation (Cont’d.) • Left-most derivation, a top-down parsers • lm, lm+, lm* • A sentential form produced via a leftmost derivation sequence is called a left sentential form. • E.g. of leftmost derivation of F(V+V) E lm Prefix(E) lm F(E) lm F(V Tail) lm F(V+E) lm F(V+V Tail) lm F(V+V)
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 Context-Free Grammars: Concepts and Notation (Cont’d.) • Right-most derivation (canonical derivation) • rm, rm+, rm* • Bottom-up parsers • A sentential form produced via a rightmost derivation sequence is called a right sentential form. • E.g. of rightmost derivation of F(V+V) E rm Prefix(E) rm Prefix(V Tail) rm Prefix(V+E) rm Prefix(V+V Tail) rm Prefix(V+V) rm F(V+V) Same # of steps, but different order
Context-Free Grammars: Concepts and Notation (Cont’d.) • A parse tree • Rooted by the start symbol • Its leaves are grammar symbols or
Context-Free Grammars: Concepts and Notation (Cont’d.) • Aphase of a sentential form is a sequence of symbols descended from a single nonterminal in the parse tree. • Simple or prime phrase • A simple phrase is a sequence of symbols directly derived form a nonterminal. • The handle of a sentential form is the left-most simple phrase.
Errors in Context-Free Grammars • CFGs are a definitional mechanism. They may have errors, just as programs may. • Flawed CFG • Useless nonterminals • Unreachable • Derive no terminal string SA|B Aa BBb Cc Nonterminal C cannot be reached form S Nonterminal B derives no terminal string Do exercise 7. S is the start symbol.
Errors in Context-Free Grammars (Cont’d.) • Ambiguous • Grammars that allow different parse trees for the same terminal string • It is impossible to decide whether a given CFG is ambiguous
Errors in Context-Free Grammars (Cont’d.) • It is impossible to decide whether a given CFG is ambiguous • For certain grammar classes, we can prove that constituent grammars are unambiguous • Wrong language • A general comparison algorithm applicable to all CFGs is known to be impossible
Transforming Extended BNF Grammars • Extended BNF BNF • Extended BNF allows • Square bracket [] • Optional list {}
Parsers and Recognizers • Recognizer • An algorithm that does Boolean-valued test • Is this input syntactically valid? • Parser • Answers more general questions • Is this input valid? • And, if it is, what is its structure (parse tree)?
Parsers and Recognizers (Cont’d.) • Two general approaches to parsing • Top-down parser • Expanding the parse tree (via predictions) in a depth-first manner • Preorder traversal of the parse tree • Predictive in nature • lm • LL parser, recursive descent
Parsers and Recognizers (Cont’d.) • Bottom-up parser • Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the productions used to generate the leaves • Postorder traversal of the parse tree • rm • LR parser, shift-reduce parser
Parsers and Recognizers (Cont’d.) To parse begin SimpleStmt; SimpleStmt; end$
Parsers and Recognizers (Cont’d.) • Naming of parsing techniques • Top-down • LL • Bottom-up • LR The way to parse token sequence L: Leftmost R: Righmost
Grammar Analysis Algorithms • Goal of this section: • Discuss a number of important analysis algorithms for Grammars
Grammar Analysis Algorithms (Cont’d.) • The data structure of a grammar G
Grammar Analysis Algorithms (Cont’d.) • What nonterminals can derive ? A BCD BC B • An iterative marking algorithm
Grammar Analysis Algorithms (Cont’d.) • First() • The set of all the terminal symbols that can begin a sentential form derivable from • If is the right-hand side of a production, then First() contains terminal symbols that begin strings derivable from First()={aVt| * a} {if * then {} else }
根據定義, FIRST(X) 集合之計算可依下列三步驟而得: • 1. If XT, then FIRST(X) = {X}. • 2. If XN, X→, then add to FIRST(X). • 3. If XN, and X → Y1 Y2 . . . Yn, then add all non- elements of FIRST(Y1) to FIRST(X),if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add to FIRST(X).
文法 G 定義如下: E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id 則其 FIRST 求解如下: FIRST E ( id E’ + T ( id T’ * F ( id
Follow(A) • A is any nonterminal • Follow(A) is the set of terminals that may follow A in some sentential form Follow(A)={aVt|S* Aa } {if S + A then {} else }
根據定義, FOLLOW(X)集合之計算可依下列三步驟而得: • 1. Put $ into FOLLOW(S). • 2. For each A B, add all non- elements of FIRST() to FOLLOW(B). • 3. For each A B or A B, where FIRST(), add all of FOLLOW(A) to FOLLOW(B).
文法 G 定義如下: E TE’ E’ +TE’ | T FT’ T’ *FT’ | F (E) | id 則其 FIRST 求解如下: FIRST F ( id T’ * T ( id E’ + E ( id FOLLOW之求解: FOLLOW E $ ) E’ $ ) T + $ ) T’ + $ ) F * + $ )
Grammar Analysis Algorithms (Cont’d.) • Definition of C data structures and subroutines • first_set[X] • contains terminal symbols and • X is any single vocabulary symbol • follow_set[A] • contains terminal symbols and • A is a nonterminal symbol
It is a subroutine of fill_first_set()
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 The execution of fill_first_set() using grammar G0
EPrefix(E) EV Tail PrefixF Prefix Tail+E Tail G0 The execution of fill_follow_set() using grammar G0 $,) ( $,)
More examples The execution of fill_first_set() S aSe S B B bBe B C C cCe C d The execution of fill_follow_set() $,e $,e $,e
More examples The execution of fill_first_set() S ABc A a A B b B The execution of fill_follow_set() $ b,c c