270 likes | 458 Vues
Chapter 3. Context-Free Grammar and Parsing. The Parsing Process. Parsing is the task of determining the syntax, or structure, of a program, so it is called syntax analysis. The syntax of a programming language is usually given by the grammar rules of a context free-grammar.
E N D
Chapter 3 Context-Free Grammar and Parsing
The Parsing Process • Parsing is the task of determining the syntax, or structure, of a program, so it is called syntax analysis. • The syntax of a programming language is usually given by the grammar rules of a context free-grammar. • The rules of context free grammar is recursive. • The data structures used to represent the syntactic structure of a language is called parse tree or syntaxtree. • Two general categories of parsing algorithms: top-down parsing and bottom-up parsing. • The parsing process may be viewed as Sequence of Tokens Syntax Tree Parser CS302 Ch3
Context-Free Grammar Terminology • An alphabet or set of basic symbols (like regular expressions, only now the symbols are whole tokens, not chars), including . (Terminals) • A set of names for structures (like statement, expression, definition). (Non-terminals) • A set of grammar rules expressing the structure of each name. (Productions) • A start symbol (the name of the most general structure compilation unit in C). CS302 Ch3
Context-Free Grammars • Example: exp exp op exp І( exp )Іnumber op +І–І* • Names are written in italic. • Choice and concatenation similar as regular expression. • Repetition represented by recursion. • Arrows replaces the equal sign. • Grammar rules in this form is called Backus-Naur Form or BNF notation. CS302 Ch3
2 non-terminals 6 terminals “Base” rule Recursive rules Example expexp op exp | (exp ) | number op + | - | * 6 productions (3 on each line) • In what way does such a Context-Free Grammar differ from a regular expression? • digit = 0|1|…|9 • number = digit digit* • Recursion! CS302 Ch3
Derivations • A derivation is a sequence of replacements of structure names by choices on the right-hand sides of grammar rules. • The arithmetic expression (34 – 3)*42 corresponds to the legal string (number – number)*number • (1) exp exp op exp [exp exp op exp] (2) exp op number [exp number] (3) exp * number [op *] (4) (exp) * number [exp (exp)] (5) (exp op exp)*number [exp exp op exp] (6) (exp op number)*number [exp number] (7) (exp – number)*number [op - ] (8) (number – number)*number [exp number] CS302 Ch3
Abstract the Structure of Derivation to a Parse Tree CS302 Ch3
Definitions • Start symbol is the right-hand side of the first grammar rule of the language, that initiate the other rules. • Nonterminals is a structure names that must be replaced further on the derivation. • Terminals is symbols in the alphabet that terminate the derivation. • Left recursion A AαІβ • Right recursion A αA Іβ CS302 Ch3
Repetition and Recursion • Left recursion: A A x | y • yxx: • Right recursion: A x A | y • xxy: CS302 Ch3
Parsing Algorithms • Top down • Recursive descent (hand choice) • “Predictive” table-driven, “LL” • Bottom up • “LR” and its cousin “LALR” (machine-generated choice [Yacc / Bison]) • Operator-precedence. CS302 Ch3
Languages Generated by Grammars 1- G : E (E) І a L(G) = { a, (a), ((a)), (((a))), …….} derivation for the input string ((a)) E (E) ((E)) ((a)) 2- G : E (E) L(G) = { } the grammar yields no strings. 3- G : E E + a І a L(G) = { a, a +a, a + a + a, ……} derivation for the input string a + a +a E E + a E + a + a a + a + a CS302 Ch3
Examples CS302 Ch3
Parse Tree • A parse tree corresponding to a derivation is a labeled tree in which the interior nodes are labeled by, the leaf nodes are labeled by terminals, and the children of each internal node nonterminals represent the replacement of the associated nonterminal in one step of the derivation. • exp exp op exp number op exp number + exp number + number exp exp op exp number + number CS302 Ch3
Rightmost and Leftmost Derivation Leftmost or preorder 1 exp 1 exp exp op exp 2 number op exp 3 number + exp 4 number + number exp op exp 4 2 3 number + number Rightmost or postorder 1 exp 1 exp exp op exp 2 exp op number 3 exp + number 4 number + number exp op exp 2 4 3 number + number CS302 Ch3
Example A leftmost derivation (Slide 6 was a rightmost): (1) exp exp op exp [exp exp op exp] (2) (exp) op exp [exp ( exp )] (3) (exp op exp) op exp [exp exp op exp] (4) (number op exp) op exp [exp number] (5) (number - exp) op exp [op -] (6) (number - number) op exp [exp number] (7) (number - number) * exp [op *] (8) (number - number) * number [exp number] CS302 Ch3
Abstract Syntax Trees • An abstracted syntax tree, or syntax tree is a tree representation of a shorthand notation for the structure of ordinary syntax. • Statement if-stmt Іother if-stmt if(exp) statement Іif (exp) statement else statement exp 0І1 • Input : if (0) other else other Parse tree Syntax tree statement If If-stmt 0 other other if ( exp ) statement else statement 0 other other CS302 Ch3
Examples • G: stmt-sequence stmt ; stmt-sequence І stmt stmt s • Input string : s ; s ; s Syntax Tree Parse tree Stmt-sequence ; Stmt-sequence stmt ; s ; stmt ; Stmt-sequence s s s stmt s s CS302 Ch3
Correctone Ambiguous Grammars • Parse tree s and syntax trees uniquely express the structure of syntax, as do leftmost and rightmost derivations, but not derivations in general. • A grammar that generates a string with two distinct parse trees is called ambiguous grammar. • Consider again the string number – number * number CS302 Ch3
Ambiguity • Sources of Ambiguity • Associativity and precedence of operators. • Extent of a substructure (dangling else). • Dealing with ambiguity • Disambiguating rules: state a rule that specifies in each ambiguous case which of the parse trees is the correct one. • Change the grammar (but not the language): this implies changing the grammar into a form that forces the construction of the correct parse tree. CS302 Ch3
Precedence and Associativity • Example: integer arithmetic expexp addop term | term addop + | - term term mulop factor | factor mulop * factor (exp ) | number exp exp addop term factor term - term mulop number factor factor * number number CS302 Ch3
Dangling else Ambiguity • Example: statement if-stmt | other if-stmt if(exp ) statement | if(exp )statement elsestatement exp 0 | 1 The following string has two parse trees: if(0) if(1) other else other CS302 Ch3
Correct one Parse Trees for Dangling else Using the most closely nested disambiguity rule CS302 Ch3
Changing the Grammar Rule for Dangling else Problem The grammar becomes: statement matched-stmt | unmatched-stmt matched-stmt if(exp ) matched-stmt else matched-stmt | other unmatched-stmt if(exp ) statement | if(exp ) matched-stmt elseunmatched-stmt exp 0 | 1 CS302 Ch3
Parse Tree for the Solution Input string: if(0) if(1) other else other statement Unmatched-stmt ) exp statement if ( Matched-stmt 0 if ( ) Matched-stmt else Matched-stmt exp 1 other other CS302 Ch3
Extended BNF Notation • Extended BNF (EBNF): • New metasymbols […] and {…} • largely eliminated by these. • Repetition: A AαІβ (Left recursion) A αA Іβ (Right recursion) • This is equivalent to: A β α* A α* β • Using EBNF notation: A β {α} A {α} β CS302 Ch3
Extended BNF Notation • Example: stmt-sequence stmt ; stmt-sequence І stmt • Using EBNF: stmt-sequence { stmt ; } stmt (right recursion) stmt-sequence stmt { ; stmt} (left recursion) • Optional: using previous example stmt-sequence stmt [ ; stmt-sequence ] • Example: exp exp addop term І term using EBNF: exp [exp addop ] term CS302 Ch3
Syntax Diagram • Example: factor ( exp ) Іnumber • Repetition: A {B } • Optional: A [ B ] A B A B CS302 Ch3