Syntax Analysis / Parser

Syntax Analysis / Parser • Parser works on a stream of tokens. • The smallest item is a token. token source program parse tree get next token

Syntax Analysis • At syntax analysis phase, a compiler verifies whether or not the tokens generated by the lexical analyzer are grouped according to the syntactic rules of the language. • If the tokens in a string are grouped according to the language’s rules of syntax, then the string of tokens generated by the lexical analyser is accepted as a valid construct of the language; otherwise an error handler is called. • Therefore, a suitable notation must be used to specify the constructs of a languageCFG

Syntax Analyzer • Syntax Analyzer creates the syntactic structure (Parse Tree)of the given source program. • The syntax of a programming is described by a CFG. We will use BNF notation in the description of CFGs. • The syntax analyzer checks whether a given source program satisfies the rules implied by a CFG or not. • If it satisfies, the parser creates the parse tree of that program. • Otherwise the parser gives an error messages. • A context-free grammar • gives a precise syntactic specification of a programming language. • the design of the grammar is an initial phase of the design of a compiler.

Parsers We categorize the parsers into two groups: • Top-Down Parser • the parse tree is created top to bottom, starting from the root. (from N.T  T) • Bottom-Up Parser • the parse tree is created bottom to top, starting from the leaves. (from TN.T) • Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).

Context-Free Grammars (CFG) • Inherently recursive structures of a programming language are defined by a CFG. • In a CFG, we have: • A finite set of terminals (in our case, this will be the set of tokens) • A finite set of non-terminals (syntactic-variables) • A finite set of productions rules in the following form A   where A is a non-terminal and  is a string of terminals and non-terminals (including the empty string) • A start symbol (one of the non-terminal symbol) • Example: E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id

Derivations E  E+E E+E derives from E • we can replace E by E+E • to able to do this, we have to have a production rule EE+E in our grammar. E  E+E  id+E  id+id “A sequence of replacements of non-terminal symbols is called derivation”. In general a derivation step is A  if there is a production rule A in our grammar where  and  are arbitrary strings of terminal and non-terminal symbols 1  2  ...  n (n derived from 1 or 1 derives n )  : derives in one step  : derives in zero or more steps  : derives in one or more steps * +

CFG - Terminology • L(G) is the language of G (the language generated by G) which is a set of sentences. • A sentence of L(G) is a string of terminal symbols of G. • If S is the start symbol of G then  is a sentence of L(G) iff S   where  is a string of terminals of G. • If G is a context-free grammar, L(G) is a context-free language. • Two grammars are equivalent if they produce the same language. • S   - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G. +

Derivation Example Let the grammar be E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id Elmd -E  -(E)  -(E+E)  -(E * E+E)  -(id * E+E)  -(id * id+E)  -(id*id+ id) Ermd -E  -(E) -(E*E)  -(E * E +E) -(E *E + id) -(E* id + id) -(id*id +id) • At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. • If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. • If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation. Exercise • Derive a sentence -(id + id* id) using LMD and RMD?

Left-Most and Right-Most Derivations Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) • Top-down parsers try to find the left-most derivation of the given source program. • Bottom-up parsers try to find the right-most derivation of the given source program in the reverse order.

E E E - E - E - E ( E ) ( E ) E + E E E - E - E ( E ) ( E ) E E + E + E id id id Parse Tree • Inner nodes of a parse tree are non-terminal symbols. • The leaves of a parse tree are terminal symbols. • A parse tree can be seen as a graphical representation of a derivation. E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)

Ambiguity • A grammar that produces more than one parse tree for a sentence is called as an ambiguous grammar. • Therefore, the grammar having the productions E→ E +E E → E * E E → id Is an ambiguous grammar because there exists more than one parse tree for the sentence id + id * id in L(G).

E E + E id * E E id id E * E E E + E id id id Two parse trees for id + id*id E  E+E  id+E  id+E*E  id+id*E  id+id*id E  E*E  E+E*E  id+E*E  id+id*E  id+id*id

Ambiguity – Operator Precedence Rewrite the grammar • Use different N for each precedence levels • Start with the lowest precedence EE-E|E/E|(E)|id Rewrite to: EE-E|T TT/T|F Fid|(E)

Conts.. W=id-id/id EE-E E-T E-T/T E-T/F E-T/id E-F/id E-id/id id-id/id

Workout E  E + E | E – E | E * E | E / E E  ( E ) E  id Rewrite the above P in operator precedence and parse this string w= id+id-id (both in RMD and LMD)

Left Recursion • A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string  • Top-down parsing techniques cannot handle left-recursive grammars. • So, we have to convert our left-recursive grammar into a grammar which is not left-recursive. • The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. +

Left-Recursion -- Problem • A grammar can be a direct and indirect left recursion • By just eliminating the immediate left-recursion, we may not get • a grammar which is not left-recursive. • S  Aa | b • A  Sc | d This grammar is not immediately left-recursive, • but it is still left-recursive. • S  Aa  Sca or • A  Sc  Aac causes to a left-recursion • So, we have to eliminate all left-recursions from our grammar

Eliminating Left –recursion • Eliminate -production • Eliminate cycles AA (one / more derivation of time if it produces A for A) +

Workout SXZ XXb|  ZaZ|ZX|  • What are the nullable variables? • What are the new productions?

Cont..(Eliminating Cycles) SXX|Y|X XaSb|b |S YYb |b|Z ZbZa|ba YY , ZZ SXX|Y|X XaSb|b |S YY’b|b|Z ZbZ’a|ba Y’Yb|b|Z Z’bZa|ba SXX|Y|X XaSb|b |S YYb Yb YZ ZbZa Zba

Immediate Left-Recursion where: A is a left-recursive nonterminal  is a sequence of nonterminals and terminals that is not null  is a sequence of nonterminals and terminals that does not start with A. replace the A-production by the production: And create a new nonterminal This newly created symbol is often called the "tail", or the "rest".

Immediate Left-Recursion -- Example For each rule which contains a left-recursive option, A  A c | Bc Introduce a new nonterminal A' and rewrite the rule as A  B c A' A'  | c A’

Left-Factoring • A predictive parser (a top-down parser without backtracking) insists that the grammar must be left-factored. grammar  a new equivalent grammar suitable for predictive parsing stmt  if expr then stmt else stmt | ????? if expr then stmt (which one to use ???)

Left-Factoring -- Steps • For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, let say A  1 | ... | n | 1 | ... | m convert it into A  A’ | 1 | ... | m A’  1 | ... | n

Left-Factoring (cont.) • In general, A  1 | 2 where  is non-empty and the first symbols of 1 and 2 are different. • when processing  we cannot know whether expand A to 1 or A to 2 • But, if we re-write the grammar as follows A  A’ A’  1 | 2 so, we can immediately expand A to A’

Left-Factoring – Example1 A  abB | aB | cdg | cdeB | cdfB  A  aA’ | cdg | cdeB | cdfB A’  bB | B  A  aA’ | cdA’’ A’  bB | B A’’  g | eB | fB

Left-Factoring – Example2 A  ad | a | ab | abc | b  A  aA’ | b A’  d |  | b | bc  A  aA’ | b A’  d |  | bA’’ A’’   | c

Syntax Analysis / Parser

Syntax Analysis / Parser

Presentation Transcript

Apple Computers

Job Analysis

Abstract Syntax

Analysis of Diction and Syntax

Semantic Analysis

Optimalitätstheorie am Phonologie-/Morphologie-/Syntax-Interface der slavischen Sprachen

Morphology: It's Relation with Phonology, Syntax and Semantics.

Introduction to C/C++

Syntax I Checklist

Compilation 0368-3133 (Semester A, 2013/14)

Motion Analysis Summer Course

5. Analysis

Formal Semantics

Implementando un Parser LR

Rule-Based Deduction

Lecture 2. Basic Java Syntax

Parsers

Code Generation

Semantic Analysis

Optimalitätstheorie am Phonologie-/Morphologie-/Syntax-Interface der slavischen Sprachen

The Historical Syntax of the Brythonic Languages