Compiler Construction Syntax Analysis. E. E. +. E. num. (. E. ).
Goals of parsing • Every Programming language has syntactic rules • Decide whether program satisfies syntactic structure • Error detection • Error recovery • Simplification: rules on tokens • Build Abstract Syntax Tree
E E + E num ( E ) + E * E num id num * 7 x Ex. From text to abstract syntax tree 5 + (7 * x) program text Lexical Analyzer token stream Grammar ( CFG)E id E num E E+EE E*EE ( E ) Parser parse tree valid syntaxerror Abstract syntax tree
CFG context free Grammar: is collection of G=(V T P S) Page Grammar rules:E id E num E E+EE E*EE ( E ) Symbols:terminals (tokens)+ * ( )id numnon-terminals E Parse tree: Derivation:EE + E1+ E1+ E + E1+2+ E 1+2*3 E E + E 1 * E E 3 2
Production rules actually defines the language • Ex. Let the Language L= anbnwhere n=>1 • G=(VTPS) v={S} • T={a,b} give production rule S aSb S ab
Introduction • A Top-down parser tries to create a parse tree from the root towards the leafs scanning input from left to right • It can be also viewed as finding a leftmost derivation for an input string • Example: id+id*id E E E E E E E -> TE’ E’ -> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id lm lm lm lm lm T E’ T E’ T E’ T E’ T E’ F T’ F T’ F T’ F T’ + T E’ id id id Ɛ Ɛ
Top down parsing • Parse tree is generated from top to bottom (root to leaves) • Consider grammar • S xPz • Pyw|y • Consider input string xyz • Construct parse tree for above grammar • Step 1 : explore start symbol S • Step 2 : P as yw -------- not matched with string • Go back try Py
1.backtracking will try different production rule to find the match for the i/p string by backtracking each time . It is powerful than predictive parser It is slower.
Predictive parser • Tries to predict the next construction using one or more lookahead symbols from i/p string. • 1 Recursive decent • 2 LL(1) parser
Recursive decent parser • It consist of a set of procedures. One for each non terminal • CFG is used to build recursive routines • RHS of production rule is directly converted to program • Execution begins with the procedure for start symbol
A typical procedure for non terminal • Void A() • { Choose an A production , AX1 X2…. Xk for(i=1 to k) { if (Xi is non-terminal call procedure Xi(); else if (Xi == current input symbol a ) advance the i/p to next symbol. else error has occured
In general form it cant choose an A production Easily so we need to try all alternatives • If one failed the i/p ptr needs to be reset and another alternative should be tried • Ex- E num T • T* num T |ɛ • i/p string • 3 * 4 $
i/p string use production rule • 3 * 4 $ Enum T • 3 * 4 $ T* num T • 3 * 4 $ T* num T • 3 * 4 $ Declare success • halt
Predictive LL(1) parser • It is non recursive • Here a parsing table is built
Parsing methods • LL(1) • “L” – left-to-right scan of input • “L” – uses leftmost derivation for i/p string • “1” – (Look ahead) it uses only one symbol to predict parsing Process so that it can decide which production to apply • predict based on one token look-ahead • diagram ???? • It uses following data structures • stack, input buffer, parsing table
Model of LL(1) parser Input token a + b $ LL(1) Parser TOP output Parsing table stack
The stack is used to hold the left sentential form, Aϒ • The symbols in RHS of rule are pushed into the stack in reverse order. (from R-L) APQR then push R,Q,P • Thus stack makes this algorithm non recursive • Table entries are M[A, a] – A non terminal a - current i/p symbol. • The parser works as follows • 1 It reads top of the stack and current i/p symbol. With the help of these two symbols the parsing action is determined. These actions can be
Construction of Predictive LL(1) parser • The construction of predictive LL(1) parser is based on two very important functions FIRST and FOLLOW • Overall construction steps • 1. computations of FIRST and FOLLOW function • 2. construct the Predictive parsing table using FIRST and FOLLOW function • 3. Parse the i/p string with the help of Predictive Parsing table
Compute FIRST • To compute FIRST(X) for all grammar symbols X, apply following rules until no more terminals or ɛcan be added to any First set: FIRST(A) is a set of terminal symbols that are first symbols appearing at RHS in derivation of A If Aɛ then empty is also in FIRST(A) • If a is a terminal symbol then then First(a) = {a}. • If X->ɛis a production then add ɛto First(X) • For the rule AX1 X2 X3……Xk then FIRST(A)= First(X1) U First(X2)…
Follow • Follow(A) is defined as the set of terminal symbols that appear immediately to the right of A, in other words. • Follow (A)={a| S αAaβ} • compute Follow(A) for all nonterminals A, apply following rules until nothing can be added to any follow set: • Place $ in Follow(S) where S is the start symbol • If there is a production A-> αBβ then everything in First(β) except ɛ is in Follow(B). • If there is a production A-> αB or a production A->αBβ where First(β) ={ ɛ}, then everything in Follow(A) is in Follow(B) or Follow(B)=Follow(A) that means everything in Follow(A) is in Follow(B)
Consider grammar • ETE’ • E’+TE’| ɛ • TFT’ • T’*FT’| ɛ • F(E)| id • Find first and follow
Construction of predictive parsing table • For each production A->α in grammar do the following: • For each terminal a in First(α) create entry M[A,a] =Aα • If ɛ is in First(α), create entry M[A,b]= Aα where b is the symbol from Follow(A) • If ɛ is in First(α) and $ is in Follow(A) then create entry in the table M[A,$]= Aα • All the remaining entries in the table M are marked as SYNTAXerror
Example First Follow F T E E’ T’ {(,id} {+, *, ), $} E -> TE’ E’ -> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id {(,id} {+, ), $} {(,id} {), $} {+,ɛ} {), $} {+, ), $} {*,ɛ} Input Symbol Non - terminal id + ( ) $ * E E’ T T’ F E -> TE’ E -> TE’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ T -> FT’ T -> FT’ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ F -> (E) F -> id
Grammar hierarchy Non-ambiguous CFG LALR(1) LL(1) SLR(1) LR(0)
If u want to write a compiler program there are some commonly used complier construction tools include • 1. Parser generators : that automatically produce syntax analyzer from a grammatical Description of a programming language. 2. Scanner generator : that produce lexical analyzers from a regular expression description of the tokens of the language 3. Syntax directed translation engines that produce collections of routines for walking a parse tree and generating intermediate code.
4. Code generator generators: that produce a code generator from a collection of rules for translating each operation of the intermediate language into the m/c language for a target m/c • 5.Data flow analysis : engines that facilitate the gathering of information about how values are transmitted from one part of program to each other part . Data-flow analysis is a key part of code optimization • 6.Compiler construction toolkits that provide an integrated set of routines for constructing various phases of compiler.
Error handling • Common programming errors • Lexical errors • Syntactic errors • Semantic errors • Lexical errors • Error handler goals • Report the presence of errors clearly and accurately • Recover from each error quickly enough to detect subsequent errors • Add minimal overhead to the processing of correct progrms
Error-recover strategies • Panic mode recovery • Discard input symbol one at a time until one of designated set of synchronization tokens is found • Phrase level recovery • Replacing a prefix of remaining input by some string that allows the parser to continue • Error productions • Augment the grammar with productions that generate the erroneous constructs • Global correction • Choosing minimal sequence of changes to obtain a globally least-cost correction
Error recovery from syntax error strategies • There are various methods for error recovery some strategies are • Panic Mode: • This is the simplest method • It is used by most parsing methods • In this method on discovering error the parser discards input symbol one at a time . This process is continued until one of the designed set of synchronizing token is found. Synchronizing tokens are ; or . . These tokens indicate end of input statement. • Thus in panic mode recovery a considerable amount of input is skipped without checking it for additional errors. • This method guarantees not to go in infinite loop. • If there is less no. of errors in the same statement then this strategy is best choice.
Phrase level recovery • In this method on discovering error, the parser perform local correction on remaining input • It can replace a prefix of remaining input by some string some local corrections are replacing , by ; deletion of extra semicolon or inserting extra semicolon etc • the type of local correction is decided by compiler designer • But while performing local correction we must be careful to choose replacement that do not lead to infinite loops. • The drawback is that it finds difficult to handle the situations where the actual error has occurred before the point of detection.
Error production ….(used to display error message) • If we have a knowledge of common errors that can be encountered then we can incorporate these errors by augmenting the grammar with error productions. • If error prodcution is used then during parsing ,we can generate appropriate error message and parsing can be continued. • This is extremely difficult to maintain. Because if we change the grammar then it becomes necessary to change corresponding error productions.
Global productions • We often want such a compiler that makes very few changes in processing an incorrect input string • We expect less no. of insertions, deletions and changes of tokens to recover from erroneous input. • Such methods increase time and space requirements at parsing time. • Choosing minimal sequence of changes to obtain a globally least-cost correction