1 / 158

Chapter 4: Syntax Analysis

Chapter 4: Syntax Analysis . Csci 465. Objectives. Parser and its role in the design of compiler Techniques used to build hand implementation parses Top-down parsing LL parser Algorithms used to build automated parser generators Bottom-up parsing LR parser Simple LR (SLR) CFG

brice
Télécharger la présentation

Chapter 4: Syntax Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4: Syntax Analysis Csci 465 Csci465

  2. Objectives • Parser and its role in the design of compiler • Techniques used to build hand implementation parses • Top-down parsing • LL parser • Algorithms used to build automated parser generators • Bottom-up parsing • LR parser • Simple LR (SLR) • CFG • Derivations (leftmost and rightmost) • FISRT and FOLLOW • Error Recovery Handling Techniques

  3. Syntax Analysis • Every PL has a set of rules prescribing the syntactic structure of the programs written in that language • E.g., Pascal • Pascal Program is made out of Blocks • A block itself made out of statements • A statement is made out of expressions • An expression is made out of tokens • A token is made out of characters specified by RE

  4. Grammars • Grammars? • the set of structural rules that guides the composition of clauses, phrases and words in any given natural language • Formal Grammars? • A set of production rules for strings in a formal language.

  5. Significant of Grammars • Significant of Grammars • Provides a precise, easy-to-understand syntactic specifications • Automates the construction of an efficient parser • Supports evolvability of an existing language implementation by adding new programming constructs

  6. Parser vs scanner • Lexical analyzer • Recognizes token (terminal symbols) from the sequence of characters in an input string • Parser • Recognizes a set of related words (or phrases) • how theses words are combined to form syntactically correct program Csci465

  7. Limitation of Regular Expression (revisited) • Regular expressions and its recognizers are suitable for indentifying error at word level • E.g., • misspelling an identifier, keyword, or operator • RE can not be used to handle nested or balanced parentheses • E.g., an arithmetic expression with unbalanced parentheses

  8. Role of parser Source Pg Token/getchar() code Parse tree LA Parser Rest of FE Sym. Table

  9. Types of Parser • Universal Parsing methods • Cocke-Younger_Kasami Algorithm • Parse any grammar • Not very efficient to use production compilers • Top-down • LL parsers (hand-written) • Bottom-up • LR parsers (automated)

  10. Context Free Grammar (CFG) • Grammar can be used to describe most of syntax of PL • PLs allow sentence construction with nested and matched parentheses • Some PL construct can not be defined by Grammar • E.g., Define/use • These languages are specified by CFG • Every language defined by CFG can be recognized by Push Down Automata (PDA) or any Language accepted by PDA is CFG Csci465

  11. CFG and PDA • The focus here is on Context Free Language (CFL) that are accepted by PDAs • CFL: • languages defined by LL(K) Context-Free Grammars • LL? • parses the input from Left to Right, and constructs a Leftmost derivation of the sentence Csci465

  12. LL Parsing • What is LL(K) grammar? • A grammar from which we can construct a deterministic, top-down PDA that looks a head at most k symbols in the input tape • What is LL(1) grammar? • The most common form of LL(K) grammar • Looks a head at most one symbol • The easiest to convert into PDA Csci465

  13. Predicative Parsing Csci465

  14. PDA • A push-down automaton is formally defined as a 7-tuple as follows • P = (, Q, ▲, H, h0, q0, F) • : Alphabet • Q: states • ▲: transition functions • H: finite stack alphabet • h0: initial symbol in H • q0: Initial state • F: finite set of final state Csci465

  15. PDA • ▲ has the following functionality • T:Q()HQH* • i.e., every transition is defined for a particular state; • reads one input token or skip the input • always pops one symbol off the stack • moves to a new state • pushes a string of zero or more (i.e., *) symbols back onto the stack Csci465

  16. Model of PDA

  17. Example 1 • Let P0 = (={a, b, c}, Q={A,B,C}, ▲,H={h,i},h0= i, q0=A, F={ }) be PDA • Where ▲can be defined as follow • T(A, a, i) = (B, h) • T(B, a, h) = (B, hh) • T(C, b, h) = (C,) • T(A, c, i) = (A, ) • T(B, c, h) = (C, h) Csci465

  18. table

  19. Push Down Automata (PDA): Implementation • PDA used to implement top-down parser • Starts with the goal symbol on the stack • Rewrites the leftmost non-terminal until the leftmost symbol is a terminal matching the first token of the input string • Takes the transition that reads ( matches) that token • Repeats the process until the entire input has been read or PDA blocks Csci465

  20. Top-Down Parsing (revisited) • Top down parsing • Building a parse tree for input string • Starting from the root • Creating the nodes for the tree in preorder (depth first) fashion • Finding a leftmost derivation for an input

  21. Example: Grammar for Arithmetic expression

  22. Suppose G defined as follows: S  c A d A a b| a

  23. FIRST and FOLLOW • The construction of both top-down and bottom up parsers require two functions • FIRST() • FOLLOW() • These functions help to select the appropriate production

  24. FIRST and Follow Sets • To show a grammar is LL(K), need to build • Firstk(w) for all right hand sides w in the grammar’s production • Followk(N) for all nonterminals N in the grammar • Creat selection sets for all productions • First and Follow sets help to fill in the entries of the parsing table Csci465

  25. First and Follow Csci465

  26. FIRSTk(w) • The FIRSTK of any string w is the set of all terminal strings of K-tokens or fewer that can be derived from w • Firstk(uv) = FirstK(FirstK(u)FirstK(v)) • (i.e. first of u concatenated with first of v) • Firstk(N) = (FirstK(w)) • (i.e., the union of all first of N such that Nw is a production) • Firstk(x) = {x} • (i.e., for any terminal x) • Firstk() = {} • (i.e., for empty string) Csci465

  27. Example 1 • First2(uv) = First2(First2(u)First2(v)) • Where • First2(u)={ab, cd, d, dd, } • First2(v)={cc, d, } • therefore • First2(uv) is formed by concatenating each of the First(u) with First (v ) • {abcc, abd, ab, cdcc, cdd, cd, dcc, dd, d, ddcc, ddd, dd, cc, d, } • Take the first two char • {ab, ab, ab, cd, cd, cd, dc, dd, d, dd, dd, dd, cc, d, } • Removed the duplicates • First2(uv)={ab,cd,dc,dd,d,cc, } Csci465

  28. Example 2 • Consider the simple grammar G: • ABa • Bb • Bc • Get the First1(A) = First1(First1(B)First1(a)) • =First((First(b)First(c))First(a)) • =First1( {b,c}{a}) • =First1({ba,ca}) • ={b,c} • where • First(b)={b} • First(c)={c} • First(a)={a} Csci465

  29. Followk(A) • Followk of a nonterminal A • Refers to the set of all terminal strings of k-tokens that can follow whatever A derives Csci465

  30. Example: Follow set • For all production BuAv, the Followk(A) can be built • Followk(A) = (Firstk(Firstk(v)Followk(B)) • It means That • to construct the Follow(A), look in the grammar for all productions in which A occurs in the right hand side (r.h.s) and apply the following rules: • the FIRST of everything to the right of the A, including the Follow(B), where B is the non-terminal on L.H.S // BuAv • If A is the rightmost symbol in some sentential form, then add  (or $) to Follow(A). • If v is nonterminal, then everything in FIRST(v) except for  is placed in Follow(A) • If v derives  (v* ), Follow(A) = Follow(B)

  31. Follow: Example 1 • Consider the following grammar • SBx • AaA • Ab • ByAzA • Compute the Follow1(A)? Csci465

  32. Follow: Example 1 (solution) • Consider the following grammar • SBx • AaA • Ab • ByAzA • Compute the Follow1(A)? • Find All A on the R.H.S • Find any terminal right after A • Add the terminal, z, to the set = {z} • Find Follow of non-terminal on L.H.S of A • Follow(B)=First(x)= {x} • Follow(A) is L.H.S ignored? recursion • Follow(A)={x,z} Csci465

  33. Example 2: First and Follow • Consider the following grammar • ETE’ • E’+TE’ | • TFT’ • T’*FT’|  • F(E)| id Csci465

  34. Solution for FIRST() • FIRST (E)=FIRST(T)=FIRST(F)={(,id} • FIRST(E’)={+, } • FIRST(T’) = {*, } Csci465

  35. Solution for Follow() • Consider the following grammar • ETE’ • E’+TE’ | • TFT’ • T’*FT’|  • F(E)| id • FOLLOW(E)=FOLLOW(E’)={), } //applied rules 2, 1// • FOLLOW(T)=FOLLOW(T’)= {+, ), } // applied rules 3, 4// • FOLLOW(F) = {*, +, ), } // applied rules 3, 4// Csci465

  36. Selection Sets • The selection set of Selectk of a production is the set of lookahead strings of K tokens that assists the selection of that production in a deterministic top-down parser Csci465

  37. More on Selection • For each production in a grammar Aw Selectk(Aw)=Firstk (Firstk(w) Followk(A)) • A nonterminal A in a grammar is LL(K) iff • For any two selection sets S1 and S2of the productions A the following condition holds • S1S2 = {} • A grammar is LL(K) if every non-terminal in that grammar is LL(K)

  38. Example of Selection • Consider the simple grammar G • SaSb • S Csci465

  39. More on Selection • SaSb • Select1(SaSb ) = First1 (First1(aSb) Follow1(S)) • First1({a}  {$,b}) • $ is in follow because S is a goal symbol • First1 ({a$, ab}) • {a} Csci465

  40. Cont’ (S) • S • Select1(S) = • First1(First1()Follow1(S)) • First1({} Follow1(S)) • First1 ({} X{$,b}) • {$,b} • {$,b} {a} = {} • Which means they have no elements in common for two selections •  the G is LL(1) Csci465

  41. In Class Quiz • Consider the following grammar • SBx • AaA • Ab • ByAzA • BAA • Compute Follow1(A)? Csci465

  42. Converting CFG to PDA:1 • PDA can be constructed from a CFG as follows: • PDA. == CFG.  • PDA.H == N //finite stack alphabet • PDA.h0 == the goal symbol of CFG • PDA.Q = the only state and it halts on empty stack Csci465

  43. Converting CFG to PDA: 2 • Two rules • 1. T(q,x,x) = (q, ) (i.e., for every terminal x) • 2. T(q, , A) = (q, ) (i.e., replace non-terminal A by ) • Where  is a set of terminal and non-terminal symbols on R.H.S Csci465

  44. Example: From CFG to PDA • Consider the following G1 that generates all a’s followed by an equal number of b’s • L(G) ={aabb, aaabbb, …} • 1) SaSb • 2) S • First (S) = {a, } • Follow (S) = {b} Csci465

  45. Example 2: Transitions • Covert G1 to PDA • T(q,, S) = (q, aSb) • T(q,, S) = (q, ) • T(q,a, a) = (q, ) • T(q,b, b) = (q, ) Csci465

  46. Example2: Parsing • Input string: aabb • Cnfg0: (q, aabb,S) • Transitions: • T(q,, S) = (q, aSb) • T(q,, S) = (q, ) • T(q,a, a) = (q, ) • T(q,b, b) = (q, ) Use first Use follow Csci465

More Related