1 / 93

Compilers

Compilers. 4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang. References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000.

asimpkins
Télécharger la présentation

Compilers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compilers 4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)

  2. Introduction • Context-free Grammar • The syntax of programming language constructs can be described by context-free grammar • Important aspects • A grammar serves to impose a structure on the linear sequence of tokens which is the program. • Using techniques from the field of formal languages, a grammar can be employed to construct a parser for it automatically. • Grammars aid programmers to write syntactically correct programs and provide answer to detailed questions about the syntax.

  3. The role of the parser

  4. Context-Free Grammars • A context-free grammar(CFG) is a compact, finite representation of a language, defined by the following four components: • A finite terminal alphabet Σ • A finite non-terminal alphabet N • A start symbol S N • A finite set of productions P

  5. A Simple Expression Grammar

  6. Leftmost Derivations • A sentential form produced via a leftmost derivation is called a left sentential form. • The production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation. Hence, these parsers are said to produce a leftmost parse. • Example: f(V+V) E lm Prefix(E)  lm f(E)  lm f(V Tail)  lm f(V+E)  lm f(V+V Tail)  lm f(V+V)

  7. Rightmost Derivations • As a bottom-up parser discovers the productions that derive a given token sequence, it traces a rightmost derivation, but the productions are applied in reverse order. • Called rightmost or canonical parse • Example: f(V+V) E rm Prefix(E)  rm Prefix(V Tail)  rm Prefix(V+E)  rm Prefix(V+V Tail)  rm Prefix(V+V)  rm f(V+V)

  8. Parse Tree • It is rooted by the start symbol S • Each node is either a grammar symbol or 

  9. Properties of CFGs • The grammar may include useless symbols • The grammar may allow multiple, distinct derivations (parse trees) for some input string. • The grammar may include strings that do not belong in the language, or the grammar may exclude strings that are in the language.

  10. Ambiguity (1) • Some grammars allow a derived string to have two or more different parse trees (and thus a nonunique structure). • Example: 1. Expr →Expr – Expr 2. | id • This grammar allows two different parse tree for id - id - id.

  11. Ambiguity (2)

  12. Parsers and Recognizers • Two approaches • A parser is considered top-down if it generates a parse tree by starting at the root of the tree, expanding the tree by applying productions in a depth-first manner. • The bottom-up parsers generate a parse tree by starting the tree’s leaves and working toward its root.

  13. Two approaches of Parser • Deterministic left-to-right top-down • LL method • Deterministic left-to-right bottom-up • LR method • Left-to-right • The sequence of tokens is processed from left to right • Deterministic • No searching is involved: each token brings the parser one step closer to the goal of constructing the syntax tree

  14. Parsers (Top-down)

  15. Parsers (bottom-Up)

  16. Pre-order and post-order (1) • The top-down method constructs the syntax tree in pre-order • The bottom-up method constructs the syntax tree in post-order

  17. Pre-order and post-order (2)

  18. Principles of top-down parsing • The main task of a top-down parser is to choose the correct alternatives for known non-terminals

  19. Principles of bottom-up parsing • The main task of a bottom-up parser is to repeatedly find the first node all of whose children have already been constructed.

  20. Creating a top-down parser manually • Recursive descent parsing • Simplest way but has its limitations

  21. Recursive descent parsing program (1)

  22. Recursive descent parsing program (2)

  23. Drawbacks • Three drawbacks • There is still some searching through the alternatives • The method often fails to produce a correct parser • Error handling leaves much to be desired

  24. Second problems (1) • Example 1 • Index_element will never be tried • IDENTIFIER ‘[‘

  25. Second problems (2) • Example 2 • The recognizer will not recognize ab

  26. Second problems (3) • Example 3 • Recursive descent parsers cannot handle left-recursive grammars

  27. Creating a top-down parser automatically • The principles of constructing a top-down parser automatically derive from those of writing one by hand, by applying precomputation. • Grammars which allow the construction of a top-down parser to be performed are called LL(1) grammars.

  28. LL(1) parsing • FIRST set • The sets of first tokens produced by all alternatives in the grammar. • We have to precompute the FIRST sets of all non-terminals • The first sets of the terminals are obvious. • Finding FIRST() is trivial when  starts with a terminal. • FIRST(N) is the union of the FIRST sets of its alternatives. • First()={a Σ|  * a}

  29. Predictive recursive descent parser • The FIRST sets can be used in the construction of a predictive parser because it predicts the presence of a given alternative without trying to find out if it is there.

  30. Closure algorithm for computing the FIRST set (1) • Data definitions

  31. Closure algorithm for computing the FIRST set (2) • Initializations

  32. Closure algorithm for computing the FIRST set (3) • Inference rules

  33. FIRST sets example(1) • Grammar

  34. FIRST sets example(2) • The initial FIRST sets

  35. FIRST sets example(3) • The final FIRST sets

  36. Another Example of First Set

  37. Another Example of First Set (II)

  38. Algorithms of Computing First(α)

  39. The predictive parser (1)

  40. The predictive parser (2)

  41. Practice • Find the FIRST sets of all alternative of the following grammar. • E -> TE’ • E’->+TE’| • T->FT’ • T’->*FT’| • F->(E)|id

  42. Nullable alternatives • A complication arises with the case label for the empty alternative (ex. rest_expression). Since it does not itself start with any token, how can we decide whether it is the correct alternative?

  43. FOLLOW sets • Follow sets • Determining the set of tokens that can immediately follow a given non-terminal N. • LL(1) parser • ‘LL’ because the parser works from Left to right identifying the nodes in what is called Leftmost derivation order. • ‘(1)’ because all choices are based on a one token look-ahead. • Follow(A)={b Σ |S+  Ab β}

  44. Closure algorithm for computing the FOLLOW sets

  45. The first and follow sets

  46. Another Example of Follow Set

  47. Another Example of Follow Set (II)

  48. Algorithm of Follow(A)

  49. Recall the predictive parser rest_expression  ‘+’ expression | FIRST(rest_expr) = {‘+’, } void rest_expression(void) {switch (Token.class) {case '+': token('+'); expression(); break;case EOF: case ')': break;default: error(); } } FOLLOW(rest_expr) = {EOF, ‘)’}

  50. LL(1) conflicts • Example • The codes

More Related