1 / 73

Chapter 9

Chapter 9. Syntax Analysis. Contents. Context free grammars Top-down parsing Bottom-up parsing Attribute grammars Dynamic semantics Tools for syntax analysis Chomsky’s hierarchy. The Role of Parser. 9.1: Context Free Grammars.

kenyon
Télécharger la présentation

Chapter 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 9 Syntax Analysis SEG2101 Chapter 9

  2. Contents • Context free grammars • Top-down parsing • Bottom-up parsing • Attribute grammars • Dynamic semantics • Tools for syntax analysis • Chomsky’s hierarchy SEG2101 Chapter 9

  3. The Role of Parser SEG2101 Chapter 9

  4. 9.1: Context Free Grammars • A context free grammar consists of terminals, nonterminals, a start symbol, and productions. • Terminals are the basic symbols from which strings are formed. • Nonterminals are syntactic variables that denote sets of strings. • One nonterminal is distinguished as the start symbol. • The productions of a grammar specify the manner in which the terminal and nonterminals can be combined to form strings. • A language that can be generated by a grammar is said to be a context-free language. SEG2101 Chapter 9

  5. Example of Grammar SEG2101 Chapter 9

  6. Notational Conventions • Aho P.166 • Example P.167 EEAE|(E)|-E|id A+|-|*|/| SEG2101 Chapter 9

  7. Derivations • E-E is read “E derives -E” • E-E-(E)=-(id)is called aderivation of -(id) from E. • If A is a production and  and  are arbitrary strings of grammar symbols, we say A  . • If 12... n, we say 1derives n. SEG2101 Chapter 9

  8. Derivations (II) •  means “derives in one step.” •  means “derives in zero or more steps.” •  • if  and  then  •  means “derives in one or more steps.” • If S, where  may contain nonterminals, then we say that  is a sentential form. * * * * + * SEG2101 Chapter 9

  9. Derivations (III) • G: grammar, S: start symbol, L(G): the language generated by G. • Strings in L(G) may contain only terminal symbols of G. • A string of terminal w is said to be in L(G) if and only if Sw. • The string w is called a sentence of G. • A language that can be generated by a grammar is said to be a context-free language. • If two grammars generate the same language, the grammars are said to be equivalent. + + SEG2101 Chapter 9

  10. Derivations (IV) EEAE|(E)|-E|id A+|-|*|/| • The string -(id+id) is a sentence of the above grammar because E-E-(E+E)-(id+E)-(id+id) We write E-(id+id) * SEG2101 Chapter 9

  11. Parse Tree EE+E|E*E|(E)|-E|id SEG2101 Chapter 9

  12. Parse Tree (II) SEG2101 Chapter 9

  13. Two Parse Trees SEG2101 Chapter 9

  14. Ambiguity • A grammar that produces more than one parse tree for some sentence is said to be ambiguous. SEG2101 Chapter 9

  15. Eliminating Ambiguity • Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. • E.g. “match each else with the closest unmatched then” SEG2101 Chapter 9

  16. Eliminating Left Recursion • A grammar is left recursive if it has a nonterminal A such that there is a derivation AA for some string . • AA| can be replaced by A A’ A’A’| • AA1|A2 |… |Am|1|2|…|n| A1A’|2A’|…|nA’| A’1A’|2A’|… mA’| + SEG2101 Chapter 9

  17. Algorithm: Eliminating Left Recursion SEG2101 Chapter 9

  18. SAa|b AAc|Sd| AAc|Aad|bd| SAa|b AbdA’|A’ A’cA’|adA’| Examples SEG2101 Chapter 9

  19. Left Factoring • Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive parsing. • The basic idea is that when it is not clear which of two alternative productions to use to expand a nonterminal A, we may be able to rewrite the A-productions to defer the decision until we have seen enough of the input to make the right choice. • Stmt --> if expr then stmt else stmt • | if expr then stmt SEG2101 Chapter 9

  20. Algorithm: Left Factoring SEG2101 Chapter 9

  21. Left Factoring (example p178) • A1|2 • The following grammar abstracts the dangling-else problem: • SiEtS|iEtSeS|a • Eb SEG2101 Chapter 9

  22. 9.2: Top Down Parsing • Recursive-descent parsing • Predictive parsers • Nonrecursive predictive parsing • FIRST and FOLLOW • Construction of predictive parsing table • LL(1) grammars • Error recovery in predictive parsing (if time permits) SEG2101 Chapter 9

  23. Recursive-Descent Parsing • Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. • It can also viewed as an attempt to construct a parse tree for the input string from the root and creating the nodes of the parse tree in preorder. Grammar: Input string w = cad SEG2101 Chapter 9

  24. Predictive Parsers • By carefully writing a grammar, eliminating left recursion, and left factoring the resulting grammar, we can obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking, i.e., a predictive parser. ScAd AaA’ A’b| SEG2101 Chapter 9

  25. Predictive Parser (II) • Recursive-descent parsing is a top-down method of syntax analysis in which we execute a set of recursive procedures to process the input. • A procedures is associated with each nonterminal of a grammar. • Predictive parsing is what in which the look-ahead symbol unambiguously determines the procedure selected for each nonterminal. • The sequence of procedures called in processing the input implicitly defines a parse tree for the input. SEG2101 Chapter 9

  26. SEG2101 Chapter 9

  27. SEG2101 Chapter 9

  28. Nonrecursive predictive parsing SEG2101 Chapter 9

  29. Predictive Parsing Program SEG2101 Chapter 9

  30. Parsing Table M Grammar: Input: id + id * id SEG2101 Chapter 9

  31. Moves Made by Predictive Parser SEG2101 Chapter 9

  32. FIRST and FOLLOW • If  is any string of grammar symbols, FIRST() is the set of terminals that begin the strings derived from . If  then  is also in FIRST(). • FOLLOW(A), for nonternimal A, is the set of terminals a that can appear immediately to the right of A in some sentential form, i.e. the set of terminals a such that there exists a derivation of the form SAa for some  and . • If A can be the rightmost symbol in some sentential form, the $ is in FOLLOW(A). * SEG2101 Chapter 9

  33. Compute FIRST(X) SEG2101 Chapter 9

  34. Compute FOLLOW(A) SEG2101 Chapter 9

  35. Construction of Predictive Parsing Tables SEG2101 Chapter 9

  36. Example of Producing Parsing Table SEG2101 Chapter 9

  37. LL(1) Grammars • A grammar whose parsing table has no multiply-defined entries is said to be LL(1). • First L: scanning from left to right • Second L: producing a leftmost derivation • 1: using one input symbol of lookahead at each step to make parsing action decision. SEG2101 Chapter 9

  38. Properties of LL(1) • No ambiguous or left recursive grammar can be LL(1). • Grammar G is LL(1) iff whenever A| are two distinct productions of G and: • For no terminal a do both  and  derive strings beginning with a. FIRST()FIRST()= • At most one of  and  can derive the empty string. • If , the  does not derive any string beginning with a terminal in FOLLOW(A). FIRST(FOLLOW(A))FIRST(FOLLOW(A))= * SEG2101 Chapter 9

  39. LL(1) Grammars: Example SEG2101 Chapter 9

  40. Non-LL(1) Grammar: Example SEG2101 Chapter 9

  41. Error recovery in predictive parsing • An error is detected during the predictive parsing when the terminal on top of the stack does not match the next input symbol, or when nonterminal A on top of the stack, a is the next input symbol, and parsing table entry M[A,a] is empty. • Panic-mode error recovery is based on the idea of skipping symbols on the input until a token in a selected set of synchronizing tokens. SEG2101 Chapter 9

  42. How to select synchronizing set? • Place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it likely that parsing can continue. • We might add keywords that begins statements to the synchronizing sets for the nonterminals generating expressions. SEG2101 Chapter 9

  43. How to select synchronizing set? (II) • If a nonterminal can generate the empty string, then the production deriving  can be used as a default. This may postpone some error detection, but cannot cause an error to be missed. This approach reduces the number of nonterminals that have to be considered during error recovery. • If a terminal on top of stack cannot be matched, a simple idea is to pop the terminal, issue a message saying that the terminal was inserted. SEG2101 Chapter 9

  44. Example: error recovery “synch” indicating synchronizing tokens obtained from FOLLOW set of the nonterminal in question. If the parser looks up entry M[A,a] and finds that it is blank, the input symbol a is skipped. If the entry is synch, the the nonterminal on top of the stack is popped. If a token on top of the stack does not match the input symbol, then we pop the token from the stack. SEG2101 Chapter 9

  45. Example: error recovery (II) SEG2101 Chapter 9

  46. 9.3: Bottom Up Parsing and LR Parsers • Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves (bottom) and working up towards the root (top). • “Reducing” a string w to the start symbol of a grammar. • At each reduction step a particular substring machining the right side of a production is replaced by the symbol on the left of that production, and if the substring is chosen correctly at each step, a rightmost derivation is traced out in reverse. SEG2101 Chapter 9

  47. Example • Grammar: SaABe AAbc|b Bd • Reduction: abbcde aAbcde aAde aABe S SEG2101 Chapter 9

  48. Operator-Precedence Parsing Grammar for expression Can be rewritten as With the precedence relations inserted, id + id * id can be written as: SEG2101 Chapter 9

  49. LR(k) Parsers • L: left-to-right scanning of the input • R: constructing a rightmost derivation in reverse • k: the number of input symbols of lookahead that are used in making parsing decisions. SEG2101 Chapter 9

  50. LR Parsing SEG2101 Chapter 9

More Related