1 / 52

Chapter 3

Chapter 3. Chang Chi-Chung 2007.4.12. The Role of the Lexical Analyzer. Token. Lexical Analyzer. Parser. Source Program. getNextToken. error. error. Symbol Table. The Reason for Using the Lexical Analyzer. Simplifies the design of the compiler

jase
Télécharger la présentation

Chapter 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Chang Chi-Chung 2007.4.12

  2. The Role of the Lexical Analyzer Token LexicalAnalyzer Parser SourceProgram getNextToken error error Symbol Table

  3. The Reason for Using the Lexical Analyzer • Simplifies the design of the compiler • LL(1) or LR(1) parsing with 1 token lookahead would not be possible (multiple characters/tokens to match) • Compiler efficiency is improved • Systematic techniques to implement lexical analyzers by hand or automatically from specifications • Stream buffering methods to scan input • Compiler portability is enhanced • Input-device-specific peculiarities can be restricted to the lexical analyzer.

  4. Tokens, Patterns, and Lexemes • Token (符號單元) • A pair consisting of a token name and optional arrtibute value. • Example: num, id • Pattern (樣本) • A description of the form for the lexemes of a token. • Example: “non-empty sequence of digits”, “letter followed by letters and digits” • Lexeme (詞) • A sequence of characters that matches the pattern for a token. • Example: 123, abc

  5. Example: Tokens, Patterns, and Lexemes

  6. Input Buffering lexemeBegin forward Sentinels

  7. Strings and Languages • Alphabet • An alphabet is a finite set of symbols (characters) • String • A stringis a finite sequence of symbols from  • s denotes the length of string s •  denotes the empty string, thus  = 0 • Language • A language is a countable set of strings over some fixed alphabet  • Abstract Language Φ • {ε}

  8. String Operations • Concatenation (連接) • The concatenation of two strings x and y is denoted by xy • Identity (單位元素) • The empty string is the identity under concatenation. • s = s = s • Exponentiation • Define s0 = si = si-1s for i > 0 • By Define s1 = s s2 = ss

  9. Language Operations • UnionL M = { ssL or sM } • ConcatenationL M = { xyx L and yM} • ExponentiationL0 = {  } Li = Li-1L • Kleene closure(封閉包)L* = ∪i=0,…,Li • Positive closureL+ = ∪i=1,…,Li

  10. Regular Expressions • Regular Expressions • A convenient means of specifying certain simple sets of strings. • We use regular expressions to define structures of tokens. • Tokens are built from symbols of a finite vocabulary. • Regular Sets • The sets of strings defined by regular expressions.

  11. Regular Expressions • Basis symbols: •  is a regular expression denoting language L() = {} • a   is a regular expression denoting L(a) = {a} • If r and s are regular expressions denoting languages L(r) and M(s) respectively, then • rs is a regular expression denoting L(r)  M(s) • rs is a regular expression denoting L(r)M(s) • r* is a regular expression denoting L(r)* • (r) is a regular expression denoting L(r) • A language defined by a regular expression is called a regular set.

  12. Operator Precedence

  13. Algebraic Laws for Regular Expressions

  14. Regular Definitions • If Σ is an alphabet of basic symbols, then a regular definitions is a sequence of definitions of the form: d1 r1 d2 r2 …dn rn • Each di is a new symbol, not in Σ and not the same as any other of d’s. • Each ri is a regular expression over the alphabet  {d1, d2, …, di-1 } • Any dj in rican be textually substituted inrito obtain an equivalent set of definitions

  15. Example: Regular Definitions Regular Definitions letter_ A | B | … | Z | a | b | … | z | _ digit 0 | 1 | … | 9 id letter_ ( letter_ | digit )* Regular definitions are not recursivedigits  digit digits digit wrong

  16. Extensions of Regular Definitions • One or more instance • r+ = rr* = r*r • r* = r+ | ε • Zero or one instance • r?=r |ε • Character classes • [a-z] = abc…z • [A-Za-z] = A|B|…|Z|a|…|z • Example • digit  [0-9] • num  digit+ (. digit+)? ( E (+-)? digit+ )?

  17. Regular Definitions and Grammars Context-FreeGrammars stmt ifexprthenstmtif exprthenstmtelsestmtexpr  term reloptermtermterm  idnum ws ( blank | tab | newline )+ Regular Definitions digit [0-9] letter [A-Za-z] if if then then else elserelop <<=<>>>== id letter ( letter | digit )* num digit+ (. digit+)? ( E (+ | -)? digit+ )?

  18. Transition Diagrams relop <<=<>>>== start < = 0 1 2 return(relop, LE) > 3 return(relop, NE) other * 4 return(relop, LT) = return(relop, EQ) 5 > = 6 7 return(relop, GE) other * 8 return(relop, GT)

  19. Transition Diagrams id letter ( letter | digit )* letter or digit * other start letter 9 10 11 return (getToken(), installID() )

  20. Finite Automata • Finite Automata are recognizers. • FA simply say “Yes” or “No” about each possible input string. • A FA can be used to recognize the tokens specified by a regular expression • Use FA to design of a Lexical Analyzer Generator • Two kind of the Finite Automata • Nondeterministic finite automata (NFA) • Deterministic finite automata (DFA) • Both DFA and NFA are capable of recognizing the same languages.

  21. NFA Definitions • NFA = { S, , , s0, F } • A finite set of statesS • A set of input symbols Σ • input alphabet, ε is not in Σ • A transition function  •  : S  S • A special start state s0 • A set of final states F, F  S (accepting states)

  22. Transition Graph for FA is a state is a transition is a the start state is a final state

  23. 3 Example a 0 1 2 a b c c • This machine accepts abccabc, but it rejects abcab. • This machine accepts (abc+)+.

  24. a start b b a 0 1 2 3 b Transition Table • The mapping  of an NFA can be represented in a transition table (0, a) = {0,1}(0, b) = {0}(1, b) = {2}(2, b) = {3}

  25. DFA • DFA is a special case of an NFA • There are no moves on input ε • For each state s and input symbol a, there is exactly one edge out of s labeled a. • Both DFA and NFA are capable of recognizing the same languages.

  26. Input An input string x terminated by an end-of-file character eof. A DFA D with start state s0, accepting states F, and transition function move. Output Answer “yes” if D accepts x; “no” otherwise. s = s0 c = nextChar(); while ( c != eof ) { s = move(s, c); c = nextChar(); } if (s is in F ) return “yes”; else return “no”; Simulating a DFA

  27. a start b b a 0 1 2 3 b b a b b 0 1 2 3 a a a S = {0,1,2,3} = {a, b}s0 = 0F = {3} NFA vs DFA (a | b)*abb

  28. The Regular Language • The regularlanguagedefinedby an NFA is the set of input strings it accepts. • Example: (ab)*abb for the example NFA • An NFA accepts an input string x if and only if • there is some path with edges labeled with symbols from x in sequence from the start state to some accepting state in the transition graph • A state transition from one state to another on the path is called a move.

  29. Theorem • The followings are equivalent • Regular Expression • NFA • DFA • Regular Language • Regular Grammar

  30. Convert Concept Regular Expression Minimization Deterministic Finite Automata Nondeterministic Finite Automata Deterministic Finite Automata

  31. N(s)  s | t   N(t) a  s t N(s) N(t)   N(s) s*  Construction of an NFA from a Regular Expression Use Thompson’s Construction ε a 

  32. r11 Example r9 r10 • ( a | b )* a b b r7 r8 b r5 r6 b r4 * a r3 ( ) r3 = r4 r1 | r2 a b

  33. a 2 3     start a b b 0 1 6 7 8 9 10   b 4 5  Example • ( a | b )* a b b

  34. Conversion of an NFA to a DFA • The subset construction algorithm converts an NFA into a DFA using the following operation.

  35. Subset Construction(1) Initially, -closure(s0) is the only state in Dstates and it is unmarked; while (there is an unmarked state T in Dstates){ mark T;for (each input symbol a   ) { U = -closure( move(T, a) );if (U is not in Dstates) add U as an unmarked state to DstatesDtran[T, a] = U } }

  36. Computing ε- closure(T)

  37. a 2 3     start a b b 0 1 6 7 8 9 10   b 4 5  Example • ( a | b )* a b b b C b a b start a b b A B D E a a a

  38. a 1 2  start a b b 3 0 4 5 6  a b  7 8 b Example • a • abb • a*b+ DstatesA = {0,1,3,7}B = {2,4,7}C = {8}D = {7}E = {5,8}F = {6,8} a 0137 247 a b 7 b b b 8 68 58 b b

  39. Minimizing the DFA • Step 1 • Start with an initial partition II with two group: F and S-F (aceepting and nonaccepting) • Step 2 • Split Procedure • Step 3 • If ( IInew = II ) IIfinal = II and continue step 4 else II = IInewand go to step 2 • Step 4 • Construct the minimum-state DFA by IIfinal group. • Delete the dead state

  40. Split Procedure Initially, let IInew = II for ( each group G of II ) { Partition G into subgroup such that two states s and t are in the same subgroup if and only if for all input symbol a, states s and t have transition on a to states in the same group of II. /* at worst, a state will be in a subgroup by itself */ replace G in IInew by the set of all subgroup formed }

  41. Example • initially, two sets {1, 2, 3, 5, 6}, {4, 7}. • {1, 2, 3, 5, 6} splits {1, 2, 5}, {3, 6} on c. • {1, 2, 5} splits {1}, {2, 5} on b.

  42. Minimizing the DFA • Major operation: partition states into equivalent classes according to • final / non-final states • transition functions ( A B C D E ) ( A B C D ) ( E ) ( A B C ) ( D ) ( E ) ( A C ) ( B ) ( D ) ( E )

  43. Important States of an NFA • The “important states” of an NFA are those without an -transition, that is • if move({s}, a) for some athen s is an important state • The subset construction algorithm uses only the important states when it determines-closure ( move(T, a) ) • Augment the regular expression r with a special end symbol # to make accepting states important: the new expression is r#

  44. Converting a RE Directly to a DFA • Construct a syntax tree for (r)# • Traverse the tree to construct functions nullable, firstpos, lastpos, and followpos • Construct DFAD by algorithm 3.62

  45. Function Computed From the Syntax Tree • nullable(n) • The subtree at node n generates languages including the empty string • firstpos(n) • The set of positions that can match the first symbol of a string generated by the subtree at node n • lastpos(n) • The set of positions that can match the last symbol of a string generated be the subtree at node n • followpos(i) • The set of positions that can follow position i in the tree

  46. Rules for Computing the Function

  47. Computing followpos for (eachnode n in the tree) { //n is a cat-node with left child c1 and right child c2 if ( n == c1.c2) for (each i in lastpos(c1) ) followpos(i) = followpos(i)  firstpos(c2);else if (n is a star-node) for ( eachi in lastpos(n) )followpos(i) = followpos(i)  firstpos(n); }

  48. Converting a RE Directly to a DFA Initialize Dstates to contain only the unmarked state firstpos(n0), where n0 is the root of syntax tree T for (r)#; while ( there is an unmarked state Sin Dstates) { mark S; for (each input symbol a  ) {let U be the union of followpos(p) for all p in S that correspond to a; if (U is not in Dstates )add U as an unmarked state to DstatesDtran[S,a] = U; } }

  49. # ○ 6 b ○ 5 n b ○ 4 a * 3 | a b 1 2 Example ( a | b )* a b b # n = ( a | b )* a nullable(n) = false firstpos(n) = { 1, 2, 3 } lastpos(n) = { 3 } followpos(1) = {1, 2, 3 }

  50. Example {1, 2, 3} {6} ( a | b )* a b b # # {6} {6} {1, 2, 3} {5} 6 b {1, 2, 3} {4} {5} {5} nullable 5 b {1, 2, 3} {3} {4} {4} 4 firstpos lastpos a {3} {3} {1, 2} * {1, 2} 3 | {1, 2} {1, 2} a b {1} {1} {2} {2} 1 2

More Related