Introduction to LR Parsing: Simple LR Overview

Chapter 4 Syntax Analysis

Content • Overview of this chapter • 4.1 Introduction • 4.2 Context-Free Grammars • 4.3 Writing a Grammar • 4.4 Top-Down Parsing • 4.5 Bottom-Up Parsing • 4.6 Introduction to LR Parsing: Simple LR • 4.7 More Powerful LR Parsers • 4.8 Using Ambiguous Grammars • 4.9 Parser Generators

4.6 Introduction to LR Parsing: Simple LR

4.6 Introduction to LR Parsing: Simple LR LR(k) parsing • “L”: left-to-right scanning • “R”: rightmost derivation in reverse • “k”: the number of input symbols of lookahead • Only consider k<=1 here In this section, we introduce • Basic concepts of LR parsing • "simple LR“ (SLR)

4.6.1 Why LR Parsers? Why LR parsers? • Recognize virtually all programming languages for which context-free grammars can be written 2. The most general nonbacktracking shift-reduce parsing method known 3. Can detect a syntactic error as soon as possible 4. Can describe more languages than LL grammars Principal drawback Too much work by hand, an LR parser generator is needed---Yacc

4.6.2 Items and the LR(0) Automaton • LR(0) item (item): A production of G with a dot at some position of the body. e.g. (1) Production A->XYZ yields the four items: A->.XYZ A->X.YZ A->XY.Z A->XYZ. (2) A->Є generates only one item: A-> . • State: Represent sets of "items"

4.6.2 Items and the LR(0) Automaton • Construct canonical LR(0) collection 1.Augmented grammar: G’, add a new start symbol S’ and production S' -> S 2.Two functions: CLOSURE and GOTO • Construct CLOSURE(I) 1. Initially, add every item in I to CLOSURE(I) 2. If A->α.Bβis in CLOSURE(I) and B->γ is a production, then add item B->. γ to CLOSURE(I) 3. Apply this rule until no more new items can be added to CLOSURE(I)

4.6.2 Items and the LR(0) Automaton e.g. Consider the augmented expression grammar E’ -> E E -> E + T | T T -> T * F | F F -> (E) | id I={[E’->.E]}, then CLOSURE(I): E’->.E E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id

4.6.2 Items and the LR(0) Automaton • Two classes of items: 1. Kernel items: initial item, S‘->.S, and all items whose dots are not at the left 2. Nonkernel items: all items with their dots at the left end, except for S‘->.S • Function GOTO: GOTO(I,X):The closure of the set of all items [A-> αX.β] such that [A-> αX.β] is in I e.g. I= {[E‘->E.], [E->E.+T]} GOTO(I,+) contains: E->E+.T T->.T*F T->.F F->.(E) F->.id

4.6.2 Items and the LR(0) Automaton • LR(0) automaton for grammar 4.1

4.6.2 Items and the LR(0) Automaton • Use LR(0) automaton to parse id*id

4.6.3 The LR-Parsing Algorithm • LR paser: • Structure of the LR Parsing Table 1. ACTION function: takes a state i and a terminal a as arguments ACTION[i, a] can have one of four forms:

4.6.3 The LR-Parsing Algorithm (a) Shift j, j is a state (b) Recuce A->β (c) Accept (d) Error 2. GOTO function: GOTO[Ii,A]=Ij • LR-Parser Configurations: A configuration of an LR parser is a pair: (stack & remaining input) This configuration represents form:

4.6.3 The LR-Parsing Algorithm • Behavior of LR Parser 1. If ACTION[Sm,ai]=shift S: shift move 2. If ACTION[Sm,ai]= reduce A->β:reduce move r is the length of , and S = GOTO[Sm-r , A]. 3. If ACTION[Sm, ai]= accept: complete 4. If ACTION[Sm, ai]= error: discovered an error and calls an error recovery routine.

4.6.3 The LR-Parsing Algorithm LR-parsing program

4.6.3 The LR-Parsing Algorithm • LR-parsing table for the expression grammar 4.1 • The only difference between one LR parser and another is the information in the ACTION and GOT0 fields of the parsing table

4.6.4 Constructing SLR-Parsing Tables • Constructing an SLR-parsing table 1. Construct C = {I0, I1, . . . ,In }, the collection of sets of LR(0) items for G’ 2. State i is constructed from Ii • If [A->α.aβ] is in Ii, and GOTO(Ii,a)=Ij, then set ACTION[i,a] to "shift j” • If [A->α.]is in Ii,, then set ACTION[i, a] to "reduce A->α" for all a in FOLLOW(A) • If [S’->S.] is in Ii,, then set ACTION[i, $] to "accept“ 3. If GOTO(Ii, A) = Ij, then GOTO[i, A] = j 4. All entries not defined by rules (2) and (3) are made "error“ 5. The initial state of the parser is the one containing [S’->.S] (if any conflicts happens, then the grammar is not SLR)

4.6.4 Constructing SLR-Parsing Tables • Every SLR(1) grammar is unambiguous, but there are many unambiguous grammars that are not SLR(1)(Textbook: Example 4.48)

4.6.5 Viable Prefixes • Stack contents must be a prefix of a right-sentential form • Not allprefixes of right-sentential forms can appear on the stack • viable prefixes: The prefixes of right sentential forms that can appear on the stack of a shift-reduce parser a viable prefix is a prefix of a right-sentential form that does not continue past the right end of the rightmost handle of that sentential form. • SLR parsing is based on the fact: LR(0) automata recognize viable prefixes

4.6.5 Viable Prefixes item is valid for a viable prefix if there is a derivation . In general, an item will be valid for many viable prefixes.

4.7 More Powerful LR Parsers

4.7 More Powerful LR Parsers In this section, we Extend the previous LR parsing techniques : ----Use one symbol of lookahead • "canonical-LR" or "LR" method • "lookahead-LR" or "LALR" method

4.7.1 Canonical LR(1) Items • LR(1) item: [A->α.β, a], where A->αβ is a production and a is a terminal or $. Notices: (1) The lookahead has no effect on the form [A->α.β ,a], if β is not Є. (2) [A->α. ,a,] calls for a reduction by A->α only if the next input symbol is a.

4.7.1 Canonical LR(1) Items • LR(1) item [A->α.β,a] is valid for a viable prefix γ if there is a derivation , where 1. ,and 2. Either a is the first symbol of w, or w is Є and a is $

4.7.2 Constructing LR(1) Sets of Items • Building LR(1) items: 1. Essentially the same as LR(0) 2. Only to modify CLOSURE and GOTO

4.7.2 Constructing LR(1) Sets of Items Building LR(1) items: 1. Essentially the same as LR(0) 2. Only to modify CLOSURE and GOTO

4.7.2 Constructing LR(1) Sets of Items S’-> S S-> C C C->c C | d • Example: Consider the following grammar I0: S'->.S, $ S->.CC, $ C->.cC, c/d C->.d, c/d I1: S’->S., $ I2: S->C.C, $ C->.cC, $ C->.d, $ I3: C->c.C, c/d C->.cC, c/d C->.d, c/d I4: C->d., c/d I5: S->CC., $ I6: C->c.C, $ C->.cC, $ C->.d, $ I7: C->d., $ I8: C->cC., c/d I9: C->cC., $

4.7.2 Constructing LR(1) Sets of Items

4.7.3 Canonical LR(1) Parsing Tables • Constructing a canonical-LR table 1. Construct C’ = {I0, I1, . . . ,In}, the collection of sets of LR(1) items for G’ 2. State i is constructed from Ii (a) If [A->α.aβ,b] is in Ii, and GOTO(Ii,a)=Ij, then set ACTION[i,a]to "shift j” (b) If [A->α.,a]is in Ii, then set ACTION[i, a] to "reduce A->α" (c) If [S’->S.,$] is in Ii, then set ACTION[i, $]to "accept“ 3. If GOTO(Ii, A) = Ij, then GO TO[i, A] = j 4. All entries not defined by rules (2) and (3) are made "error“ 5. The initial state of the parser is the one containing [S’->.S,$] (if any conflicts happens, then the grammar is not LR1)

4.7.3 Canonical LR(1) Parsing Tables S’-> S S-> C C C->c C | d

4.7.4 Constructing LALR Parsing Tables • LALR grammar: • Merge LR(1) states whose items have the same core • LALR Tables are considerably smaller than LR tables • Have no effect on GOTO function • Never shift/reduce conflict • May produce reduce/reduce conflict

4.7.4 Constructing LALR Parsing Tables • An easy, but space-consuming LALR table construction 1. Construct C = {I0, I1, . . . ,In}, the collection of sets of LR(1) items 2. For all LR(1) items, find all sets having the same core, and replace these sets by their union 3. Construct parsing actions, If there is a parsing action conflict, then the grammar is not LALR(1) 4. Construct GOTO table

4.7.4 Constructing LALR Parsing Tables

4.7.5 Efficient Construction of LALR Parsing Tables • Modifications of Algorithm 4.59: 1. Represent any set of LR(0) or LR(1) items by its kernel 2. Construct the LALR(1)-item kernels from the LR(0) item kernels 3. Generate the LALR(1) parsing table by closing each kernel

4.7.6 Compaction of LR Parsing Tables • Create a pointer for each state into a one-dimensional array save considerable space • Create a list for the actions of each state The list consists of (terminal-symbol, action) pairs. The most frequent action for a state can be placed at the end of the list, and in place of a terminal we may use the notation "any, "

4.7.6 Compaction of LR Parsing Tables

第四次作业 • 4.6.5 • 4.6.6 • 4.7.1

The end of Lecture06

Introduction to LR Parsing: Simple LR Overview

Introduction to LR Parsing: Simple LR Overview

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4