LR(1) & LALR PARSER

LR(1) & LALR PARSER

INTRODUCTION The LALR ( Look Ahead-LR ) parsing technique is between SLR and Canonical LR, both in terms of power of parsing grammars and ease of implementation. This method is often used in practice because the tables obtained by it are considerably smaller than the Canonical LR tables, yet most common syntactic constructs of programming languages can be expressed conveniently by an LALR grammar. The same is almost true for SLR grammars, but there are a few constructs that can not be handled by SLR techniques.

CONSTRUCTING LALR PARSING TABLES A core is a set of LR (0) (SLR) items for the grammar, and an LR (1) (Canonical LR) grammar may produce more than two sets of items with the same core. The core does not contain any look ahead information. Example: Let s1 and s2 are two states in a Canonical LR grammar. S1 – {C ->c.C, c/d; C -> .cC, c/d; C -> .d, c/d} S2 – {C ->c.C, $; C -> .cC, $; C -> .d, $} These two states have the same core consisting of only the production rules without any look ahead information.

CONSTRUCTION IDEA: • Construct the set of LR (1) items. • Merge the sets with common core together as one set, if no conflict (shift-shift or shift-reduce) arises. • If a conflict arises it implies that the grammar is not LALR. • The parsing table is constructed from the collection of merged sets of items using the same algorithm for LR (1) parsing.

ALGORITHM: • Input: An augmented grammar G’. • Output: The LALR parsing table actions and goto for G’.

Method: • Construct C= {I0, I1, I2,…, In}, the collection of sets of LR(1) items. • For each core present in among the set of LR(1) items , find all sets having the core, and replace these sets by their union. • Parsing action table is constructed as for Canonical LR. • The goto table is constructed by taking the union of all sets of items having the same core. If J is the union of one or more sets of LR (1) items, that is, J=I1 U I2 U … U Ik, then the cores of goto(I1,X), goto(I2,X),…, goto(Ik, X) are the same as all of them have same core. Let K be the union of all sets of items having same core as goto(I1, X). Then goto(J,X)=K.

EXAMPLE GRAMMAR: • S’ -> S • S -> CC • C -> cC • C -> d

SET OF ITEMS: I4: C -> d., c /d I5: S -> CC., $ I6: C -> c.C, $ C -> .cC, $ C -> .d, $ I7: C -> d., $ I8: C -> cC., c /d I9: C -> cC., $ I0 : S’ -> .S, $ S -> .CC, $ C -> .c C, c /d C -> .d, c /d I1: S’ -> S., $ I2: S -> C.C, $ C -> .Cc, $ C -> .d, $ I3: C -> c. C, c /d C -> .Cc, c /d C -> .d, c /d

The goto graph:

CANONICAL PARSING TABLE

LALR PARSER Merge the Cores: • 3 & 6 • 4 & 7 • 8 & 9

LALR PARSING TABLE

SHIFT-REDUCE CONFLICT COMPARISON OF LR (1) AND LALR: • If LR (1) has shift-reduce conflict then LALR will also have it. • If LR (1) does not have shift-reduce conflict LALR will also not have it. • Any shift-reduce conflict which can be removed by LR (1) can also be removed by LALR. • For cases where there are no common cores SLR and LALR produce same parsing tables.

SHIFT-REDUCE CONFLICT LR(1) Parser- How Does a shift-Reduce conflict arises? IA : { E->α..βc, F->γ. } & FOLLOW(F)={β} For Eg. I3 : {E->L.=R , R-> L. } & Follow(E)={=} For Solving this Problem by LALR Parser: • IA={E->α.βc,ω1 ; F->γ.,δ1} • IB={E->α.βc,ω2 ; F->γ.,δ2} • IAB={E->α.βc,ω1|ω2 ; F->γ.,δ1|δ2}

SHIFT-REDUCE CONFLICT COMPARISON OF SLR AND LALR: • If SLR has shift-reduce conflict then LALR may or may not remove it. • SLR and LALR tables for a grammar always have same number of states. Hence, LALR parsing is the most suitable for parsing general programming languages.The table size is quite small as compared to LR (1) , and by carefully designing the grammar it can be made free of conflicts. For example, in a language like Pascal LALR table will have few hundred states, but a Canonical LR will have thousands of states. So it is more convenient to use an LALR parsing.

REDUCE-REDUCE CONFLICT The Reduce-Reduce conflicts still might just remain .This claim may be better comprehended if we take the example of the following grammar: • IA={ A->γ1.,δ1 ; ->γ2.,δ2} If δ1≠δ2 then NO reduce-reduce conflict. (mutual exclusive sets) • IA={ A->γ1.,δ1 ; ->γ2.,δ2} • IB={ A->γ1.,δ3 ; B->γ2.,δ4} • IAB={ A->γ1.,δ1|δ3 ; B->γ2.,δ2|δ4} Conflicts arises when δ1=δ4 or δ2=δ3.

Example of R-R Conflict • S'-> S • S -> aAd • S -> bBd • S -> aBe • S -> bAe • A -> c • B -> c

Generating the LR (1) items for the above grammar • I0 : S’-> .S , $ • S-> . aAd, $ • S-> . bBd, $ • S-> . aBe, $ • S-> . bAe, $ • I1: S’-> S ., $ • I2: S-> a . Ad, $ • S-> a . Be, $ • A-> .c, d • B->.c, e • I3: S-> b . Bd, $ • S-> b . Ae, $ • A->.c, e • B->.c,d • I4: S->aA.d, $

Generating the LR (1) items for the above grammar • I5: S-> aB.e,$ • I6: A->c. , d ; B->c. , e • I7: S->bB.d, $ • I8: S->bA.e, $ • I9: B->c. ,d ; A->c. , e • I10: S->aAd. , $ • I11: S->aBe., $ • I12: S->bBd., $ • I13: S->aBe., $ • The underlined items are of our interest. We see that when we make the Parsing table for LR (1), we will get something like this…

The LR (1) Parsing Table

The LALR Parsing table LR(1) Parsing table on reduction to the LALR parsing table

Conclusion • So, we find that the LALR gains reduce-reduce conflict whereas the corresponding LR (1) counterpart was void of it. This is a proof enough that LALR is less potent than LR (1). • But, since we have already proved that the LALR is void of shift-reduce conflicts (given that the corresponding LR(1) is devoid of the same), whereas SLR (or LR (0)) is not necessarily void of shift-reduce conflict, the LALR grammar is more potent than the SLR grammar

Conclusion SHIFT-REDUCE CONFLICT present in SLR (Some of them are solved in….) LR (1)  (All those solved are preserved in…) LALR So, we have answered all the queries on LALR that we raised intuitively.

Shift-Reduce Parsers • Reviewing some technologies: • Phrase • Simple phrase • Handle of a sentential form S A sentential form  b C b a C A b C handle Simple phrase  b C a C

Shift-reduce parser • A parse stack • Initially empty, contains symbols already parsed • Elements in the stack are not terminal or nonterminal symbols • The parse stack catenated with the remaining input always represents a right sentential form • Tokens are shifted onto the stack until the top of the stack contains the handle of the sentential form

Shift-reduce parser • Two questions • Have we reached the end of handles and how long is the handle? • Which nonterminal does the handle reduce to? • We use tables to answer the questions • ACTION table • GOTO table

Shift-reduce parser • LR parsers are driven by two tables: • Action table, which specifies the actions to take • Shift, reduce, accept or error • Goto table, which specifies state transition • We push states, rather than symbols onto the stack • Each state represents the possible subtree of the parse tree

Shift-reduce parser

<program> begin <stmts> R 2 end $ SimpleStmt ; <stmts> R 2 SimpleStmt ; <stmts> R 4 

LR Parsers • LR(1): • left-to-right scanning • rightmost derivation(reverse) • 1-token lookahead • LR parsers are deterministic • no backup or retry parsing actions • LR(k) parsers • decide the next action by examining the tokens already shifted and at most k lookahead tokens • the most powerful of deterministic bottom-up parsers with at most k lookahead tokens.

LR(0) Parsing • A production has the form • AX1X2…Xj • By adding a dot, we get a configuration (or an item) • A•X1X2…Xj • AX1X2…Xi• Xi+1 … Xj • AX1X2…Xj • • The • indicates how much of a RHS has been shifted into the stack.

LR(0) Parsing • An item with the• at the end of the RHS • AX1X2…Xj • • indicates (or recognized) that RHS should be reduced to LHS • An item with the • at the beginning of RHS • A•X1X2…Xj • predicts that RHS will be shifted into the stack

LR(0) Parsing • An LR(0) state is a set of configurations • This means that the actual state of LR(0) parsers is denoted by one of the items. • The closure0 operation: • if there is an configuration B • A in the set then add all configurations of the form A •  to the set. • The initial configuration • s0 = closure0({S •  $})

LR(0) Parsing

LR(0) Parsing • Given a configuration sets, we can compute its successor, s', under a symbol X • Denoted go_to0(s,X)=s'

LR(0) Parsing • Characteristic finite state machine (CFSM) • It is a finite automaton, p.148, para. 2. • Identifying configuration sets and successor operation with CFSM states and transitions

LR(0) Parsing • For example, given grammar G2 S'S$ SID|

LR(0) Parsing • CFSM is the goto table of LR(0) parsers.

LR(0) Parsing • Because LR(0) uses no lookahead, we must extract the action function directly from the configuration sets of CFSM • Let Q={Shift, Reduce1, Reduce2 , …, Reducen} • There are n productions in the CFG • S0 be the set of CFSM states • P:S02Q • P(s)={Reducei | B •  s and production i is B }  (if A • a  sfor a Vt Then {Shift} Else )

LR(0) Parsing • G is LR(0) if and only if  s  S0 |P(s)|=1 • If G is LR(0), the action table is trivially extracted from P • P(s)={Shift}  action[s]=Shift • P(s)={Reducej}, where production j is the augmenting production,  action[s]=Accept • P(s)={Reducei}, ij, action[s]=Reducei • P(s)=  action[s]=Error

Consider G1 • SE$ • EE+T | T • TID|(E) CFSM for G1 

LR(0) Parsing • Any state s  S0 for which |P(s)|>1 is said to be inadequate • Two kinds of parser conflicts create inadequacies in configuration sets • Shift-reduce conflicts • Reduce-reduce conflicts

LR(0) Parsing • It is easy to introduce inadequacies in CFSM states • Hence, few real grammars are LR(0). For example, • Consider -productions • The only possible configuration involving a -production is of the form A • • However, if A can generate any terminal string other than , then a shift action must also be possible (First(A)) • LR(0) parser will have problems in handling operator precedence properly

LR(1) Parsing • An LR(1) configuration, or item is of the form • AX1X2…Xi• Xi+1 … Xj, l where l Vt{} • The look ahead commponent l represents a possible lookahead after the entire right-hand side has been matched • The  appears as lookahead only for the augmenting production because there is no lookahead after the endmarker

LR(1) Parsing • We use the following notation to represent the set of LR(1) configurations that shared the same dotted production AX1X2…Xi• Xi+1 … Xj, {l1…lm} ={AX1X2…Xi• Xi+1 … Xj, l1}  {AX1X2…Xi• Xi+1 … Xj, l2}  … {AX1X2…Xi• Xi+1 … Xj, lm}

LR(1) Parsing • There are many more distinct LR(1) configurations than LR(0) configurations. • In fact, the major difficulty with LR(1) parsers is not their power but rather finding ways to represent them in storage-efficient ways.

LR(1) & LALR PARSER

LR(1) & LALR PARSER

Presentation Transcript

Principles of Programming Languages

Compiler Design 11. Table-Driven Bottom-Up Parsing: LALR More Examples for LR0, SLR, LR1, LALR

Syntax Analysis - LR(1) and LALR(1) Parsing

LL(1) Parser

使用微軟工具來進行 IIS 6.0 除錯與資料探勘

Compiler Design 5 . Top-Down Parsing with a Recursive Descent Parser

Assignments

Top Down Parser

Implementando un Parser LR

LR parsing techniques

CUP

Lab 3: Using ML-Yacc

A brief yacc tutorial

Parsers

Bottom-Up Parser

Creating a Bottom-Up Parser Automatically

指导教师 : 杨建国

SAX and more…

LALR(1) 方法

Constraint based Dependency Telugu Parser

CPSC 325 - Compiler

LR(1) &amp; LALR PARSER

LR(1) &amp; LALR PARSER

Presentation Transcript

Principles of Programming Languages

Compiler Design 11. Table-Driven Bottom-Up Parsing: LALR More Examples for LR0, SLR, LR1, LALR

Syntax Analysis - LR(1) and LALR(1) Parsing

LL(1) Parser

使用微軟工具來進行 IIS 6.0 除錯與資料探勘

Compiler Design 5 . Top-Down Parsing with a Recursive Descent Parser

Assignments

Top Down Parser

Implementando un Parser LR

LR parsing techniques

CUP

Lab 3: Using ML-Yacc

A brief yacc tutorial

Parsers

Bottom-Up Parser

Creating a Bottom-Up Parser Automatically

指导教师 : 杨建国

SAX and more…

LALR(1) 方法

Constraint based Dependency Telugu Parser

CPSC 325 - Compiler

LR(1) & LALR PARSER

LR(1) & LALR PARSER