컴파일러 입문

컴파일러 입문 제 8 장 LR 구문 분석

목 차 I. LR Parsers II. The Canonical Collection ofLR(0)Items III. Construction of LR Parsing Tables III.1 SLRMethod III.2 CLRMethod III.3LALRMethod IV. Deterministic Parsing ofAmbiguousGrammars V.Compactionof LR Parsing Tables VI. Implementation of an LR Parser

. LR Parsers • efficient Bottom-up parsers for a large and useful class of context-free grammars. • the "L" stands forleft-to-right scan of the input; the "R" for constructing a Rightmost derivation in reverse. • The attractive reasons of LR parsers (1) LR parsers can be constructed formost programming languages. (2) LR parsing method is moregeneral than LL parsing method. (3) LR parsers can detect syntactic errors as soon as possible. But, • it is too much work to implement an LR parser byhand for a typical programming-language grammar. =====>  ParserGenerator

PGS Driver Routine Parsing Table • Parser Generating Systems GrammarParsing Table <BNF Notations> Input Output * The driver routine is the same for all LR parsers; only the parsing table changes from one parser to another.

The techniques for producing LR parsing tables • Simple LR(SLR) - LR(0) items, FOLLOW • Canonical LR(CLR) - LR(1) items • Lookahead LR(LALR) - LR(1) items LR(0), Lookahead ` CLR LALR SLR

a1 ••• ai ••• an $ Sm • LR parser • stack : S0X1S1X2 ••• XmSm,where Si : state and Xi V. • Configuration of an LR parser : (S0X1S1 •••XmSm, aiai+1••• an$) stack contents unscanned input : input Parsing Table Driver Routine stack

Parsing Table(ACTION table + GOTO table) ACTION Table GOTO Table • The LR parsing algorithm ::= same as theshift-reduce parsing algorithm.  Four Actions : 1. shift 2. reduce 3. accept 4. error symbol states <Terminals> <Nonterminals> ••• ••• •••

1. ACTION[Sm,ai] = shift S ::= (S0X1S1  XmSm, aiai+1 an$)  (S0X1S1 XmSmaiS, ai+1 an$) 2. ACTION[Sm,ai] =reduce A  and || = r ::= (S0X1S1 XmSm, aiai+1 an$) (S0X1S1 Xm-rSm-r, aiai+1 an$), GOTO(Sm-r , A) = S (S0X1S1 Xm-rSm-rAS, aiai+1 an$) 3. ACTION [Sm,ai] = accept, parsing is completed. 4. ACTION [Sm,ai] = error, the parser has discovered an error and calls an errorrecovery routine.

ex) G : 1. LIST  LIST , ELEMENT 2. LIST  ELEMENT 3. ELEMENT  a Parsing Table : where, sj means shiftand stack state j, ri meansreduce by production numbered i, acc means accept, and blank means error. symbols states a , $ LIST ELEMENT 0 s3 1 2 1 s4 acc 2 r2 r2 3 r3 r3 4 s3 5 5 r1 r1

0 a,a$ s3 0 a 3 ,a$ r3 GOTO 2 0 ELEMENT 2 ,a$ r2 GOTO 1 0 LIST 1 ,a$ s4 0 LIST 1, 4 a$ s3 0 LIST 1, 4 a 3 $ r3 GOTO 5 0 LIST 1, 4 ELEMENT 5 $ r1 GOTO 1 0 LIST 1 $ accept Input :  = a, a STACK INPUT ACTION initial configuration

. The Canonical Collection of LR(0) Items • The methodfor constructing an LR parsing table fromagrammar ① SLR LR(0) items ② LALR ③ CLR • Definition : an LR(0) item • a production with a dot at some position of the right side. ex) A  XYZ  P, [A  .XYZ] [A  X.YZ] [A  XY.Z] [A  XYZ.] mark symbol ::= the symbol after the dot if it exists. kernel item ::= [A α.] if α, A = S'. closure item ::= [A .α] : the result of performing the CLOSURE operation.

[Aα.β] means that • an input string derivable from α has just been seen, • if next seeing an input string derivable from β, we may be able toreduce by the production A  αβ. • Definition : Augmented Grammar G = (VN, VT, P, S)  G' = (VN  {S’},VT, P  {S'  S}, S') where, S' is a new start symbol not in VN. • The purposeof this new starting production is to indicate to the parser when it should stop parsing and announce acceptance of the input. That is,acceptance occurs when and only when the parser is about to reduce by S'  S.

If S αAω  αβ1β2ω, then αβ1 :viable prefix. rmrm "viable prefix is a prefix of a right sentential form that does not continue past the right end of the handle of that sentential form." • We say item [Aβ1.β2] is valid for a viable prefix if there is a derivation S αAω  αβ1β2ω, rmrm "In general, an item will bevalid for many viable prefixes." • Canonical collection of LR(0) items ::= the set of valid items for each viable prefix that can appear on the stack of an LR parser. • Computation : CLOSURE &GOTOfunction * *

The CLOSURE operation • Definition : CLOSURE(I) = I  {[B . ] | [A .B]  CLOSURE(I), B   P} • Meaning : [A .B] in CLOSURE(I) indicates that, at some point in the parsing process, we next expect to see a substring derivable fromB as input. If B  is a production, we would also expect to see a substring from  at this point. For this reason, we also include [B . ] in CLOSURE(I).

Computing Algorithm: Algorithm CLOUSURE(I) ; begin CLOUSURE := I ; repeat if [A .B]  CLOSURE andB P then if [B .] CLOSURE then CLOSURE := CLOSURE ∪ {[B  .]} fi fi until no change end.

ex) E'  E E  E + T | T T  T  F | F F  (E) | id  CLOSURE ({[E' .E]}) = {[E' .E], [E .E+T], [E .T], [T .TF], [T .F], [F .(E)], [F .id]}.  CLOSURE({[E  E.+T]}) = { [E  E.+T] }. ex) S  AS | b A  SA | a  CLOSURE({[S  A.S]}) = {[S  A.S], [S .AS], [S .b], [A .SA], [A .a]}.

The GOTO operation • Definition : GOTO(I,X) = CLOSURE({[A  X. ] | [A .X]  I}). • Meaning : If I is the set of items that are valid for some viable prefix , then GOTO(I,X) is theset of items that arevalid for the viable prefix X. ex) I = {[E'  E.], [E  E.+T]} GOTO(I,+) = CLOSURE({[E  E+.T]}) = {[E  E+.T], [T .TF], [T .F], [F .(E)], [F .id]} • Canonical Collection C0 = {CLOSURE ({[S' .S]})} ∪ {GOTO(I,X) | I ∈ C0, X ∈ V} • We are now ready to give the algorithm to construct C0, the canonical collection of sets of LR(0) items for an augmented grammar; the algorithm is the following:

Construction algorithm of C0. Algorithm Canonical_Collection; begin C0 := { CLOSURE({[S' . S]}) }; repeat for I ∈ C0 do Closure := CLOSURE(I); for each X ∈ MARK SYMBOL of Closure do J := GOTO(I,X); if Ji = J then GOTO[I,X] := Ji else GOTO[I,X] := J; C0 := C0 ∪ {J} fi end for end for until no change end.

ex) G : LIST  LIST , ELEMENT LIST  ELEMENT ELEMENT  a  Augmented Grammar G' : ACCEPT LIST LIST  LIST , ELEMENT LIST  ELEMENT ELEMENT a

Co : • I0 : CLOSURE({[ACCEPT .LIST]}) = {[ACCEPT .LIST], [LIST .LIST,ELEMEMT], [LIST .ELEMENT], [ELEMENT .a]}. • GOTO(I0,LIST) = I1 = {[ACCEPT  LIST.], [LIST  LIST.,ELEMEMT]}. • GOTO(I0,ELEMENT) = I2 = {[LIST  ELEMENT.]}. • GOTO(I0,a) = I3 = {[ELEMENT  a.]}. • GOTO(I1,,) = I4 = {[LIST  LIST,.ELEMEMT], [ELEMENT .a]}. • GOTO(I4,ELEMENT) = I5 = {[LIST  LIST,ELEMEMT.]}. • GOTO(I4,a) = I3.

GOTO graph ::= a directed graph in which thenodes arelabeled by the sets of items and the edges by grammar symbol. Ex) I1 LIST , ELEMENT I0 ELEMENT I2 I4 I5 a a I3

I1 [P'P.] I4 I3 I2 [D d.;D] [D d.] [P bD.;Se] [P b.D;Se] [D .d;D] [D .d] I5 I6 [P bD;.Se] [S .s;S] [S .s] [D d;.D] [D .d;D] [D .d] I8 [S s.;S] [S s.] I7 I9 [P bD;S.e] I11 [D d;D.] I8 [S .s;.S] [S .s;S] [S .s] I12 [P bD;Se.] [S s;S.] • C0 : I0 P [P'.P] [P .bD;Se] b D d ; ; d s S D ; s e S

III. Construction of LR Parsing Tables • Three methods • SLR(simple LR) - C0, Follow • CLR(Canonical LR) - C1 • LALR(Lookahead LR)  C1  C0. Lookahead • Parsing Table Action Table GOTO Table symbols states VT {$} VN 0 1 2 3 : Shift reduce accept error GOTO

State i is constructed from Ii, where Ii ∈ C0. • The size of parsing table depends on the number of states of C0. But, |C0| << |C1|. SLR: |V| * |C0| CLR: |V| * |C1| LALR : |V| * |C0|

III.1 Constructing an SLR parsing table ::= The method constructing the SLR parsing table from the C0. • Constructing Algorithm: C0 = {I0,I1,I2,...,In} 1. ACTION[i,a] := "shift j" if [A .a ] ∈ Ii and GOTO(Ii,a) = Ij. 2. ACTION[i,a] := "reduce A  α", for all a∈FOLLOW(A) if [A .] ∈ Ii . 3. ACTION[i,$] := "accept" if [S'  S.] ∈ Ii . 4. GOTO[i,A] := j if GOTO(Ii, A) = Ij. 5. "error" for allundefined entries and initial state is i if [S' .S] ∈ Ii . reduce item에 대해 FOLLOW를 사용하여 resolve.

ex) G : 0. A  L (A : ACCEPT, L : LIST, E : ELEMENT) 1. L  L , E 2. L  E 3. E  a • FOLLOW(A) = {$} • FOLLOW(L) = {,,$} • FOLLOW(E) = {,,$} I0 [A .L] [L .L,E] [L .E] [E .a] L a E I1 I3 I2 [A L.] [L L.,E] [L E.] [E a.] I4 , [L L,.E] [E .a] a E I5 [L L,E.]

a , $ L E Symbols states I0 s3 1 2 I1 s4 acc I2 r2 r2 I3 r3 r3 I4 s3 5 I5 r1 r1 • Parsing Table : Action Table GOTOTable

ex) G: 1. S  L = R 2. S  R 4. L  id 3. L  R 5. R  L • C0 : I0 I1 I5 S [S.S] [S .L=R] [S .R] [L .R] [L .id] [R .L] id [SS.] [L id.] I2  [S L.=R] [R L.] L I4 id R = [S .R] [R .L] [L .R] [L .id] I3 I6  I7 [S R.] [S L=.R] [R .L] [L .R] [L .id] R  [L R.] id R I8 I9 L [R L.] [S L=R.]

Consider I2 :  ACTION[2,=] := "shift 6 "  ACTION[2,=] := "reduce RL " (∵ = ∈ FOLLOW(R))  shift/reduce conflict Not SLR(1)

III.2 Constructing CLR Parsing Tables • In the SLR method, if [A .]  Ii, then M[i,a] := reduce A  for all aFOLLOW(A). But in some situations, a cannot be a follow symbol of A inState i. Thus, the reduction by A  would be invalid on a in that state. To solve this problem, we must carrymore information that will allow us to ruleoutsome of these invalid reductions by A .This is called thelookahead of the item that is a state-dependent FOLLOW symbol.

LR(1) item ::= LR(0) +lookahead information • form : [A .,a]. where A    P and a  VT  {$}. 1. One in LR(1) is thelengthof the lookahead. 2. A . is calledcore. 3. a is called thelookaheadof the item. • The lookahead has no effect in an item of the form [A .,a], where   , but an item of the form [A .,a] calls for a reduction by A  onlyif the next symbol is a. • The method for constructing the collection of sets of valid LR(1) item is essentially the same as C0construction except CLOSURE operation.

CLOSURE operation of LR(1) item: CLOSURE(I) = I  {[B .,b]|[A .B,a]  CLOSURE(I), B  P, b FIRST(a)}. ex) G : S'  S S  CC C  cC C  d CLOSURE({[S' .S,$]}) = {[S' .S,$], [S .CC,$], [C .cC,c/d], [C .d,c/d]}. • We use the notation [C .cC,c/d] as ashorthand for the two items [C .cC,c] and [C .cC,d]. • CLOSURE({[A .B,a]}) = {[A .B,a]}  {[B  .,b] | b FIRST(a)}.

I1 I5 I0 [S'  S.,$] [S'  CC.,$] [S.S,$] [S .CC,$] [C .cC,c/d] [C .d,c/d] I2 I6 [S C.C,$] [C .cC,$] [C .d,$] [C c.C,$] [C .cC,$] [C .d,$] I3 I7 [C c.C,c/d] [C .cC,c/d] [C .d,c/d] [C d.,$] I4 [C d.,c/d.] I8 I9 [C cC.,c/d.] [C  cC.,$] ex) • I6differs from I3 only in second components. S C C c c d c d d c C d C

Construction of CLR parsing table ::= same as SLR except that ACTION[i,a] := reduce A  if [A  .,a]  Ii. ex) G : S  L = R | R G' : 0) S'  S augmented L  R | id =========> 1) S  L = R R  L 2) S  R 3) L R 4) L  id 5) R  L • C1 : • I0S = I1 : [S'  S.,$] • I0L = I2 : [S  L.=R,$] • [R  L.,$] • I0R = I3 : [S  R.,$] • I0 = I4 : [L .R,=] • [R .L,=] • [L .R,=] • [L .id,=] • I0id = I5 : [L  id.,=] • I0 : [S' .S,$] • [S  .L=R,$] • [S  .R,$] • [L  .R,=] • [L  .id,=] • [R  .L,$]

I2= = I6 : [S  L=.R,$] • [R .L,$] • [L .R,$] • [L .id,$] • I6 R = I9 : [S  L=R.,$] • I6 L = I10 : [R  L.,$] • I6 = I11 : [L  .R,$] • [R .L,$] • [L .R,$] • [L .id,$] • I6 id= I12 : [L  id.,$] • I4 R = I7 : [L  *R.,=] • I4 L = I8 : [R  L.,=] • I4 = I4 • I4 id = I5 • I11 R = I13 : [L  R.,$] • I11 L = I10 • I11 = I11 • I11 id = I12

Action Table Goto Table symbols states $ =  id S L R 0 s4 s5 1 2 3 1 2 r5 s6 3 r2 4 s5 8 7 5 r4 6 s12 10 9 7 r3 8 r5 9 r1 10 r5 11 s12 10 13 s11 12 r4 • Parsing Table acc s11

LR(1) Parsing ( S0, id = id $ ) S4 ===> ( S0 S4, id = id $ ) S5 ===> ( S0 S4 id S5, = id $ ) r4,Goto8 ===> ( S0 S4 L S8, = id $ ) r5,Goto7 ===> ( S0 S4 R S7, = id $ ) r3,Goto7 ===> ( S0L S2, = id $ ) S6 ===> ( S0L S2 = S6, id $ ) ...

III.3 Constructing LALR Parsing Tables • Two methods •  C1merge •  C0, lookahead • The C1 method • LR(1) item : [A  ., a ] corelookahead • The general idea of the algorithm is to construct C1 and if no conflicts arise, merge sets with common cores. • In general, a core is a set of LR(0) item for the grammar at hand. Thus SLR andLALRtables for a grammar always have the same number of states.

ex) I3 + I6I36: {[Cc.C,c/d/$],[C.cC,c/d/$],[C.d,c/d/$]}. I4 + I7I47: {[C  d.,c/d/$]}. I8 + I9I89: {[C  cC.,c/d/$]}. • Parsing table Action Table Goto Table symbols states c d S C $ 0 s36 s47 1 2 1 acc 2 s36 s47 5 3 6 s36 s47 8 9 4 7 r3 r3 r3 5 r1 8 9 r2 r2 r2

Themerging of states withcommon cores can never produce a shift-reduce conflict that was not present in one of the original states, because shift actions depend on thecore , not the lookahead. It is possible, however, that a merger will produce a reduce-reduce conflict. • shift/reduce conflict : can not decide whether to shift or to reduce reduce/reduce conflict : can not decide which of several reductions to make.

The C0 method  complex but smaller time & space. • the C1 method :simple but time & space consuming method. • references : 1. Korenjak, A.J. [1969]. "A Pratical Method for Constructing LR(k) Processors," CACM 12:11, pp.613-623. 2. DeRemer, F.L. [1969]. Practical Translators for LR(k) Languages, Ph.D dissertation, MIT. 3. DeRemer, F.L. and T.J. Pennello [1982]. "Efficient Computation of LALR(1) Look-Ahead Sets," ACM TOPLAS, 4:4, PP.615-649. • C0, lookahead

Efficient Computation of Lookahead Sets • Definition : LA(p, [A  . ]) = {a | a  FIRST(), S' A,  accesses p}. where " accesses p" means that starting from the start state the scanning of the string  will result in a sequence of state transitions, the last of which is state p. • Computing formula : LA(p, [A   .]) =   FIRST(2)  LA(q, [B 1.A2]). qPRED(P,) [B1.A2]q • PRED(p, ) = {q | p GOTO(q, )}. *

Computing Lookahead Sets byRecursive Calls. function LALR(p:state; I : item) : set of VT ; assume I = [A  .]; LALR := {}; if A <> S' then for q PRED(p, ) do for [B 1.A 2] q do LALR := LALR  FIRST(2); if  FIRST(2) and MAP(q, [B 1.A 2]) then LALR := LALR LALR(q, [B 1.A 2]) fi end for end for end function • lookahead of augmented rule: LA(I0,[S' .S]) = {$}.

I1 [S S.] I9 I6 I2 [S  L=R.] [S L.=R] [R L.] [L .*R] [L .id] [S L.=R] [R L.] I0 [S.S] [S .L=R] [S .R] [R .L] [L .*R] [L .id] I3 [S R.] I4 I7 [L *.R] [R .L] [L .*R] [L .id] [L *R.] I8 교과서 295쪽 [예 13] [R  L.] I5 [L id.] ex) S R = L . R * L * id R . . * L . id id

LA ( I2, [RL.] ) = FIRST()  LA ( I0, [S.R] ) = LA ( I0, [S.R] ) = FIRST()  LA ( I0, [S'.S] ) = {$} • LA ( I5, [L id.] ) • Construction of LALR parsing tables • same as SLR methodexcept that ACTION[p,a] := reduce A  for all a∈LA(p,[A .]).

VI. Deterministic Parsing of Ambiguous Grammars • Reference : Aho, A.V. and Johnson, S.C., and Ullman, J.D. "Deterministic Parsing of Ambiguous Grammars," Comm. ACM 18:8, pp.441-452. • Every ambiguous grammar fails to be LR. So ambiguous grammars alwaysarise the conflicts, shift-reduce or reduce-reduce. But some ambiguous grammars are quiteuseful in the specification of languages. And also they can reduce the speed of a parser.

shift-reduce conflict --- can not decide whether to shift or to reduce. reduce-reduce conflict --- can not decide which of several reductions to make.  These conflicts can be resolved using the precedence andassociativity information.  Precedence : higher shift lower reduce  Associativity : left reduce right shift

I0 I1 [E.E] [E .E+E] [E .EE] [E .(E)] [E .id] [EE.] [E E.+E] [E E.E] I4 I3 [E E+.E] [E .E+E] [E .EE] [E .(E)] [E .id] I7 [E id.] [E E+E.] [E E.+E] [E E.E] I2 [E (.E)] [E .E+E] [E .EE] [E .(E)] [E .id] I8 I5 [E EE.] [E E.+E] [E E.E] [E .EE] [E .E+E] [E .EE] [E .(E)] [E .id] I6 [E (E.)] [E E.+E] [E E.E] I9 [E (E).] Ex) E  E + E | E  E | (E) | id E * ( id id + E id ( * ( + id E ( E * * ) + • I7,I8 : 상태 id + * ( ) $ E I7 r1,s4 r1,s5 r1 r1 I8 r2,s4 r2,s5 r2 r2

I0 I1 I4 S [S' .S] [S .iSeS] [S .iS] [S .a] [S' S.] [S iS.eS] [S iS.] S I2 i e [S i.SeS] [S i.S] [S .iSeS] [S .iS] [S .a] . I5 i a [S iSe.S] [S .iSeS] [S .iS] [S .a] . i I3 a [S a.] S I6 a [S iSeS.] ex) The "Dangle-else" Ambiguity • S'  S • S  iSeS | iS | a

컴파일러 입문

컴파일러 입문

Presentation Transcript