400 likes | 584 Vues
컴파일러 입문. 제 7 장 LL 구문 분석. I. 결정적 구문 분석. ▶ Deterministic Top-Down Parsing ::= deterministic selection of production rules to be applied in top-down syntax analysis. ▶ One pass nobackup 1. Input string is scanned once from left to right.
 
                
                E N D
컴파일러 입문 제 7 장 LL 구문 분석
I. 결정적 구문 분석 ▶ Deterministic Top-Down Parsing ::= deterministic selectionof production rules to be applied in top-down syntax analysis. ▶ One passnobackup 1. Input string is scannedonce from left to right. 2. Parsing process is deterministic. ▶ Top-down parsing with nobackup ::= deterministic top-down parsing. called LL parsing. "Left to right scanning and Leftparse"
▶ How to decide which production is to be applied: sentential form : 1 2 … i-1Xα input string : 1 2 … i-1 ii+1 …n  X 1 | 2... | k ∈ P일 때, i를 보고 X-production 중에unique하게 결정.  the condition forno backtracking: FIRST와 FOLLOW가 필요. (= LL condition)
FIRST ▶ Computation of FIRST(X), where X ∈ V. 1) if X∈VT, then FIRST(X) = {X} 2) if X∈VN and X a∈P, then FIRST(X) = FIRST(X)  {a} if X  ∈ P, then FIRST(X) = FIRST(X)  {} 3) if X  Y1Y2 …Yk ∈ P and Y1Y2 …Yi-1*, i then FIRST(X) = FIRST(X)  ( FIRST(Yj) - {}). j=1 if Y1Y2 …Yk* , then FIRST(X) = FIRST(X)  {}. ▶ FIRST() ::= the set of terminals that begin the strings derived from . if * , then  is also in FIRST().  FIRST(A) ::= { a∈VT∪{} | A * a,  ∈ V* }.
Text p.230 ex1) E  TE E+TE |  T  FT T FT |  F (E) | id FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E) = {+, } FIRST(T) = {, } ex2) PROGRAM  begin d semi X end X  d semi X X  s Y Y  semi s Y |  FIRST(PROGRAM) = {begin} FIRST(X) = {d,s} FIRST(Y) = {semi, }
▶ left-dependency graph - the vertices are the terminal and nonterminal symbols and the arcs go from X to Y if and only if X  X1...XnY, where n  0, and each of X1,...,Xn can produce the empty string. ex) S  AB A  aA |  B bB |  A a S B b FIRST(S) = {a, , b} FIRST(A) = {a, } FIRST(B) = {b, }
★ In general, A  A1A2...An if A1 : non-nullable if A1 : nullable if A1A2 : nullable A A1 A1 A A2 A1 A A2 A3
FOLLOW ▶ FOLLOW(A) ::= the set of terminals that can appear immediately to the right of A in some sentential form. If A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A). $ is the input right marker. ::= {a ∈ VT∪{$} | S *Aa, ,  ∈ V*}. ▶ Computation of FOLLOW(A) 1) FOLLOW(S) = {$} 2) if A B ∈ P and  , then FOLLOW(B) = FOLLOW(B) ∪ (FIRST() -) 3) if A B ∈ P or A B and *, then FOLLOW(B) = FOLLOW(B) ∪ FOLLOW(A).
Text p.233 ex) E  TE' E'  +TE' |  T  FT' T' FT' |  F  (E) | id Nullable = { E, T } FIRST(E) = FIRST(T) = FIRST(F) = {(, id} FIRST(E) = {+, } FIRST(T) = {, } FOLLOW(E) = {),$} FOLLOW(E') = {),$} FOLLOW(T) = {+,),$} FOLLOW(T') = {+,),$} FOLLOW(F) = {,+,),$}
▶ LL condition ::= no backup condition ::= the condition for deterministic parsing of top-down method. input : 12 ... i-1i ...n derived string : 12...i-1X X 1 | 2 ... | m i를 보고 X-production들 중에서 X를 확장할 rule을 결정적으로 선택. ★ <LL condition> A  | ∈ P, 1. FIRST()  FIRST() =  2. if * , FOLLOW(A)  FIRST() =
ex) A  aBc | Bc | dAa B  bB |  FIRST(A) = {a,b,c,d} FOLLOW(A) = {$,a} FIRST(B) = {b, } FOLLOW(B) = {c} 1) A  aBc | Bc | dAa에서, FIRST(aBc)  FIRST(Bc)  FIRST(dAa) = {a}  {b,c}  {d} =  2) B  bB | 에서, FIRST(bB)  FOLLOW(B) = {b}  {c} =  1), 2)에 의해 LL 조건을 만족한다.
II. Recursive-descent 파서 ▶ Recursive-descent parsing ::= A top-down method that uses a set of recursiveprocedures to recognize its input with no backtracking. ▶ create a procedure for each nonterminal. ex) G : S  aA | bB A  aA | c B  bB | d procedure pS; begin if nextsymbol = qa then begin get_nextsymbol; pAend else if nextsymbol = qb then begin get_nextsymbol; pB end else error end;
 = aac$ procedure pA; begin if nextsymbol = qa then begin get_nextsymbol; pA end else if nextsymbol = qc then get_nextsymbol else error end; procedure pB; ... (* main *) begin get_nextsymbol; pS; if next_symbol = '$' then accept else error end.  Procedure call sequence ::= leftmost derivation
▶ The main problem in constructing a recursive-descent syntax analyzer is the choice of productions when a procedure is first entered. To resolve this problem, we can compute the lookahead of each production. ▶ LOOKAHEADof a production Definition: LOOKAHEAD(A) = FIRST({ | S *A*∈ VT*}). Meaning : the set of terminals which can be generated by  and if *, then FOLLOW(A) is added to the set. Computing formula: LOOKAHEAD(A  X1X2...Xn) = FIRST(X1X2...Xn)  FOLLOW(A)
ex) S  aSA |  A  c Nullable Set = {S} FIRST(S) = {a, } FOLLOW(S) = {$,c} FIRST(A) = {c} FOLLOW(A) = {$,c} LOOKAHEAD(S  aSA) = FIRST(aSA)  FOLLOW(S) = {a} LOOKAHEAD(S ) = FIRST()  FOLLOW(S) = {$,c} LOOKAHEAD(A  c) = FIRST(c)  FOLLOW(A) = {c}  Nullable => FIRST => FOLLOW => LOOKAHEAD
▶ Strong LL condition  Definition : A   |  ∈ P, LOOKAHEAD(A  )  LOOKAHEAD(A ) = .  Meaning : for each distinct pair of productions with the same left-hand side, it can select the unique alternate that derives a string beginning with the input symbol.  Definition : the grammar G is said to be strong LL(1) if it satisfies the strong LL condition. ex) G : S  aSA |  A  c  LOOKAHEAD(S  aSA) = {a}  LOOKAHEAD(S ) = FOLLOW(S) = {$, c} LOOKAHEAD(S  aSA)  LOOKAHEAD(S ) =   G는 strong LL(1)이다.
▶ Implementation of Recursive-descent parser  If a grammar is strong LL(1), we can construct a parser for sentences of the grammar using the following scheme. a ∈ VT, procedure pa; (* get_nextsymbol=scanner *) begin if nextsymbol = qa then get_nextsymbol else error end; get_nextsymbol : 스캐너에 해당하는 루틴으로 입력 스트림으로부터 토큰 한 개를 읽어 변수 nextsymbol에 할당하는 일을 한다.
Text p.240 A ∈ VN, procedure pA; var i: integer; begin case nextsymbol of LOOKAHEAD(A  X1X2...Xm): for i := 1 to m do pXi; LOOKAHEAD(A  Y1Y2...Yn): for i := 1 to n do pYi; : LOOKAHEAD(A  Z1Z2...Zr): for i := 1 to r do pZi; LOOKAHEAD(A ): ; otherwise: error end (* case *) end;
▶ Improving the efficiency and structure of recursive-descent parser 1) Eliminating terminal procedures ::= In practice it is better not to write a procedure for each terminal. Instead the action of advancing the input marker can always be initiated by the nonterminal procedures. In this way many redundant tests can be eliminated. ex) text p.241 [예9] 2) BNF EBNF : reduce the number of productions and nonterminals. ① repetitive part : { } ② optional part : [ ] ③ alternation : ( | )
ex) < IF_st > ::= 'if ' < C > ' then ' < S > [ 'else ' < S > ] procedure pIF; begin if nextsymbol = qif then begin get_nextsymbol; pC; if nextsymbol = qthen then begin get_nextsymbol; pS end else error(10) end else error(20); if nextsymbol = qelse then begin get_nextsymbol; pS end end;
ex) <id_list> ::= ' id ' { ' , ' ' id ' } procedure pID_LIST; begin if nextsymbol = qid then begin get_nextsymbol; while (nextsymbol = qcomma) do begin get_nextsymbol; if nextsymbol = qid then get_nextsymbol else error end end end;
<문제> 다음 grammar를 extended BNF로 바꾸고 그에 따른 recursive-descent parser를 위한 procedure를 작성하시오. <D> ::= ' label ' <L> | ' integer ' <L> <L> ::= <id> <R> <R> ::= ' ; ' | ' , ' <L> <L>  <id> (' , ' <id> )*' ; '  <D> ::= ( ' label ' | ' integer ' ) <id> {' , ' <id>} ' ; ' *
procedure pD; begin if nextsymbol in [qlabel,qinteger] then begin get_nextsymbol; if nextsymbol = qid then begin get_nextsymbol; while (nextsymbol = qcomma) do begin get_nextsymbol; if nextsymbol = qid then get_nextsymbol else error(3) end end else error(2); if nextsymbol = qsemi then get_nextsymbol else error(4) end else error(1) end;
Programming Assignment #1  Implement a recursive-descent syntax analyzer for the grammar given in exercise 5.24(text p. 189).  Problem Specifications - input : SPL program to find a Minimum and a Maximum. - output : left parse - methods : (1) write the get_nextsymbol routine. (2) compute LOOKAHEADs for each production. (3) create a procedure for each nonterminal. (4) assemble the procedures with main program. a set of productions LOOKAHEADs for each nonterminal Computation of LOOKAHEADs
 $ : input $ Driver routine output Table stack III. Predictive Parsing ▶ Predictive parsing ::= a deterministic parsing method using a stack. The stack contains a sequence of grammar symbols. ▶ Model of a predictive parser
 Current input symbol과 stack top symbol 사이의 관계에 따라 parsing. The input buffer contains the string to be parsed, followed by $. Initial configuration : STACK INPUT $S $  Parsing table(LL) : parsing action을 결정지어 줌. ※ M[X,a] = r : stack top symbol이 X이고 current symbol이 a일 때, r번 생성 규칙으로 expand. terminals a r nonterminals X
▶ Parsing Actions X : stack top symbol, a : current input symbol 1. if X = a = $, then accept. 2. if X = a, then pop X and advance input. 3. if X ∈ VN, then if M[X,a] = r (X), then replace X by  else error.
Text p.246 ▶ Predictive parsing algorithm set ip to point to the first symbol of $; repeat let X be the top stack symbol and a the symbol pointed to by ip; if X is a terminal or $ then if X = a then pop X from the stack and advance ip else error(1) else /* X is nonterminal */ if M[X,a] = X  Y1Y2...Yk then begin pop X from the stack; push YkYk-1,...,Y1 onto the stack, with Y1 on top; output the production X  Y1Y2...Yk end else error(2) until X = $ /* stack is empty */
ex) G : 1. S  aSb 2. S  bA 3. A  aA 4. A  b string : aabbbb • Parsing Table: terminals a b nonterminals S 1 2 A 3 4
STACK INPUT ACTIONS OUTPUT $S aabbbb$ expand 1 1 $bSa aabbbb$ pop a and advance $bS abbbb$ expand 1 1 $bbSa abbbb$ pop a and advance $bbS bbbb$ expand 2 2 $bbAb bbbb$ pop b and advance $bbA bbb$ expand 4 4 $bbb bbb$ pop b and advance $bb bb$ pop b and advance $b b$ pop b and advance $ $ Accept ※ How to construct a predictive parsing table for the grammar.
VT a VN X VI. Predictive 파싱 테이블의 구성 ▶ main idea : If A  is a production with a in FIRST(), then the parser will expand A by  when the current input symbol is a. And if *, then we should again expand A by  when the current input symbol is in FOLLOW(A). ▶ parsing table(LL): M[X,a] = r : expand X with r-production blank : error
▶ Algorithm : for each production A, 1. a ∈ FIRST(), M[A,a] := <A> 2. if *, then b ∈ FOLLOW(A), M[A,b] := <A>. ex) G: 1. E  TE' 2. E'  +TE' 3. E'  4. T  FT' 5. T' FT' 6. T'  7. F  (E) 8. F  id FIRST(E)=FIRST(T)=FIRST(F)={ ( , id } FIRST(E')={ + ,  } FIRST(T')={  ,  } FOLLOW(E) = FOLLOW(E') = { ) , $ } FOLLOW(T) = FOLLOW(T') = { + , ) , $ } FOLLOW(F) = { + ,  , ) , $ }
Parsing Table: Terminals id + * ( ) $ Nonterminals E 1 1 E' 2 3 3 T 4 4 T' 6 5 6 6 F 8 7
▶ LL(1) Grammar ::= a grammar whose parsing table has no multiply-defined entries.  multiply 정의되면 어느 rule로 expand해야 할 지 결정할 수 없기 때 문에 deterministic하게 parsing할 수 없다. ▶ LL(1) condition: A  | , 1. FIRST() FIRST() = . 2. if , then FOLLOW(A) FIRST() =  . ex) G : 1. S  iCtSS' 2. S  a 3. S'  eS 4. S'  5. C  b FIRST(S) = {i,a} FOLLOW(S) = {$,e} FIRST(S') = {e, } FOLLOW(S') = {$,e} FIRST(C) = {b} FOLLOW(C) = {t} *
Parsing Table: M[S',e] := <3,4>로 중복으로 정의되었음. 여기서, stack top이 S'이고 input symbol이 e일 때 3번 rule로 expand해야 할 지, 4번 rule로 expand해야 하는지 알 수 없다. 그러므로 G는 LL(1) grammar가 아니다. ex) text p.252 예제14) G : S  aA | abA  : abab A Ab | a a b e i t $ S 2 1 S' 3,4 4 C 5
V. Strong LL(k) and LL(k) Grammars ▶ FIRSTk() = {| *, || = k or  and || < k} ▶ G is said to be strong LL(k), for some fixed integer k > 0, if whenever there are two leftmost derivations. 1. S *A*x∈ VT*, and 2. S *A*y∈ VT* such that 3. FIRSTk(x) = FIRSTk(y). It follows that 4.  = . ▶ Meaning: Suppose we consider any state of the parse in which A is the nonterminal currently being parsed and FIRSTk(x) is the k-lookahead at the current point. Then, if the k-lookahead is same, the two productions A  and A  are identical. Any other information provided by the closed portion and the open portion of the current state of the parse will be disregarded.
▶ S A,  : closed portion,  : open portion ▶ Two states of the parse FIRSTk(x) = FIRSTk(y) ===>  = . * S S   A A    x  y
▶ Def) LL(k) grammar: 1. S Ax ∈ VT*, and 2. S Ay ∈ VT* such that 3. FIRSTk(x) = FIRSTk(y). It follows that 4.  = . ex) S  aAaa | bAba A  b |  S S a A a a b A b a b   lookahead가 ba일 때 A  b, A 중 어느 rule을 택할 수 있는가? 이제 본 symbol이a이면 A  b를 선택하고, b이면 A 를 선택한다. 따라서 SLL(2)는 아니며 LL(2)가 된다. * * * *
LL(k) SLL(k) ▶ SLL(k) and LL(k) ▶ <theorem> strong LL(1)  LL(1) Proof) () clear! () Suppose that G is not strong LL(1). Then, by definition, there are two distinct productions A   and A  such that, S 1A111111111 S 2A222222222 and FIRST(11) = FIRST(22). * * * * * *
Now we must prove that G is not LL(1). 1) 1= 2= , G is not LL(1). Indeed, it is ambiguous. 2) one (or both) of 1 and 2 is not . 1. FIRST1(1 1) = FIRST1(1) = FIRST1(2 2). but then, S 2A2222 12212 S 2A2222 22222 satisfy the property FIRST1(1 2) = FIRST1(1) = FIRST1(2 2). Thus, by definition, G is not LL(1). * * * * * *