제 5 장 컴파일러 개요

제5장 컴파일러 개요 • 5.1 서 론 • 5.2 컴파일러 일반적 구성 • 5.3 컴파일러 자동화 도구 • 5.4 어휘 분석 • 5.5 구문 분석 • 구문 분석 방법 • 구문 분석기의 출력 • Top-down 방법 • Recursive-descent 파서 • LL 파서 • Bottom-up 방법 • Shift-reduce 구문 분석 • LR 파서

5.2 컴파일러일반적 구성 • Compiler “A compiler is a computer program which translates programs written in a particular high-level programming language into executable code for a specific target computer.” ex)C compiler on SPARC • C program을 입력으로 받아 SPARC에서 수행 가능한 코드를 출력한다.

Compiler Structure • Front-End : languagedependent part • Back-End : machinedependent part

1.3 일반적인 컴파일러 구조

1. Lexical Analyzer(Scanner) • 컴파일러 내부에서 효율적이며 다루기 쉬운 정수로 바꾸어 줌. ex) if ( a > 10 ) ... Token : if(a>10) ... Token Number : 32 7 4 25 5 8

2. Syntax Analyzer(Parser) • 기능: Syntax checking, Tree generation. • 출력: incorrect - error message 출력 correct - program structure (=> tree 형태) 출력 ex) if (a > 10) a = 1; if > = a 10 a 1 Introduction to Compiler Design Theory

3. Intermediate Code Generator • Semantic checking • Intermediate Code Generation ex) if (a > 10) a = 1.0; ☞ a가 정수일 때 semantic error ! ex) a = b + 1; Tree : = a + b 1 Ucode: lod 1 2 ldc 1 add str 1 1 - variable reference: (base, offset)

4. Code Optimizer • Optional phase • 비효율적인 code를 구분해 내서 더 효율적인 code로 바꾸어 준다. • Meaning of optimization • major part : improve running time • minor part : reduce code size ex) LDC R1, 1 LDC R1, 1 (x) • Criteria for optimization • preserve the program meanings • speed up on average • be worth the effort

Local optimization • local inspection을 통하여 inefficient한 code들을 구분해 내서 좀 더 efficient한 code들로 바꾸는 방법. 1. Constant folding 2. Eliminating redundant load, store instructions 3. Algebraic simplification 4. Strength reduction • Global optimization • flow analysis technique을 이용 1. Common subexpression 2. Moving loop invariants 3. Removing unreachable codes

5. Target Code Generator • 중간 코드로부터 machine instruction을 생성한다. • Code generator tasks 1. instruction selection & generation 2. register management 3. storage allocation 4. code optimization (Machine-dependent optimization)

6. Error Recovery Error recovery- error가 다른 문장에 영향을 미치지 않도록 수정하는 것 Error repair- error가 발생하면 복구해 주는 것 • Error Handling • Error detection • Error recovery • Error reporting • Error repair • Error • Syntax Error • Semantic Error • Run-time Error

5.3 컴파일러자동화도구 • Compiler Generating Tools (= Compiler-Compiler, Translator Writing System) • Language와 machine이 발달할 수록 많은 compiler가 필요. • 새로운 언어를 개발하는 이유: 컴퓨터의 응용 분야가 넓어지므로. • N개의 language를 M개의 컴퓨터에서 구현하려면 N*M개의 컴파일러가 필요. ex) 2개의 language : C, Java 3개의 Machine : IBM, SPARC, Pentium C-to-IBM, C-to-SPARC, C-to-Pentium Java-to-IBM, Java-to-SPARC, Java-to-Pentium

Compiler-compiler Model • Language description은 grammar theory를 이용하고 있으나, Machine description은 정형화가 이루어져 있지 않은 상태임. • HDL : Hardware Description Language  Computer Architecture를 design하는 데 사용. • Machine architecture와 programming language의 발전에 따라 automatic compiler generation이 연구됨.

1. LEX : 1975년에 M. E. Lesk가 고안. • 입력 스트림에서 정규표현으로 기술된 토큰들을 찾아내는 프로그램을 작성하는데 유용한 도구.

2. Parser Generator(PGS: Parser Generating System) (1) Stanford PGS • John Hennessy • 파스칼 언어로 쓰여 있음 : 5000 lines • 특징 : 구문 구조를 AST 형태로 얻음. • Output : Abstract Syntax Tree(AST)의 정보를 포함한 파싱 테이블을 출력.

(2) Wisconsin PGS • C.N. Fisher • 파스칼 언어로 쓰여 있음.: 10000 lines • 특징 : error recovery (3) YACC(Yet Another Compiler Compiler) • UNIX에서 수행. • C language로 쓰여 있음.

3. Automatic Code Generation • Three aspects 1. Machine Description : ISP, ISPS, HDL 2. Intermediate language 3. Code generating algorithm • CGA Pattern matching code generation Table driven code generation

4. Compiler Compiler System (1) PQCC(Production Quality Compiler CompilerSystem) • W.A. Wulf(Carnegie-Mellon University) • input으로 language description과 target machine description을 받아 PQC(Production Quality Compiler)와 table이 output됨. • 중간 언어로 tree구조인 TCOL을 사용. • Pattern Matching Code Generation에 의해 code를 생성함. (2) ACK(Amsterdam Compiler Kit) • Vrije대학의 Andrew S. Tanenbaum을 중심으로 개발된 Compiler의 Back-End 자동화 도구. • UNCOL 개념에서 출발(N*M=>N+M). • EM이라는 Abstract Machine Code를 중간 언어로 사용. • Portable Compiler를 만들기에 편리.

PQCC Model

ACK Model

Lexical Analysis the process by which the compiler groups certain strings of characters into individual tokens. Lexical Analyzer  Scanner  Lexer 5.4 어휘 분석

Token • 문법적으로 의미 있는 최소 단위 Token - a single syntactic entity(terminal symbol). Token Number - string 처리의 효율성 위한 integer number. Token Value - numeric value or string value. ex) if(a>10) ... Token Number : 32 7 4 25 5 8 Token Value : 0 0 ‘a’ 0 10 0

Token classes • Special form - language designer 1. Keyword --- const, else, if, int, ... 2. Operator symbols --- +, -, *, /, ++, -- etc. 3. Delimiters --- ;, ,, (, ), [, ] etc. • General form - programmer 4. identifier --- stk, ptr, sum, ... 5. constant --- 526, 3.0, 0.1234e-10, ‘c’, “string” etc. • Token Structure - represented by regular expression. ex) id = (l + _)( l + d + _)*

Symbol table의 용도 • L.A와 S.A시 identifier에 관한 정보를 수집하여 저장. • Semantic analysis와 Code generation시에 사용. • name + attributes ex) Hashed symbol table • chapter 12 참조

Specification of token structure - RE Specification of PL - CFG Scanner design steps 1. describe the structure of tokens inre. 2. or, directly design a transition diagram for the tokens. 3. and program a scanner according to the diagram. 4. moreover, we verify the scanner action through regular language theory. Character classification letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9d special character : + | - | * | / | . | , | ... 5.4.2 토큰인식

Transition diagram Regular grammar S  lA | _A A lA | dA | _A | ε Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*  S = (l + _)( l + d + _)* 4.2.1 Identifier Recognition

n : non-zero digit o : octal digit h : hexa digit 4.2.2 Integer number Recognition • Form : 10진수, 8진수, 16진수로 구분되어진다. 10진수 : 0이 아닌 수 시작 8진수 : 0으로 시작, 16진수 : 0x, 0X로 시작 • Transition diagram

구문 분석 방법 구문 분석기의 출력 Top-down방법 Bottom-up방법 5.5 구문 분석 5.5.1 5.5.2 5.5.3 5.5.4

6.1 구문 분석 방법 • How to check whether an input string is a sentence of a grammar and how to construct a parse tree for the string. • A Parser for grammar G is a program that takes as input a string ω and produces as output either a parse tree(or derivation tree) for ω, if ω is a sentence of G, or an error message indicating that ω is not sentence of G. ? Parsing : ∈L(G)

Two basic types of parsers for context-free grammars ① Top down - starting with the root and working down to the leaves. recursive descent parser, predictive parser. ② Bottom up - beginning at the leaves and working up the root. precedence parser, shift-reduce parser. ex) A → XYZ A reduce expand bottom-up X Y Z top-down “start symbol로”“sentence로”

5.5.2 구문분석기의출력 • The output of a parser: ① Parse - left parse, right parse ② Parse tree ③ Abstract syntax tree ex) G : 1. E → E + T string : a + a * a 2. E → T 3. T → T * F 4. T → F 5. F →(E) 6. F → a

1 2 4 6 3 4 6 6 1 3 6 4 6 2 4 6 • left parse: a sequence of production rule numbers applied in leftmost derivation. E  E + T  T + T  F + T  a + T  a + T * F  a + F * F  a + a * F  a + a * a ∴ 1 2 4 6 3 4 6 6 • right parse: reverse order of production rule numbers applied in rightmost derivation. E  E + T  E + T * F  E + T * a  E + F * a  E + a * a  T + a * a  F + a * a  a + a * a ∴ 6 4 2 6 4 6 3 1

parse tree: derivation tree E E + T T T * F F F a a a string : a + a * a

AbstractSyntaxTree(AST) ::= a transformed parse tree that is a more efficient representation of the source program. • leaf node -operand(identifier or constant) • internal node -operator(meaningful production rule name) ex) G: 1. E → E + T  add 2. E → T 3. T → T * F mul 4. T → F 5. F → (E) 6. F → a string :a + a * a

※ 의미 있는 terminal terminal node 의미 있는 production rule nonterminal node → naming : compiler designer가 지정. ex) if (a > b) a = b + 1; else a = b – 2;

::= Beginning with the start symbol of the grammar, it attempts to produce a string of terminal symbol that is identical to a given source string. This matching process proceeds by successively applying the productions of the grammar to produce substrings from nonterminals. ::= In the terminology of trees, this is moving from the root of the tree to a set of leaves in the parse tree for a program. Top-Down parsing methods (1) Parsing with backup or backtracking. (2) Parsing with limited or partial backup. (3) Parsing with nobacktracking. backtracking : making repeated scans of the input. 5.5.3 Top-Down 방법

General Top-Down Parsing method • called a brute-force method • with backtracking (  Top-Down parsing with full backup ) 1. Given a particular nonterminal that is to be expanded, the first production for this nonterminal is applied. 2. Compare the newly expanded string with the input string. In the matching process, terminal symbol is comparedwith an input symbol is selected for expansion and its first production is applied. 3. If the generated string does not match the input string, an incorrect expansion occurs. In the case of such an incorrect expansion this process is backed upby undoing the most recently applied production. And the next production of this nonterminal is used as next expansion. 4. This process continues either until the generated string becomes an input string or until there are no further productions to be tried. In the latter case, the given string cannot be generated from the grammar.

Several problems with top-down parsing method • left recursion • A nonterminal A is left recursive if A Aα for some α. • A grammar G is left recursive if it has a left-recursive nonterminal. ⇒ A left-recursive grammar can cause a top down parser to go into an infinite loop. ∴ eliminate the left recursion. • Backtracking • the repeated scanning of input string. • the speed of parsing is much slower. (very time consuming) ⇒ the conditions for nobacktracking : FIRST, FOLLOW을 이용하여 formal하게 정의. Syntax Analysis

+ • Elimination of left recursion • direct left-recursion : A → Aα ∈ P • indirect left-recursion : A Aα • general form : A → Aα ┃  A = Aα +  = α* • introducingnew nonterminal A’ which generates α*. ==> A → A' A' → αA' ┃ε

* ex) E → E + T | T T → T  F | F F → (E) | a E  E(+T)* T(+T)* | | E' E' → +TE' |  ※ E → TE' E' → +TE' |  • general method : A → Aα1┃Aα2┃ ... ┃Aαm┃β1┃β2┃... ┃βn ==> A → β1 A' | β2 A' | ... | βn A' A' → α1A' | α2 A' | ... | αm A' | 

Left-factoring • if A →  |  are two A-productions and the input begins with a non-empty string derived from , we do not know whether to expand A to  or to  . ==> left-factoring : the process of factoring out the common prefixes of alternates. • method : A →  |  ==> A → (|) ==> A → A', A' →  |  ex) S → iCtS | iCtSeS | a C → b

5.5.4 Bottom-up 방법 ::= Reducing a given string to the start symbol of the grammar. ::= It attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root(the top). ex) G: S → aAcBe string : abbcde A → Ab | b B → d

* rm * rm * rm rm rm rm rm rm = = = = = Reduce [Def 3.1] reduce : the replacement of the right side of a production with the left side. S , A →  ∈ P  S A [Def 3.2] handle : If S A, then  is a handle of . [Def 3.3] handle pruning : S  r0 r1 ...  rn-1 rn   rn-1  rn-2 ...  S “ reduce sequence ” ex) G : S → bAe ω : b a ; a e A → a;A | a rm

Shift-Reduce Parsing ::= a bottom-up style of parsing. • Two problems for automatic parsing 1. How to find a handle in a right sentential form. 2. What production to choose in case there is more than one production with the same right hand side. ====> grammar의 종류에 따라 방법이 결정되지만 handle를 유지하기 위하여 stack을 사용한다.

Four actions of a shift-reduce parser “Stack top과 current input symbol에 따라 파싱 테이블을 참조해서 action을 결정.” 1. shift : the next input symbol is shifted to the top of the stack. 2. reduce: the handle is reduced to the left side of production. 3. accept : the parser announces successful completion of parsing. 4. error : the parser discovers that a syntax error has occurred and calls an error recovery routine.

ex) G: E →E + T | T string : a + a  a T →T  F | F F → (E) | a STACK INPUT ACTION -------------- ------------------ --------------------- (1) $a + a  a $ shift a (2) $a + a  a $ reduce F → a (3) $F + a  a $ reduce T → F (4) $T + a  a $ reduce E → T (5) $E + a  a $ shift + (6) $E + a  a $ shift a (7) $E + a  a $ reduce F → a (8) $E + F  a $ reduce T → F (9) $E + T  a $ shift  (10) $E + T  a $ shift a (11) $E + T  a $ reduce F → a (12) $E + T  F $ reduce T → T * F (13) $E + T $ reduce E → E + T (14) $E $ accept

<< Thinking points >> 1. the handle will always eventually appear on top of the stack, never inside.  ∵ rightmost derivation in reverse. stack에 있는 contents와 input에 남아 있는 string이 합해져서 right sentential form을 이룬다. 따라서 항상 stack의 top부분이 reduce된다. 2. How to make a parsing table for a given grammar. → 문법의 종류에 따라 Parsing table을 만드는 방법이 다르다. SLR(Simple LR) LALR(LookAheadLR) CLR(Canonical LR)

Constructing a Parse tree 1. shift : create a terminal node labeled the shifted symbol. 2. reduce : A → X1X2...Xn. (1) A new node labeled A is created. (2) The X1X2...Xn are made direct descendants of the new node. (3) If A → ε, then the parser merely creates a node labeled A with no descendants. ex) G : 1. LIST → LIST , ELEMENT 2. LIST → ELEMENT 3. ELEMENT → a string : a , a

제 5 장 컴파일러 개요

제 5 장 컴파일러 개요

Presentation Transcript