Compilers

Basic Compiler Functions Machine-Dependent Compiler Features Machine-Independent Compiler Features Compiler Design Options Implementation Examples C H A P T E R5 Compilers

Basic Compiler Functions • Grammars • Lexical Analysis • Syntactic Analysis • Code Generation

High-Level Programming Language • A high-level programming language is described in terms of a grammar, which specifies the syntax of legal statements. • An assignment statement: • a variable name + an assignment operator + an expression 1 PROGRAM STATS ;2 VAR3 SUM, SUMSQ, I, VALUE, MEAN, VARIANCE : INTEGER ;4 BEGIN5 SUM := 0 ;6 SUMSQ := 0 ;7 FOR I := 1 TO 100 DO 8 BEGIN9 READ( VALUE ) ;10 SUM := SUM + VALUE ;11 SUMSQ := SUMSQ + VALUE * VALUE ;12 END ;13 MEAN := SUM DIV 100 ;14 VARIANCE := SUMSQ DIV 100 - MEAN * MEAN ; 15 WRITE( MEAN, VARIANCE ) 16 END .

Compilation: matching statements (written by programmers) to structures (defined by the grammar) and generating the appropriate object code Lexical analysis (scanning) Scanning the source statement, recognizing and classifying the various tokens, including keywords, variable names, data types, operators, etc. Syntactic analysis (parsing) Recognizing each statement as some language construct described by the grammar Semantics (code generation) Generation of the object code Compiler

A grammar is a formal description of the syntax BNF (Backus-Naur Form): A simple and widely used notations for writing grammars introduced by John Backus and Peter Naur in about 1960. Meta-symbols of BNF: ::= "is defined as" | "or" < > angle brackets used to surround non-terminal symbols A BNF rule defining a nonterminal has the form: nonterminal ::= sequence_of_alternatives consisting of strings of terminals (tokens) or nonterminals separated by the meta-symbol | Grammars

G = <N, T, , P> N: Nonterminal Symbol Set T: Terminal Symbol Set : Start Symbol,   N P: Production Rule Set,  ,   (N T)*,   , 為空字串 N  T =  Grammar

G = <N, T, , P> N = {A, B, S, T, } T = {0, 1} P = {S, S1A, A1A, A 0B, B 1T, T }  ,   (N T)*,   , 為空字串 S1A1(1A)1+A 1+0B 1+01T 1+01 Grammar * 1 0 1 S 1 B T A

4 Language/Grammar/Machine Types

Definition: I: Input Set Rules:  is a regular expression (表示空字串)  a  I, a is a RE If R, S are RE, R | S is a RE If R, S are RE, RS is a RE If R is a RE, (R) is a RE If R is a RE, R* is a RE If R is a RE, R+ is a RE Regular Set Regular Expression

(a | b)* aba Nondeterministic Finite Automata Deterministic Finite Automata Regular Expression b b a 1 a 3 4 2 a a a b a 3 4 2 b b a 1 b

Homework • Give deterministic finite automata (DFA) accepting the following languages over the alphabet {0,1} : • The set of all strings with three consecutive 0’s. • The set of all strings ending in 00. • The set of all strings such that every block of five consecutive symbols contains at least two 0’s. • The set of all strings beginning with a 1 which, interpreted as the binary representation of an integer, is congruent to zero modulo 3. • The set of all strings not containing 101 as a substring

Simplified Pascal Grammar 1 <prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END.2 <prog-name> ::= id3 <dec-list> ::= <dec> | <dec-list> ; <dec>4 <dec> ::= <id-list> : <type> 5 <type> ::= INTEGER6 <id-list> ::= id | <id-list> , id 7 <stmt-list> ::= <stmt> | <stmt-list> ; <stmt>8 <stmt> ::= <assign> | <read> | <write> | <for> 9 <assign> ::= id := <exp>10 <exp> ::= <term> | <exp>+<term> | <exp> - <term> 11 <term> ::= <factor> | <term>*<factor> | <term> DIV <factor>12 <factor> ::= id | int | ( <exp> )13 <read> ::= READ( <id-list> )14 <write> ::= WRITE( <id-list> )15 <for> ::= FOR <index-exp> DO <body>16 <index-exp> ::= id := <exp> TO <exp> 17 <body> ::= <stmt> | BEGIN <stmt-list> END Recursive rule

Parse Tree (Syntax Tree) READ(VALUE) VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN The multiplication and division precede the addition and subtraction

Parse Tree

Lexical Analysis • Tokens might be defined by grammar rules to be recognized by the parser: <ident> ::= <letter> | <ident><letter> | <ident><digit> <letter> ::= A | B | C | D | … | Z <digit> ::= 0 | 1 | 2 | 3 | … | 9 • For better efficiency, a scanner can be used instead to recognize and output the tokens in a sequence represented by fixed-length codes and the associated token specifiers.

Lexical Scan

Modeling Scanners as Finite Automata • Tokens can often be recognized by a finite automaton, which consists of • A finite set of states (including a starting state and one or more final states) • A set of transtitions from one state to another

Finite Automata for Typical Tokens

Finite Automata for Tokens from Fig.5.5

Token Recognition Algorithm

Syntactic Analysis • Operator-Precedence Parsing • Recursive-Descent Parsing

Syntactic analysis: building the parse tree for the statements being translated Parse tree Root: goal grammar rule Leaves: terminal symbols Methods: Bottom-up: operator-precedence parsing Top-down: recursive-descent parsing Syntactic Analysis

The operator-precedence method uses the precedence relation between consecutiveoperators to guide the parsing processing. A + B * C - D Subexpression B*C is to be computed first because * has higher precedence than the surrounding operators, this means that * appears at a lower level than does + or – in the parse tree. Precedence:     < < > >  = Operator-Precedence Parsing

Precedence Matrix Empty means that these two tokens cannot appear together

Example: READ ( VALUE )

Example: VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

Operator-precedence parsing can deal with the operator grammars having the property that no production right side has two adjacent nonterminals. Shift-reduce parsing is a more general bottom-up parsing method for LR(k) grammar. It makes use of a stack to store tokens that have not yet been recognized. Actions: Shift: push the current token onto the stack Reduce: recognize symbols on top of the stack according to a grammar rule. Shift-Reduce Parsing

A recursive-descent parser is made up of a procedure for each nonterminal symbol in the grammar. The procedure attempts to find a substring of the input that can be interpreted as the nonterminal. The procedure may call other procedures, or even itself recursively, to search for other nonterminals. The procedure must decide which alternative in the grammar rule to use by examining the next input token. Top-down parsers cannot be directly used with a grammar containing immediate left recursion. Recursive-Descent Parsing

Modified Grammar without Left Recursion still recursive, but a chain of calls always consume at least one token

Recursive-Descent Procedure for READ Statement

Recursive-Descent Procedure for Assignment Statement

Example: VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

When the parser recognizes a portion of the source program according to some rule of the grammar, the corresponding semantic routine (code generation routine) is executed. As an example, symbolic representation of the object code for a SIC/XE machine is generated. Two data structures are used for working storage: A list (associated with a variable LISTCOUNT) A stack Code Generation

Example: READ ( VALUE ) placed in register L Argument passing

Example:VARIANCE:=SUMSQ DIV 100 – MEAN*MEAN

Other Code-Generation Routines

Symbolic Representation of the Generated Object Code

Compilers

Compilers

Presentation Transcript

Advanced Compilers

Honors Compilers

Compilers

Compilers

Compilers:

Honors Compilers

Optimizing Compilers

Compilers

COMPILERS

COMPILERS

Compilers

Compilers

Advanced Compilers

Compilers

Compilers

Compilers

Compilers