Simple Syntax-directed Translator for Programming Languages

Chapter 2: A Simple Syntax-directed Translator Salam R. Al-E’mari Adham University College

Languages are described by their syntaxes and semantics What is the meaning of Syntax? • It means the form or structure of the expressions, statements and program units. The Syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. What is the meaning of Semantics? It means the meaning of the expressions, statements, and program units.

character stream The Role of the Lexical Analyzer Lexical Analyzer Analysis part Front end of the compiler token stream Syntax Analyzer (Parser) syntax tree Semantic Analyzer Symbol Table syntax tree Phases of a compiler Intermediate Code Generator intermediate representation Machine-Independent Code Optimizer Synthesis part Back end of the compiler intermediate representation Code Generator target-machine code Machine-Dependent Code Optimizer target-machine code

Syntax Definition • Definition of Grammars • Derivations • Parse trees • Ambiguity • Associativity of operators • Precedence of operators

Definition of Grammars • Context-free grammar is a 4-tuple with • A set of tokens (terminal symbols) • A set of nonterminals • A set of productions • A designated start symbol • Lexemeis basic component during lexical analysis. A lexeme consists of related alphabet from the language. e.g. numeric literals, operators (+), and special words (begin). • Token (variable names) : of a language is a category of its lexemes, and a lexeme is an instance of a token.

Example: consider the following Java statement • index = 2 * count +17; • The lexemes and tokens of this statement are represented in the following table

Formal Methods of Describing Syntax The formal language that is used to describe the syntax of programming languages is called grammar. • BNF (Backus-Naur Form) and Context-Free Grammars (FG) BNF is widely accepted way to describe the syntax of a programming language. • Context-Free Grammars Regular expression and context-free grammar are useful in describing the syntax of a programming language. Regular expression describes how a token is made of alphabets, and context-free grammar determines how the tokens are put together as a valid sentence in the grammar.

BNF Fundamentals • Metalanguage is a language that is used to describe another language. BNF is a metalanguage for programming languages. • The BNF consists of rules (or productions). A rule has a left-hand side (LHS) as the abstraction, and a right-hand side (RHS) as the definition. As in java assignment statement the definition could be as follows: <assign> → <var> = <expression> • The abstractions in a BNF description, or grammar, are often called nonterminal symbols, or simply nonterminals. • Lexemes and tokens of the rules are called terminal symbols, or simply terminals.

Example Grammar Context-free grammar for simple expressions: G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with productions P = list list+digit list list-digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

2. Derivation • Given a CF grammar we can determine the set of all strings (sequences of tokens) generated by the grammar using derivation • We begin with the start symbol • In each step, we replace one nonterminal in the current sentential form with one of the right-hand sides of a production for that nonterminal

Derivation for the Example Grammar listlist+digit list-digit+digit digit-digit+digit 9 -digit+digit 9 - 5 +digit 9 - 5 + 2 This is an example leftmost derivation, because we replacedthe leftmost nonterminal (underlined) in each step.Likewise, a rightmost derivation replaces the rightmostnonterminal in each step

Example : A derivation of a program in this language follows: <program> → begin <stmt_list> end <stmt_list> → <stmt> | <stmt>; <stmt_list> <stmt> → <var> = <expression> <var> → A | B | C <expression> → <var> + <var> | <var> - <var> | <var> <program> => begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A= <expression>; <stmt_list> end => begin A = <var> + < var> ; <stmt_list> end => begin A = B+ < var> ; <stmt_list> end => begin A = B + C; <stmt_list> end => begin A = B + C; <stmt> end => begin A = B + C ; <var> = <expression> end => begin A = B + C ; B = <expression> end => begin A = B + C ; B = <var> end => begin A = B + C ; B = C end

3. Parse Trees • The root of the tree is labeled by the start symbol • Each leaf of the tree is labeled by a terminal (=token) or  • Each interior node is labeled by a nonterminal • If AX1 X2 … Xn is a production, then node A has immediate childrenX1, X2, …, Xn where Xi is a (non)terminal or  ( denotes the empty string)

Parse Tree for the Example Grammar Parse tree of the string 9-5+2 using grammar G list list digit list digit digit The sequence ofleaf is called theyield of the parse tree 9 - 5 + 2

Parse Tree for the Example Grammar <assign> A parse tree for the simple statement: A = B * (A + C) <id> <expr> = <id> <expr> * A ( ) <expr> B <id> <expr> + <id> A C

4. Ambiguity Consider the following context-free grammar: G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string> with production P = string string+string | string-string | 0 | 1 | … | 9 This grammar is ambiguous, because more than one parse treerepresents the string 9-5+2

Ambiguity (cont’d) string string string string string string string string string string 9 - 5 + 2 9 - 5 + 2

Ambiguity Example : Two distinct parse trees (generated by the grammar on the next slide) for the same sentence, A = B +C * A <assign> <assign> <id> <expr> = <id> <expr> = <expr> <expr> A + <expr> <expr> A * <expr> <expr> * <expr> <expr> <id> + <id> <id> <id> <id> <id> B A A 19 C C B

5- Associativity of Operators Left-associative operators have left-recursive productions left left+term | term String a+b+c has the same meaning as (a+b)+c Right-associative operators have right-recursive productions right term=right | term String a=b=c has the same meaning as a=(b=c)

Associativity of Operators Operator associativity can also be indicated by a grammar. <expr> => <expr> + <expr> | const (ambiguous) <expr> => <expr> + const | const (unambiguous) <expr> <expr> const + <expr> const + const

6. Precedence of Operators Operators with higher precedence “bind more tightly” expr expr+term | termterm  term *factor | factorfactor  number | ( expr ) String 2+3*5 has the same meaning as 2+(3*5) expr expr term term term factor factor factor number number number + * 2 3 5

Example: If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> → <expr> - <term> | <term> <term> → <term> / const | const

Extended BNF - EBNF has the same expression power as BNF - The addition includes optional constructs (parts), repetition, and multiple choices, very much like regular expression. - Optional parts are placed in brackets ([ ]) • <if_stmt> → if (<expression>) <statement> [else <statement>] - Alternative parts of RHSs in parentheses and separate them with vertical bars • <term> → <term> (* | / | %) <factor> - Put repetitions (0 or more) are placed inside braces ({}) • <ident_list> → <identifier> {, <identifier>}

BNF <expr>  <expr> + <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> EBNF <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>}

Syntax-Directed Translation • Uses a CF grammar to specify the syntactic structure of the language • AND associates a set of attributes with the terminals and nonterminals of the grammar • AND associates with each production a set of semantic rules to compute values of attributes • A parse tree is traversed and semantic rules applied: after the tree traversal(s) are completed, the attribute values on the nonterminals contain the translated form of the input

Synthesized and Inherited Attributes • An attribute is said to be: • Synthesized if its value at a parse-tree node is determined from the attribute values at the children of the node. • Inherited if its value at a parse-tree node is determined by the parent (by enforcing the parent’s semantic rules)

Example Attribute Grammar String concat operator Production Semantic Rule expr expr1+termexpr  expr1-termexpr  termterm  0term  1…term  9 expr.t := expr1.t // term.t // “+”expr.t := expr1.t // term.t // “-”expr.t := term.tterm.t := “0”term.t := “1”… term.t := “9”

Example Annotated Parse Tree expr.t = “95-2+” expr.t = “95-” term.t = “2” expr.t = “9” term.t = “5” term.t = “9” 9 - 5 + 2

Depth-First Traversals procedure visit(n : node);begin for each child m of n, from left to right dovisit(m); evaluate semantic rules at node nend

Depth-First Traversals (Example) expr.t = “95-2+” expr.t = “95-” term.t = “2” expr.t = “9” term.t = “5” term.t = “9” 9 - 5 + 2 Note: all attributes areof the synthesized type

Translation Schemes • A translation scheme is a CF grammar embedded with semantic actions rest +term { print(“+”) } rest Embeddedsemantic action rest + term { print(“+”) } rest

Example Translation Scheme expr expr+termexpr  expr-termexpr  termterm  0term  1…term  9 { print(“+”) }{ print(“-”) }{ print(“0”) }{ print(“1”) }…{ print(“9”) }

Example Translation Scheme (cont’d) expr { print(“+”) } + expr term { print(“2”) } { print(“-”) } - 2 expr term { print(“5”) } 5 term { print(“9”) } 9 Translates 9-5+2 into postfix 95-2+

Parsing • Parsing = process of determining if a string of tokens can be generated by a grammar • For any CF grammar there is a parser that takes at most O(n3) time to parse a string of n tokens • Linear algorithms suffice for parsing programming language source code • Top-down parsing“constructs” a parse tree from root to leaves • Bottom-up parsing“constructs” a parse tree from leaves to root

Predictive Parsing • Recursive descent parsing is a top-down parsing method • Each nonterminal has one (recursive) procedure tha tis responsible for parsing the nonterminal’s syntactic category of input tokens • When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information • Predictive parsing is a special form of recursive descent parsing where we use one lookahead token to unambiguously determine the parse operations

Example Predictive Parser (Grammar) type simple|^ id|array [ simple ] of typesimple integer|char| num dotdot num

Example Predictive Parser (Program Code) procedure match(t : token);begin if lookahead = t thenlookahead := nexttoken()else error()end;procedure type();begin if lookahead in { ‘integer’, ‘char’, ‘num’ } thensimple()else if lookahead = ‘^’thenmatch(‘^’); match(id)else if lookahead = ‘array’thenmatch(‘array’); match(‘[‘); simple();match(‘]’); match(‘of’); type()else error()end; procedure simple();begin if lookahead = ‘integer’thenmatch(‘integer’)else if lookahead = ‘char’thenmatch(‘char’)else if lookahead = ‘num’thenmatch(‘num’);match(‘dotdot’);match(‘num’)else error()end;

Example Predictive Parser (Execution Step 1) type() Check lookaheadand call match match(‘array’) Input: array [ num dotdot num ] of integer lookahead

Example Predictive Parser (Execution Step 2) type() match(‘array’) match(‘[’) Input: array [ num dotdot num ] of integer lookahead

Example Predictive Parser (Execution Step 3) type() match(‘array’) match(‘[’) simple() match(‘num’) Input: array [ num dotdot num ] of integer lookahead

Example Predictive Parser (Execution Step 4) type() match(‘array’) match(‘[’) simple() match(‘num’) match(‘dotdot’) Input: array [ num dotdot num ] of integer lookahead

Example Predictive Parser (Execution Step 5) type() match(‘array’) match(‘[’) simple() match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of integer lookahead

Example Predictive Parser (Execution Step 6) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of integer lookahead

Example Predictive Parser (Execution Step 7) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) match(‘num’) match(‘dotdot’) match(‘num’) Input: array [ num dotdot num ] of integer lookahead

Example Predictive Parser (Execution Step 8) type() match(‘array’) match(‘[’) simple() match(‘]’) match(‘of’) type() match(‘num’) match(‘dotdot’) match(‘num’) simple() match(‘integer’) Input: array [ num dotdot num ] of integer lookahead

Left Factoring When more than one production for nonterminal A startswith the same symbols, the FIRST sets are not disjoint stmt if expr then stmtendif | if expr then stmt else stmt endif We can use left factoring to fix the problem stmt if expr then stmt opt_elseopt_else else stmt endif|endif

Left Recursion When a production for nonterminal A starts with aself reference then a predictive parser loops forever A A  |  |  We can eliminate left recursive productions by systematicallyrewriting the grammar using right recursive productions A  R|  RR  R| 

A Translator for Simple Expressions expr expr+termexpr  expr-termexpr  termterm  0term  1…term  9 { print(“+”) }{ print(“-”) }{ print(“0”) }{ print(“1”) }…{ print(“9”) } After left recursion elimination: expr term rest rest  +term { print(“+”) } rest | -term { print(“-”) } rest | term  0 { print(“0”) }term  1 { print(“1”) }…term  9 { print(“9”) }

Thank you 

Simple Syntax-directed Translator for Programming Languages

Simple Syntax-directed Translator for Programming Languages

Presentation Transcript

English Syntax

Syntax V

Syntax-Directed Translation

INTERNAL Commands

Bayesian networks

Lesson 6

Chapter 2 A Simple Syntax-Directed Translator

UNIT – 5 SYNTAX-DIRECTED TRANSLATION

SYNTAX DIRECTED TRANSLATION

Present Simple

Syntax Directed Definitions Synthesized Attributes

Syntax Part 1

Syntax-directed Transformations of XML Streams

CPSC 388 – Compiler Design and Construction

Bayesian networks

Chapter 5

Chapter 3 Syntax Part 1

Syntax Directed Translation

Syntax-Directed Translation

Syntax and Messages

Lecture 10

Chapter 5 Syntax