480 likes | 642 Vues
Programming Languages 2nd edition Tucker and Noonan. Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth. Contents. 2.1 Grammars 2.1.1 Backus-Naur Form 2.1.2 Derivations 2.1.3 Parse Trees
E N D
Programming Languages2nd editionTucker and Noonan Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth
Contents 2.1 Grammars 2.1.1 Backus-Naur Form 2.1.2 Derivations 2.1.3 Parse Trees 2.1.4 Associativity and Precedence 2.1.5 Ambiguous Grammars 2.2 Extended BNF 2.3 Syntax of a Small Language: Clite 2.3.1 Lexical Syntax 2.3.2 Concrete Syntax 2.4 Compilers and Interpreters 2.5 Linking Syntax and Semantics 2.5.1 Abstract Syntax 2.5.2 Abstract Syntax Trees 2.5.3 Abstract Syntax of Clite
Grammar G1:Review Expr Expr + Term | Expr – Term | Term Term Term * Factor | Term / Factor | Term % Factor | Factor Factor Primary ** Factor | Primary Primary 0| ... | 9|( Expr ) Red indicates a terminal of G1, blue indicates meta-symbols (symbols that aren’t part of the language but are used to describe the language) Example: 10 * ( 5 – 3)
2.3 Syntax of a Small Language: Clite • Motivation for using a subset of C: Grammar Language (pages) Reference Pascal 5 Jensen & Wirth C 6 Kernighan & Richie C++ 22 Stroustrup Java 14 Gosling, et. al. • The Clite grammar fits on one page (next 3 slides), so it’s a far better tool for studying language design.
CliteEBNF Grammar: StatementsFig. 2.7 Program int main ( ){ Declarations Statements } Declarations {Declaration } Declaration Type Identifier [[Integer ]]{ , Identifier [[Integer ]] }; Type int | bool | float | char Statements {Statement } Statement ;|Block | Assignment |IfStatement|WhileStatement Block {Statements } Assignment Identifier [[Expression ]]=Expression ; IfStatement if(Expression ) Statement [elseStatement ] WhileStatement while(Expression ) Statement
Fig. 2.7Clite Grammar: Expressions Expression Conjunction {||Conjunction } Conjunction Equality {&&Equality } Equality Relation [EquOp Relation ] EquOp== | != Relation Addition [RelOp Addition ] RelOp < | <= | > | >= Addition Term {AddOp Term } AddOp+ | - Term Factor {MulOp Factor } MulOp * | / | % Factor [UnaryOp]Primary UnaryOp - | ! Primary Identifier [[Expression ]] | Literal | (Expression ) | Type (Expression )
Clite grammar: lexical level Identifier Letter { Letter | Digit } Letter a | b | ... | z | A | B | ... | Z Digit 0 | 1 | ... | 9 Literal Integer | Boolean | Float | Char Integer Digit { Digit } Boolean true | false Float Integer.Integer Char ‘ASCII Char‘ (ASCII Char is the set of ASCII characters)
Clite Grammar - Discussion • 13 grammar rules – compare to 4 pages, for C++ • Metabraces { }(0 or more) is interpreted to mean left associativity; e.g., • Addition → Term { AddOp Term }AddOp → + | - • Metabrackets [ ] (optional) means an addition can only be followed by one or no relational operators plus another addition. • Relation Addition [RelOp Addition ] • RelOp < | <= | > | >= • (no a > b > c for example)
Issues Not Addressed by this Grammar • Comments • The significance of whitespace • Distinguishing one token <= from two tokens < = • Distinguishing identifiers from keywords like if
Grammar Levels • The Clite grammar has two levels • lexical level (described by lexical syntax) • syntactic level (described by concrete syntax) • They correspond to two separate parts of a compiler. • The issues on the previous slide are lexical issues.
2.3.1 Lexical Syntax & Analysis • Examples of lexical entities (tokens): • Identifiers e.g., numbr1, X • Literals e.g., 123, 'x', 3.25, true • Keywords bool char else false float if int main true while • Operators = || && == != < <= > >= + - * / ! % • Punctuation ; , { } ( ) • Char e.g.,‘?’
Whitespace • Whitespace is any space, tab, end-of-line character (or characters), or character sequence inside a comment • No token may contain embedded whitespace • (unless it is a character or string literal) • Example: >= one token > = two tokens
The Role of Whitespace • while ( a <= b) legal - spacing between tokens • while(a<=b) also legal - spacing not needed • while (a < = b)no lexical errors butillegal syntactically – lexer would identify tokens while, (, a, <, =, b, )
Comments • Clite uses // comment style of C++ • Not defined in Clite grammar (but could be) • Instead, it’s defined outside the grammar • The use of whitespace to differentiate between one and two character operators is also defined outside the grammar.
Clite Identifiers • Sequence of letters and digits, starting with a letter • “if”is an identifier which also is a keyword • Keywords versus reserved words: • Keyword: predefined by the language • Reserved word: can only be used as defined. • In most languages all keywords are also reserved, but in a few; e.g., Pascal, a subset of the keyword identifiers are predefined but not reserved (and can be redefined by the programmer). Implications? Flexibility, confusion …
2.3.2 Concrete Syntax • Concrete syntax of a language is the set of rules for writing correct programs • The structure of a specific program can be represented by a parse tree, based on the concrete syntax of the language, usingthe stream of Tokens identified during lexical analysis • The root of the parse tree is the Start Symbol of the language (Program, in Clite).
Clite Grammar - Discussion • Clite’s expression rulesare non-ambiguous with respect to precedence and associativity • Rule ordering defines precedence; rule format defines associativity. • C expression grammar definition is ambiguous – precedence and associativity are specified separately.
Associativity and Precedence Clite Operator Associativity Unary - ! none * /left + - left < <= > >= none (i.e., no a < b < c) == != none && left || left
Clite Relational Operators • … are non-associative.(an idea borrowed from Ada) • Why is this important? In C & C++, the expression: if (a < x < b) is not equivalent to if (a < x && x < b) But it is error-free! So, what does it mean?
Syntax versus Semantics • Grammar rules don’t specify the operand types to be used with various operators; e.g., is true + 13 a legal expression? What is the type of the expression 123.78 + 37 ? • These are type and semantic issues, not lexical or syntax.
2.4 Compilers and InterpretersFigure 2.8: Major Stages in the Compilation Process Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens Source Program Machine Code Lexical Analyzer (lexer) Syntactic Analyzer (parser) Code Optimizer Code Generator Semantic Analyzer
Lexer • Input: characters (the program) • Output: tokens & token type • Lexical grammars are simpler than syntax grammars • Often generated automatically by lexical analyzer generating programs
Parser (Syntax Analyzer) • Based on BNF/EBNF grammar • Input: tokens • Output: abstract syntax tree or some other representation of the program • Abstract syntax: similar to a concrete parse tree but with punctuation, many nonterminals discarded
Semantic Analysis/Type Checker • Typical tasks: • Check that all identifiers are declared • Perform type checking for expressions, assignments, … • Insert implied conversion operators (i.e., make them explicit) • Context free grammars can’t express the semantic rules that are needed for this phase of translation. • Output: Intermediate code (IC) tree, modified abstract syntax tree representation.
Code Optimization • Purpose: Improve the run-time performance of the object code • Usually, to make it run faster • Other possibilities: reduce amount of storage required • Drawback: optimization is time-consuming; slows down debugging • Output: Intermediate code, similar to abstract syntax notation; closer to machine code
Code Optimization - Techniques • Evaluate constant expressions at compile-time • In-line expansion (of function calls) • Loop unrolling • Reorder code to improve cache performance • Eliminate common sub-expressions • Eliminate unnecessary code • Store local variables/intermediate results in registers rather than on the stack or elsewhere in memory
Code Generation • Output: machine code • Instruction selection, register management • “Peephole” optimization: look at a small segment of machine code, make it more efficient; e.g., • x = y; →→ load y, R0 store R0, xz = x * 2; load x, R0 // redundant codemul R0, #2
Interpreter • Replaces last 2 phases of a compiler with direct execution • Input: • Mixed: generates & uses intermediate code/abstract syntax • Pure: start from stream of ASCII characters each time a statement is executed • Mixed interpreters • Java, Perl, Python, Haskell, Scheme • Pure interpreters: • most Basics, shell commands
Interpreter versus Compiler • Source code: a = x + y; • Compiler-generated object code: load r0, x; add r0, y; store r0, a; • Will be executed later, with remainder of program • Interpreter: Call an interpretive routine to actually perform the operation. e.g., add(x, y, a);
Stages of Translation Are Interleaved • It’s not the case that the lexical analyzer identifies all the tokens, and then the parser analyzes all the tokens, and then the type/semantic analysis is performed. • Instead, parser repeatedly contacts lexer to get another token • As tokens are received they either do or don’t match the expected syntax • If there’s a match, perform any type or semantic testing, possibly generate int. code, call for another token.
2.5 Linking Syntax and Semantics • Output of parser: the concrete parse tree is large – probably more than needed for next phase • The compiler usually produces some more compact representation of the program • Example: Fig. 2.9 (page 46)
Finding a More Efficient Tree • The shape of the parse tree reveals the meaning of the program. • So as output of syntax analysis we want a tree that removes its inefficiency and keeps its meaning. • Remove separator/punctuation terminal symbols • Remove all trivial nonterminals • Replace remaining nonterminals with leaf terminals • Example: Fig. 2.10
Abstract Syntax Removes unnecessary details but keeps the essential language elements; e.g., consider the following two equivalent loops: PascalC++ while j < n do begin while (j < n) { j := j + 1; j = j + 1; end; } Essential information: 1) it is a loop, 2) its terminating condition is j >= n, and 3) its body increments the current value of j.
Abstract Syntax • Purpose: an intermediate form of the source code • Generated by the parser during syntax analysis • Used during type checking/semantic analysis • Abstract syntax is defined by the compiler (or interpreter). • One concrete syntax can have several abstract syntaxes associated with it, depending on the design of the translator.
General Form of Abstract Syntax Rules • LHS = RHS • LHS names an abstract syntax class • RHS either • (1) gives a list of one or more alternatives • or • (2) lists the essential elements of the syntax class
Assignment = Variable target; Expression source Expression = Variable| Value | Binary | Unary Variable = String id Value = Integer value Binary = Operator op; Expression term1, term2 Unary = UnaryOpop; Expression term Operator = +| - | * | / | ! Priority? Associativity? …. Simplified Abstract Assignment Syntax Figure 2.11
Compare Concrete to Abstract • Concrete:Assignment Identifier [[Expression ]]=Expression ; • Abstract:Assignment = Variable target; Expression source
Operator Variable Binary + x Operator Value Variable y 2 * Abstract Syntax Tree for z = x + 2 * y target source Binary z
Binaries and Unaries op term1 term2 Binary node op term Unary node Binaries and unaries represent information that can be used for later processing
Assignment = Variable target; Expression source Expression = VariableRef | Value | Binary | Unary VariableRef = Variable | ArrayRef Variable = String id ArrayRef = String id; Expression index Value = IntValue | BoolValue | FloatValue | CharValue Binary = Operator op; Expression term1, term2 Unary = UnaryOpop; Expression term Operator = ArithmeticOp | RelationalOp | BooleanOp IntValue = Integer intValue … Abstract Syntax of Clite Assignments– Fig. 2.14
abstract class Expression { } abstract class VariableRef extends Expression { } class Variable extends VariableRef { String id; } class ArrayRef extends VariableRef { String id; Expression index} class Value extends Expression { … } class Binary extends Expression { Operator op; Expression term1, term2; } class Unary extends Expression { UnaryOp op; Expression term; } Abstract Syntax as Java Classes - Examples
Remaining Abstract Syntax of Clite (Declarations and Statements)Fig 2.14
Summary • Lexical syntax: small, simple, defines language tokens • Concrete syntax: detailed, specific, defines correct programs, used to direct parsing algorithms(language specific, not implementation specific) • Abstract syntax: simpler than concrete, used to describe the structure of the intermediate code (implementation specific, not language specific) NOT INTENDED TO BE USED FOR PARSING • Semantics: program “meaning”, or runtime behavior
Summary • A syntax is ambiguous if a portion of a program has two or more possible interpretations (parse trees) • Non-ambiguous grammars can be written but some ambiguity may be tolerated to reduce grammar size. • Operator associativity and precedence can be defined by the concrete syntax • Compilers generate code for later execution while interpreters execute program statements as they are analyzed.