Chapter 3: Describing Syntax and Semantics

Chapter 3: Describing Syntax and Semantics • Introduction • Terminology • Formal Methods of Describing Syntax • Attribute Grammars – Static Semantics • Describing the Meanings of Programs: Dynamic Semantics

Introduction • Syntax: the form or structure of the expressions, statements, and program units, e.g., DD/DD/ DDDD • lexical specification • grammar • Semantics: the meaning of the expressions, statements, and program units, e.g., 先月後日 • Syntax and semantics provide a language’s definition • Users of a language definition • Other language designers • Implementers • Programmers (the users of the language)

Terminology • A sentence is a string of characters over some alphabet • A language is a set of sentences • A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, x), given by the lexical specification • A token is a category of lexemes (e.g., identifier)

Formal Methods of Describing Syntax • Context-Free Grammars • Developed by Noam Chomsky in the mid-1950s • meant to describe the syntax of natural languages • Define a class of languages called context-free languages • Backus-Naur Form (1959) • Invented by John Backus to describe Algol 58 • BNF is equivalent to context-free grammars • The Most widely known metalanguage, which is used to describe another language • Extended BNF • Improves readability and writability of BNF

Four parts of a Context-Free Grammar • a set of terminals: lexemes and tokens, the atomic symbols in the language, • a set of nonterminals: abstractions, used to represent constructs in the language; they act like syntactic variables • a set of rules (or called productions): • identifying the components of a construct • A rule has a nonterminal as the left-hand side (LHS), and the right-hand side (RHS) may consist of terminal and nonterminal symbols • Examples of a BNF rule: <if_stmt> → if <logic_expr> then <stmt> • A nonterminal chosen as the starting nonterminal.

BNF Rules • Nonterminals are enclosed between symbols “ < ” and “ > ”. • An abstraction (or nonterminal symbol) can have more than one RHS. Each alternative separated by “|” is a distinct rule. • “ ” is read as “can be”. “|” is read as “or”. • Example: <stmt>  <single_stmt> | begin <stmt_list> end • It sometimes uses subscripts, like [1],on the right side to distinguish between occurrences of a construct.

Describing Lists • Syntactic lists are described using recursion <ident_list>  ident | ident, <ident_list> • Example: BNF rules for real numbers <real-number>  <integer-part > . <fraction> <integer-part>  <digit> | <integer-part><digit> <fraction>  <digit> |<digit><fraction> <digit>  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Derivation • A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols) • Every string of symbols in the derivation is a sentential form, which may consist of nonterminals. • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded • A derivation may be neither leftmost nor rightmost

Example • An Example Grammar <program>  <stmts> <stmts>  <stmt> | <stmt> ; <stmts> <stmt>  <var> = <expr> <var>  a | b | c | d <expr>  <term> + <term> | <term> - <term> <term>  <var> | const • Example of deviation: <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Parse Tree • A hierarchical representation of a derivation <program> <program> => <stmts> => <stmt> => <var> = <expr> =>a =<expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b

Parse Tree (cont) • Each leaf is labeled with a terminal. • Each non leaf node is labeled with a nonterminal. • The label of a non leaf node is the left side of some rule, and the labels of the children of the node, from left to right, form the right side of that production. • The root is labeled with the starting nonterminal. • A parse tree generates the sentence formed by reading the terminals at its leaves from left to right. • The construction of a parse tree is called parsing.

Parser • top - down parser : from the root of a parse tree toward the leaves; • bottom - up parser: from leaves of a parse tree toward the root;

Ambiguity in Grammars • A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees • Ambiguity can be resolved by establishing conventions. • Example: dangling-else ambiguity Consider the following grammar: <S>  if <E> then <S> <S>  if <E> then <S> else <S> Consider the sentential form: if E1 then if E2 then S1 else S2

Dangling-else ambiguity (a) corresponds to: if E1 then (if E2 then S1 else S2) (b) corresponds to: if E1 then (if E2 then S1) else S2It is resolved by matching an else with the nearest unmatched if.

Expression notations • prefix notation: op E1 E2, e.g., * + 20 30 60 = * 50 60 = 3000; easy to decode during a left-to-right scan of an expression. • postfix notation: E1 E2 op, e.g., 20 30 + 60 *=50 60 *=3000; can be mechanically evaluated with a stack data structure.

Expression notations(cont) • infix notation: E1 op E2 • familiar and easy to read; • without rules for specifying the relative “precedence” of operators, parentheses would be needed in expressions to make explicit the operands of an infix operator, e.g., a+b*c  a+(b*c). • An operator is “ left associate” if subexpressions containing multiple occurrences of the operator are grouped from left to right, • e.g. 4-2-1  (4-2)-1 • An operator is “ right associate” if subexpressions containing multiple occurrences of the operator are grouped from right to left, • e.g. x=y=3  x=(y=3)

An Ambiguous Expression Grammar <expr>  <expr> <op> <expr> | const <op>  / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const

An Unambiguous Expression Grammar • If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr>  <expr> - <term> | <term> <term>  <term> / const| const <expr> <expr> - <term> <term> <term> / const const const

Associativity of Operators • Operator associativity can also be indicated by a grammar • <expr> -> <expr> + <expr> | const (ambiguous) • <expr> -> <expr> + const | const (unambiguous) <expr> <expr> <expr> + const <expr> + const const

L R - - L number 1 R number 4 - - L R number 2 number 2 number 4 number 1 Associativity of Operators (cont) (1) <L> -> <L> + number (2) <R> -> number + <R> | <L> - number | number - <R> | number | number Although both grammars are unambiguous, (1) is more suitable for left associate operators, because its parse tree grows down and to the left, which is close to the semantics.

Handling Associativity and Precedence • The syntax of expressions in a language can be characterized by a table giving the associativity and precedence of operators. • Suppose we have a table, where all operators on the same line have the same associativity and precedence. (see the next page) • A grammar for expressions can be designed by choosing a nonterminal for each precedence level, and an additional nonterminal for the smallest subexpression (factors).

Extended BNF • Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] • Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term>(+|-) const • Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit}

BNF and EBNF • BNF <expr>  <expr> + <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> • EBNF <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>}

Attribute Grammars • Context-free grammars (CFGs) cannot describe all of the syntax of programming languages. For example, all variables must be declared before they are referenced. • attribute grammars (AGs): additions to CFGs to carry some static semantic information along parse trees • Static semantics are related to the legal form of a program, not directly related to the meaning of programs during execution. Many static semantic rules state the type constraints of a language. • Primary value of attribute grammars (AGs) • Static semantics specification • Compiler design (static semantics checking)

Attribute Grammars : Definition • An attribute grammar is a context-free grammar with the following additions: • For each grammar symbol x there is a set A(x) of attribute values • Each rule has a set of semantic functions that define certain attributes of the nonterminals in the rule • Each rule has a (possibly empty) set of predicates to check for attribute consistency

Attribute Grammars: Definition (cont) • Let X0 X1 ... Xn be a rule • Functions of the form S(X0) = f(A(X1), ... ,A(Xn)) define synthesized attributes, which are used to pass semantic information up a parse tree. • Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for 1<= j <= n, define inherited attributes, which pass semantic information down and across a tree. • Initially, there are intrinsic attributes on the leaves, whose values are determined outside the parse tree. For example, the types of variables come from the symbol table.

Attribute Grammars: An Example • Syntax <assign> -> <var> = <expr> <expr> -> <var> + <var> | <var> <var> -> A | B | C • Attributes • actual_type: synthesized for <var> and <expr> • expected_type: inherited for <expr> • We assume the variables can be one of two types: int or real. • In the next page, the look-up function looks up a given variable name in the symbol table and returns the type.

Example of an Attribute Grammar (cont) • Syntax rule: <assign>  <var> = <expr> Semantic rules: <expr>.expected_type  <var>.actual_type 2. Syntax rule: <expr>  <var>[2] + <var>[3] Semantic rules: <expr>.actual_type  if (<var>[2].actual_type == int) and (<var>[3].actual_type == int) then int else real end if Predicate: <expr>.actual_type == <expr>.expected_type 3. Syntax rule: <expr>  <var> Semantic rules: <expr>.actual_type  <var>.actual_type Predicate: <expr>.actual_type == <expr>.expected_type 4. Syntax rule: <var>  A | B | C Semantic rule: <var>.actual_type  lookup (<var>.string)

Computing Attribute Values • How are attribute values computed? • If all attributes were inherited, the tree could be decorated in top-down order. • If all attributes were synthesized, the tree could be decorated in bottom-up order. • In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used.

Example of Computing Attribute Values • For the sentence: A = A + B • <var>.actual_type  look-up(A) (Rule4) • <expr>.expected_type  <var>.actual_type (Rule1) • <var>[2].actual_type  lookup (A) (Rule4) <var>[3].actual_type  lookup (B) (Rule4) 4. <expr>.actual_type  either int or real (Rule2) 5. <expr>.expected_type == <expr>.actual_type is either TRUE or FALSE (Rule2)

Example of Computing Attribute Values (cont)

Semantics • There is no single widely acceptable notation or formalism for describing semantics • Axiomatic Semantics • Based on formal logic (predicate calculus) • Axioms or inference rules are defined for each statement type in the language, to state the meaning of statements and programs. • The main purpose is for formal program verification. We will talk about this in Chapter 8.

Chapter 3: Describing Syntax and Semantics