Télécharger la présentation
## Chapter 3

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Chapter 3**Describing Syntax and Semantics**Syntax and Semantics**• Syntax of a programming language: the form of its expressions, statements, and program units • Semantics: the meaning of the expressions, statements, and program units • Syntax and semantics provide a language’s definition**An Example of Syntax and Semantic**• The syntax of a Java while statement: while (<boolean_expr>) <statement> • The semantic of the above statement: when the current value of the Boolean expression is true, the embedded statement is executed; otherwise, control continues after the while construct. Then control implicitly returns to the Boolean expression to repeat the process**Description Easiness**• Describing syntax is easier than describing semantics • partly because a concise and universally accepted notation is available for syntax description • but none has yet been developed for semantics**Definitions of Languages and Sentences**• A language, whether natural (such as English) or artificial (such as Java), is a set of strings of characters from some alphabet. • The strings of a language are called sentences or statements.**Syntax Rules of a Language**• The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language.**Definitions of Lexemes**• The lexeme of a programming language include its • numeric literals • operators • and special words, among others. • (e.g., *, sum, begin) • Alexemeis the lowest level syntactic unit of a language: • for simplicity’s sake, formal descriptions of the syntax of programming languages often do not include description of them • However the description of lexemes can be given by a lexical specification which is usually separate from the syntactic description of the language.**An Informal Definition of a Program**• A program could be thought as strings of lexemes rather than of characters.**Lexeme Group**• Lexemes are partitioned into groups. • For example, in a programming language, the names of variables, methods, classes, and so forth form a group called identifier. • Each of these groups is represented by a name, or token.**Definitions of Tokens**• A tokenis a category of lexemes (e.g., identifier) • a token may have only a single lexeme • For example, the token for the arithmetic operator symbol +, which may have the name plus_op, has just one possible lexeme. • a lexeme of a token can also be called an instance of the token**Example of Tokens**• Identifier is a token that can have lexemes, or instances, such as sum and total.**Lexemes and Tokens [csusb], [Lee]**• Lexemes: a string of characters in a language that is treated as a single unit in the syntax and semantics. • For example identifiers, numbers, and operators are often lexemes • In a programming language, there are a very large number of lexemes, perhaps even an infinite number; however, there are only a small number of tokens.**Examples of Lexemes and Tokens [Lee]**• while (y >= t) y = y - 3 ; will be represented by the set of pairs: a token with only one lexeme**Ways to Define a Language**• In general, languages can be formally defined in two distinct ways: by recognition and by generation. • P.S.: Although neither provides a definition that is practical by itself for people trying to lean or even use a programming language.**Language Recognizer**• a recognition device that • reads strings of characters from the alphabet of a language and • decides whether an input string was or was not in the language • Example: syntax analysis part of a compiler is a recognizer for the language the compiler translates • are not used to enumerate all of the sentences of a language**Language Generators**• a device that generates the sentences of a language. • We can think of the generator as having a button that produces a sentence of the language every time it is pushed • However, the particular sentence that is produced by a generator when its button is pushed is unpredictable • One can determine if the syntax of a particular sentence is correct by comparing it to the structure of the generator • People learn a language from examples of its sentences**Grammars [wiki]**• In computer science and linguistics, a formal grammar, or sometimes simply grammar, is a precise description of a formal language — that is, of a set of strings. • Commonly used to describe the syntax of programming languages**Formal Grammars [wiki]**• A formal grammar, or sometimes simply grammar, consists of: • a finite set of terminal symbols; • a finite set of nonterminal symbols; • a finite set of production rules with a left- and a right-hand side consisting of a sequence of these symbols • a start symbol.**Grammar Example [wiki]**• Nonterminals are usually represented by uppercase letters • terminals by lowercase letters • the start symbol by S. • For example, the grammar with • terminals {a,b}, • nonterminals {S,A,B}, • production rules • SABS • S ε (where ε is the empty string) • BAAB • BS b • Bb bb • Abab • Aa aa • start symbol S, • defines the language of all words of the form anbn (i.e. n copies of a followed by n copies of b).**Formal Grammars and Formal Languages [wiki]**• A formal grammar defines (or generates) a formal language, which is a (possibly infinite) set of sequences of symbols that may be constructed by applying production rules to a sequence of symbols which initially contains just the start symbol.**Grammar Classes [wiki]**• In the mid-1950s, according to the format of the production rules, Chomsky described four classes of generative devices or grammars that define four classes of languages: • recursively enumerable • context-sensitive • context-free • regular**Application of Context-free and Regular Grammars**• The tokens of programming languages can be described by regular grammars. • Whole programming languages, with minor exceptions, can be described by context-free grammars.**Review of Context-Free Grammars**• Context-Free Grammars • Developed by Noam Chomsky in the mid-1950s • Language generators, meant to describe the syntax of natural languages • Define a class of languages called context-free languages**Terminology - metalanguage**• A metalanguage is a language that is used to describe another language.**Backus-Naur Form (BNF)**• Backus-Naur Form (1959) • Invented by John Backus to describe Algol 58 • Most widely known method for describing programming language syntax • BNF is equivalent to context-free grammars • BNF is a metalanguage used to describe another language**Extended BNF**• Extended BNF • Improves readability and writability of BNF**BNF Abstraction**• BNF uses abstractions for syntactic structures • For example, a simple Javaassignment statement might be represented by the abstraction <assign>. • (pointed brackets are often used to delimit names of abstractions) • In fact, tokens are also abstractions.**Synonym**Nonterminals symbols (nonterminals) BNF abstractions Terminals Lexemes and tokens**BNF Rules**• A rule, also called a production, has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and nonterminal symbols. • An abstraction is defined by a rule. • Examples of BNF rules: <ident_list> → identifier | identifier,<ident_list> <if_stmt> → if <logic_expr> then <stmt> • A grammar is a finite nonempty set of rules.**Multiple Definitions of a Nonterminal**• Nonterminal symbols can have two or more distinct definitions, representing two or more possible syntactic forms in the language. • Multiple definitions can be written as a single rule, with different definitions separated by the symbol |.**Example of Multiple Definitions of a Nonterminal**• For example, an Adaif statement can be described with the rules: <if_stmt> if <logic_expr> then <stmt> <if_stmt> if <logic_expr> then <stmt> else <stmt> Or with the rule <if_stmt> if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>**A Rule Example**<assign> <var> = <expression> • The above rule specifies that the abstraction <assign> is defined as an instance of the of the abstraction <var>, followed by the lexeme =, followed by an instance of the abstraction <expression>. • One example whose syntactic structure is described by the rule is: total = subtotal1 + subtotal2**Expressive Capability of BNF**• Although BNF is simple, it is sufficiently powerful to describe nearly all of the syntax of programming languages. • BNF can describe • Lists of similar construct • The order in which different constructs must appear • Nested structures to any depth • Imply operator precedence • Imply operator associativity**Variable-length Lists and Recursive Rules**• Variable-length lists in mathematics are often written using an ellipsis (…). • 1,2, … is an example. • BNF does not include the ellipsis, it uses recursive rules as an alternative. • A rule is recursive if its LHS appears in its RHS.**Example of a Recursive Rule**• Syntactic lists are described using recursion <ident_list> ident | ident, <ident_list> the above rule defines <ident_list> as either a single token (identifier) or an identifier followed by a comma followed by another instance of <ident_list>.**Grammar and Derivation**• A grammar is a generative device for defining languages. • The sentences of a languages are created through deviations. • A derivation is a repeated application of rules, starting with a special nonterminal of the grammar called the start symbol and ending with a sentence (all terminal symbols)**An Example of Start Symbols**• In a grammar for a complete programming language, the start symbol represents a complete program and is usually named <program>.**Example 3.1 - An Example Grammar**• What follows is a grammar for a small language. P.S.:<program> is the start symbol.**Language Defined by the Grammar in Example 3.1**• The language in Example 3.1 has only one statement form: assignment. • A program consists of the special word begin, followed by a list of statements separated by semicolons, followed by the special word end. • An expression is either a single variable, or two variables separated by either a + or – operator. • The only variable names in this language are A, B, and C.**An Example Derivation**<program>=> begin <stmt_list> end => begin <stmt> ; <stmt_list> end => begin <var> = <expression>; <stmt_list> end => begin A = <expression> ; <stmt_list> end => begin A = <var> + < var > ; <stmt_list> end => begin A = B + <var> ; <stmt_list> end => begin A = B + C ; <stmtjist> end => begin A = B + C ; <stmt> end => begin A = B + C ; <var> := <expression> end => begin A=B + C;B=<expression> end => begin A = B + C ;B= <var> end => begin A = B + C ; B = C end derive**Sentential Form**• In the previous slide, each successive string in the sequence is derived from the previous string by replacing one of the nonterminals with one of that nonterminal’s definitions. • Every string of symbols in the derivation is a sentential form.**Order of Derivation**• A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. • The derivation continues until the sentential form contains no nonterminals. • The sentential form, consisting of only terminals, or lexemes, is the generated sentence. • Rightmost derivation • A derivation may be neither leftmost nor rightmost. • However the derivation order has no effect on the language generated by a grammar.**Using the Grammar in Example 3.1 to Generate Different**Sentences • By choosing alternative RHSs of rules with which to replace nonterminals in the derivation, different sentences in the language can be generated. • By exhaustively choosing all combinations of choices, the entire language can be generated. • This language, like most others, is infinite, so one cannot generate all the sentences in the language in finite time.**Example 3.2**• This grammar describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parentheses.**The Leftmost Derivation for A=B*(A+C)**<assign> => <id> = <expr> => A = <expr> => A = <id> * <expr> => A = B * <expr> => A = B * (< expr >) => A = B * (<id> + <expr>) => A = B * (A + <expr>) => A = B * (A + <id>) => A = B * (A + C)**Parse Trees**• One of the most attractive features of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the languages they define. • These hierarchical structures are called parse trees.**Figure 3.1 - a Parse Tree for A=B*(A+C)**• The right side figure is a parse tree. • This parse tree shows the structure of the assignment statement derived previously.**A Derivation and Its Corresponding Parse Tree**<assign> => <id> = <expr> => A = <expr> => A=<id>*<expr> => A = B * <expr> => A = B * (< expr >) => A = B* ( <id> + <expr> ) => A = B* (A+<expr> ) => A = B*<A+<id> ) => A=B*(A+C)**Mapping between a Grammar and Its Corresponding Parse Tree**• Every internal node of a parse tree is labeled with a nonterminal symbol. • Every leaf is labeled with a terminal symbol. • Every subtree of a parse tree describes one instance of an abstraction in the statement.**Ambiguity in Grammars**• A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees.