270 likes | 367 Vues
Explore Context-Free (CF) grammar with examples, BNF fundamentals, list descriptions, derivations, parse trees, and compilation in programming languages.
E N D
Note • As usual, these notes are based on the Sebesta text. • The tree diagrams in these slides are from the lecture slides provided in the instructor resources for the text, and were made by David Garrett.
Context-free (CF) grammar • A CF grammar is formally presented as a 4-tuple G=(T,NT,P,S), where: • T is a set of terminal symbols (the alphabet) • NT is a set of non-terminal symbols • P is a set of productions (or rules), where PNT(TNT)* • SNT
Example 1 L1 = { 0, 00, 1, 11 } G1 = ( {0,1}, {S}, { S0, S00, S1, S11 }, S )
Example 2 L2 = { the dog chased the dog, the dog chased a dog, a dog chased the dog, a dog chased a dog, the dog chased the cat, … } G2 = ( { a, the, dog, cat, chased }, { S, NP, VP, Det, N, V }, { S NP VP, NP Det N, Det a | the, N dog | cat, VP V | VP NP, V chased }, S ) Notes: S = Sentence, NP = Noun Phrase , N = Noun VP = Verb Phrase, V = Verb, Det = Determiner
BNF Fundamentals • Sample rules [p. 128] <assign> → <var> = <expression> <if_stmt> → if <logic_expr> then <stmt> <if_stmt> → if <logic_expr> then <stmt> else <stmt> • non-terminals/tokens surrounded by < and > • lexemes are not surrounded by < and > • keywords in language are in bold • → separates LHS from RHS • | expresses alternative expansions for LHS <if_stmt> → if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt> • = is in this example a lexeme
BNF Rules • A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and nonterminal symbols • A grammar is often given simply as a set of rules (terminal and non-terminal sets are implicit in rules, as is start symbol)
Describing Lists • There are many situations in which a programming language allows a list of items (e.g. parameter list, argument list). • Such a list can typically be as short as empty or consisting of one item. • Such lists are typically not bounded. • How is their structure described?
Describing lists • The are described using recursive rules. • Here is a pair of rules describing a list of identifiers, whose minimum length is one: <ident_list> ident | ident, <ident_list> • Notice that ‘,’ is part of the object language
Example 3 L3 = { 0, 1, 00, 11, 000, 111, 0000, 1111, … } G3 = ( {0,1}, {S,ZeroList,OneList}, { S ZeroList | OneList, ZeroList 0 | 0 ZeroList, OneList 1 | 1 OneList }, S )
Derivation of sentences from a grammar • A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols).
Example: derivation from G2 • Example: derivation of the dog chased a cat S NP VP Det N VP the N VP the dog VP the dog V NP the dog chased NP the dog chased Det N the dog chased a N the dog chased a cat
Example: derivations from G3 • Example: derivation of 0 0 0 0 S ZeroList 0 ZeroList 0 0 ZeroList 0 0 0 ZeroList 0 0 0 0 • Example: derivation of 1 1 1 S OneList 1 OneList 1 1 OneList 1 1 1
Observations about derivations • Every string of symbols in the derivation is a sentential form. • A sentence is a sentential form that has only terminal symbols. • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. • A derivation can be leftmost, rightmost, or neither.
An example grammar <program> <stmt-list> <stmt-list> <stmt> | <stmt> ; <stmt-list> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | const
A leftmost derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
Parse tree • A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b
Parse trees and compilation • A compiler builds a parse tree for a program (or for different parts of a program). • If the compiler cannot build a well-formed parse tree from a given input, it reports a compilation error. • The parse tree serves as the basis for semantic interpretation/translation of the program.
Extended BNF • Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] • Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term>(+|-) const • Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit}
Comparison of BNF and EBNF • sample grammar fragment expressed in BNF <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> • same grammar fragment expressed in EBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}
Ambiguity in grammars • A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees
An ambiguous grammar for arithmetic expressions <expr> <expr> <op> <expr> | const <op> / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const
Disambiguating the grammar • If we use the parse tree to indicate precedence levels of the operators, we can remove the ambiguity. • The following rules give / a higher precedence than - <expr> <expr> - <term> | <term> <term> <term> / const| const <expr> <expr> - <term> <term> <term> / const const const
Links to BNF-style grammars for actual programming languages Below are some links to grammars for real programming languages. Look at how the grammars are expressed. • http://www.schemers.org/Documents/Standards/R5RS/ • http://www.sics.se/isl/sicstuswww/site/documentation.html In the ones listed below, find the parts of the grammar that deal with operator precedence. • http://java.sun.com/docs/books/jls/index.html • http://www.lykkenborg.no/java/grammar/JLS3.html • http://www.enseignement.polytechnique.fr/profs/informatique/Jean-Jacques.Levy/poly/mainB/node23.html • http://www.lrz-muenchen.de/~bernhard/Pascal-EBNF.html
Associativity of operators • When multiple operators appear in an expression, we need to know how to interpret the expression. • Some operators (e.g. +) are associative, meaning that the meaning of an expression with multiple instances of the operator is the same no matter how it is interpreted: (a+b)+c = a+(b+c) • Some operators (e.g. -) are not associative: (a-b)-c a-(b-c) e.g. try a=10, b=8, c=6 (10-8)-6 = -4 but 10-(8-6)=8
Associativity of Operators • Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | const (ambiguous) <expr> -> <expr> + const | const (unambiguous) <expr> <expr> <expr> + const <expr> + const const
Links to BNF-style grammars for actual programming languages Below are some links to grammars for real programming languages. Look at how the grammars are expressed. • http://www.schemers.org/Documents/Standards/R5RS/ • http://www.sics.se/isl/sicstuswww/site/documentation.html In the ones listed below, find the parts of the grammar that deal with operator associativity. • http://java.sun.com/docs/books/jls/index.html • http://www.lykkenborg.no/java/grammar/JLS3.html • http://www.enseignement.polytechnique.fr/profs/informatique/Jean-Jacques.Levy/poly/mainB/node23.html • http://www.lrz-muenchen.de/~bernhard/Pascal-EBNF.html