Understanding Context-Free Grammar: Examples and Derivations

Note • As usual, these notes are based on the Sebesta text. • The tree diagrams in these slides are from the lecture slides provided in the instructor resources for the text, and were made by David Garrett.

Context-free (CF) grammar • A CF grammar is formally presented as a 4-tuple G=(T,NT,P,S), where: • T is a set of terminal symbols (the alphabet) • NT is a set of non-terminal symbols • P is a set of productions (or rules), where PNT(TNT)* • SNT

Example 1 L1 = { 0, 00, 1, 11 } G1 = ( {0,1}, {S}, { S0, S00, S1, S11 }, S )

Example 2 L2 = { the dog chased the dog, the dog chased a dog, a dog chased the dog, a dog chased a dog, the dog chased the cat, … } G2 = ( { a, the, dog, cat, chased }, { S, NP, VP, Det, N, V }, { S  NP VP, NP  Det N, Det  a | the, N  dog | cat, VP  V | VP NP, V  chased }, S ) Notes: S = Sentence, NP = Noun Phrase , N = Noun VP = Verb Phrase, V = Verb, Det = Determiner

Examples of lexemes and tokens

BNF Fundamentals • Sample rules [p. 128] <assign> → <var> = <expression> <if_stmt> → if <logic_expr> then <stmt> <if_stmt> → if <logic_expr> then <stmt> else <stmt> • non-terminals/tokens surrounded by < and > • lexemes are not surrounded by < and > • keywords in language are in bold • → separates LHS from RHS • | expresses alternative expansions for LHS <if_stmt> → if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt> • = is in this example a lexeme

BNF Rules • A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and nonterminal symbols • A grammar is often given simply as a set of rules (terminal and non-terminal sets are implicit in rules, as is start symbol)

Describing Lists • There are many situations in which a programming language allows a list of items (e.g. parameter list, argument list). • Such a list can typically be as short as empty or consisting of one item. • Such lists are typically not bounded. • How is their structure described?

Describing lists • The are described using recursive rules. • Here is a pair of rules describing a list of identifiers, whose minimum length is one: <ident_list>  ident | ident, <ident_list> • Notice that ‘,’ is part of the object language

Example 3 L3 = { 0, 1, 00, 11, 000, 111, 0000, 1111, … } G3 = ( {0,1}, {S,ZeroList,OneList}, { S  ZeroList | OneList, ZeroList  0 | 0 ZeroList, OneList  1 | 1 OneList }, S )

Derivation of sentences from a grammar • A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols).

Example: derivation from G2 • Example: derivation of the dog chased a cat S  NP VP  Det N VP  the N VP  the dog VP  the dog V NP  the dog chased NP  the dog chased Det N  the dog chased a N  the dog chased a cat

Example: derivations from G3 • Example: derivation of 0 0 0 0 S  ZeroList  0 ZeroList  0 0 ZeroList  0 0 0 ZeroList  0 0 0 0 • Example: derivation of 1 1 1 S  OneList  1 OneList  1 1 OneList  1 1 1

Observations about derivations • Every string of symbols in the derivation is a sentential form. • A sentence is a sentential form that has only terminal symbols. • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded. • A derivation can be leftmost, rightmost, or neither.

A leftmost derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Parse tree • A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b

Parse trees and compilation • A compiler builds a parse tree for a program (or for different parts of a program). • If the compiler cannot build a well-formed parse tree from a given input, it reports a compilation error. • The parse tree serves as the basis for semantic interpretation/translation of the program.

Extended BNF • Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] • Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term>(+|-) const • Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit}

Comparison of BNF and EBNF • sample grammar fragment expressed in BNF <expr>  <expr> + <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> • same grammar fragment expressed in EBNF <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>}

Ambiguity in grammars • A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees

An ambiguous grammar for arithmetic expressions <expr>  <expr> <op> <expr> | const <op>  / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const

Disambiguating the grammar • If we use the parse tree to indicate precedence levels of the operators, we can remove the ambiguity. • The following rules give / a higher precedence than - <expr>  <expr> - <term> | <term> <term>  <term> / const| const <expr> <expr> - <term> <term> <term> / const const const

Links to BNF-style grammars for actual programming languages Below are some links to grammars for real programming languages. Look at how the grammars are expressed. • http://www.schemers.org/Documents/Standards/R5RS/ • http://www.sics.se/isl/sicstuswww/site/documentation.html In the ones listed below, find the parts of the grammar that deal with operator precedence. • http://java.sun.com/docs/books/jls/index.html • http://www.lykkenborg.no/java/grammar/JLS3.html • http://www.enseignement.polytechnique.fr/profs/informatique/Jean-Jacques.Levy/poly/mainB/node23.html • http://www.lrz-muenchen.de/~bernhard/Pascal-EBNF.html

Associativity of operators • When multiple operators appear in an expression, we need to know how to interpret the expression. • Some operators (e.g. +) are associative, meaning that the meaning of an expression with multiple instances of the operator is the same no matter how it is interpreted: (a+b)+c = a+(b+c) • Some operators (e.g. -) are not associative: (a-b)-c  a-(b-c) e.g. try a=10, b=8, c=6 (10-8)-6 = -4 but 10-(8-6)=8

Associativity of Operators • Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | const (ambiguous) <expr> -> <expr> + const | const (unambiguous) <expr> <expr> <expr> + const <expr> + const const

Links to BNF-style grammars for actual programming languages Below are some links to grammars for real programming languages. Look at how the grammars are expressed. • http://www.schemers.org/Documents/Standards/R5RS/ • http://www.sics.se/isl/sicstuswww/site/documentation.html In the ones listed below, find the parts of the grammar that deal with operator associativity. • http://java.sun.com/docs/books/jls/index.html • http://www.lykkenborg.no/java/grammar/JLS3.html • http://www.enseignement.polytechnique.fr/profs/informatique/Jean-Jacques.Levy/poly/mainB/node23.html • http://www.lrz-muenchen.de/~bernhard/Pascal-EBNF.html

Understanding Context-Free Grammar: Examples and Derivations

Understanding Context-Free Grammar: Examples and Derivations

Presentation Transcript

Note …

NOTE:

NOTE:

NOTE

*Note

Note!

Note

NOTE:

NOTE

NOTE

Note

Note:

Note:

Note

Note

Note

Note

Note

Note

NOTE:

Note