180 likes | 395 Vues
Chapter 3 - 2. Construction Techniques. Section 3.3 Grammars. A grammar is a finite set of rules, called productions, that are used to describe the strings of a language.
 
                
                E N D
Chapter 3 - 2 Construction Techniques
Section 3.3 Grammars • A grammar is a finite set of rules, called productions, that are used to describe the strings of a language. • Notational Example. The productions take the form a→ ß, where a and ß are strings over an alphabet of terminals and nonterminals. Read a→ ß as, “a produces ß,” “a derives ß,” or “a is replaced by ß.” The following four expressions are productions for a grammar. S → aSB Alternative Short form S → Λ S → aSB | Λ B → bB B → bB | b B → b.
Terms • Terminals: {a, b}, the alphabet of the language. • Nonterminals: {S, B}, the grammar symbols (uppercase letters), disjoint from terminals. • Start symbol: S, a specified nonterminal alone on the left side of some production. • Sentential form: any string of terminals and/or nonterminals.
Derivation • a transformation of sentential forms by means of productions as follows: If xay is a sentential form and a→ß is a production, then the replacement of a by ß in xay to obtain xßy is a derivation step, which we denote by xay → xßy. • Example Derivation: S ➯ aSB ➯ aaSBB ➯ aaBB ➯ aabBB ➯ aabbB ➯ aabbb. This is a leftmost derivation, where each step replaces the leftmost nonterminal. • The symbol ➯+ means one or more steps and ➯* means zero or more steps. So we could write S ➯+ aabbb or S ➯* aabbb or aSB ➯* aSB, and so on.
The Language of a Grammar • The language of a grammar is the set of terminal strings derived from the start symbol. • Example. Can we find the language of the grammar S → aSB | Λ and B → bB | b? • Solution: Examine some derivations to see if a pattern emerges. S ➯ Λ S ➯ aSB ➯ aB ➯ ab S ➯ aSB ➯ aB ➯ abB ➯ abbB ➯ abbb S ➯ aSB ➯ aaSBB ➯ aaBB ➯ aabB ➯ aabb S ➯ aSB ➯ aaSBB ➯ aaBB ➯ aabBB ➯ aabbBB ➯ aabbbB ➯ aabbbb. So we have a pretty good idea that the language of the grammar is {anbn+k | n, k ∊ N}. • Quiz (1 minute). Describe the language of the grammar S → a | bcS. • Solution: {(bc)na | n ∊ N}.
Construction of Grammars • Example. Find a grammar for {anb | n ∊ N}. • Solution: We need to derive any string of a’s followed by b. The production S → aS can be used to derive strings of a’s. The production S → b will stop the derivation and produce the desired string ending with b. So a grammar for the language is S → aS | b.
Quizzes • Quiz (1 minute). Find a grammar for {ban | n ∊ N}. • Solution: S → Sa | b. • Quiz (1 minute). Find a grammar for {(ab)n | n ∊ N}. • Solution: S → Sab | Λ or S → abS | Λ.
Rules for Combining Grammars • Let L and M be two languages with grammars that have start symbols A and B, respectively, and with disjoint sets of nonterminals. Then the following rules apply. • L ∪ M has a grammar starting with S → A | B. • LM has a grammar starting with S → AB. • L* has a grammar starting with S → AS | Λ.
Example • Find a grammar for {ambmcn | m, n ∊ N}. • Solution: The language is the product LM, where L = {ambm | m ∊ N} and M = {cn | n ∊ N}. So a grammar for LM can be written in terms of grammars for L and M as follows. S → AB A → aAb | Λ B → cB | Λ.
Example • Find a grammar for the set Odd, of odd decimal numerals with no leading zeros, where, for example, 305 ∊ Odd, but 0305 ∉ Odd. • Solution: Notice that Odd can be written in the form Odd = (PD*)*O, where O = {1, 3, 5, 7, 9}, P = {1, 2, 3, 4, 5, 6, 7, 8, 9}, and D = {0} ∪ P. Grammars for O, P, and D can be written with start symbols A, B, and C as: A → 1 | 3 | 5 | 7 | 9, B → A | 2 | 4 | 6 | 8, and C → B | 0. Grammars for D* and PD* and (PD*)* can be written with start symbols E, F, and G as: E → CE | Λ, F → BE, and G → FG | Λ. So a grammar for Odd with start symbol S is S → GA.
Example • Find a grammar for the language L defined inductively by, • Basis: a, b, c ∊ L. • Induction: If x, y ∊ L then ƒ(x), g(x, y) ∊ L. • Solution: We can get some idea about L by listing some of its strings. a, b, c, ƒ(a), ƒ(b), …, g(a, a), …, g(ƒ(a), ƒ(a)), …, ƒ(g(b, c)), …, g(ƒ(a), g(b, ƒ(c))), … So L is the set of all algebraic expressions made up from the letters a, b, c, and the function symbols ƒ and g of arities 1 and 2, respectively. A grammar for L can be written as S → a | b | c | ƒ(S) | g(S, S). For example, a leftmost derivation of g(ƒ(a), g(b, ƒ(c))) can be written as S ➯ g(S, S) ➯ g(ƒ(S), S) ➯ g(ƒ(a), S) ➯ g(ƒ(a), g(S, S)) ➯ g(ƒ(a), g(b, S)) ➯ g(ƒ(a), g(b, ƒ(S))) ➯ g(ƒ(a), g(b, ƒ(c))).
Parse Tree S • A Parse Tree is a tree that represents a derivation. The root is the start symbol and the children of a nonterminal node are the symbols(terminals, nonterminals, or Λ) on the right side of the production used in the derivation step that replaces that node. • Example. The tree shown in the picture is the parse tree for the following derivation: S ➯ g(S, S) ➯ g(ƒ(S), S) ➯ g(ƒ(a), S) ➯ g(ƒ(a), b). g ( , S S ) ( S ) b f a
Ambiguous Grammar • Means there is at least one string with two distinct parse trees, or equivalently, two distinct leftmost derivations or two distinct rightmost derivations. • Example. Is the grammar S → SaS | b ambiguous? • Solution: Yes. For example, the string babab has two distinct leftmost derivations: S ⇒ SaS ⇒ SaSaS ⇒ baSaS ⇒ babaS ⇒ babab. S ⇒ SaS ⇒ baS ⇒ baSaS ⇒ babaS ⇒ babab. The parse trees for the derivations are pictured in the next slide.
parse trees S S S a S S a S S a S b b S a S b b b b
Quiz (2 minutes) • Show that the grammar S → abS | Sab | c is ambiguous. • Solution: The string abcab has two distinct leftmost derivations: S ⇒ abS ⇒ abSab ⇒ abcab S ⇒ Sab ⇒ abSab ⇒ abcab.
Unambiguous Grammar • Sometimes one can find a grammar that is not ambiguous for the language of an ambiguous grammar. • Example. The previous example showed S → SaS | b is ambiguous. The language of the grammar is {b, bab, babab, …}. Another grammar for the language is S → baS | b. It is unambiguous because S produces either baS or b, which can’t derive the same string.
Example • The previous quiz showed S → c | abS | Sab is ambiguous. Its language is {(ab)mc(ab)n | m, n ∈ N}. Another grammar for the language is S → abS | cT and T → abT | ٨. • It is unambiguous because S produces either abS or cT, which can’t derive the same string. • Similarly, T produces either abT or ٨, which can’t derive the same string.