Properties of Context-Free Languages

Properties of Context-Free Languages Juan Carlos Guzmán CS 6413 Theory of Computation Southern Polytechnic State University

Summary • Normal Forms • Pumping Lemma • Closure Properties • Decision Properties

Normal Forms • Recall that many different grammars generate the same language • We would like to restrict the form of the productions of the CFG • Chomsky Normal Form • Greibach Normal Form • Tasks to accomplish • Eliminate useless symbols • Eliminate ε-productions • Eliminate unit productions

Grammar Transformations • We are about to present a series of transformations on grammars • You should consider each of them as a “transformation function” T: GrammarGrammar

Elimination of Useless Symbols • Let G=(V,T,P,S) • XV is useful if there exist , , and w such that S *X*w • Two considerations: • X must generate strings • X*v • X must be reachable from S • S*X

Elimination of Non-Generating Symbols (Tg) • Let G=(V,T,P,S ) be a CFG • G’ = (V’  {S },T,P’,S ), where • V’ = { A | (A)P (TV’ )* } • P’ = { (A) | (A)P AV’  (TV’ )* } • contains only generating symbols

Example • G = ({S,A,B,C },{a,b },P ,S ), where • P = { Sa | A, AAB | BCA | a, Bb, CACA | BCB } • V’ = {S,A,B } • G’ = ({S,A,B },{a,b },P’ ,S ), where • P’ = { Sa | A, AAB | a, Bb }

Elimination of Non-Reachable Symbols (Tr) • Let G=(V,T,P,S ) be a CFG • G’ = (V’,T,P’,S ), where • V’ = {S}  { B | (AB )PAV’ } • P’ = { A | (A)PAV’ } • contains only reachable symbols

Example • G = ({S,A,B,C },{a,b },P ,S ), where • P = { Sa | A, AAB | a, Bb, CACA | BCB } • V’ = {S,A,B } • G’ = ({S,A,B },{a,b },P’ ,S ), where • P’ = { Sa | A, AAB | a, Bb }

Useful Symbols • Remove • non-generating symbols • non-reachable symbols

Elimination of ε-Productions (Tε) • Let G=(V,T,P,S ) be a CFG • Vε = { A | (A)P Vε*} • G’ = (V-Vε,T,P’,S ), where • P’ = {A0X1… Xkk | A0B1… BkkP for all 1i kBiVεXi {ε, Bi }  for all 0i ki(T V-Vε)* |0X1… Xkk | > 0 } • does not contain ε-prods and generates L(G) - {ε}

Example • G = ({S },{a,b },P ,S ), where • P = { SaSbS | bSaS | ε } • Vε = {S } • G’ = ({S },{a,b },P’ ,S ), where • P’ = {SaSbS | aSb | abS | ab | bSaS | bSa | baS | ba } • Note that G’ does not generate ε

Elimination of Unit Productions (Tu ) • Let G=(V,T,P,S ) be a CFG • Let Up = { (A,A) | AV}  { (A,C ) | (A,B)Up (BC )P } • G’ = (V,T,P’,S ), where • P’ = {A | (A,B)Up (B)P  V } • does not contain unit prods and generates L(G )

Example • G = ({E,T,F },{+,*,(,),a },P ,E ), where • P = {EE+T |T, TT*F |F, Fa | (E )} • Up = {(E,E ),(E,T ),(E,F ),(T,T ),(T,F ),(F,F )} • G’ = (V,T,P’,S ), where • P’ = {EE+T |T*F | a | (E ), TT*F | a | (E ), Fa | (E )}

Summary of Transformations • Given a CFG G, we can obtain a new grammar G’ such that • no ε-productions • no unit productions • no useless symbols • by transforming the original grammar in this order: Tr  Tg  Tu  Tε

Results of the Transformations • After the transformations • the grammars do not have useless symbols (and associated productions) • their productions (A) are not • ε-productions • Unit productions • Therefore,  must satisfy • ||>1, or • T

Implications for Transformed Grammars • Transformed grammars have some nice properties • No unit productions • No ε-productions • However, they produce “bushy” trees

Chomsky Normal Form • Any CFG without ε can be transformed so that each of its productions is of the form • ABC, where A,B,C V • Aa, where A V  a  T • The idea behind CNF is to obtain grammars whose parse trees are binary trees

Chomsky Normal Form • Productions of grammars not yet in CNF, but already transformed, are of the following forms • AX1… Xkk >1, allXi T V, or • Aaa T • We need to further transform the first kind of productions so that • the right-hand-side consists only of variables, and • break long RHS’s into chains of productions

Chomsky Normal Form • Transformations • For every terminal a that appears on a RHS of length 2 or more • Create a production Aa • Replace a in all such productions with A • Replace every production AB1… Bk (k >2) with • AB1C1 • C1B2C2 • … • Ck-2Bk-1Bk

Example

Greibach Normal Form • All productions must be of the form • AaB1… Bkk 0 • Note that each derivation step is associated with the generation of a terminal • This translates nicely to PDA’s where each movement of the automaton will be guided by the recognition of an input character • To convert to GNF • Order the variables (A1 … An) • Modify the production set so that • Ai  Aj implies that i  j • remove left recursion i.e., Ai  Aj implies that i < j • Ai  a • Ai  a, V * • The algorithm resembles matrix triangularization • It appears in 1st edition of our book

Relation Between Height and Yield of a CNF Parse Tree • Note that tree nodes of grammars in CNF are • binary nodes for productions (ABC) • unit terminal nodes for productions (Aa) • The yield of a complete CNF parse tree of height n is of size 2n-1 or less S height n-1 At most 2n-1 height n a1a2 a3 … at

Pumping Lemma • Let L be a context-free language. Then there exists a constant n (which depends on L) such that for every string z in L such that |z|n, we can break z into five strings, z = uvwxy, such that: • |vwx| n • vx  ε • For all i  0, the string uviwxiy is also in L

Pumping Lemma • In plain words • For any context-free language • Words of large size will contain a substring • Somewhere in the middle • Not null, not too big • That substring can itself be broken into three pieces vwx • v not null or x not null • v and x can be “pumped” (together) over and over again • The new words are guaranteed in the language • How large the words must be in order to be considered “large” depends on the actual language

Pumping Lemma – Proof • Find a CNF for the language • The size of the word relates to the height of the tree A0 A1 A2 Ak a

Pumping Lemma – Proof • Find a CNF for the language • For large words, a variable must be repeated S Ai Aj Note: Ai = Aj , i < j u v x y w

Related Strings • The strings • uwy • uvvwxxy • uvnwxny • are also in the language

How about ε? • If the language contains ε • The transformations remove ε from the grammar • Therefore you get a different language!!! • CNF is not defined for languages with ε • If a language contains ε • A new grammar can be given, which generates the same language • ε will be generated in one derivation • All other productions comply with CNF

Closure Properties • Context-free languages are closed under • Substitution • Regular Operators • Homomorphism • Reversal • Intersection with regular language • Inverse homomorphism

Substitution • A substitution is an operation which replaces characters with strings • These strings are pulled from a particular language

Substitution—Formally • Let Σ be an alphabet • Let La a language associated to aΣ • s(a) = La • s(a1a2…an) = s(a1)s(a2)…s(an) = La1 La2…Lan • s(L) = { s(w) | wL }

Substitution • CFL’s are closed under substitution with CFL’s • Let G = (V, Σ,P,S ), such that L(G ) = L • Let Ga = (Va,Ta,Pa,Sa), such that L(Ga) = La • Let G’ = (V’,T’,P’,S ) where • V’ =V (aΣVa ) • T’ = (aΣTa ) • P’ = (aΣPa )  P’’, where • P’’ is all productions of P, where each terminal a was replaced by the corresponding Sa • G’ generates s(L)

Example • G = ({S},{0,1},P,S), where • P= {S  SS| 0S1 | ε} • L0 = {(} • L1 ={)} • Or • L0 =0* • L1=1*

Closure Under Regular Operators • CFL’s are closed under • Union • Concatenation • Closure (*), and positive closure(+)

Closure Under Homomorphism • CFL’s are closed under homomorphism • This is a special case of substitution • Substitution with a single string

Reversal • CFL’s are closed under reversal • Just reverse all productions

Intersection with a Regular Language • CFL’s are not closed under intersection • They are closed under intersection with a regular language

Inverse Homomorphism • CFL’s are closed under inverse homomorphism

Decision Properties of CFL’s • Complexity to transform grammars to PDA’s, and within PDA’s • Complexity of transformation to CNF • Testing Emptyness of CFL’s • Testing Membership in a CFL

Undecidable Problems • Is a given CFG G ambiguous? • Is a given CFL L inherently ambiguous? • Is the intersection of two CFL’s empty? • Are two CFL’s the same? • Is a given CFL equal to Σ*, where Σ* is the alphabet of the language?

Properties of Context-Free Languages