Languages That Are and Are Not Context-Free

Languages That Are and Are Not Context-Free Section 3.5 Wed, Oct 26, 2005

Regular vs. Context-Free • Theorem: Every regular language is context-free. • Proof: • Let L be regular. • Given a DFA for L, add a stack, but do not use the stack. • That is, change each DFA transition (p, a, q) to a PDA transition ((p, a, e), (q, e)). • The result is a PDA whose language is L. • Therefore, L is context-free.

Closure under Union • Theorem: Let L1 and L2 be CFLs. Then L1L2 is also a CFL. • Proof: • Let L1 have grammar (V1, Σ1, R1, S1) and let L2 have grammar (V2, Σ2, R2, S2). • Then L1L2 has the grammar (V, Σ, R, S) where • Σ = Σ 1Σ 2 • V = V1V2 • S is the new start symbol • R = R1R2 {S → S1S2}.

Proof, continued • Therefore, L1L2is a CFL. • We must assume in the proof that (V1 – Σ1)  (V2 – Σ2) = . • Why?

Closure under Concatenation • Theorem: Let L1 and L2 be CFLs. Then L1L2 is also a CFL. • Proof: • Let L1 have grammar (V1, Σ1, R1, S1) and let L2 have grammar (V2, Σ2, R2, S2). • Then L1L2 has the grammar (V, Σ, R, S) where • Σ = Σ 1Σ 2 • V = V1V2 • S is the start symbol • R = R1R2 {S → S1S2}.

Proof, continued • Therefore, L1L2 is a CFL. • Again, we must assume that (V1 – Σ1)  (V2 – Σ2) = .

Closure under Kleene Star • Theorem: Let L be a CFL. Then L* is also a CFL. • Proof: • Let L have grammar (V, Σ, R, S). • Then L* has the grammar (V, Σ, R, S) where • R = R {S → e | SS}. • Therefore, L*is a CFL.

Intersection of a Regular Language and a CFL. • Theorem: The intersection of a CFL and a regular language is a CFL. • Proof (outline): • Use the cross product to construct the intersection of the PDA and the DFA. • Only one component uses the stack. • Therefore, there is no complication. • The cross product will function as a PDA.

Intersection of a Regular Language and a CFL. • More specifically, the transitions (p, a) q from the DFA and (p', a, )  (q', ) from the PDA may be combined into ((p, p'), a, )  ((q, q'), ) for the new PDA.

Complementation and Intersection • The complement of a context-free language is not necessarily context-free. • The intersection of two context-free languages is not necessarily context-free. • Counterexamples will be given later.

The Concept behind the Pumping Lemma for CFLs • The Pumping Lemma for CFLs will allow us to show that some languages are not context-free. • If a CFL contains a word w with a sufficiently long derivation S*w, then some nonterminal A must appear more than once. • This is the Pigeonhole Principle.

The Concept behind the Pumping Lemma for CFLs • That is, we have S*uAz*uvAyz*uvxyz. • Thus, A*vAy and A*x. • We may repeat the derivation A*vAy as many times as we like (including zero times), producing strings uvnxynz, for any n 0.

The Length of a Path in a Parse Tree • In a parse tree T, define a path to be • empty, or • a sequence of nodes, starting at a node in the tree and ending at one of its descendants, and including all of the children along the way. • The length of a path is • 0, if the path is empty, or • 1 less than the number of nodes in the path.

Height and Fanout • The height of a parse tree is the length of the tree’s longest path. • Given a grammar G, the fanout of G, denoted (G), is the largest number of symbols on the right side of any rule in G.

A Lemma for the Lemma • Lemma: Let G be a CFG. The yield of any parse tree of G of height h has length no greater than (G)h. • Proof: • The longest possible string is obtained if we always use a grammar rule with the maximum number of symbols on the right-hand side. • Therefore, if we apply grammar rules to each nonterminal in the string at most h times, then the length of the resulting string is at most f(G)h.

The Pumping Lemma for CFLs • The Pumping Lemma for CFLs: Let G = (V, Σ, R, S) be a context-free grammar. Then any string wL(G) with length at least n = (G)|V – | + 1 can be written as w = uvxyz for some strings u, v, x, y, z Σ* such that • |v| > 0 or |y| > 0, • |vxy|n, and • uvkxykzL(G) for every k 0.

The Pumping Lemma for CFLs • Proof: • Let n = (G)|V – | + 1. • Let wL(G) with |w| n. • Let T be a parse tree for w that uses the smallest number of leaves possible (minimize the number of empty strings.) • Let P be a path of maximum length in T. • Since |w| > (G)|V – |, the length of P is greater than |V – |, i.e., P is at least |V – | + 1. (Lemma) • Therefore, the number of nodes on P is at least |V – | + 2.

The Pumping Lemma for CFLs • Let P' be the last part of P consisting of exactly |V – | + 2 nodes. • P' must contain exactly |V – | + 1 nonterminals. • Therefore, at least one nonterminal must be repeated. • Let A be the first nonterminal that is repeated as we follow the path from the leaf back towards the root. • Let T' be the subtree with root at the second-to-last occurrence of A on the path P. • If we remove T' from T, except for its root A, the result is a parse tree for a string uAz.

The Pumping Lemma for CFLs • Let T'' be the subtree whose root node is the last occurrence of A on the path P. • T'' is a parse tree for a string x. • If we remove T'' from T' except the root A, the result is a parse tree for a string vAy. • This parse tree may be attached at the leaf A in the tree T – T' repeatedly as many times as we like (including zero times), creating parse trees for uvkAykz for any k 0. • Finally, we re-attach T'' and get a parse tree for uvkxykz.

The Pumping Lemma for CFLs • If v = e and y = e, then they could have been eliminated, producing a shorter tree. • We assumed that this was the shortest possible parse tree for w. • Therefore, v ≠ e or y ≠ e. • The path from the second-to-last A to the last A and then to the terminal has length at most |V – | + 1. • Therefore, the subtree T' represents no more than (G)|V – | + 1 terminals. (Lemma) • Thus, |vwy| n.

Standard Example of a Non-CFL • The language {anbncn | n 0} is not context-free. • Proof: • Suppose it is. • Let n be the n of the Pumping Lemma. • Let w = anbncn. • Then w = uvxyz where |v| > 0 or |y| > 0 and |vxy| n. • Then vxy contains at most two different symbols. • Suppose it contains at most as and bs (but no cs). • Then either v contains at least one a or y contains at least one b.

Standard Example of a Non-CFL • Say v contains ias and y contains jbs, for some i and j, with i > 0 or j > 0. • Then uv2xy2z contains at least n + ias and at least n + jbs, at least one of which is greater than n. • But uv2xy2z contains only ncs. • Thus, uv2xy2z L. • This is a contradiction. • Therefore, this language is not context-free. • The other case, where vxy contains bs and cs, but no as, is handled similarly.

Example of a Non-CFL • The language {ambncmdnm, n 0} is not context-free. • Proof: • Suppose that it is context-free. • Let n be the n of the Pumping Lemma. • Let w = anbncndn. • Complete the proof using the Pumping Lemma.

Example of a Non-CFL • The language L = {w *#as = #bs = #cs} is not context-free. • Proof: • Suppose that it is context-free. • Intersect it with L(a*b*c*), which is regular. • The intersection is {anbncn | n 0}, which known to be non-CFL. • Therefore, the language L is not context-free.

Nonclosure Properties • Theorem: The set of context-free languages is not closed under intersection. • Proof: • Let L1 = {anbncm | m, n 0} and let L2 = {ambncn | m, n 0}. • Clearly, L1and L2are context-free. • However, L1 L2= {anbncn | n 0}, which is known to be non-context-free.

Nonclosure Properties • Theorem: The set of context-free languages is not closed under complementation. • Proof: • Suppose it were closed under complementation. • Let L1and L2be context-free languages. • Then (L1'  L2')' is also context-free. • However, by DeMorgan’s Laws, this is L1 L2, which we now know is not necessarily context-free.

Example • The language L = {w * | wuu for any u*} is context-free. • The language L′ = {w * | w = uu for some u*} is not context-free.

Languages That Are and Are Not Context-Free