Closure properties for CFLs.

Closure properties for CFLs. • CFLs are closed under union, concatenation, and Kleene closure. • We’ve seen the proof when observing that all regular languages are CFLs • Recall that it involves creating new rules of the form • S → S1 | S2, S → S1S2, or S → l | S1S

Using CFL closure properties • For example, the languages below are CFLs • {0m1n2n | m,n>0} and {0m1m2n | m,n>0} • The first is {0m | m>0} ∙ {1n2n | n>0} • The second is {0m1m | m>0} ∙ {2n | n>0}. • But the intersection of these two CFLs is {0n1n2n | n>0}, which isn’t a CFL. • So the class of CFLs is not closed under intersection

Nonclosure under complementation • Suppose that the class of CFLs was closed under complementation. • Then it would be closed under intersection, just as for regular languages. • So the class of CFLs isn't closed under intersection • But the intersection of a CFL and a regular language is always a CFL. • This is shown as Theorem 8.5 of Linz

Intersecting CFLs with regular languages • The intuition is that a DFA and a PDA may be run in parallel • just as we did for two DFAs, by using the Cartesian product of the sets of states • This won't work for two PDAs • since we can't sensibly define the Cartesian product of two stacks. • But if there is only one PDA, then we can simply use its stack with no problem.

Decision algorithms for CFGs • To check whether L(G) is empty for a CFG G • we merely check whether S generates a string of terminals • To check whether L(G) is infinite for a CFG G • eliminate useless symbols, l-productions, and unit productions from G to get a new CFG G' • then determine whether any variable is nontrivially reachable from itself in G'. • This is equivalent to determining whether there is a cycle in a dependency graph

A sample CFG generating an infinite language • Consider the CFG G with rules below, where we ignore rules for Det, N, V, P • S → NP VP • NP → Det N | Det N PP • VP → V NP • PP → P NP • Since there’s a cycle involving NP and PP, L(G) is infinite • That is, NP =*> Det N P NP • where Det, N, and P generate nontrivial strings

Nonexistence of decision algorithms • For some questions about CFGs, there is no possible solution algorithm! • for example, there’s no algorithm for determining whether two CFGs generate the same language • Showing how to justify such a claim is a major goal of the rest of the course

The pumping lemma for CFLs • CFLs have an analog of the pumping lemma. • Its statement is slightly more complicated than for regular languages. • The basic idea is again simple • every sufficiently deep parse tree must have a long path, and thus a repeated symbol.

Deriving the pumping lemma for CFLs • Suppose a CFG G has m variables. • We may assume that G is in CNF (we won’t care whether l ε L(G). • If a parse tree has height greater than m, then it must have a path with more than m nonleaves. • Among the lowest m+1 nonleaves must be two that correspond to the same variable A.

Repeated variables on a path • If the higher node is Ahi and the lower one Alo • then the yield of Aj is a substring of the yield of Ai. • Let w be the yield of Alo and vwx that of Ahi. • Let uvwxy be the yield of the entire tree. • Since G is context free, we may replace the tree rooted at Alo by a copy of the tree rooted at Ahi.

The ability to pump • The yield of the new tree will be uv2wx2y. • We may repeat the process to get trees that yield uviwxiy for any i>0. • We may also replace the tree rooted at Ahi by the tree rooted at Alo to get a tree yielding uwy. • So for any nonzero i, uviwxiy is in L(G). • That is, z = uvwxy may be pumped • but slightly differently than for DFAs.

Making the pumped part short • Here, it is vwx whose length we will bound. • Since G is in CNF, a parse tree of height at most m has yield of length at most 2m-1 • For n = 2m, any string z of length at least n has a path of length greater than m • z can be pumped, • |vwx| <= n, • and vwx is a proper superstring of w. • We get a new pumping lemma (Linz, Th. 8.1)

Using the pumping for CFLs. • Suppose that the language L = {0j1j2j | j>=0} is a CFL. • If we choose z = 0n1n2n in the pumping lemma for CFLs, then vwx must contain at most two distinct symbols. • Therefore uwy contains more of the third symbol than of one of the other two symbols, and is thus not in L. • So L can't be a CFL.

Closure properties for CFLs.