620 likes | 832 Vues
This text introduces the CYK algorithm, designed for determining if a given string can be generated by a context-free grammar in Chomsky Normal Form (CNF). It outlines the problem and presents a systematic method for analyzing substrings of varying lengths derived from the grammar, exemplified with the string "10110". The process illustrates how to substitute terminals with their generating variables. By following these steps, one can ascertain whether the string is representable by the grammar, concluding with a methodical way to fill in a parsing table to reach the answer efficiently.
E N D
Introduction • Problem: Given a context free grammar and a string s is it possible to decide whether s can be generated by the grammar or not? • If the grammar is not in a very special form this is not so efficient. • If the grammar is in Chomsky Normal Form, we have an elegant algorithm for testing this, the CYK algorithm.
The CYK algorithm • Suppose that we are given a grammar in Chomsky Normal form S → AB A → BB | 0 B → AA |1 • We would like to see if 10110 is generated by this grammar or not.
Substrings of length 1 • Since the only way to produce terminals is by following the rules A → a, just replace every terminal with the variables that produce it. 1 0 1 1 0 B A B B A
Substrings of length 2 Suppose now that we want to see how every substring of length 2 can be generated. This is equivalent with finding ways to produce all the length 2 substrings where terminals are replaced with the variables that represent them. But since every rule is of the form A → BC, it suffices to replace every two consecutive variables with the variables that produce them. 1 0 1 1 0 B A B B A - S A -
Substrings of length 3 • To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here BS cannot be produced by any variable. 10 1 1 0 B A B B A - S A - -
Substrings of length 3 • To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here we don’t have a pair since 10 cannot be produced. 1 01 1 0 B A BB A - S A - -
Substrings of length 3 • To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here AA can be produced by B. 101 1 0 B A B B A - S A - - B
Substrings of length 3 • To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here SB cannot be produced by any variable 1 0 11 0 B A B B A - S A - - B
Substrings of length 3 • To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here we don’t have a pair since 10 cannot be produced by a variable. 1011 0 B A BB A - S A - - B -
Substrings of length 3 • To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here AA can be produced by B 10 110 B A B BA - S A - - B B
Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here BB can be produced by A. 10 1 1 0 B A B B A - S A - - BB A
Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 10 cannot be produced. 1 011 0 B A B B A - S A - - B B A
Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 101 cannot be produced. 1 0 11 0 B A B B A - S A - - B B A
Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here AB can be produced by S. 101 1 0 B A B B A - S A - - B B A S
Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here we don’t have a pair since 10 cannot be produced. 1 0110 B A B B A - S A - - B B A S
Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here BA cannot be produced by any variable. 1 0 1 10 B A B BA - S A - - BB A S
Combine previous solutions • In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, BS cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S -
Combine previous solutions • In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here we don’t have a pair. 1 0 1 1 0 B A B B A - S A - - B B A S -
Combine previous solutions • In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here we don’t have a pair. 1 01 1 0 B A B B A - S A - - B B A S -
Combine previous solutions • In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, AA is produced by B. 1 0 1 10 B A B BA - S A - - B B A S B
Answer • If the last line contains the start variable S, then there is a way to produce the string else the string cannot be generated. For our example 10110 cannot be generated.
Mechanical way • Now that we show why this method works lets give an easy way to compute the table • Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 10 1 1 0 B A B B A - S A - - B B A S
Mechanical way • Now that we show why this method works lets give an easy way to compute the table • Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 10 1 1 0 B A B B A - S A - - B B A S
Mechanical way • Now that we show why this method works lets give an easy way to compute the table • Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 10 1 1 0 B A B B A - S A - - B B A -
Mechanical way • Now that we show why this method works lets give an easy way to compute the table • Suppose that we are about to fill in the position with the cycle. We take the pairs that the arrows designate 10 1 1 0 B A B BA - S A - - BB A -
Mechanical way • So finally: 10 1 1 0 B A B B A - S A - - B B A S
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B ABBB - S
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A BBB - S A
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A A
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A BBB - S A - -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A BBB - S A - - B -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 BA B BB - S A - - B S A
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 BA B BB - S A - - B S A -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A A
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A A -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A A -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A A -
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S AA S
A string that is produced • Run the CYK algorithm for the string 10111 10 1 1 1 B A B BB - S A - - B S A A S The derivation is: S