Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Chomsky Normal Form CYK Algorithm

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Normal Forms**There are some special forms in which I can bring the grammar to work with it more easily. • Chomsky Normal Form • Greibach Normal Form**Chomsky Normal Form**A Context Free Grammar is in Chomsky Normal Form if the rules are of the form: • A⟶BC • A ⟶ a • S ⟶ε with A, B, C being variables (B,C not being the start variable), a being a terminal and S only being the start variable.**Chomsky Normal Form**There are 5 steps to follow in order to transform a grammar into CNF: • Add the a new start variable S0 and the production rule S0⟶ S. • Eliminate the ε-rules. • Eliminate the unary productions A ⟶ B. • Add rules of the form Vt⟶ t for every terminal t and replace t with the variable Vt. • Transform the remaining of the rules to the form A ⟶ BC (A, B, C variables).**1. Add a new start variable**• We have to make sure that the start variable doesn’t occur to the right side of some rule. • Thus, we add a new start variable S0 and the rule S0⟶ S, where S is the old start variable.**2. Eliminate ε-rules**• We have to eliminate all productions of the form A⟶ ε, for A being any non-start variable. • To do so we should remove the rule A⟶ ε and replace every appearance of A with ε in all other rules.**3. Eliminate unary productions**• A unary production is a production of the form A ⟶ B (with both A, B being variables). • There should only be productions of the form V1 ⟶ V2V3 involving variables, thus we have to eliminate unary productions. • To do so, we replace B in A ⟶ B with the right parts of the rules involving B in the left part.**4. Add Vt⟶ t and replace t with Vt**• There should only be rules of the form A ⟶ t involving terminals, thus terminals should disappear from every other rule involving more than just one single literal. • To do so, we add a new variable Vt for every terminal t and we replace every appearance of t with Vt , except those in rules of the form A ⟶ t.**5. Transform rules to A ⟶ BC**• All the rules involving only variables should be of the form A ⟶ BC. Thus we should take care of all the rules involving more than 2 variables in the right part • For the rule V ⟶ A1A2A3…An,we start reducing the size of the right part by replacing every two variables with one new variable (resulting in the creation of n-2 new variables).**5. Transform rules to A ⟶ BC**V ⟶ A1A2A3A4A5A6…An**5. Transform rules to A ⟶ BC**V ⟶ B1A3A4A5A6…An B1 ⟶ A1A2**5. Transform rules to A ⟶ BC**V ⟶ B2A4A5A6…An B2 ⟶ B1A3 B1 ⟶ A1A2**5. Transform rules to A ⟶ BC**V ⟶ B3A5A6…An B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2**5. Transform rules to A ⟶ BC**V ⟶ B4A6…An B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2**5. Transform rules to A ⟶ BC**V ⟶ Bn-2An Bn-2 ⟶ Bn-3An-1 … B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2**Example**S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1**Example**1. Add new start variablle S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1**Example**2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1**Example**2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B |CS| SC |S C ⟶ 00 B ⟶ 01B | 1**Example**3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC|S C ⟶ 00 B ⟶ 01B | 1**Example**3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1**Example**3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1**Example**3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1**Example**3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1**Example**3. Eliminate Unary Productions S0 ⟶ CSC | 01B | 1 | CS|SC S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1**Example**4. Create Vt for every terminal t S0 ⟶ CSC | 01B | 1 | CS | SC S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1 Z ⟶ 0**Example**4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC| Z1B | 1 | CS | SC C ⟶ZZ B ⟶Z1B | 1 Z ⟶ 0**Example**4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC | Z1B | 1 | CS | SC C ⟶ ZZ B ⟶ Z1B | 1 Z ⟶ 0 A ⟶ 1**Example**4. Create Vt for every terminal t S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC| ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ZAB | 1 Z ⟶ 0 A ⟶ 1**Example**5. Take care of long rules S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS**Example**5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS**Example**5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA**Example**5. Take care of long rules S0 ⟶ DC | EB | 1 | CS | SC S ⟶ DC | EB | 1 | CS | SC C ⟶ ZZ B ⟶EB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA**CYK Introduction**• Problem: Given a context free grammar and a string s is it possible to decide whether s can be generated by the grammar or not? • If the grammar is not in a very special form this is not so efficient. • If the grammar is in Chomsky Normal Form, we have an elegant algorithm for testing this, the CYK algorithm.**The CYK algorithm**• Suppose that we are given a grammar in Chomsky Normal form S → AB A → BB | 0 B → AA |1 • We would like to see if 10110 is generated by this grammar or not.**Substrings of length 1**• Since the only way to produce terminals is by following the rules A → a, just replace every terminal with the variables that produce it. 1 0 1 1 0 B A B B A**Substrings of length 2**Suppose now that we want to see how every substring of length 2 can be generated. This is equivalent with finding ways to produce all the length 2 substrings where terminals are replaced with the variables that represent them. But since every rule is of the form A → BC, it suffices to replace every two consecutive variables with the variables that produce them. 1 0 1 1 0 B A B B A - S A -**Substrings of length 3**• To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here BS cannot be produced by any variable. 10 1 1 0 B A B B A - S A - -**Substrings of length 3**• To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here we don’t have a pair since 10 cannot be produced. 1 01 1 0 B A BB A - S A - -**Substrings of length 3**• To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here AA can be produced by B. 101 1 0 B A B B A - S A - - B**Substrings of length 3**• To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here SB cannot be produced by any variable 1 0 11 0 B A B B A - S A - - B**Substrings of length 3**• To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here we don’t have a pair since 10 cannot be produced by a variable. 1011 0 B A BB A - S A - - B -**Substrings of length 3**• To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here AA can be produced by B 10 110 B A B BA - S A - - B B**Substrings of length 4**• To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here BB can be produced by A. 10 1 1 0 B A B B A - S A - - BB A**Substrings of length 4**• To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 10 cannot be produced. 1 011 0 B A B B A - S A - - B B A**Substrings of length 4**• To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 101 cannot be produced. 1 0 11 0 B A B B A - S A - - B B A**Substrings of length 4**• To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here AB can be produced by S. 101 1 0 B A B B A - S A - - B B A S**Substrings of length 4**• To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here we don’t have a pair since 10 cannot be produced. 1 0110 B A B B A - S A - - B B A S**Substrings of length 4**• To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here BA cannot be produced by any variable. 1 0 1 10 B A B BA - S A - - BB A S**Combine previous solutions**• In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, BS cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S -