Chomsky Normal Form CYK Algorithm

Chomsky Normal Form CYK Algorithm

Télécharger la présentation

Chomsky Normal Form CYK Algorithm

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

1. Chomsky Normal FormCYK Algorithm

2. Normal Forms There are some special forms in which I can bring the grammar to work with it more easily. • Chomsky Normal Form • Greibach Normal Form

3. Chomsky Normal Form A Context Free Grammar is in Chomsky Normal Form if the rules are of the form: • A⟶BC • A ⟶ a • S ⟶ε with A, B, C being variables (B,C not being the start variable), a being a terminal and S only being the start variable.

4. Chomsky Normal Form There are 5 steps to follow in order to transform a grammar into CNF: • Add the a new start variable S0 and the production rule S0⟶ S. • Eliminate the ε-rules. • Eliminate the unary productions A ⟶ B. • Add rules of the form Vt⟶ t for every terminal t and replace t with the variable Vt. • Transform the remaining of the rules to the form A ⟶ BC (A, B, C variables).

5. 1. Add a new start variable • We have to make sure that the start variable doesn’t occur to the right side of some rule. • Thus, we add a new start variable S0 and the rule S0⟶ S, where S is the old start variable.

6. 2. Eliminate ε-rules • We have to eliminate all productions of the form A⟶ ε, for A being any non-start variable. • To do so we should remove the rule A⟶ ε and replace every appearance of A with ε in all other rules.

7. 3. Eliminate unary productions • A unary production is a production of the form A ⟶ B (with both A, B being variables). • There should only be productions of the form V1 ⟶ V2V3 involving variables, thus we have to eliminate unary productions. • To do so, we replace B in A ⟶ B with the right parts of the rules involving B in the left part.

8. 4. Add Vt⟶ t and replace t with Vt • There should only be rules of the form A ⟶ t involving terminals, thus terminals should disappear from every other rule involving more than just one single literal. • To do so, we add a new variable Vt for every terminal t and we replace every appearance of t with Vt , except those in rules of the form A ⟶ t.

9. 5. Transform rules to A ⟶ BC • All the rules involving only variables should be of the form A ⟶ BC. Thus we should take care of all the rules involving more than 2 variables in the right part • For the rule V ⟶ A1A2A3…An,we start reducing the size of the right part by replacing every two variables with one new variable (resulting in the creation of n-2 new variables).

10. 5. Transform rules to A ⟶ BC V ⟶ A1A2A3A4A5A6…An

11. 5. Transform rules to A ⟶ BC V ⟶ B1A3A4A5A6…An B1 ⟶ A1A2

12. 5. Transform rules to A ⟶ BC V ⟶ B2A4A5A6…An B2 ⟶ B1A3 B1 ⟶ A1A2

13. 5. Transform rules to A ⟶ BC V ⟶ B3A5A6…An B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2

14. 5. Transform rules to A ⟶ BC V ⟶ B4A6…An B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2

15. 5. Transform rules to A ⟶ BC V ⟶ Bn-2An Bn-2 ⟶ Bn-3An-1 … B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2

16. Example S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1

17. Example 1. Add new start variablle S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1

18. Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1

19. Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B |CS| SC |S C ⟶ 00 B ⟶ 01B | 1

20. Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC|S C ⟶ 00 B ⟶ 01B | 1

21. Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1

22. Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1

23. Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1

24. Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1

25. Example 3. Eliminate Unary Productions S0 ⟶ CSC | 01B | 1 | CS|SC S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1

26. Example 4. Create Vt for every terminal t S0 ⟶ CSC | 01B | 1 | CS | SC S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1 Z ⟶ 0

27. Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC| Z1B | 1 | CS | SC C ⟶ZZ B ⟶Z1B | 1 Z ⟶ 0

28. Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC | Z1B | 1 | CS | SC C ⟶ ZZ B ⟶ Z1B | 1 Z ⟶ 0 A ⟶ 1

29. Example 4. Create Vt for every terminal t S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC| ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ZAB | 1 Z ⟶ 0 A ⟶ 1

30. Example 5. Take care of long rules S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS

31. Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS

32. Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA

33. Example 5. Take care of long rules S0 ⟶ DC | EB | 1 | CS | SC S ⟶ DC | EB | 1 | CS | SC C ⟶ ZZ B ⟶EB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA

34. CYK Introduction • Problem: Given a context free grammar and a string s is it possible to decide whether s can be generated by the grammar or not? • If the grammar is not in a very special form this is not so efficient. • If the grammar is in Chomsky Normal Form, we have an elegant algorithm for testing this, the CYK algorithm.

35. The CYK algorithm • Suppose that we are given a grammar in Chomsky Normal form S → AB A → BB | 0 B → AA |1 • We would like to see if 10110 is generated by this grammar or not.

36. Substrings of length 1 • Since the only way to produce terminals is by following the rules A → a, just replace every terminal with the variables that produce it. 1 0 1 1 0 B A B B A

37. Substrings of length 2 Suppose now that we want to see how every substring of length 2 can be generated. This is equivalent with finding ways to produce all the length 2 substrings where terminals are replaced with the variables that represent them. But since every rule is of the form A → BC, it suffices to replace every two consecutive variables with the variables that produce them. 1 0 1 1 0 B A B B A - S A -

38. Substrings of length 3 • To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here BS cannot be produced by any variable. 10 1 1 0 B A B B A - S A - -

39. Substrings of length 3 • To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here we don’t have a pair since 10 cannot be produced. 1 01 1 0 B A BB A - S A - -

40. Substrings of length 3 • To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here AA can be produced by B. 101 1 0 B A B B A - S A - - B

41. Substrings of length 3 • To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here SB cannot be produced by any variable 1 0 11 0 B A B B A - S A - - B

42. Substrings of length 3 • To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here we don’t have a pair since 10 cannot be produced by a variable. 1011 0 B A BB A - S A - - B -

43. Substrings of length 3 • To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here AA can be produced by B 10 110 B A B BA - S A - - B B

44. Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here BB can be produced by A. 10 1 1 0 B A B B A - S A - - BB A

45. Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 10 cannot be produced. 1 011 0 B A B B A - S A - - B B A

46. Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 101 cannot be produced. 1 0 11 0 B A B B A - S A - - B B A

47. Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here AB can be produced by S. 101 1 0 B A B B A - S A - - B B A S

48. Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here we don’t have a pair since 10 cannot be produced. 1 0110 B A B B A - S A - - B B A S

49. Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here BA cannot be produced by any variable. 1 0 1 10 B A B BA - S A - - BB A S

50. Combine previous solutions • In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, BS cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S -