1 / 98

Formal Languages, Grammars, Regex, & Automata

Formal Languages, Grammars, Regex, & Automata. Shallow Processing Techniques for NLP Ling 570 October 3, 2011. Roadmap. Motivation: Defining a language Formal languages Regular languages Regular expressions, formally Formal grammars: Regular grammars, Context-free grammars

lonna
Télécharger la présentation

Formal Languages, Grammars, Regex, & Automata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Formal Languages, Grammars, Regex, & Automata Shallow Processing Techniques for NLP Ling 570 October 3, 2011

  2. Roadmap • Motivation: • Defining a language • Formal languages • Regular languages • Regular expressions, formally • Formal grammars: • Regular grammars, Context-free grammars • Finite-State Automata

  3. What’s a Language? • How can we define a language?

  4. What’s a Language? • How can we define a language? • Strings in a language: • He put the map on the table. • The quick brown fox jumped over the lazy dog. • The sky is blue.

  5. What’s a Language? • How can we define a language? • Strings in a language: • He put the map on the table. • The quick brown fox jumped over the lazy dog. • The sky is blue. • Strings not in the language: • *Green furiously colorless sleep ideas. • *The the on the. • *sdfsdfoiumerwweokc.

  6. What’s in a Language? • What are all the pronunciations of words in a language?

  7. What’s in a Language? • What are all the pronunciations of words in a language? • Some sounds in a language: • ah b aw • ah b aw t • t ah m ey t ow • t ah m aa t ow

  8. What’s in a Language? • What are all the pronunciations of words in a language? • Some sounds in a language: • ah b aw • ah b aw t • t ah m ey t ow • t ah m aa t ow • Some sounds not in the languages: • k k t p k • m g p n aa

  9. Defining Language • A language is defined as all and only those acceptable strings in the language.

  10. Defining Language • A language is defined as all and only those acceptable strings in the language. • How can we describe the language?

  11. Defining Language • A language is defined as all and only those acceptable strings in the language. • How can we describe the language? • Enumerate?

  12. Defining Language • A language is defined as all and only those acceptable strings in the language. • How can we describe the language? • Enumerate? • Problems • Languages are infinitely productive • Inefficient • Misses basic regularities

  13. Better Definitions • Grammars: • Start symbol • Expand with rewrite rules • Stop at word strings

  14. Better Definitions • Grammars: • Start symbol • Expand with rewrite rules • Stop at word strings • Automata: • Start in start state • Transition to other states • Until reach final state

  15. Better Definitions • Grammars: • Start symbol • Expand with rewrite rules • Stop at word strings • Automata: • Start in start state • Transition to other states • Until reach final state • Generate/recognize strings in language

  16. Better Definitions • Grammars: • Start symbol • Expand with rewrite rules • Stop at word strings • Automata: • Start in start state • Transition to other states • Until reach final state • Generate/recognize strings in language • Reject those not in language

  17. Formal Languages

  18. Acoustic Model P(signal|words) words -> phones + phones -> vector quantiz’n Words -> phones Pronunciation dictionary lookup Multiple pronunciations? Probability distribution Dialect Variation: tomato +Coarticulation Product along path aa t ow m t ow ey ow aa t m t ow ax ey 0.5 0.5 0.2 0.5 0.5 0.8

  19. Pronunciation Example • Observations: 0/1

  20. Formal Languages • Formal language: Model that can recognize/generate all and only strings a formal language act as a definition of the language

  21. Formal Languages • Formal language: Model that can recognize/generate all and only strings a formal language act as a definition of the language • Alphabet: Finite set of symbols • Σ= {a, b, c}

  22. Formal Languages • Formal language: Model that can recognize/generate all and only strings a formal language act as a definition of the language • Alphabet: Finite set of symbols • Σ= {a, b, c} • String: Finite sequence of symbols from alphabet • “aababc” • Empty string: ε

  23. Formal Languages • Formal language: Model that can recognize/generate all and only strings a formal language act as a definition of the language • Alphabet: Finite set of symbols • Σ= {a, b, c} • String: Finite sequence of symbols from alphabet • “aababc” • Empty string: ε • Formal language: Set of strings defined over alphabet • {aa, bb, cc, aaaa, bbbb } • {anbn| n > 0} • Empty set ϕ

  24. Regular Language

  25. Kleene Closure • L2=L L • Ln = Ln-1  L • L* = {ε} U L1 U L2U ….

  26. Kleene Closure • L2=L L • Ln = Ln-1  L • L* = {ε} U L1 U L2U …. • E.g. • L = {a,b} • L2 = {aa, ab, bb, ba} • L* = {ε, a,b,aa,aaa,aaaa,abab}

  27. Regular Languages • Closed under • Concatenation:  • Union/Disjunction: U • Kleene star: *

  28. Regular Languages • Closed under • Concatenation:  • Union/Disjunction: U • Kleene star: * • Also • Intersection: If L1 and L2 R.L.s, then R.L. • Difference: If L1 and L2 R.L.s, then L1-L2 is R.L. • Complementation: if L1 is R.L., then Σ*-L is R.L. • Reversal

  29. Regular Languages? • Any finite set of strings? • {xxR} • {a*b*} • {anbn| n > 0} • {anbncn| n > 0}

  30. Regular Expressions

  31. Regular Expressions(as a Formal Language) • εis a regular expression • is a regular expression • If r1 and r2 are regular expressions, then • r1 r2 is a regular expression, • r1 | r2 is a regular expression, • and r1* is a regular expression

  32. Basic Regular Expressions • Examples: • ab*c • a (0|1) b • C? V N?, where C is consonant, V is vowel, N is nasal

  33. Basic Regular Expressions • Examples: • ab*c • a (0|1) b • C? V N?, where C is consonant, V is vowel, N is nasal • Others: • +: 1 or more • a?: 0 or 1 • . : wildcard • [0123]: disjunction • [^0123]: disjunctive negation

  34. More Complex RegEx

  35. More Complex RegEx

  36. More Complex RegEx

  37. More Complex RegEx

  38. More Complex RegEx

  39. More Complex RegEx

  40. More Complex RegEx Examples: \d+ dollars = 10 dollars, 105 dollars, etc Escape: \ : turns off special characters; \\ (backslashitis)

  41. Searching for ‘the’ • Idea: • /the/

  42. Searching for ‘the’ • Idea: • /the/ • Idea 2: • /[Tt]he/

  43. Searching for ‘the’ • Idea: • /the/ • Idea 2: • /[Tt]he/ • Idea 3: • /\b[Tt]he\b/

  44. Searching for ‘the’ • Idea: • /the/ • Idea 2: • /[Tt]he/ • Idea 3: • /\b[Tt]he\b/ • Balancing: • Improving coverage (lower miss rate, aka Type 2 error)

  45. Searching for ‘the’ • Idea: • /the/ • Idea 2: • /[Tt]he/ • Idea 3: • /\b[Tt]he\b/ • Balancing: • Improving coverage (lower miss rate, aka Type 2 error) • Improving precision (lower false alarm, aka Type 1 error)

  46. Equivalences • Every regular language can be obtained from a regular expression. • Every regular expression can be associated with a regular language.

  47. Formal Grammars

  48. Representation:Formal Grammars • A formal grammar is a concise description of a formal language

  49. Representation:Formal Grammars • A formal grammar is a concise description of a formal language • Grammars: 4-tuple • A set of terminal symbols: Σ

  50. Representation:Formal Grammars • A formal grammar is a concise description of a formal language • Grammars: 4-tuple • A set of terminal symbols: Σ • A set of non-terminal symbols: N

More Related