1 / 27

Regular Languages Regular Expressions Finite-State Automata

Regular Languages Regular Expressions Finite-State Automata. Torbjörn Lager, Stockholm University. Languages. Sets of strings Example: . {“ac” “abc” “abbc” “abbbc” ...}. b. c. a. 0. 1. 2. Finite-State Automata.

keelia
Télécharger la présentation

Regular Languages Regular Expressions Finite-State Automata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular LanguagesRegular ExpressionsFinite-State Automata Torbjörn Lager, Stockholm University

  2. Languages • Sets of strings • Example: {“ac” “abc” “abbc” “abbbc” ...} FST - Torbjörn Lager, UU

  3. b c a 0 1 2 Finite-State Automata • Directed graphs consisting of states, and arcs labelled with symbols. For example: FST - Torbjörn Lager, UU

  4. Regular expressions • Example: • Note: • Atomic symbols (e.g. ‘a’) denote languages (e.g. {“a”}) • Operations (here: concatenation and kleene-star) are operations over languages [a b* c] FST - Torbjörn Lager, UU

  5. Regular expressions, regular languages and automata Regular expressions denote compile into Regular languages Finite-state automata generateaccept FST - Torbjörn Lager, UU

  6. Regular expressions, regular languages and automata [a b* c] denotes compiles into b {“ac” “abc” “abbc” ...} c a 0 1 2 generatesaccepts FST - Torbjörn Lager, UU

  7. Regular expression operators • Concatenation A B • Union A | B • Iteration (Kleene-star) A* • Difference A - B • Intersection A & B • Grouping of expressions [A] FST - Torbjörn Lager, UU

  8. Examples • a b {“ab”} • a [b|c] {“ab” “ac”} • a* [b|c]* {“” “a” “ab” ... } • [a|b] & [b|c] {“b”} • [a|b] - b {“a”} • [a|b] - [b|a] {} FST - Torbjörn Lager, UU

  9. Special symbols • ? The any symbol • ?* The universal language • [] The empty-string language • 0 (or “”) The empty string (epsilon) FST - Torbjörn Lager, UU

  10. Regular expression operators • Optionality (A) • Kleene-plus A+ • Complement ~A • Containment $A • Restriction A => B _ C FST - Torbjörn Lager, UU

  11. Examples • a (b) c {“ac” “abc”} • a b+ c {“abc” “abbc” “abbbc” ...} • ~[a b c] {“” ... “a” ... “ab” ... “abca” ..} • $[a|b] {“a” ... “abba” ... “abcd” ...} • b => a _ c {“” ... “a” ... “ccc” ... “abc” ..} FST - Torbjörn Lager, UU

  12. Component technologies in FST • Word lists and lexica • Tokenisers • Morphological analysers • Part-of-speech taggers • Parsers FST - Torbjörn Lager, UU

  13. Applications of FST • Named-entity recognition • Information extraction • Corpus linguistics • Spelling- and grammar checking • Speech processing applications FST - Torbjörn Lager, UU

  14. The Xerox Finite-State Tool • Compiles extended regular expressions into finite-state machines (automata and transducers) • Allows the user to display, examine and modify the machines FST - Torbjörn Lager, UU

  15. Non-deterministic FSAs • At least one state has more than one transition leading from it labelled with the same symbol b b a 0 1 2 FST - Torbjörn Lager, UU

  16. Determinization of FSAs • Any non-deterministic FSA can be transformed into an equivalent deterministic FSA. • Example: • Determinize for efficiency! b b b b a a 0 1 2 0 1 2 FST - Torbjörn Lager, UU

  17. Minimization of FSAs • Any (deterministic) FSA can be transformed into an equivalent FSA that has a minimal number of states. • Minimize for space! FST - Torbjörn Lager, UU

  18. Representing word lists • Think of a word list as a regular language • Use the calculus of regular expressions to query and update the wordlist • Determinize for speed! • Minimize for space! FST - Torbjörn Lager, UU

  19. Various equivalences • (A) = A|[] • A+ = A A* • A+ = A* - [] • A - B = A & ~B • ~A = ?* - A • $A = ?* A ?* • ~[A | B] = ~A & ~B • ~[A & B] = ~A | ~B FST - Torbjörn Lager, UU

  20. Various equivalences • A - A = ~[?*] • A | ~[?*] = A • A [] = A • A ~[?*] = ~[?*] • A & ?* = A • A | ?* = ?* FST - Torbjörn Lager, UU

  21. Important theoretical results • Kleene’s theorem (concerning FSAs) • Closure properties of regular languages and regular relations • Decidability FST - Torbjörn Lager, UU

  22. FSAs and regular expressions • Kleene’s theorem: Any language recognised by an FSA is denoted by a regular expression and any language denoted by a regular expression can be recognised by a FSA. FST - Torbjörn Lager, UU

  23. Regex to FSA to regex NondeterministicFSA Deterministic FSA Nondeterministic FSAwith epsilon transitions Regularexpressions Picture adapted from Hopcroft & Ullman 1979 FST - Torbjörn Lager, UU

  24. From regular expressions to finite-state automata • The only really necessary operators: • Disjunction • Concatenation • Iteration • Sidenote: Compare regular grammars: • A --> x BA --> x (where A and B are nonterminals, and where x is a sequence of terminals) FST - Torbjörn Lager, UU

  25. Closure properties of regular languages • A set is said to be closed under an operation iff applying the operation to members of the set will never take us outside the set • Example: if A and B are regular languages, then [A|B] is always regular. Therefore regular languages are closed under union. FST - Torbjörn Lager, UU

  26. Decidability • Given one automaton A: • Is the string S a string in L(A) ? • Does L(A) contain any strings at all ? • Is L(A) equivalent to ?* ? • Given two automata A1 and A2: • Is L(A1) a subset of L(A2) ? • Are L(A1) and L(A2) equivalent ? • Do L(A1) and L(A2) overlap ? FST - Torbjörn Lager, UU

  27. FST - Torbjörn Lager, UU

More Related