1 / 24

CSA3050: NL Algorithms

CSA3050: NL Algorithms. Introduction to English Morphology Finite State Transducers. Acknowledgement. For further details see Jurafsky & Martin Ch.3. Morphology. Morphology is the study of how word-parts combine to form word wholes. Several different dimensions:

varian
Télécharger la présentation

CSA3050: NL Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers CSA3050: NLP Algorithms

  2. Acknowledgement For further details see Jurafsky & Martin Ch.3 CSA3050: NLP Algorithms

  3. Morphology • Morphology is the study of how word-parts combine to form word wholes. • Several different dimensions: • Orthographic - rules for combining strings of characters together. • Syntax - effect on syntactic category. • Semantic - effect on meaning. CSA3050: NLP Algorithms

  4. Examples ofMorphological Processes • Affixation • prefix • suffix • circumfix: German ge + stem + te.g. sagen, gesagt • infix: unbloodylikely • Vowel change: swim/swam • Consonant change: send/sent CSA3050: NLP Algorithms

  5. Inflectional+s plural+ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational+ment category changingescape+ment not completely productive: detractment* not completely systematic: catchment Inflectional/DerivationalMorphology CSA3050: NLP Algorithms

  6. English Inflectional Morphology • Applies to nouns, verbs and adjectives only • Number of inflections relatively small • Nouns • Plural, Possessive • Verbs • Verb forms • Adjectives • Comparison CSA3050: NLP Algorithms

  7. Noun Inflections CSA3050: NLP Algorithms

  8. Regular Verb Inflections CSA3050: NLP Algorithms

  9. Irregular Verb Inflections CSA3050: NLP Algorithms

  10. Morphological Parsing Output Analysis cat + PL Input Word cats Morphological Parser • Output is a string of morphemes • Reversibility? CSA3050: NLP Algorithms

  11. Morphological Parsing: Examples CSA3050: NLP Algorithms

  12. Morphemes • Morpheme is a theoretical contruct ... • but has a practical use • Choice of morpheme vocabulary: theoretical and practical motivation • Distinction between underlying morpheme and its realisation. • String of morphemes could be turned into another representation later CSA3050: NLP Algorithms

  13. Morphological Parsing Requires • Lexicon: list of stems and affixes + related information (e.g syntactic category) • Morphotactics: a model of ordering constraints over morphemes (e.g. the fact that +s comes after the stem not before). • Correspondences between input and output strings • SpellingRules: city + s  cities CSA3050: NLP Algorithms

  14. Lexicon • Lexicon is generally divided into sublexicons • Stem Lexicon • Noun Stems • Verb Stems • etc • Suffix Lexicon • Prefix Lexicon • Can all be represented as FSAs CSA3050: NLP Algorithms

  15. FSA for Sublexicon Fragment o t h e s a e i t s CSA3050: NLP Algorithms

  16. FSA for Morphotactics forNoun Inflection CSA3050: NLP Algorithms

  17. Morphotactics for Verb Inflection CSA3050: NLP Algorithms

  18. Input/Output Correspondences • Problem: how to specify correspondence between input word, and output analysis. • Given: both input and output are strings. • Two level morphology (Koskenniemi 1983) proposes • Surface Tape (words) • Lexical Tape (concatenation of morphemes) CSA3050: NLP Algorithms

  19. 2 Level Model The automaton used to perform the mapping Between these levels is the finite state transducer (FST). CSA3050: NLP Algorithms

  20. Basic FS Transducer • Each transition of a transducer is labelled with a pair of symbols • Input symbols are matched against the lower-side symbols on transitions. • If analysis succeeds, return the string of upper-side symbols output symb input symb CSA3050: NLP Algorithms

  21. C A T +N +PL e C A T S Morphological Analysis { ("CATS", "CAT+N+PL"), ("CAT", "CAT+N+SG") } CSA3050: NLP Algorithms

  22. FST Formal Definition • States, initial state, final states: same as FSA • Alphabets I and O are input and output alphabets, not necessarily disjoint. • FST Alphabet Σ I x O • Transition function δ(q, i:o), defines the state q' that ensues when the machine is in state q and encounters complex symbol i:o. CSA3050: NLP Algorithms

  23. FST Alphabet Example I x O O a:c a:a a:t a:ε c a t ε c:c c:ac:t c:ε Σ I ':c ':a ':t ':ε ' t:c t:a t:t t:ε CSA3050: NLP Algorithms

  24. Summary • Morphological processing can be handled by finite state machinery • Finite State Transducers are formally very similar to Finite State Automata. • They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages. CSA3050: NLP Algorithms

More Related