1 / 31

CS621: Artificial Intelligence

CS621: Artificial Intelligence. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward, backward computation and Baum Welch Algorithm will be done later). Part of Speech Tagging.

eagan
Télécharger la présentation

CS621: Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS621: Artificial Intelligence Pushpak BhattacharyyaCSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21st and 25th Oct, 2010 (forward, backward computation and Baum Welch Algorithm will be done later)

  2. Part of Speech Tagging • POS Tagging is a process that attaches each word in a sentence with a suitable grammar tag (noun, verb etc.) from a given set of tags. • The set of tags is called the Tag-set. • Standard Tag-set : Penn Treebank (for English).

  3. POS: A kind of sequence labeling task • Other such tasks • Marking tags on genomic sequences • Training for predicting protein structure: labels are primary (P), secondery (S), tertiary (T) • Named entity labels • Washington_PLACE voted Washington_PERSON to power • पूजा_PERS ने पूजा के लिया फूल ख़रीदा (Puja bought flowers for worshipping) • Shallow parsing (noun phrase marking) • The_Blittle_Iboy_I sprained his_Bring_Ifinger_I.

  4. POS Tags • NN – Noun; e.g. Dog_NN • VM – Main Verb; e.g. Run_VM • VAUX – Auxiliary Verb; e.g. Is_VAUX • JJ – Adjective; e.g. Red_JJ • PRP – Pronoun; e.g. You_PRP • NNP – Proper Noun; e.g. John_NNP • etc.

  5. POS Tag Ambiguity • In English: I bank1 with the bank2 on the river bank3. • Bank1 is verb, the other two banks are noun • {Aside- generator of humour (incongruity theory)}: A man returns to his parked car and finds the sticker “Parking fine”. He goes and thaks the policeman for appreiating his parking skill. ‏fine_adverb vs. fine_noun

  6. For Hindi • Rama achhaagaatahai. (hai is VAUX : Auxiliary verb)‏; Ram sings well • Rama achhaladakaahai. (hai is VCOP : Copula verb)‏; Ram is a good boy

  7. Process • List all possible tag for each word in sentence. • Choose best suitable tag sequence.

  8. Example • ”People jump high”. • People : Noun/Verb • jump : Noun/Verb • high : Noun/Verb/Adjective • We can start with probabilities.

  9. Challenge of POS tagging Example from Indian Language

  10. Tagging of jo, vaha, kaunand their inflected forms in Hindi and their equivalents in multiple languages

  11. DEM and PRON labels • Jo_DEMladakaakalaayaathaa, vaha cricket acchhaakhelletaahai • Jo_PRONkalaayaathaa, vaha cricket acchhaakhelletaahai

  12. Disambiguation rule-1 • If • Jo is followed by noun • Then • DEM • Else • …

  13. False Negative • When there is arbitrary amount of text between the joand the noun • Jo_??? bhaagtaahuaa, haftaahuaa, rotaahuaa, chennai academy a kochinglenevaalaaladakaakalaayaathaa, vaha cricket acchhaakhelletaahai

  14. False Positive • Jo_DEM(wrong!) duniyadariisamajhkarchaltaahai, … • Jo_DEM/PRON? manushyamanushyoMkebiichristoMnaatoMkosamajhkarchaltaahai, … (ambiguous)

  15. False Positive for Bengali • Je_DEM(wrong!) bhaalobaasaapaay, seibhaalobaasaaditepaare (one who gets love can give love) • Je_DEM(right!)bhaalobaasatumikalpanaakorchho, taa e jagat e sambhab nay (the love that you are image exits, is impossible in this world)

  16. Will fail • In the similar situation for • Jis, jin, vaha, us, un • All these forms add to corpus count

  17. Disambiguation rule-2 • If • Jo is oblique (attached with ne, ko, se etc. attached) • Then • It is PRON • Else • <other tests>

  18. Will fail (false positive) • In case of languages that demand agreement between jo-form and the noun it qualifies • E.g. Sanskrit • Yasya_PRON(wrong!)baalakasyaaananamdrshtyaa… (jisladakekaamuhadekhkar) • Yasya_PRON(wrong!)kamaniyasyabaalakasyaaananamdrshtyaa…

  19. Will also fail for • Rules that depend on the whether the noun following jo/vaha/kaun or its form is oblique or not • Because the case marker can be far from the noun • <vaha or its form> ladakiijisepiliyakiibimaarii ho gayiiithiiko … • Needs discussions across languages

  20. Remark on DEM and PRON DEM vs. PRON cannot be disambiguated IN GENERAL At the level of the POS tagger i.e. Cannot assume parsing Cannot assume semantics

  21. Mathematics of POS tagging

  22. Derivation of POS tagging formula Best tag sequence = T* = argmax P(T|W) = argmax P(T)P(W|T) (by Baye’s Theorem) P(T) = P(t0=^ t1t2 … tn+1=.) = P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) … P(tn|tn-1tn-2…t0)P(tn+1|tntn-1…t0) = P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn) = P(ti|ti-1) Bigram Assumption N+1 ∏ i = 0

  23. Lexical Probability Assumption P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) … P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1) Assumption: A word is determined completely by its tag. This is inspired by speech recognition = P(wo|to)P(w1|t1) … P(wn+1|tn+1) = P(wi|ti) = P(wi|ti) (Lexical Probability Assumption) n+1 ∏ i = 0 n+1 ∏ i = 1

  24. Generative Model ^_^ People_N Jump_V High_R ._. Lexical Probabilities ^ N V A . V N N Bigram Probabilities N A A This model is called Generative model. Here words are observed from tags as states. This is similar to HMM.

  25. Parts of Speech Tags (Simplified situation) • Noun (N)– boy • Verb (V)– sing • Adjective (A)—red • Adverb (R)– loudly • Preposition (P)—to • Article (T)– a, an • Conjunction (C)– and • Wh-word (W)– who • Pronoun (U)--he

  26. Hidden Markov Model and POS tagging • Parts of Speech tags are states • Words are observation • S={N,V,A,R,P,C,T,W,U} • O={Words of language}

  27. Example • Test sentence • “^ People laugh aloud $”

  28. Transition Table

  29. Lexical or Word Probabilities

  30. Corpus • Collection of coherent text • ^_^ People_Nlaugh_Valoud_A $_$ Corpus Spoken Written Brown BNC Switchboard Corpus

More Related