1 / 67

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. Dipanjan Das Carnegie Mellon University. Slav Petrov Google Research. June 21 ACL 2011. Part-of-Speech Tagging. Portland has a thriving music scene . . ADJ. NOUN. NOUN. NOUN. DET. VERB.

mick
Télécharger la présentation

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Part-of-Speech Taggingwith Bilingual Graph-Based Projections Dipanjan Das Carnegie Mellon University Slav Petrov Google Research June 21 ACL 2011

  2. Part-of-Speech Tagging Portland has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB

  3. (Nearly) Universal Part-of-Speech Tags See Petrov, Das and McDonald (2011)

  4. (Nearly) Universal Part-of-Speech Tags Example Penn Treebank tag maps: NN NOUN NNP NOUN NNPS NOUN NNS NOUN PRPPRON PRP$ PRON WP PRON WP$ PRON Example Spanish Treebank tag maps: p0 PRON pdPRON pePRON pi PRON pnPRON ppPRON prPRON ptPRON pxPRON npNOUN ncNOUN

  5. (Nearly) Universal Part-of-Speech Tags Portland has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB Portland hat eine prächtig gedeihende Musikszene . . NOUN NOUN DET VERB ADJ ADJ পোর্টল্যান্ড শহর এরসঙ্গীত পরিবেশ বেশ উন্নত |  ADJ NOUN ADP NOUN ADJ NOUN NOUN . Supervised training data available for ~20 languages.

  6. Supervised Universal POS Tagging TnT(Brants, 2000) Generalizes well for the supervised setting: average accuracy is 96.2%

  7. Resource-Poor Languages Several major languages with no or little annotated data e.g. Native speakers Punjabi 109 million Vietnamese 69 million However, lots of parallel and unannotated data! Basic NLP tools like POS tagging essential for development of language technologies Polish 40 million 32 million Oriya 37 million Indonesian-Malay Azerbaijani 20 million Haitian 7.7 million See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size

  8. State of the Art in Unsupervised POS Tagging

  9. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Portland gedeihende : observation sequence : state sequence

  10. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm one of the 12 coarse tags ? ? ? ? ? ? ? eine prächtig . hat Musikszene Portland gedeihende : observation sequence : state sequence

  11. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm transition multinomials ? ? hat Portland : observation sequence : state sequence

  12. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm emission multinomials ? ? hat Portland : observation sequence : state sequence

  13. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Portland gedeihende EM-HMM Poor average result

  14. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models emission multinomials ? ? hat Portland : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)

  15. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models emission multinomials ? ? hat Portland suffixhyphencapital letters numbers... : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)

  16. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Portland suffixhyphencapital letters numbers... : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)

  17. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Portland EM-HMM Feature-HMM Improvements across all languages Berg-Kirkpatrick et al. (2010)

  18. Unsupervised POS Taggingwith dictionaries

  19. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Portland gedeihende

  20. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models State space constrained by possible gold tags PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Portland gedeihende EM-HMM Feature-HMM w/ gold dictionary Average result close to supervised accuracy!

  21. Morphologically rich languages only have base forms in dictionaries For most languages, access to high-quality tag dictionaries is not realistic. • Ideas: • Use supervision in resource-rich languages • Use translated data • Construct projected tag lexicons

  22. Bilingual Projection automatic labels from supervised tagger, 97% accuracy . ADJ NOUN NOUN NOUN DET VERB Portland has a thriving music scene .

  23. Bilingual Projection . ADJ NOUN NOUN NOUN DET VERB Portland has a thriving music scene . Portland hat eine prächtig gedeihende Musikszene . Automatic unsupervised alignments from translation data (available for more than 50 languages)

  24. Bilingual Projection . ADJ NOUN NOUN NOUN VERB DET Portland has a thriving music scene . Portland hat eine prächtig gedeihende Musikszene . NOUN (most frequent tag) unaligned word Idea 1:direct projection Yarowsky and Ngai (2001)

  25. Bilingual Projection + more projected tagged sentences Portland hat eine prächtig gedeihende Musikszene . . NOUN NOUN DET VERB NOUN ADJ supervised training tagger (Brants, 2000) Idea 1:direct projection Yarowsky and Ngai (2001)

  26. Bilingual Projection Idea 1:direct projection EM-HMM Feature-HMM Direct projection Yarowsky and Ngai (2001)

  27. Bilingual Projection Idea 1:direct projection EM-HMM Feature-HMM Direct projection consistent improvements over unsupervised models Yarowsky and Ngai (2001)

  28. Bilingual Projection Idea 2:lexicon projection

  29. Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB Portland has a thriving . music scene prächtig gedeihende . Portland hat eine Musikszene

  30. Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB . Portland gedeihende Portland has a thriving . music scene prächtig hat ignore unaligned word eine Musikszene

  31. Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB . Portland gedeihende Portland has a thriving . music scene Bag of alignments hat eine Musikszene

  32. Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN NOUN DET VERB . Portland gedeihende Portland has a thriving . music scene hat eine Musikszene

  33. Bilingual Projection Idea 2:lexicon projection . ADJ NOUN NOUN PRON NUM NOUN DET VERB . one one Portland gedeihende Portland has a thriving . music scene hat eine Musikszene

  34. Bilingual Projection Idea 2:lexicon projection . VERB ADJ NOUN NOUN PRON NUM NOUN DET VERB . one one Portland gedeihende Portland has a thriving thriving . music scene hat eine Musikszene

  35. Bilingual Projection Idea 2:lexicon projection After scanning all the parallel data: . eine gedeihende Portland hat Musikszene = probability of a tag given a word

  36. Bilingual Projection Idea 2:lexicon projection Feature HMM constrained with projected dictionary EM-HMM Feature-HMM Direct projection Projected Dictionary Improvements over simple projection for majority of the languages

  37. No information about unaligned words Can coverage be improved? Idea: Projected lexicon expansion and refinement using a lot of unlabeled data

  38. Brief Overview: Graph-Based Learning with Labeled and Unlabeled Data

  39. 0.9 0.1 0.01 labeled datapoints unlabeled datapoints supervised label distributions distributions to be found 0.9 0.8 = symmetric weight matrix Zhu, Ghahramaniand Lafferty, 2003

  40. Label Propagation 0.9 0.1 0.01 0.9 0.8 Zhu, Ghahramaniand Lafferty, 2003

  41. Label Propagation 0.9 0.1 0.01 0.9 0.8 set of distributions over unlabeled vertices Zhu, Ghahramaniand Lafferty, 2003

  42. Label Propagation 0.9 0.1 0.01 0.9 0.8 unlabeled vertices Zhu, Ghahramaniand Lafferty, 2003

  43. Label Propagation 0.9 0.1 0.01 0.9 0.8 brings the distributions of similar vertices closer Zhu, Ghahramaniand Lafferty, 2003

  44. Label Propagation 0.9 0.1 0.01 0.9 Size of the label set 0.8 brings the distributions of uncertain neighborhoods close to the uniform distribution Zhu, Ghahramaniand Lafferty, 2003

  45. Label Propagation 0.9 0.1 0.01 0.9 0.8 Iterative updates for optimization Zhu, Ghahramaniand Lafferty, 2003

  46. Idea 3:Graph-Based Projections How can label propagation help? • For a language: • Build graph over a lot of trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertextag distribution over the trigram’s middle word Subramanya, Petrov and Pereira (2010)

  47. Example Graph in German gutem Essen zugetan istwichtigbei zum Essen niederlassen ist gut bei fuers Essen drauf istfeinbei schlechtes Essen und 1000 Essen pro istlebhafterbei zu realisieren , zu essen , zu stecken , zu erreichen ,

  48. Example Graph in German gutem Essen zugetan NOUN ist wichtig bei zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu essen , zu stecken , zu erreichen , VERB

  49. Idea 3:Graph-Based Projections How can label propagation help? • For a target language: • Build graph over a 2M trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertextag distribution over the trigram’s middle word • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments

  50. Bilingual Graph ADJ ADV ADJ important gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu essen , NOUN food VERB zu stecken , zu erreichen , eating eat eat VERB VERB

More Related