1 / 115

Multilingual Guidance for Unsupervised Linguistic Structure Prediction

Multilingual Guidance for Unsupervised Linguistic Structure Prediction. Dipanjan Das Carnegie Mellon University CLSP Seminar Johns Hopkins University September 27, 2011. Joint work with. Shay Cohen Columbia University. Slav Petrov Google. Noah Smith Carnegie Mellon University. Goal: .

neena
Télécharger la présentation

Multilingual Guidance for Unsupervised Linguistic Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilingual Guidance for Unsupervised Linguistic Structure Prediction Dipanjan Das Carnegie Mellon University CLSP Seminar Johns Hopkins University September 27, 2011

  2. Joint work with Shay Cohen Columbia University Slav Petrov Google Noah Smith Carnegie Mellon University

  3. Goal: Learn linguistic structure without any labeled data in a target language . ADJ NOUN NOUN NOUN DET VERB Dependency Parsing Part-of-Speech Tagging Baltimore has a thriving music scene .

  4. Multilingual Unsupervised Learning no parallel data (hard) using parallel data supervision in source language(s) supervision in source language(s) joint learning for multiple languages joint learning for multiple languages Yarowsky and Ngai (2001) Cohen and Smith (2009) Snyder et al. (2009) Xi and Hwa (2005) Berg-Kirkpatrick and Klein (2010) Naseem et al. (2010) Smith and Eisner (2009) McDonald et al. (2011)

  5. Multilingual Unsupervised Learning no parallel data (hard) using parallel data supervision in source language(s) supervision in source language(s) joint learning for multiple languages joint learning for multiple languages This talk

  6. Multilingual Unsupervised Learning using parallel data Part 1 supervision in source language(s)

  7. Part-of-Speech Tagging . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene .

  8. Supervised POS Tagging Supervised setting: average accuracy is 96.2% with TnT (Brants, 2000)

  9. Resource-Poor Languages Several major languages with no or little annotated data Native speakers e.g. 109 million Punjabi 69 million Vietnamese However, lots of parallel and unannotated data! Basic NLP tools like POS tagging essential for development of language technologies Oriya 32 million Indonesian-Malay 37 million 20 million Azerbaijani 7.7 million Haitian See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size

  10. (Nearly) Universal POS Tags

  11. (Nearly) Universal POS Tags Example Penn Treebank tag maps: PRPPRON PRP$ PRON WP PRON WP$ PRON NN NOUN NNP NOUN NNPS NOUN NNS NOUN Example Spanish Treebank tag maps: p0 PRON pd PRON pe PRON pi PRON pn PRON pp PRON pr PRON pt PRON px PRON np NOUN nc NOUN See Petrov, Das and McDonald (2011)

  12. (Nearly) Universal POS Tags Baltimore has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB Baltimore hat eine prächtig gedeihende Musikszene . . NOUN NOUN DET VERB ADJ ADJ বাল্টিমোরশহর এরসঙ্গীত পরিবেশ বেশ উন্নত |  NOUN ADJ NOUN ADP ADJ . NOUN NOUN

  13. State of the Art in Unsupervised POS Tagging

  14. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Baltimore gedeihende : observation sequence : state sequence Merialdo (1994)

  15. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm one of the 12 coarse tags ? ? ? ? ? ? ? eine prächtig . hat Musikszene Baltimore gedeihende : observation sequence : state sequence Merialdo (1994)

  16. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm transition multinomials ? ? hat Baltimore : observation sequence : state sequence Merialdo (1994)

  17. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm emission multinomials ? ? hat Baltimore : observation sequence : state sequence Merialdo (1994)

  18. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithm ? ? ? ? ? ? ? eine prächtig . hat Musikszene Baltimore gedeihende EM-HMM Poor average result Johnson (2007)

  19. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models emission multinomials ? ? hat Baltimore : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)

  20. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Baltimore suffixhyphencapital letters numbers... : observation sequence : state sequence Berg-Kirkpatrick et al. (2010)

  21. Unsupervised Part-of-Speech Tagging Hidden Markov Model (HMM) with locally normalized log-linear models Estimated using gradient-based methods emission multinomials ? ? hat Baltimore EM-HMM Feature-HMM Improvements across all languages

  22. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Baltimore gedeihende State space constrained by possible gold tags

  23. Unsupervised POS Tagging with Dictionaries Hidden Markov Model (HMM) with locally normalized log-linear models PRON DET ADJ NUM ADJ ADV ADJ NOUN NOUN . VERB eine prächtig . hat Musikszene Baltimore gedeihende EM-HMM Feature-HMM w/ gold dictionary

  24. Morphologically rich languages only have base forms in dictionaries For most languages, access to high-quality tag dictionaries is not realistic. • Ideas: • Use supervision in resource-rich languages • Use parallel data • Construct projected tag lexicons

  25. Bilingual Projection automatic labels from supervised tagger, 97% accuracy . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene .

  26. Bilingual Projection . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene . Baltimore hat eine prächtig gedeihende Musikszene . Automatic unsupervised alignments from translation data (available for more than 50 languages)

  27. Bilingual Projection . ADJ NOUN NOUN NOUN DET VERB Baltimore has a thriving music scene . Baltimore hat eine prächtig gedeihende Musikszene . NOUN (most frequent tag) unaligned word Baseline1:direct projection Yarowsky and Ngai (2001)

  28. Bilingual Projection Baltimore hat eine prächtig gedeihende Musikszene . + more projected tagged sentences . NOUN NOUN DET VERB NOUN ADJ supervised training tagger (Brants, 2000) Baseline1:direct projection Yarowsky and Ngai (2001)

  29. Bilingual Projection Baseline 1:direct projection EM-HMM Feature-HMM Direct projection Yarowsky and Ngai (2001)

  30. Bilingual Projection Baseline 1:direct projection EM-HMM Feature-HMM Direct projection consistent improvements over unsupervised models Yarowsky and Ngai (2001)

  31. Bilingual Projection Baseline 2:lexicon projection

  32. Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB Baltimore has a thriving . music scene prächtig gedeihende . Musikszene Baltimore hat eine

  33. Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB . Baltimore gedeihende Baltimore has a thriving . music scene prächtig hat ignore unaligned word eine Musikszene

  34. Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB . Baltimore gedeihende Baltimore has a thriving . music scene Bag of alignments hat eine Musikszene

  35. Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET NOUN VERB . Baltimore gedeihende Baltimore has a thriving . music scene hat eine Musikszene

  36. Bilingual Projection Baseline 2:lexicon projection . ADJ NOUN NOUN DET PRON NUM NOUN VERB . one one Baltimore gedeihende Baltimore has a thriving . music scene hat eine Musikszene

  37. Bilingual Projection Baseline 2:lexicon projection . ADJ VERB NOUN NOUN DET PRON NUM NOUN VERB . one one Baltimore gedeihende Baltimore has a thriving . thriving music scene hat eine Musikszene

  38. Bilingual Projection Baseline 2:lexicon projection After scanning all the parallel data: . eine gedeihende Musikszene Baltimore hat = probability of a tag given a word

  39. Bilingual Projection Baseline 2:lexicon projection Feature HMM constrained with projected dictionary EM-HMM Feature-HMM Direct projection Projected Dictionary Improvements over simple projection for majority of the languages

  40. No information about unaligned words Baltimore has a thriving music scene . . ADJ NOUN NOUN NOUN DET VERB Baltimore hat eine prächtig gedeihende Musikszene . Can coverage be improved? Idea: Projected lexicon expansion and refinement using label propagation

  41. How can label propagation help? Our Model:Graph-Based Projections • For a language: • Build graph over 2M trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertex tag distribution over the trigram’s middle word Subramanya, Petrov and Pereira (2010)

  42. Example Graph in German gutem Essen zugetan ist wichtig bei zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , zu stecken , zu essen ,

  43. Example Graph in German gutem Essen zugetan NOUN ist wichtig bei zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , zu stecken , zu essen , VERB

  44. How can label propagation help? Our Model:Graph-Based Projections • For a target language: • Build graph over 2M trigram types as vertices • compute similarity matrix using co-occurrence statistics • Label distribution at each vertex tag distribution over the trigram’s middle word • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments

  45. Bilingual Graph ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB

  46. How can label propagation help? Our Model:Graph-Based Projections • For a target language: • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments • Run first stage of label propagation • Source language target language

  47. First Stage of Label Propagation ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB

  48. First Stage of Label Propagation ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB

  49. How can label propagation help? Our Model:Graph-Based Projections • For a target language: • Plug in auto-tagged words from a source language • Links between source and target language units are word alignments • Run first stage of label propagation • Source language target language • Run second stage of label propagation • within target language vertices • graph objective function with squared penalties

  50. Second Stage of Label Propagation ADJ ADV important ADJ gutem Essen zugetan nicely good ADJ ist wichtig bei fine zum Essen niederlassen ist gut bei fuers Essen drauf ist fein bei schlechtes Essen und 1000 Essen pro ist lebhafter bei zu realisieren , zu erreichen , NOUN food VERB zu stecken , zu essen , eating eat eat VERB VERB

More Related