1 / 46

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models . Yuval Marton Ph.D. Dissertation Defense Department of Linguistics University of Maryland. Dissertation Theme.

garvey
Télécharger la présentation

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fine-Grained LinguisticSoft Constraints on Statistical Natural Language Processing Models Yuval Marton Ph.D. Dissertation Defense Department of Linguistics University of Maryland

  2. Dissertation Theme • Hybrid knowledge/corpus-based statistical NLP models using fine-grained linguistic soft constraints Unified corpus-based model with soft linguistic constraints Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation Unified corpus-based model with soft linguistic constraints Semantic (Words) in word-pair similarity tasks Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation Yuval Marton, Dissertation Defense

  3. Pure vs. Hybrid Models • Pure models • Corpus-based, data-driven, distributional, statistical • Statistical Machine Translation • Distributional Profiles (Context Vectors) • Manually-crafted linguistic knowledge (rules, word grouping by concept), theory-driven • Rule-based / syntax-driven machine translation • WordNet/thesaurus-based semantic similarity measures • Hybrid models • Here: bias data-driven models with linguistic constraints Yuval Marton, Dissertation Defense

  4. Universe Hard Universe Soft Hard and Soft Constraints • Hard constraints • [0,1]; in/out • Decrease search space • Theory-driven • Faster, slimmer • Soft constraints • [0..1]; fuzzy • Only bias the model • Data-driven: Let patterns emerge Yuval Marton, Dissertation Defense

  5. Fine-Grained Soft Linguistic Constraints • Fine granularity is a big deal • Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Negative results positive results • Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Positive results  better results • Soft semantic constraints in paraphrase generation for SMT • Callison-Burch et al. 2006 vs. Marton, Callison-Burch & Resnik 2009 Yuval Marton, Dissertation Defense

  6. Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained • Soft syntactic constraints • In statistical machine translation • Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

  7. Statistical Machine Translation: Hiero • Chiang 2005, 2007 • Weighted synchronous CFG • Unnamed non-terminals: X <e, f>e.g., X  <今年X1, X1 this year> • Translation model features:e.g., ϕ3 = log p(e|f) • Log-linear model: + rule penalty feature, “glue” rules • These trees are not necessarily “syntactic”! • Not syntactic in the linguistic sense 的竞选Election 投票在初选voted inthe primaries Yuval Marton, Dissertation Defense

  8. Previous (Coarse) Soft Syntactic Constraints • X  X1speech ||| X1discurso • What should be the span of X1? • Chiang’s (2005) constituency feature • Reward rule’s score if rule’ssource-side matches a constituent span • Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward) • Good idea -- Neg-result Yuval Marton, Dissertation Defense

  9. New (Fine-Grained) Soft Syntactic Constraints • separate weighted feature for each constituent, e.g.: • NP-only: (NP=) • VP-only: (VP=) Yuval Marton, Dissertation Defense

  10. New Constraint Conditions • VP-only, revisited: • We saw VP-match (VP= ):Reward exact match of a VP sub-tree span • We can also incur a penalty for crossing constituent boundaries, e.g.,VP-cross (VP+ ) Yuval Marton, Dissertation Defense

  11. Constraint (Feature) Space • {NP, VP, IP, CP, …} x {match=,cross-boundary+} • Basic translation models: • For each feature, add (only it) to default feature set,assigning it a separate weight. • Feature “combo” translation models: • NP2 (double feature): add both NP=and NP+with a separate weight for each • NP_ (conflated feature) ties weights of NP=and NP+ • XP=, XP+, XP2, XP_:conflate all labels that correspond to“standard” X-bar Theory XP constituents in each condition. • All-labels= (Chiang’s), All-labels+, All-labels_,All-labels2 Yuval Marton, Dissertation Defense

  12. Chinese-English Results • Replicated Chiang 2005 constituency feature(negative result) • NP=, QP+, VP+ up to .74 BLEU points better. • XP+, IP2, all-labels_, VP2, NP_, up to 1.65 BLEU points better. • Validated on the NIST MT08 test set BLEU score: higher=better*,**: sig. better than baseline+,++: better than Chiang-05 (replicated) Yuval Marton, Dissertation Defense

  13. Arabic-English Results • New result for Chiang’s constituency feature(MT06, MT08) • PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline. • AP2, AdvP2 up to 1.94 better. • Validated on the NIST MT08 test set *,**: sig. better than baseline+,++: better than Chiang-05 New! Yuval Marton, Dissertation Defense

  14. PP+Example: Arabic MT06 Yuval Marton, Dissertation Defense

  15. Arabic-English Results – MIRA Chiang, Marton and Resnik (2008) Previous problem of feature selection solved here: Yuval Marton, Dissertation Defense

  16. Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained Soft syntactic constraints • In statistical machine translation • Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

  17. Semantic Models • Forget Frege, alternative worlds, <e,t>, … • To model meaning of words, we can use • “Pure” models • Knowledge-based: Manually crafted linguistic resources(dictionary, thesaurus, taxonomies, WordNet) • Usage-based: Machine-generated distributional profiles(containing word co-occurrence-based information) • Hybrid models • Bias distributional profiles with soft semantic constraints • As we just saw with soft syntactic constraints • E.g, use thesaurus “concepts” as word senses, with which to alter co-occurrence counts in distributional profiles Yuval Marton, Dissertation Defense

  18. Word-Based Distributional Profiles (DPs) tenure • Distributional Hypothesis (Harris 1940; Firth 1957) • DP (Context Vector) of “bank”:Which words “bank” occurs next to • Strength of association • Counts, PMI, TF/IDF-based, Log-likelihood ratios … • Vector similarity (cosine, L1, L2,..) bank linguist linguist money money river river teller teller water water … … α Yuval Marton, Dissertation Defense

  19. Taxonomies and Groupings • WordNet • Synsets • Classical Relations (“is-a”) • Arc distance • “The tennis problem” • Thesaurus • Flat lists of related words • Potentially coarse • Implicit relations, potentially non-classical job Is-a Is-a Industry job Academic job Is-a Is-a CEO Professor Yuval Marton, Dissertation Defense

  20. Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus FIN.INST bank, dollar, deposit, … RIVER bank, boat, wave, … • Word-based DP • Concept-based DP • Approximate senses • Aggregated • Coarse • “bank” is listed under several concepts • DP for each sense bank linguist linguist linguist money money money river river river teller teller teller water water water … … … Yuval Marton, Dissertation Defense

  21. Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus FIN.INST bank, dollar, deposit, … RIVER bank, boat, wave, … • How similar are “bank” and “wave”? • Compare all pairs of senses • FIN.INST, PHYSICS • FIN.INST, RIVER • RIVER, PHYSICS • RIVER, RIVER • Return closest sense pair • Problem: bank = wave ?? bank PHYSICS amp., wave, freq., … wave Yuval Marton, Dissertation Defense

  22. New: Word/Concept Hybrid Model(Word Sense DP) bankRIVER RIVER bank, boat, wave, … • Given the word’s word-based DP and concept-based DPs: • Bias DP of “bank” towards DP of RIVER • Create bankFIN.INSTsimilarly, etc. bank linguist linguist linguist money money money river river river teller teller teller water water water … … … Yuval Marton, Dissertation Defense

  23. Fine-Grained Soft Semantic Constraints • Hybrid models: best of all: fine-grained, sense-aware, widely applicable • bankFIN.INST≠bankRIVER≠waveRIVER! • Two hybrid flavors: • Hybrid-filtered • Hybrid-proportional Yuval Marton, Dissertation Defense

  24. Evaluation: Word-Pair Similarity Task • Give each word pair a similarity score • Rooster – voyage: 0.12 • Coast – shore: 0.93 • Same part-of-speech pairs • Noun-noun (Rubinstein & Goodenough, 1965; Finkelstein et al. 2002) • Verb-verb (Resnik & Diab, 2000) • Result: list of pairs ordered by similarity • Evaluation metric: Spearman rank correlation Yuval Marton, Dissertation Defense

  25. Word-Pair Similarity Results Yuval Marton, Dissertation Defense

  26. Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained Soft syntactic constraints • In statistical machine translation • Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

  27. Words  Phrases • Extend the word-based semantic similarity measures to “phrases” • she declined to provide any other information … • police refused to provide any other details … • So far: See if y is similar to xNow: Find y’s similar to x • Can solve other problems now! • Use these extended phrasal DPs to find good paraphrases of unknown “phrases” in machine translation models to provide any other bank information money declined teller details … Yuval Marton, Dissertation Defense

  28. Coverage Problem in Statistical Machine Translation • Trained on parallel text • Every new test document contains some “phrases” unknown to the model Spanish Spanish Spanish Spanish Spanish Spanish Spanish Spanish English English English English English English English English Test set Spanish Spanish Spanish Spanish Spanish Spanish Spanish ?? Spanish Yuval Marton, Dissertation Defense

  29. Previous Solution: Pivoting • Use other parallel texts to increase coverage • Drawback: Parallel text is a limited resources! French Spanish Spanish Spanish Spanish Spanish German’ German’ Spanish French’’ Spanish Spanish French’’ German English English English English English English English Spanish Spanish’ Spanish Spanish’’ Spanish Spanish English Test set Spanish Spanish Spanish Spanish Spanish Spanish’ Spanish’’ Spanish’’’ Yuval Marton, Dissertation Defense

  30. New Solution: Monolingually-Derived Paraphrases • Use monolingual text to increase coverage • Resources available in abundance! Spanish Spanish Spanish Spanish Spanish Spanish Spanish Spanish English English English English English English English English Monolingual text Spanish Spanish Spanish Test set Spanish Spanish Spanish Spanish Spanish Spanish Spanish Spanish α Spanish Spanish’ Spanish’ Spanish’’ Spanish’’ Spanish’’’ Spanish’’’ Spanish’’’’ Yuval Marton, Dissertation Defense

  31. Find Paraphrases • Gather all contexts L _ R for phrase “to provide any other”: • What else appears between L _ R ? Yuval Marton, Dissertation Defense

  32. Find Paraphrases • Gather all contexts L _ R for phrase “to provide any other”: • What else appears between L _ R ? • Measure distributional similarity to each candidate, e.g.,“to provide any other” -- “to give further” Yuval Marton, Dissertation Defense

  33. Paraphrase Examples (Phrases) Yuval Marton, Dissertation Defense

  34. Paraphrase Examples (Unigrams) Yuval Marton, Dissertation Defense

  35. Paraphrase Feature Model • Evidence reinforcement:If exist more than one fi paraphrases of f:Aggregate score with a “quasi-online updating”:asimi= asimi-1 + (1 – asimi-1) sim(fi,f), where asim0 = 0 Analogous to Callison-Burch et al. (2006) Yuval Marton, Dissertation Defense

  36. English to Chinese Results • 29k line subset created to emulate low density language setting * better than baseline + better than non-hybridcounterpart Yuval Marton, Dissertation Defense

  37. English-Chinese Translation Examples Yuval Marton, Dissertation Defense

  38. Spanish to English Yuval Marton, Dissertation Defense

  39. Comparison with Corpus Size & Pivoting Yuval Marton, Dissertation Defense

  40. Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained Soft syntactic constraints • In statistical machine translation Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

  41. Unified Model • Soft linguistic constraints in a log-linear model • Syntactic • Semantic • … • ihi(x) • Constraints = Add moreihi(x) terms to the sum:ihi(x) + jhj(x) i: Weight / importance of feature i hi: Features / Constraints Yuval Marton, Dissertation Defense

  42. Unified Model (Soft Syntactic Constraints) • Straightforward: if is a translation model,bias is syntactically, e.g., as follows: + jϕj(f,e) 1 If the source language where ϕj(f,e) = word sequence f is a VP. 0 Otherwise. Yuval Marton, Dissertation Defense

  43. Unified Model (Soft Semantic Constraints)semantic distance of word e in sense s from word e’ in sense s’: cos(es,e’s’) = where: fSense(e,s,wi) fWord(e,wi) / ZC fSense(e,s,wi) fSense(e’,s’,wi) = cosSense(es,e’s’) / ZC fSense(e,s,wi) fWord(e’,wi) cross-terms cross-terms / ZC fWord(e,wi) fSense(e’,s’,wi) / ZC fWord(e,wi) fWord(e’,wi) = K cosWord(e,e’) Yuval Marton, Dissertation Defense

  44. Main Contributions Fine-grained linguistic soft constraints Fine-grained linguistic soft constraints Fine-grained linguistic soft constraints Unified corpus-based model with soft linguistic constraints Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation in state-of-the-art end-to-end phrase-based SMT systems in state-of-the-art end-to-end phrase-based SMT systems Unified corpus-based model with soft linguistic constraints Semantic (Words) in word-pair similarity tasks Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation distributional paraphrase generation evidence reinforcement component Yuval Marton, Dissertation Defense

  45. Thanks to… • Defense Committee: • Philip Resnik, Chair/Advisor • Amy Weinberg, Advisor • William Idsardi, Member • Chris Callison-Burch, Special Member (JHU) • Bonnie Dorr, Dean's Representative • Ling Chair: • Norbert Hornstein • Ling Cohort: • Ellen … Lau • Phil Monahan • Eri Takahashi • Rebecca McKeown • ChizuruNakao • CLIP Lab • David Chiang, SmaraMuresan, HendraSetiawan, Adam Lopez, Chris Dyer, AsadSayeed, VladEidelman, Zhongqiang Huang, Denis Filimonov, and many others! Yuval Marton, Dissertation Defense

  46. Thank you! • Questions Yuval Marton, Dissertation Defense

More Related