Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Fine-Grained LinguisticSoft Constraints on Statistical Natural Language Processing Models Yuval Marton Ph.D. Dissertation Defense Department of Linguistics University of Maryland

Dissertation Theme • Hybrid knowledge/corpus-based statistical NLP models using fine-grained linguistic soft constraints Unified corpus-based model with soft linguistic constraints Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation Unified corpus-based model with soft linguistic constraints Semantic (Words) in word-pair similarity tasks Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation Yuval Marton, Dissertation Defense

Pure vs. Hybrid Models • Pure models • Corpus-based, data-driven, distributional, statistical • Statistical Machine Translation • Distributional Profiles (Context Vectors) • Manually-crafted linguistic knowledge (rules, word grouping by concept), theory-driven • Rule-based / syntax-driven machine translation • WordNet/thesaurus-based semantic similarity measures • Hybrid models • Here: bias data-driven models with linguistic constraints Yuval Marton, Dissertation Defense

Universe Hard Universe Soft Hard and Soft Constraints • Hard constraints • [0,1]; in/out • Decrease search space • Theory-driven • Faster, slimmer • Soft constraints • [0..1]; fuzzy • Only bias the model • Data-driven: Let patterns emerge Yuval Marton, Dissertation Defense

Fine-Grained Soft Linguistic Constraints • Fine granularity is a big deal • Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Negative results positive results • Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Positive results  better results • Soft semantic constraints in paraphrase generation for SMT • Callison-Burch et al. 2006 vs. Marton, Callison-Burch & Resnik 2009 Yuval Marton, Dissertation Defense

Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained • Soft syntactic constraints • In statistical machine translation • Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

Statistical Machine Translation: Hiero • Chiang 2005, 2007 • Weighted synchronous CFG • Unnamed non-terminals: X <e, f>e.g., X  <今年X1, X1 this year> • Translation model features:e.g., ϕ3 = log p(e|f) • Log-linear model: + rule penalty feature, “glue” rules • These trees are not necessarily “syntactic”! • Not syntactic in the linguistic sense 的竞选Election 投票在初选voted inthe primaries Yuval Marton, Dissertation Defense

Previous (Coarse) Soft Syntactic Constraints • X  X1speech ||| X1discurso • What should be the span of X1? • Chiang’s (2005) constituency feature • Reward rule’s score if rule’ssource-side matches a constituent span • Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward) • Good idea -- Neg-result Yuval Marton, Dissertation Defense

New (Fine-Grained) Soft Syntactic Constraints • separate weighted feature for each constituent, e.g.: • NP-only: (NP=) • VP-only: (VP=) Yuval Marton, Dissertation Defense

New Constraint Conditions • VP-only, revisited: • We saw VP-match (VP= ):Reward exact match of a VP sub-tree span • We can also incur a penalty for crossing constituent boundaries, e.g.,VP-cross (VP+ ) Yuval Marton, Dissertation Defense

Constraint (Feature) Space • {NP, VP, IP, CP, …} x {match=,cross-boundary+} • Basic translation models: • For each feature, add (only it) to default feature set,assigning it a separate weight. • Feature “combo” translation models: • NP2 (double feature): add both NP=and NP+with a separate weight for each • NP_ (conflated feature) ties weights of NP=and NP+ • XP=, XP+, XP2, XP_:conflate all labels that correspond to“standard” X-bar Theory XP constituents in each condition. • All-labels= (Chiang’s), All-labels+, All-labels_,All-labels2 Yuval Marton, Dissertation Defense

Chinese-English Results • Replicated Chiang 2005 constituency feature(negative result) • NP=, QP+, VP+ up to .74 BLEU points better. • XP+, IP2, all-labels_, VP2, NP_, up to 1.65 BLEU points better. • Validated on the NIST MT08 test set BLEU score: higher=better*,**: sig. better than baseline+,++: better than Chiang-05 (replicated) Yuval Marton, Dissertation Defense

Arabic-English Results • New result for Chiang’s constituency feature(MT06, MT08) • PP+, AdvP= up to 1.40 BLEU better than Chiang’s and baseline. • AP2, AdvP2 up to 1.94 better. • Validated on the NIST MT08 test set *,**: sig. better than baseline+,++: better than Chiang-05 New! Yuval Marton, Dissertation Defense

PP+Example: Arabic MT06 Yuval Marton, Dissertation Defense

Arabic-English Results – MIRA Chiang, Marton and Resnik (2008) Previous problem of feature selection solved here: Yuval Marton, Dissertation Defense

Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained Soft syntactic constraints • In statistical machine translation • Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

Semantic Models • Forget Frege, alternative worlds, <e,t>, … • To model meaning of words, we can use • “Pure” models • Knowledge-based: Manually crafted linguistic resources(dictionary, thesaurus, taxonomies, WordNet) • Usage-based: Machine-generated distributional profiles(containing word co-occurrence-based information) • Hybrid models • Bias distributional profiles with soft semantic constraints • As we just saw with soft syntactic constraints • E.g, use thesaurus “concepts” as word senses, with which to alter co-occurrence counts in distributional profiles Yuval Marton, Dissertation Defense

Word-Based Distributional Profiles (DPs) tenure • Distributional Hypothesis (Harris 1940; Firth 1957) • DP (Context Vector) of “bank”:Which words “bank” occurs next to • Strength of association • Counts, PMI, TF/IDF-based, Log-likelihood ratios … • Vector similarity (cosine, L1, L2,..) bank linguist linguist money money river river teller teller water water … … α Yuval Marton, Dissertation Defense

Taxonomies and Groupings • WordNet • Synsets • Classical Relations (“is-a”) • Arc distance • “The tennis problem” • Thesaurus • Flat lists of related words • Potentially coarse • Implicit relations, potentially non-classical job Is-a Is-a Industry job Academic job Is-a Is-a CEO Professor Yuval Marton, Dissertation Defense

Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus FIN.INST bank, dollar, deposit, … RIVER bank, boat, wave, … • Word-based DP • Concept-based DP • Approximate senses • Aggregated • Coarse • “bank” is listed under several concepts • DP for each sense bank linguist linguist linguist money money money river river river teller teller teller water water water … … … Yuval Marton, Dissertation Defense

Concept-Based Distributional ProfilesMohammad & Hirst (2006) – Macquarie Thesaurus FIN.INST bank, dollar, deposit, … RIVER bank, boat, wave, … • How similar are “bank” and “wave”? • Compare all pairs of senses • FIN.INST, PHYSICS • FIN.INST, RIVER • RIVER, PHYSICS • RIVER, RIVER • Return closest sense pair • Problem: bank = wave ?? bank PHYSICS amp., wave, freq., … wave Yuval Marton, Dissertation Defense

New: Word/Concept Hybrid Model(Word Sense DP) bankRIVER RIVER bank, boat, wave, … • Given the word’s word-based DP and concept-based DPs: • Bias DP of “bank” towards DP of RIVER • Create bankFIN.INSTsimilarly, etc. bank linguist linguist linguist money money money river river river teller teller teller water water water … … … Yuval Marton, Dissertation Defense

Fine-Grained Soft Semantic Constraints • Hybrid models: best of all: fine-grained, sense-aware, widely applicable • bankFIN.INST≠bankRIVER≠waveRIVER! • Two hybrid flavors: • Hybrid-filtered • Hybrid-proportional Yuval Marton, Dissertation Defense

Evaluation: Word-Pair Similarity Task • Give each word pair a similarity score • Rooster – voyage: 0.12 • Coast – shore: 0.93 • Same part-of-speech pairs • Noun-noun (Rubinstein & Goodenough, 1965; Finkelstein et al. 2002) • Verb-verb (Resnik & Diab, 2000) • Result: list of pairs ordered by similarity • Evaluation metric: Spearman rank correlation Yuval Marton, Dissertation Defense

Word-Pair Similarity Results Yuval Marton, Dissertation Defense

Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained Soft syntactic constraints • In statistical machine translation • Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

Words  Phrases • Extend the word-based semantic similarity measures to “phrases” • she declined to provide any other information … • police refused to provide any other details … • So far: See if y is similar to xNow: Find y’s similar to x • Can solve other problems now! • Use these extended phrasal DPs to find good paraphrases of unknown “phrases” in machine translation models to provide any other bank information money declined teller details … Yuval Marton, Dissertation Defense

Coverage Problem in Statistical Machine Translation • Trained on parallel text • Every new test document contains some “phrases” unknown to the model Spanish Spanish Spanish Spanish Spanish Spanish Spanish Spanish English English English English English English English English Test set Spanish Spanish Spanish Spanish Spanish Spanish Spanish ?? Spanish Yuval Marton, Dissertation Defense

Previous Solution: Pivoting • Use other parallel texts to increase coverage • Drawback: Parallel text is a limited resources! French Spanish Spanish Spanish Spanish Spanish German’ German’ Spanish French’’ Spanish Spanish French’’ German English English English English English English English Spanish Spanish’ Spanish Spanish’’ Spanish Spanish English Test set Spanish Spanish Spanish Spanish Spanish Spanish’ Spanish’’ Spanish’’’ Yuval Marton, Dissertation Defense

New Solution: Monolingually-Derived Paraphrases • Use monolingual text to increase coverage • Resources available in abundance! Spanish Spanish Spanish Spanish Spanish Spanish Spanish Spanish English English English English English English English English Monolingual text Spanish Spanish Spanish Test set Spanish Spanish Spanish Spanish Spanish Spanish Spanish Spanish α Spanish Spanish’ Spanish’ Spanish’’ Spanish’’ Spanish’’’ Spanish’’’ Spanish’’’’ Yuval Marton, Dissertation Defense

Find Paraphrases • Gather all contexts L _ R for phrase “to provide any other”: • What else appears between L _ R ? Yuval Marton, Dissertation Defense

Find Paraphrases • Gather all contexts L _ R for phrase “to provide any other”: • What else appears between L _ R ? • Measure distributional similarity to each candidate, e.g.,“to provide any other” -- “to give further” Yuval Marton, Dissertation Defense

Paraphrase Examples (Phrases) Yuval Marton, Dissertation Defense

Paraphrase Examples (Unigrams) Yuval Marton, Dissertation Defense

Paraphrase Feature Model • Evidence reinforcement:If exist more than one fi paraphrases of f:Aggregate score with a “quasi-online updating”:asimi= asimi-1 + (1 – asimi-1) sim(fi,f), where asim0 = 0 Analogous to Callison-Burch et al. (2006) Yuval Marton, Dissertation Defense

English to Chinese Results • 29k line subset created to emulate low density language setting * better than baseline + better than non-hybridcounterpart Yuval Marton, Dissertation Defense

English-Chinese Translation Examples Yuval Marton, Dissertation Defense

Spanish to English Yuval Marton, Dissertation Defense

Comparison with Corpus Size & Pivoting Yuval Marton, Dissertation Defense

Road Map Hybrid models with soft constraints • Pure and hybrid models • Hard and soft constraints • Fine-grained Soft syntactic constraints • In statistical machine translation Soft semantic constraints • In word pair similarity tasks • In paraphrasing for statistical machine translation • Unified model Yuval Marton, Dissertation Defense

Unified Model • Soft linguistic constraints in a log-linear model • Syntactic • Semantic • … • ihi(x) • Constraints = Add moreihi(x) terms to the sum:ihi(x) + jhj(x) i: Weight / importance of feature i hi: Features / Constraints Yuval Marton, Dissertation Defense

Unified Model (Soft Syntactic Constraints) • Straightforward: if is a translation model,bias is syntactically, e.g., as follows: + jϕj(f,e) 1 If the source language where ϕj(f,e) = word sequence f is a VP. 0 Otherwise. Yuval Marton, Dissertation Defense

Unified Model (Soft Semantic Constraints)semantic distance of word e in sense s from word e’ in sense s’: cos(es,e’s’) = where: fSense(e,s,wi) fWord(e,wi) / ZC fSense(e,s,wi) fSense(e’,s’,wi) = cosSense(es,e’s’) / ZC fSense(e,s,wi) fWord(e’,wi) cross-terms cross-terms / ZC fWord(e,wi) fSense(e’,s’,wi) / ZC fWord(e,wi) fWord(e’,wi) = K cosWord(e,e’) Yuval Marton, Dissertation Defense

Main Contributions Fine-grained linguistic soft constraints Fine-grained linguistic soft constraints Fine-grained linguistic soft constraints Unified corpus-based model with soft linguistic constraints Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation in state-of-the-art end-to-end phrase-based SMT systems in state-of-the-art end-to-end phrase-based SMT systems Unified corpus-based model with soft linguistic constraints Semantic (Words) in word-pair similarity tasks Semantic (Phrases) in stat. machine translation Syntactic (Parsing) in stat. machine translation distributional paraphrase generation evidence reinforcement component Yuval Marton, Dissertation Defense

Thanks to… • Defense Committee: • Philip Resnik, Chair/Advisor • Amy Weinberg, Advisor • William Idsardi, Member • Chris Callison-Burch, Special Member (JHU) • Bonnie Dorr, Dean's Representative • Ling Chair: • Norbert Hornstein • Ling Cohort: • Ellen … Lau • Phil Monahan • Eri Takahashi • Rebecca McKeown • ChizuruNakao • CLIP Lab • David Chiang, SmaraMuresan, HendraSetiawan, Adam Lopez, Chris Dyer, AsadSayeed, VladEidelman, Zhongqiang Huang, Denis Filimonov, and many others! Yuval Marton, Dissertation Defense

Thank you! • Questions Yuval Marton, Dissertation Defense

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Presentation Transcript

Statistical Natural Language Processing

CS 388: Natural Language Processing: Statistical Parsing

Statistical Natural Language Processing

Statistical Natural Language Processing

Statistical Natural Language Processing

Constrained Conditional Models for Natural Language Processing

Natural Language Processing

Natural Language Processing

Fine-Grained Soft Semantic Constraints

Natural Language Processing

Declarative Learning Models for Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

CS 294-5: Statistical Natural Language Processing

Statistical Natural Language Processing

Natural Language Processing

Natural Language Processing Statistical Inference: n-grams

Statistical Learning Methods in Natural Language Processing

Soft Constraints: Exponential Models

Statistical Natural Language Processing

Constrained Conditional Models for Natural Language Processing