1 / 37

Fine-Grained Soft Semantic Constraints

Fine-Grained Soft Semantic Constraints. Yuval Marton University of Maryland http://umiacs.umd.edu/~ymarton/pub/umanch/Hybrid Knowledge-CorpusBasedSem-Manchester_090614.ppt. Why Care?. Tell’em apart: These, too:. FOX. FOX = FOX = FO rkhead/winged-heli X replicator gene. Road map.

vic
Télécharger la présentation

Fine-Grained Soft Semantic Constraints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland http://umiacs.umd.edu/~ymarton/pub/umanch/Hybrid Knowledge-CorpusBasedSem-Manchester_090614.ppt

  2. Why Care? Tell’em apart: These, too: Yuval Marton, U Manchester talk

  3. FOX • FOX = • FOX = FOrkhead/winged-heliX replicator gene Yuval Marton, U Manchester talk

  4. Road map • Brief overview of doctoral work • Hybrid knowledge / corpus-based semantic similarity methods • Pure and hybrid methods • Hard and soft constraints • Fine-grained • Named-entities Yuval Marton, U Manchester talk

  5. Dissertation Theme • Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Syntactic and Semantic Constraints • Soft Constraints • Fine-Grained • Syntactic (parsing) • Semantic (“concepts”, paraphrases) • Evaluated in • Word-pair similarity ranking and • Statistical Machine Translation (SMT) Yuval Marton, U Manchester talk

  6. Univ. Hard Univ. Soft Soft Constraints • Hard constraints • [0,1]; in/out • Decrease search space • “structural zeroes” • Theory-driven • Faster, slimmer • Soft constraints • [0..1]; fuzzy • Only bias the model • Data-driven: Let patterns emerge Yuval Marton, U Manchester talk

  7. Fine-grained • Granularity is a big deal • Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Neg results  pos results • Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Pos results  better results Yuval Marton, U Manchester talk

  8. Soft Syntactic Constraints • X  X1speech ||| X1espiche • What should be the span of X1? • Chiang’s 2005 constituency feature • Reward rule’s score if rule’s source-side matches a constituent span • Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward) • Good idea -- Neg-result • But what if… Yuval Marton, U Manchester talk

  9. Rule granularity • Chiang: Single weight for all constituents (parse tags) • … But what if we can assign a separate feature and weight for each constituent? • E.g., NP-only: (NP=) • Or VP-only: (VP=) Yuval Marton, U Manchester talk

  10. Fine-grained • Granularity is a big deal •  Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Neg results  pos results • Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Pos results  better results Yuval Marton, U Manchester talk

  11. Word-pair similarity ranking • Give each word pair a similarity score • Rooster – voyage • Coast – shore • Noun-noun (Rubinstein & Goodenough, 1965) • Verb-verb (Resnik & Diab, 2000) • Result: list of pairs ordered by similarity • Spearman rank correlation Yuval Marton, U Manchester talk

  12. Similarity measures • Distributional profiles (DP) • Which words did I occur next to? • Context vectors • Similar vectors  similar meaning Yuval Marton, U Manchester talk

  13. Bank (pure word-based) Bank Yuval Marton, U Manchester talk

  14. Bank (pure concept-based) Bank Teller Money … River Bank Water … • Compare closest senses • Bankriver= water ?? Yuval Marton, U Manchester talk

  15. Bank (Hybrid Model) BankFin.Inst BankRiver Yuval Marton, U Manchester talk

  16. Fine-grained • Granularity is a big deal •  Soft syntactic constraints in SMT • Chiang 2005 vs. Marton and Resnik 2008 • Neg results  pos results •  Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009 • Pos results  better results Yuval Marton, U Manchester talk

  17. Unified Model • Soft constraints in a log-linear model • Syntactic • Semantic • … • ihi(x) • Add more terms to the sum Yuval Marton, U Manchester talk

  18. Road map • Brief overview of doctoral work • Hybrid knowledge / corpus-based semantic similarity methods • Pure and hybrid methods • Hard and soft constraints • Fine-grained • Named-entities Yuval Marton, U Manchester talk

  19. Distributional profiles (DPs) • DPW: word-based distributional profile • First order • Distributional Hypothesis (Harris 1940; Firth 1957) • Second order (vector representation) • Strength of association • Counts, PMI, TF/IDF-based, Log-likelihood ratios … • Vector similarity (cosine, L1, L2,..) Yuval Marton, U Manchester talk

  20. Taxonomies and Groupings • WordNet • Synsets • Relations (“is-a”) • Arc distance • UMLS • Thesaurus • Flat • Coarse • Bankriver= water ?? job Is-a Is-a Industry job Academic job Is-a Is-a CEO Postdoc Yuval Marton, U Manchester talk

  21. Hybrid measures • WordNet • Resnik’s method (info content) • Lin and others • Thesaurus Concept-based • Mohammad and Hirst (coarse-grained) • word may be listed under several concepts • Distance b/w most similar senses • Pro: Resource-poor languages and domains • Con: Small thesaurus  low applicability • WCCM: Financial instit. ~ academic instit. • Bankriver= water ?? Yuval Marton, U Manchester talk

  22. WCCM: Concept-Word matrix • WCCM: word-concept collocation matrix • DPC: concept-based distributional profile • Potentially iterative process • Clean-up Yuval Marton, U Manchester talk

  23. Bank Use concept-based DPCs to bias word-based DPWs + = Yuval Marton, U Manchester talk

  24. Fine-grained soft constraints • DPWS: distributional profile of word senses • Use concept-based DPCs to bias word-based DPWs • Hybrid-filtered • Hybrid-proportional Yuval Marton, U Manchester talk

  25. Hybrid-filtered Filter out collocates in DPW, if not appearing in DPC Yuval Marton, U Manchester talk

  26. Hybrid-proportional Only discount collocate’s value in DPW, in proportion to the ratio of its count in current DPC relative to all DPCs of the target word Yuval Marton, U Manchester talk

  27. WSD with DPWS • Each sense of each word has a unique profile • Bankfin.inst≠ Bankriver≠ water ! • Pro: • Not aggregated: DPC profiles are • Non/less smearing: DPW profiles smear all senses in a single profile Yuval Marton, U Manchester talk

  28. Results Yuval Marton, U Manchester talk

  29. evaluation • Word-pair similarity ranking • Spearman Rank correlation • Paraphrasing in SMT • BLEU, TER, METEOR, .. Yuval Marton, U Manchester talk

  30. comparison • WordNet results • LSA results Yuval Marton, U Manchester talk

  31. Challenges • Antonyms (black – white) • “Hyperonyms” (vehicle – car) • Co-hypernyms / co-taxonyms Yuval Marton, U Manchester talk

  32. Named Entities • Challenges: • Bush – Obama • Potentially helpful: • H2O – Water • FOX – “forkhead/winged-helix replicator” • FOXP2 – SPCH1 • “SPCH1” turned out to be a member of the FOX (forkhead/winged-helix replicator genes) family, of which several other genes are known all across the animal world. It was then labeled FOXP2, that being its current, and more conventional, name. Yuval Marton, U Manchester talk

  33. Biomedical/Chemical WSD • Explore hybrid methods to create DPWS • FOXgene , FOXanimal • requires a lexical resource • UMLS or other resources • Useful for smaller training sets! Yuval Marton, U Manchester talk

  34. Univ. Soft conclusion • Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Constraints • Soft Constraints • Fine-Grained • Semantic (“concepts”) • resource-poor setting, special domains Yuval Marton, U Manchester talk

  35. Thank you! Questions? ymarton@umiacs.umd.edu Advisors: Philip Resnik & Amy Weinberg Department of Linguistics and CLIP Lab Yuval Marton, U Manchester talk

  36. Fine-grained semantic • Word-based: • Bank: river, money, water, teller, … • “concept”-based • River: water, bank, boat, … • Financial institution: bank, money, teller,… • Humans compare closest senses • Bankriver= water ?? • Hybrid: • Bankriver: more strongly associated with water • Bankfin.inst: more strongly associated with money Yuval Marton, U Manchester talk

  37. SMT • Statistical Machine Translation • What translational units to use? • Syntactic constituents, re-ordering • “es gibt” • Paraphrases • Pivoting vs. bitext-free paraphrasing • Typically monolingual • Translation = bilingual / cross-domain paraphrasing • Can be evaluated in SMT Yuval Marton, U Manchester talk

More Related