1 / 17

Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns

Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns. Yejin Choi and Claire Cardie (Cornell University) Ellen Riloff and Siddharth Patwardhan (University of Utah) EMNLP 2005. Introduction.

Télécharger la présentation

Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns Yejin Choi and Claire Cardie (Cornell University) Ellen Riloff and Siddharth Patwardhan (University of Utah) EMNLP 2005

  2. Introduction • Especially useful in opinion QA and summarization. Ex. How does X feel about Y? • The goal is to identify direct and indirect sources. Ex. International officers said US officials want the EU to prevail.

  3. Source Identification • View it as an information extraction task. • Tackle it using sequence tagging and pattern matching techniques. • Evaluate using NRRC corpus (Wiebe et al., 2005)

  4. Big Picture • This task goes beyond NER: role relationships. • Learning-based methods: graphical models and extraction pattern learning. • Hybrid method: CRF for NER and AutoSlog for information extraction.

  5. Semantic Tagging via CRF • A sequence of tokens x=x1x2…xn • A sequence of labels y=y1y2…yn • Label values ‘S’, ‘T’, ‘-’ • ‘S’: the first token of a source • ‘T’: a non-initial token of a source • ‘-’: tokens not part of any source

  6. CRF Algorithm (1) • G=(V,E), V: the set of random variables Y={Yi|1 <= i <= n}, n tokens in a input sentence. • E={(Yi-1, Yi)| 1 < i <= n} is the set of n-1 edges forming a linear chain. • For each edge: • For each node:

  7. CRF Algorithm (2) • The conditional probability of a sequence of labels y given a sequence of tokens x is: Zx is a normalization factor

  8. CRF Algorithm (3) • Maximize the conditional log-likelihood • Given a sentence x in the test data, the tagging sequence y is given by

  9. Features • The sources of opinions are mostly noun phrases. • The source phrases should be semantic entities that can bear or express opinions. • The source phrases should be directly related to an opinion expression.

  10. Features (1) • Capitalization features: all-capital, initial-capital • POS features: in a [-2, +2] window • Opinion lexicon feature: in a [-1,1] window, add subclass info such as moderately subjective

  11. Features (2) • Dependency tree features: Syntactic chunking and Opinion word propagation. • Semantic class features (Sundance shallow parser)

  12. Extraction Pattern Features • SourcePatt: whether a word activates any pattern (ex. Complained) • SourceExtr: whether a word is extracted by any pattern (ex. Jacques in “Jacques complained …”) • Freq and Prob added (4 extra features)

  13. Measures • Overlap match (OL): lenient • Head match (HM): conservative • Exact match (EM): strict • Precision, Recall, f-measure

  14. Baselines • Baseline1: label all phrases that belong to the semantic categories • Baseline2: Noun phrases + • NP is the subject of a verb phrase containing an opinion word • NP contains a possessive and is preceded by an opinion word • NP follows “according to” • NP follows “by” and attaches to an opinion word • Baseline3: NP + Baseline1 + Baseline2

  15. Experiment Results

  16. Error Analysis • Sentence boundary detector in GATE and parsing • Complex and Unusual sentence structure • Limited coverage of the opinion lexicon (slangs, idioms)

  17. Conclusion and Future Work • A hybrid approach • Make AutoSlog automatic • Both direct and indirect sources • Cross sentence boundary • Coreference • The strength of the opinion expression may be useful

More Related