1 / 30

Exploiting Subjectivity Classification to Improve Information Extraction

Exploiting Subjectivity Classification to Improve Information Extraction. Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William Phillips University of Utah. Subjectivity ?.

darby
Télécharger la présentation

Exploiting Subjectivity Classification to Improve Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Subjectivity ClassificationtoImprove Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William Phillips University of Utah

  2. Subjectivity ? • Definition:Subjective languageexpresses or refers to opinions, emotions, sentiments and other private states. • Related Work: • Sentiments (Turney & Littman 2003; Dave, Lawrence, & Pennock 2003; Pang & Lee 2004) • Product Reputation Tracking (Morinaga et al. 2002; Yi et al. 2003) • Opinion Oriented Summarization and QA (Hu & Liu 2004; Yu & Hatzivassiloglou 2003) • Opinion - personal beliefs • Emotion - state of mind • Sentiments - positive/negative judgements

  3. Motivation • Our observation: many false hits produced by Information Extraction (IE) systems come from subjective sentences. • Hypothesis: we can improve IE performance by avoiding extractions from subjective sentences.

  4. Examples “D’Aubruisson unleashed harsh attacks onDuarte…” “The Parliamentexplodedinto fury against the government when word leaked out…” “The subversives must suspend the aggression against the people and thedestruction ofthe economy…”

  5. The Big Picture Subjective Sentence Classifier subjective sentences objective sentences Full Information Extraction Selective Information Extraction

  6. The Subjectivity Classifier • Most documents contain a mix of subjective and objective sentences • 44% of sentences in newspaper articles subjective! (Wiebe et al. 2004) • We used the Naïve Bayes subjective sentence classifier developed by Wiebe & Riloff [2005]. • Classifies at sentence level • unsupervised • rivals best supervised methods

  7. Initial Training Data Creation unlabeled texts subjective clues rule-based subjective sentence classifier rule-based objective sentence classifier subjective & objective sentences

  8. subjective clues subjective patterns objective patterns POS features training set Naïve Bayes Training Naïve Bayes Classifier extraction pattern learner Naïve Bayes training

  9. NB Confidence Measure CM =

  10. MUC-4 IE Task • To extract information about terrorist events in Latin America. • Evaluated performance on 4 types of information: • perpetrators (individuals), victims, targets, weapons • Corpus: 1700 texts • 1400 used for training, 100 for tuning, 200 for testing • Used Autoslog-TS to generate extraction patterns • system used 397 patterns

  11. Base IE System Performance System Rec Prec F #Correct #Wrong IE .52 .42 .47 266 367

  12. Filtering Subjective Sentences System Rec Prec F #Correct #Wrong IE .52 .42 .47 266 367 IE+SubjFilter .44 .44 .44 218 (-48) 273 (-94)

  13. Source Attribution Sentences • In news articles, factual information is often prefaced with a source attribution. Examples: “The Associated Press reported…” “The President stated…” • Source attribution sentences often contain important facts even if subjective language is also present.

  14. Source Attribution Modification • Keep the subjective sentences if they contain a source attribution. 1) the sentence contains a communication verb: {affirm, announce, cite, confirm, convey, disclose, report, tell, say, state } 2) the subjectivity classifier considers the sentence to be only weakly subjective (CM  25)

  15. Results with Source Attribution Modification System Rec. Prec. F #Correct #Wrong IE .52 .42 .47 266 367 IE+SubjFilter .44 .44 .44 218(-48) 273(-94) IE+SubjFilter2 .46 .44 .45 231(-35) 289(-78)

  16. Selective Filtering • We observed that subjective sentence can contain important facts. For example: “He was outraged by the terrorist attackon the World Trade Center.” • Modification: selectively extract information from subjective sentences • Done using Indicator Patterns.

  17. Indicator Patterns • We defined an indicator pattern as a pattern that has the following Autoslog-TS statistics : P(relevant | pattern)  0.65 and Frequency  10 • Indicator Patterns clearly represent a fact of interest • “murder ofX” • “Xwas assassinated” .

  18. System Rec Prec F #Correct #Wrong IE .52 .42 .47 266 367 IE+SubjFilter .44 .44 .44 218 (-48) 273 (-94) IE+SubjFilter2 .46 .44 .45 231 (-35) 289 (-78) IE+SF2+Slct .51 .45 .48 258 (-8) 311 (-56) Results for Selective Subjectivity Filtering

  19. Removing Subjective Extraction Patterns • Example: “….to destroythe building.” “…to destroythe process of reconciliation.” • Use subjectivity analysis to remove subjective patterns. • We classified a pattern as subjective if: 1) P(subjective | pattern) > .50 and 2) frequency  10

  20. Final Results System Rec Prec F #Correct #Wrong IE .52 .42 .47 266 367 IE+SubjFilter .44 .44 .44 218 (-48) 273 (-94) IE+SubjFilter2 .46 .44 .45 231 (-35) 289 (-78) IE+SF2+Slct .51 .45 .48 258 (-8) 311 (-56) IE+SF2+Slct -SubjEPs .51 .46 .48 258(-8) 305(-62)

  21. Subjectivity Filtering Combined with Topic Classification System Rec Prec IE .52 .42 IE w/Perfect TC .52 .53 IE w/Perfect TC + SubjFilter .51 .56

  22. Conclusions • Subjectivity filtering strategies improved IE precision with minimal recall loss. • The benefits of subjectivity classification are synergistic with those of topic classification. • As subjectivity classification improves, we expect corresponding improvements to IE.

  23. IE Evaluation • Performed at extraction level, before template generation Standard IE System Slot Extraction Component Template Generation Component texts extracts

  24. We defined an indicator pattern as a pattern that has the following Autoslog-TS statistics : P(relevant | pattern)  0.65 and Frequency  10 • Using only the indicator patterns for IE not sufficient. Rec Prec F IE .52 .42 .47 IE (Indicators Only) .40 .54 .46

  25. IE System • We used Autoslog-TS to generate extraction patterns. • 40,553 distinct patterns were learned • We manually reviewed top patterns (2808 patterns) • The final system used 397 patterns.

  26. Examples of Filtered Extractions • The demonstrators, convoked by the solidarity with Latin America Committee, verbally attackedSalvadoran President Alfredo Cristiani and have asked the Spanish government to offer itself as a mediator to promote and end to the armed conflict. PATTERN: attacked <dobj> VICTIM: “Salvadoran President Alfredo Cristiani”

  27. Examples of Filtered Extractions • The crime was directed at hindering the development of the electoral process and destroyingthe reconciliation process… PATTERN: destroying <dobj> TARGET: “the reconciliation process” • Presidents, political and social figures of the continent have said that the solution is not based on the destruction ofanative plant but in active fight against drug consumption. PATTERN: destruction of <np> TARGET: “a native plant”

  28. Breakdown by Extraction Type Category Baseline SubjFilter Rec Prec Rec Prec Perp .47 .33 .45 .38 Victim .51 .50 .50 .52 Target .63 .42 .62 .47 Weapon .45 .39 .43 .42 Total .52 .42 .51 .46

  29. Subjective Patterns The following extraction patterns were classified as subjective: attacks on <np> to attack <dobj> communique by <np> to destroy <dobj> <subj> was linked leaders of <np> <subj> unleashed was aimed at <np> offensive against <np> dialogue with <np>

  30. Metaphor • False hits can come from subjective sentences that contain metaphorical language. The Parliamentexplodedinto fury against the government when word leaked out…

More Related