1 / 46

CERATOPS Center for Extraction and Summarization of Events and Opinions in Text

CERATOPS Center for Extraction and Summarization of Events and Opinions in Text. Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U. Utah. Overview. Rapidly re-trainable, robust components for: Information extraction of facts and entities related to events from text

raphael
Télécharger la présentation

CERATOPS Center for Extraction and Summarization of Events and Opinions in Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CERATOPSCenter for Extraction and Summarization of Events and Opinions in Text Janyce Wiebe, U. Pittsburgh Claire Cardie, Cornell U. Ellen Riloff, U. Utah

  2. Overview Rapidly re-trainable, robust components for: • Information extraction of facts and entities related to events from text • Extraction of opinions and motivations expressed in text • Tracking, linking, and summarizing events and opinions and their progressions over time

  3. Rapid semantic processing of large volumes of unstructured text Automatic merging of facts and entity relationships across sets of documents Automatic population of large databases with factual information from many text sources Motivation for Event IE Systems

  4. Information Extraction from Text After a brief lull, the avian flu is on the march again through Fraser Valley poultry farms. The Canadian Food Inspection Agency says ongoing surveillance efforts have led to the detection of bird flu on 36 commercial premises. The agency says it is continuing depopulation efforts on infected farms on a priority basis. OUTBREAK Disease: Victims: Location: Country: Status: Containment: / bird flu / 36 commercial premises Canada confirmed avian flu poultry Fraser Valley poultry farms depopulation

  5. Keywords and named entity recognition are not sufficient. Troops were vaccinated against anthrax, cholera, … Researchers have discovered how anthrax toxin destroys cells and rapidly causes death ... Information Extraction of Events Extracting facts and entity relations associated with events of interest. Terrorist incidents: perpetrators, victims, physical targets, weapons, date, location Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures

  6. Syntactic Analysis Extraction Coreference Resolution Template Generation 3 chickens died from avian flu. 3 chickens died from avian flu. SUBJVPPP Fact: DEATH Victim: 3 chickens Disease: avian flu 3 chickens died from avian flu. The birds were found in Canada. Event: Outbreak Victim: 3 chickens / the birds Disease: bird flu Country: Canada

  7. kidnapper, arsonist, assassin agent (perpetrator) casualty, fatality, victim theme (victim) Disease Reports: toddler, girl, boy victim Crime Reports: restaurant, store, hotel location New Approach: Role-Identifying Nouns Lexically role-identifying nouns are defined by the role that the noun plays in an event. Semantically role-identifying nouns strongly evoke one event role in a domain based on semantics. (Intuition from Grice’s Maxim of Relevance)

  8. Unannotated Texts Ex: murderer, sniper, criminal Bootstrapped Learning of Role-Identifying Nouns Ex: assassin, arsonist, kidnapper Ex: <subj> was arrested killed by <np> Best Extraction Patterns Best Extractions (Nouns)

  9. <subject> was kidnapped by <np> in <np> victim perpetrator location • But sometimes, a verb identifies a role player in an event without identifying the event! <subject> participated <subject> was implicated perpetrator Role-Identifying Expressions • Typically, a verb refers to an event and the verb’s arguments identify the role players:

  10. relevant event nouns Bootrapped Learning of Role-Identifying Expressions event STEP 1 STEP 2 AutoSlog Basilisk event extraction patterns event nouns Candidate RIE Pattern Generator candidate RIE patterns

  11. Learning to Extract Perpetrators [Phillips & Riloff, RANLP-07] Role-Identifying Nouns: assailants, attackers, cell, culprits, extremists, hitmen, kidnappers, militiamen, MRTA, narco-terrorists, sniper Event-Specific Patterns: was kidnapped by <np> was killed by <np> Role-Identifying Patterns: EVENT was perpetrated by <np> <subject> was involved in EVENT

  12. Decoupling Relevant Region Identification and Extraction Local pattern matching has two drawbacks: • Facts can be missed if they do not occur with the event description. • False hits can be generated from irrelevant contexts. …the explosion ripped through the busy neighborhood in New Delhi. A bombwas found under a parked car… • Solution: • Identify relevant text regions. • Apply general, but semantically appropriate patterns

  13. pattern IE Pattern Learning with Relevant Regions and Semantic Affinity [Patwardhan & Riloff, EMNLP-07] relevant & irrelevant texts Self-training SVM Classifier Semantic Affinity Pattern Learner Relevant Region Classifier Relevant Sentences IE Patterns IE System Extractions

  14. Learned Extraction Patterns

  15. CERATOPS Text Extraction and Data Visualization for Animal Health Surveillance • Collaborative project between CERATOPS, PURVAC, and the Veterinary Information Network (VIN), with funding from LLNL. • Goal: proof-of-concept of an end-to-end NLP-based visual analytics system for unstructured text.

  16. Animal Health Surveillance Monitoring animal health is important to DHS’ mission: • 73% of emerging infectious diseases are zoonotic in origin. • Pets can provide early warning signs of disease outbreaks and exposures to toxic substances. • Adverse pet reactions can be early indicators of food chain contamination.

  17. The Veterinary Information Network • VIN is the largest on-line community, information resource, and on-line continuing education source for veterinarians. Over half of all veterinarians in the U.S. use VIN! • VIN hosts message boards where veterinarians discuss what they are seeing in their practices. 15 years of message board data has been archived! • VIN built a database of semantic information associated with pet health to support search. • Paul Pion, DVM, President and co-founder of VIN, and served as our consultant.

  18. NLP fact fact fact… CERATOPS NLP-based Visual Analytics

  19. Prototype System for We produced a prototype IE system to extract and visualize diseases, victims, dates, and locations from ProMed-mail disease outbreak reports. • Used the VIN database (248,108 entries) to create 3 new dictionaries for text analysis: • syntactic and semantic lexicon • phrasal lexicon • synonym dictionary • Enhanced the template generation process to use new types of semantic information. • Converted our IE templates into a format appropriate for Purdue’s visualization system.

  20. ProMed-mail Visualization Output

  21. NLP-based Visual Analytics for Animal Health Surveillance • Rapid identification of new disease outbreaks. • Trends or spikes in disease outbreaks. • Unusual symptoms or clusters of symptoms. • Statistical associations between foods & adverse pet reactions. • Improved diagnostic tools to associate symptoms with diseases and external events. Future Goals:

  22. CERATOPS Semantic Class Learning from the Web [Kozareva, Riloff, & Hovy, ACL-08] • Goal: automatically create semantic dictionaries • Use a doubly-anchored hyponym pattern: <class name> such as <class member> and * • Construct pattern linkage graphs to capture the popularity and productivity of candidate terms and rank them. • Produces very accurate results with truly minimal supervision (class name and one seed)

  23. Semantic Class Learning Results

  24. Chain1: Chain2: Chain3: Chain4: U.S. State Dept. President Bush NIH Inspector General Coreference Resolution • Links entities, events, and opinions within and across documents

  25. Queen Elizabeth her [Queen Elizabeth], set about transforming [her] [husband] , [King George VI], … coref? coref? Clustering Algorithm coref? coref? husband King George VI coref? Build on Prior Work in NP Coreference Resolution • Classification • given a description of two noun phrases, NPiand NPj, classify the pair as coreferentor not coreferent • Clustering • coordinates pairwise coreference decisions E.g., Ng & Cardie ACL [2002]

  26. Partially Supervised Clustering for Source Coreference Resolution [Stoyanov & Cardie, EMNLP 2006] Labels for non-source NPs are unavailable Australian press has launched a bitter attack on Italy after seeing theirbelovedSocceroos eliminated on a controversial late penalty. ItaliancoachLippi has also been blasted for his comments after the game. In the opposite camp Lippi is preparing his side for the upcoming game with Ukraine. Hehailed 10-man Italy's determination to beat Australia and said the penalty was rightly given.

  27. State-of-the-Art Coreference Resolution • Cornell, Utah, & LLNL are collaboratively building a state-of-the-art coreference resolver based on the best features identified in prior work. • We plan to make the system publicly available. • On-going work and future plans include: • systematic evaluations of coreference subproblems • incorporating external knowledge about entities • non-anaphoric NP identification • unsupervised, automatic training • topic coreference for opinion analysis

  28. THE END

  29. Overview • Text analysis to support a broad range of knowledge discovery tasks • Automatic annotators that assign semantic and conceptual labels to words, phrases, and documents • Automaticallyextracting, summarizing and tracking information about eventsand opinions

  30. NLP fact fact fact… fact fact fact fact fact fact fact fact fact… fact… fact… fact… Images NLP NLP

  31. headlines

  32. Document Text One person was killed when a small bomb exploded at a police station in Basra town in Iraq's politically volatile southern region on Wednesday, residents said. The bomb was the first to hit an urban area since the riots in the southern city on May 16. The terrorist group Al Qaeda claimed responsibility for the attack, and… Event Weapon: a small bomb Location: Basra Victim: One person Perpetrator Org: Al Qaeda Physical Target: police station Event-oriented IE Goal of IE system: extract facts associated with events from unstructured text

  33. Full

  34. Keywords and named entity recognition are not sufficient. Troops were vaccinated against anthrax, cholera, … Researchers have discovered how anthrax toxin destroys cells and rapidly causes death ... Information Extraction for Events Extracting facts and entity relations associated with events of interest. Terrorist incidents: perpetrators, victims, physical targets, weapons, date, location Disease outbreaks: disease, organisms, victim, symptoms, location, country, date, containment measures

  35. Fact Extraction: Example After a brief lull, the avian flu is on the march again through Fraser Valley poultry farms. The Canadian Food Inspection Agency says ongoing surveillance efforts have led to the detection of bird flu on 36 commercial premises. The agency says it is continuing depopulation efforts on infected farms on a priority basis. OUTBREAK Disease: Victims: Location: Country: Status: Containment: / bird flu / 36 commercial premises Canada confirmed avian flu poultry Fraser Valley poultry farms depopulation

  36. Text 3 chickens died from avian flu. 3 chickens died from avian flu. SUBJVPPP Syntactic Analysis Fact: DEATH Victim: 3 chickens Disease: avian flu Semantic Extraction 3 chickens died from avian flu. The birdswere found in Canada. Coreference Resolution Event: Outbreak Victim: 3 chickens / the birds Disease: bird flu Country: Canada Relation and Event Analysis

  37. Extraction of Opinions

  38. Pattern Learner Patterns & Statistics (More) Patterns & Statistics keywords Relevant Region Identifier Pattern Learner IE Pattern Bootstrapping Process

  39. Unannotated Texts Semantic Dictionary Bootstrapping Ex: anthrax, ebola, cholera, flu, plague Ex: outbreak of <NP> Best Extraction Pattern(s) Ex: smallpox, tularemia, botulism Extractions (Nouns)

  40. Semantic Learning Case Study • Input to Basilisk: 10 common disease names • Of the top 200 words hypothesized to be diseases: 89 were already in the UMLS metathesaurus (32,000 names of diseases and organisms), but 111 were not! Including: adenomatosis tularaemia tularamia diarrhoea diphtheriae enterovirus-71 fibropapillomas gastroeneteritis h5n1 h7n3 ev71 yf jyf nvcjd pepmv wsmv flu kawasaki mad-cow-disease smut pertussis pleuro-pneumonia polioencephalomyelitis poliovirus

  41. Learning Subjective Phrases • Using Information Extraction techniques

  42. Extractions expressed <dobj> condolences, hope, grief, views, worries indicative of <np> compromise, desire, thinking inject <dobj> vitality, hatred reaffirmed <dobj> resolve, position, commitment voiced <dobj> outrage, support, skepticism, opposition, gratitude, indignation show of <np> support, strength, goodwill, solidarity <subj> was shared anxiety, view, niceties, feeling

  43. <subj> was expected 45 0.42 was expected from <np> 5 1.00 <subj> put 187 0.67 <subj> put end 10 0.90 <subj> talk 28 0.71 talk of <np> 10 0.90 <subj> is talk 5 1.00 <subj> is fact 38 1.00 fact is <dobj> 12 1.00 Subjective Expressions as IE Patterns PATTERNFREQP(Subj | Pattern) <subj> asked 128 0.63 <subj> was asked 11 1.00

  44. Conclusions Rapidly re-trainable, robust components for • Information extraction of facts and entities • Extraction of opinions • Tracking, linking, and summarizing events and opinions and their progressions over time

  45. Current Work: Topics • Topic coreference resolution • Treat as an NP coreference resolution task • Modify our existing NP coref approach • Initial results look promising

More Related