1 / 35

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction. Kiyoshi Sudo Ph.D. Research Proposal New York University. Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed. Outline. Introduction Research Proposal Problem Setting Approach

Angelica
Télécharger la présentation

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Acquisition ofLexical Classes and Extraction Patternsfor Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed

  2. Outline • Introduction • Research Proposal • Problem Setting • Approach • Application to Information Extraction • Discussion Kiyoshi Sudo Thesis Proposal Presentation

  3. MUC Scenario Template Task MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism. Kiyoshi Sudo Thesis Proposal Presentation

  4. Monday Masked gunmen six people Kalashnikov rifles a Christian school three MUC Scenario Template Task MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian schoolMonday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism. Kiyoshi Sudo Thesis Proposal Presentation

  5. High Cost forAcquiring Knowledge-Base • Find extraction patterns • Find relevant documents • Find relevant events • Analyze sentences • Find domain-specific lexicon • Find existing KB (e.g. thesaurus, gazetteers) Kiyoshi Sudo Thesis Proposal Presentation

  6. Prior Work Automatic Knowledge Acquisition Lexical Acquisition Pattern Acquisition Mutual Bootstrapping (Riloff and Jones 1999) Pattern Discovery with Document Re-ranking (Yangarber et al. 2000) Simultaneous Multi-Semantic Class (Thelen and Riloff 2002) (Yangarber et al. 2002) Pattern Acquisition for QA (Ravichandran and Hovy 2002) Kiyoshi Sudo Thesis Proposal Presentation

  7. MUC-3: Terrorism Event Challenge User Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set Knowledge Base Kiyoshi Sudo Thesis Proposal Presentation

  8. Semantic Clustering Scenario Description Semantic Cluster Meeting the Challenge User Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set Knowledge Base Kiyoshi Sudo Thesis Proposal Presentation

  9. Semantic Clustering Scenario Description Semantic Cluster Semantic Lexicon Extraction Patterns Semantic Clustering • Input: • Description specific enough • to define the scenario • (terrorism, bombing, kidnapping) • “Tell me about the terrorism action, • such as bombing and kidnapping.” • Goal: • Find Scenario-specific Semantic Clusters • each of which consists of • Semantic Lexicon • Extraction Patterns Kiyoshi Sudo Thesis Proposal Presentation

  10. Semantic Clustering Scenario Description Semantic Cluster Benefit for User • Simplify Domain Analysis • Low-cost Knowledge-base Acquisition for IE systems Kiyoshi Sudo Thesis Proposal Presentation

  11. (x, bombs, himself) Sequential: context = Case Frame: (bomb (v), x (subj), himself (obj)) Dependency: x bomb himself Extraction Patterns • Definition where cunifies with the context that is defined by semantic class L V:subj V:obj (cf. Sudo et al. 2001) Kiyoshi Sudo Thesis Proposal Presentation

  12. Outline • Introduction • Research Proposal • Problem Setting • Approach • Information Extraction • Evaluation Kiyoshi Sudo Thesis Proposal Presentation

  13. Source Information Retrieval Scenario Description Boot- strapping Query Expansion Semantic Cluster Overview Semantic Clustering Kiyoshi Sudo Thesis Proposal Presentation

  14. Source Information Retrieval Scenario Description Boot- strapping Query Expansion Semantic Cluster Overview Semantic Clustering Kiyoshi Sudo Thesis Proposal Presentation

  15. Information Retrieval • Get Relevant Document set • Get list of lexical items and extraction patterns ordered by relevance to the scenario • TF/IDF scoring R Kiyoshi Sudo Thesis Proposal Presentation

  16. Example of TF/IDF scoring(Management Succession: Business) 300 documents retrieved From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998) Kiyoshi Sudo Thesis Proposal Presentation

  17. Source Information Retrieval Scenario Description extraction patterns lexicon Boot- strapping Query Expansion Semantic Cluster Overview Semantic Clustering Kiyoshi Sudo Thesis Proposal Presentation

  18. Bootstrapping Assumption: • Patterns provide Lexical Classes. • Lexicon provides contextual information. • Find one cluster that consists of Lexicon and Extraction Patterns Riloff and Jones 1999 Agichtein and Gravano 2000 Kiyoshi Sudo Thesis Proposal Presentation

  19. Bootstrapping (Cont.) • Algorithm (cf. Riloff and Jones 1999) • Given • the ordered list of terms • the ordered list of extraction patterns • Lexicon = (), Pattern = () • w the most relevant term in the list and add it into Lexicon • p the most relevant pattern among those that extract w. • Add p into Pattern • wthe most relevant term among those that are extracted by p • Add w into Lexicon • Go to 1 Kiyoshi Sudo Thesis Proposal Presentation

  20. Example of Bootstrapping(Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998) Kiyoshi Sudo Thesis Proposal Presentation

  21. Example of Bootstrapping(Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998) Kiyoshi Sudo Thesis Proposal Presentation

  22. Problem:Polysemous Lexicon, Pattern • Lexicon can be ambiguous • e.g. Clinton (Person, Organization, Location … ) • Extraction patterns can be ambiguous • e.g. be killed in <x> (x: Location, Date … ) • Needs more study • more restriction • Probabilistic Model ?? Kiyoshi Sudo Thesis Proposal Presentation

  23. Scenario Description pt lex pattern Semantic Cluster lexicon Overview Semantic Clustering Source Information Retrieval Boot- strapping Query Expansion Kiyoshi Sudo Thesis Proposal Presentation

  24. Query Expansion • Generalize terms in a query with a newly discovered cluster • cf. Rocchio 1971 (Vector model) • Zhai and Lafferty 2001 (Language-modeling) Kiyoshi Sudo Thesis Proposal Presentation

  25. Scenario Description pt lex pattern Semantic Cluster lexicon Overview Semantic Clustering Source Information Retrieval Boot- strapping Query Expansion Kiyoshi Sudo Thesis Proposal Presentation

  26. Outline • Introduction • Research Proposal • Problem Setting • Approach • Application to Information Extraction • Discussion Kiyoshi Sudo Thesis Proposal Presentation

  27. Semantic Clustering Preprocessing Scenario Description Entity Recognition Event Recognition Role Assignment Semantic Cluster Pattern Matching Semantic Lexicon Merging Extraction Patterns Application toInformation Extraction Kiyoshi Sudo Thesis Proposal Presentation

  28. Human Intervention • Extraction patterns • Event pattern • Context contains a verb or nominalization of verb • Used for event extraction and role assignment • e.g. (terrorist, fire, x) • Local pattern • Context contains only enough information to recognize semantic class • Used for entity recognition only • e.g. (x,Inc.) • Association of Event Pattern to Role • e.g. (company, hire, x)PersonIn and (company, fire, x)PersonOut Kiyoshi Sudo Thesis Proposal Presentation

  29. Outline • Introduction • Research Proposal • Problem Setting • Approach • Application to Information Extraction • Discussion Kiyoshi Sudo Thesis Proposal Presentation

  30. Discussion • Domain Portability • User only needs to specify the scenario • Language Portability • Language-dependent Tools • Segmentation (Lemmatization) • Dependency Parsing Kiyoshi Sudo Thesis Proposal Presentation

  31. Evaluation • MUC-style (Scenario-Template task) • Slot-base • Precision, Recall, F-measure • Domain Portability • Several pre-defined tasks that differ in difficulty • Language Portability • Japanese • English Kiyoshi Sudo Thesis Proposal Presentation

  32. Contribution • Tool for Domain Analysis • Low-cost Knowledge-base Acquisition • Towards Open-domain Information Extraction Kiyoshi Sudo Thesis Proposal Presentation

  33. Conclusion • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability) Kiyoshi Sudo Thesis Proposal Presentation

  34. ToDo • Implementation • Preparation for Evaluation • Evaluation Kiyoshi Sudo Thesis Proposal Presentation

  35. Time for Questions(Conclusion) • Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) • Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) • Discussed Evaluation with several predefined MUC-style tasks different in difficulty and across languages (Domain portability and Language portability) Kiyoshi Sudo Thesis Proposal Presentation

More Related