1 / 5

Distant Supervision for Knowledge Base Population

Distant Supervision for Knowledge Base Population. Mihai Surdeanu, David McClosky , John Bauer, Julie Tibshirani , Angel Chang, Valentin Spitkovsky, Christopher Manning. Definition and Approach. We took part in TAC KBP 2010 this year (both tasks)

loren
Télécharger la présentation

Distant Supervision for Knowledge Base Population

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher Manning

  2. Definition and Approach • We took part in TAC KBP 2010 this year (both tasks) • Slot filling task: learning a pre-defined set of relations and attributes for target entities based on documents in a collection • “Warren Buffett began studying at the Warton School of Finance at the University of Pennsylvania, but transferred to the University of Nebraska where he graduated.” • (per:schools_attended, Warren Buffett, University of Pennsylvania) • (per:schools_attended, Warren Buffett, University of Nebraska • Distant supervision approach: generate training data automatically from Wikipedia infoboxes

  3. Training Evaluation Infobox KB KBP query: entity name Map infobox fields to KBP slots (one to many mapping) IR: find relevant sentences Query: entity name + trigger words IR: find relevant sentences Query: entity name + slot value Extract slot candidates Map KBP slots to fine-grained NE labels Classify candidates Extract +/- slot candidates Inference (greedy, local) Train multiclass classifier Extracted slots

  4. Results Training on 2/3 of infoboxes, evaluating on 1/3 Evaluating only on sentences that contain at least a valid slot Top 10 most common slots Total for all slots

  5. Challenges • Improve quality of data generated through distant supervision • Improve IR recall • Use relation-specific trigger words (or n-grams or dependency paths etc.) to boost sentences likely to contain answers to the top • How to acquire these automatically? • Better classifiers for noisy text (e.g., web snippets)

More Related