Distant Supervision for Knowledge Base Population

Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher Manning

Definition and Approach • We took part in TAC KBP 2010 this year (both tasks) • Slot filling task: learning a pre-defined set of relations and attributes for target entities based on documents in a collection • “Warren Buffett began studying at the Warton School of Finance at the University of Pennsylvania, but transferred to the University of Nebraska where he graduated.” • (per:schools_attended, Warren Buffett, University of Pennsylvania) • (per:schools_attended, Warren Buffett, University of Nebraska • Distant supervision approach: generate training data automatically from Wikipedia infoboxes

Training Evaluation Infobox KB KBP query: entity name Map infobox fields to KBP slots (one to many mapping) IR: find relevant sentences Query: entity name + trigger words IR: find relevant sentences Query: entity name + slot value Extract slot candidates Map KBP slots to fine-grained NE labels Classify candidates Extract +/- slot candidates Inference (greedy, local) Train multiclass classifier Extracted slots

Results Training on 2/3 of infoboxes, evaluating on 1/3 Evaluating only on sentences that contain at least a valid slot Top 10 most common slots Total for all slots

Challenges • Improve quality of data generated through distant supervision • Improve IR recall • Use relation-specific trigger words (or n-grams or dependency paths etc.) to boost sentences likely to contain answers to the top • How to acquire these automatically? • Better classifiers for noisy text (e.g., web snippets)

Distant Supervision for Knowledge Base Population

Distant Supervision for Knowledge Base Population

Presentation Transcript

Modeling Missing Data in Distant Supervision for Information Extraction

LCC’s Approaches to Knowledge Base Population

Enterprise Knowledge Base

UNC Knowledge Base

NAPUS KNOWLEDGE BASE

Event Extraction Using Distant Supervision

Passage Retrieval for Information Extraction using Distant Supervision

Knowledge Base

OCLC Knowledge Base:

Text Analysis Conference Knowledge Base Population 2013

Knowledge Base

Knowledge Base Content

Responder Knowledge Base

Knowledge Base

TAC 2012 Cold Start Knowledge Base Population

Distant Supervision for Relation Extraction without Labeled Data

Knowledge Base

Knowledge Base Tuning

Text Analysis Conference Knowledge Base Population 2013