100 likes | 248 Vues
This document presents an overview of the linguistic resources utilized in the 2012 TAC KBP Entity Linking Evaluations. It covers source data in English, Chinese, and Spanish, along with annotator guidelines and labeled training data. The methodology includes annotation tasks focused on name string selection, knowledge base linking, and NIL coreference handling. Updated tools have enhanced efficiency and query challenge. The outcomes and goals of the workshop held by NIST on November 5-6, 2012, are discussed, highlighting improvements and future directions for annotation processes.
E N D
Linguistic Resources for the 2012 TAC KBP Entity Linking Evaluations Joe Ellis (presenter), Xuansong Li, Brendan Callahan, Stephanie Strassel, Linguistic Data Consortium University of Pennsylvania, USA
Outline • English, Chinese and Spanish source data • Annotator and assessor guidelines • Labeled training and evaluation data • Annotation Tasks and Methodologies • Namestring Selection • KB Linking • NIL Coreference • Linguistic Resources for 2012 Entity Linking TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Source Corpus – 2012 TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
KB and Guidelines Knowledge Base Corpus Guidelines • Annotator GUI and pipeline revised to improve efficiency and quality over previous years • Enhanced ability to select ambiguous and varied queries • Resulted in more challenging queries • Available at: • http://www.nist.gov/tac/2012/KBP/task_guidelines/index.html TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Existing EL Training Data TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
New EL Training & Eval Data TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Entity Linking Overview Stage 1: Select name strings and ref docs Stage 3: Co-reference NIL entities Stage 2: Link namestrings to KB or mark as NIL TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Entity Linking –Stage 1 • Run named entity taggers over source corpora* • Provides guided search through the corpus • Namestring Selection • Confusable, ambiguous, varied • Balance NIL, non- NIL (target even distribution) • Balance by entity type (1/3 GPEs, PERs, and ORGs) • Genre: 2/3 NW, 1/3 Web for English & Chinese; all NW for Spanish • For cross-lingual tasks, especially target non-English queries with entities mentioned in English documents *Thank you to the track coordinators for providing tagger output TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Entity Linking – Stages 2 & 3 • KB Linking • Review ref document and search KB for matching node • Multiple entities viewed together for quicker linking • Time-limited quality control pass enhanced completeness and accuracy • NIL Coreference • NIL queries (no KB match) require manual co-reference annotation • Time-limited quality control pass enhanced completeness and accuracy TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Conclusions • 2012 Achievements • Source corpus expansion • 5 new EL corpora developed (1 less than 2009-2011 combined) • New annotation pipeline/GUI supports creation of more challenging queries in less time • 2013 Goals • Further enhance annotation GUI and pipeline, address lingering inefficiencies and bugs • Further discussion of desired query qualities to fully utilize new capabilities TAC KBP Evaluation Workshop – NIST, November 5-6, 2012