140 likes | 158 Vues
Overview of the KBP 2013 Slot Filler Validation Track. Hoa Trang Dang National Institute of Standards and Technology. Slot Filler Validation (SFV). Track Goals Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval
 
                
                E N D
Overview of the KBP 2013Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology
Slot Filler Validation (SFV) • Track Goals • Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval • Evaluate the contribution of RTE systems on KBP slot-filling • Allow teams to experiment with system voting and global • SFV input: • Candidate slot filler • Possibly additional information about candidate slot fillers • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Can only improve precision, not recall of full slot-filling systems • Evaluation metrics depends on SFV use case and availability of additional information about candidate fillers • TAC RTE KBP Validation task (2011) • TAC KBP Slot Filler Validation task (2012)
TAC RTE KBP Validation task (2011) Each slot filler returned by SF systems • 1 RTE evaluationpair, where: • T is the entiredocumentsupporting the slot filler • H is a set ofsynonymoussentences, representingdifferentrealizations of the slot filler
Use Case 1: SFV as Textual Entailment (2011) • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance) • Local Approach: • Generic textual entailment: H is relation implied by candidate slot filler (e.g., “Barack Obama has lived in Chicago”), T is provenance (entire document, or smaller regions defined by justification offsets) • Tailored textual entailment: train on different slot types; could be a validation module for a full slot filling system. • Evaluation: • F score on entire pool of candidate slot fillers (unique slot filler, provenance) • Baseline: All T’s classified as entailing the corresponding H: P=R=percentage of entailing pairs in the pooled SF responses • Weak baseline, easily beat by all SFV systems; not a direct measure of utility of SFV to SF
Use Case 2: SFV impact on single SF systems • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • Global Approach: • System Voting, leveraging features across multiple SF runs • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run
Slot Filler Validation (SFV) 2012 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run
Slot Filler Validation (SFV) 2012 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run • One SFV submission, decreased F1 of almost all SF runs except poorest performing SF runs.
Slot Filler Validation (SFV) 2013 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run
Slot Filler Validation (SFV) 2013 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run • Score only on the 90% of KBP 2013 slot filling queries that didn’t have preliminary assessments released as part of SFV input
SF System Profile • SF Team ranks in KBP 2009-2012 • Did the system extract fillers from the KBP 2013 source corpus? • Do the Confidence Values have meaning? • Is the Confidence Value a probability? • Tools or methods for: • Query expansion • Document retrieval • Sentence retrieval • NER nominal tagging • Coreference resolution • Third-party relation/event extraction • Dependency/Constituent parsing • POS tagging • Chunking • Main slot filling algorithm • Learning algorithm • Ensemble model • External resources
Slot Filler Validation Teams and Approaches • BIT: Beijing Institute of Technology [local] • Generic RTE approach based on word overlap, cosine similarity, and token edit distance • Stanford: Stanford University [local] • Based on Stanford’s full slot-filling system, especially component for checking consistency and validity of candidate fillers • UI_CCG: University of Illinois at Urbana-Champaign [local] • Tailored RTE approach; check candidate for slot-specific constraints • jhuapl: Johns Hopkins University Applied Physics Laboratory [weak global] • Consider only the confidence value associated with each candidate filler and aggregate confidence values across systems. • RPI_BLENDER: Rensselaer Polytechnic Institute [strong global] • Based on RPI_BLENDER full slot-filling system (like Stanford), but also leveraged full set of SFV input (including SF system profile and preliminary assessments) to rank systems and apply tier-specific filtering.
Impact of RPI_BLENDER2 SFV on SF Runs Top 10 SF runs Negatively impacted SF runs
Conclusion • Leveraging global features boosts scores of individual SF runs…. If done discriminately • Don’t treat all slot filling systems the same • Even weak global features (e.g. raw confidence values) may help in some cases • Caveat: other evaluation metrics also valid depending on use case. • RTE KBP validation (2011) metric may be appropriate if goal is to make assessment more efficient