590 likes | 846 Vues
Relational Entity Linking with Cross Document Coreference. Xiao Cheng, Bingling Chen, Rajhans Samdani , Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG). Talk Outline. Introduction Architecture Entity Linking Approach Preprocessing
 
                
                E N D
Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, RajhansSamdani, Kai-Wei Chang, ZhiyeFeiand Dan Roth University of Illinois at Urbana-Champaign (UI_CCG)
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation
Entity Linking Specification <query id="EL13_ENG_0015"> <docid>bolt-eng-DF-170-181137-9030298</docid> <name>Lightning Bolts</name> <beg>15959</beg> <end>15973</end> </query> Query Output
Entity Linking using Wikification and Cross-Doc Coref Cross DocumentCoreference
Wikification Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd(D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
Wikification Challenges • Ambiguity • Concepts outside of KB (NIL) • Blumenthal? • Variability • Scale • Millions of labels Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd(D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. The New York Times The Times Times CT The Nutmeg State Connecticut
Key Innovation • Improved Wikification for Structured EL • Relational Inference for Linking (Cheng and Roth, EMNLP’13) • No retraining • Non-trivial cross-document clustering • Best Latent Left-Linking approach (Samdani et al. ’12)
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Evaluation
Entity Linking Architecture TAC Query Preprocessing Purposeful Coreference Query Normalization Document Transformation Linking Problem Linking Reconcile Linking Clusters Wikification Supervise Cross-Doc Coreference
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Evaluation
Preprocessing Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a • Query normalization • Handling spelling mistakes and slangs – one of the reasons we did not achieve expected performance • In document coreference – some coreferent mentions are easier to link than the query mention
Preprocessing Original Opening Coreferent Context Query Context • Document transformation • Document can be as long as 100k characters for a single query • Need to truncate documents but minimize the loss of critical contexts
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation
Wikification Bottleneck • State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features • Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets • What is missing? Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd(D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
Motivating Example Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … Egyptian President Hosni Mubarak , the of deposed , … Mubarak wife • What are we missing with Bag of Words (BOW) models? • Who is Mubarak? • Constraining interaction between concepts • (Mubarak, wife, Hosni Mubarak)
Relational Inference for Wikification Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … • (Mubarak, wife, Hosni Mubarak) • Our contribution • Identify key textual relations for Wikification • A global inference framework to incorporate relational knowledge • Significant improvement over state-of-the-art Wikification systems
Traditional Wikification 1 - Mention Segmentation ...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party… Sub noun phrase chunks NER Capitalized phrases
Traditional Wikification 1 - Mention Segmentation ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… Obtains nested mentions
Traditional Wikification 2 - Candidate Generation ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… • Approach • Collect known mappings from Wikipedia page titles, hyperlinks… • Limit to top-K candidates based on frequency of links (Ratinov et al. 2011)
Traditional Wikification 3 - Candidate Ranking ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… Local and global statistical features
Traditional Wikification 4 – Determine NILs ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… • This answer is wrong • We did not generate the correct candidate based on top-K prior • Is the top candidate really what the text referred to? • Binary classifier
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation
Formulation (0) Mubarak, the wife of deposed Egyptian President Hosni Mubarak, … • (Mubarak, wife, Hosni Mubarak) • Intuition • Promote pairs of candidate concepts coherent with textual relations
Formulation (1) weight to output Whether to output th candidate of the th mention weight of a relation Whether a relation exists between and Formulate as an Integer Linear Program (ILP): If no relation exists, collapse to the unstructured decision
Formulation (2) ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… r(1,2)34 r(4,3)34 • eki: whether a concept is chosen • ski : score of a concept • r(k,l)ij: whether a relation is present • w(k,l)ij: score of a relation
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation
Relation Identification • ACE style in-document coreference (Chang et al. ‘13) • Extract named entity-only coreference relations with high precision • Syntactico-Semantic relations (Chan & Roth ‘10) • Easy to extract with high precision • Aim for high recall, as false-positives will be filtered • Sparse, but covers ~80% relation instances in ACE2004
Relation Identification ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party…
Relation Retrieval ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… • What concepts can “Socialist Party” refer to? • More robust candidate generation • Identified relations are verified against a knowledge base (DBPedia)
Relation Retrieval ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party… q1=(Socialist Party of France,?, *Milošević*) q2=(Slobodan Milošević,?,*Socialist Party*) • Query Pruning • Only 2 queries per pair necessary due to strong baseline.
Relation Retrieval ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party…
Relational Inference - coreference ...ousted long time Yugoslav PresidentSlobodan Milošević in October. Mr. Milošević'sSocialist Party…
Determine unknown concepts (NILs) Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention • How to capture the fact: • “Dorothy Byrne” does not refer to any concept in Wikipedia • Identify coreferent nominal mention relations • Generate better features for NIL classifier
Determine unknown concepts (NILs) Dorothy Byrne, a state coordinator for the Florida Green Party,… nominal mention • Create NIL candidate for structured inference • e.g. corrects other coreferent “Dorothy” later in the document
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation
Cross Document Coreference • Naomi Campbell to give evidence at Charles Taylortrial: spokeswoman. • Supermodel Campbell says 'nothing to gain' from Taylortrial testimony. • NILs can be viewed as KB entries with partial information • A uniform model for entity representation • Shared features with Entity Linking system • Can be supervised using existing EL systems • Cross document coreference cluster example:
Cross Document Coreference Approach • Run document-level coreference • Aggregate all features in a document-level coreferent cluster • Use both mention-level features and document-level features • String similarity features (NESim, Do et al. ‘09) • Context TF-IDF similarity features • Document-level cluster features • Training: using both TAC data and Wikifier generated data
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation
Query mapping reconciliation Seattle (0.7) [Seattle] has won… [Seattle] Seahawks ended the game… Seattle Seahawks (0.8) … cheered for [Seattle]… Seattle(0.2) • Max • {0.8,0.7,0.2} = Seattle Seahawks • Sum • {0.8,0.7+0.2} = Seattle • No Threshold • NIL classifier always outputs “non-NIL” • Same as Max otherwise
Talk Outline • Introduction • Architecture • Entity Linking Approach • Preprocessing • Wikification • Formulation • Relational Analysis • Cross Document Coreference • Reconciliation • Evaluation
Evaluation – TAC KBP 2011 Entity Linking *Median of top 14 systems • Run Relational Inference (RI) Wikifier “as-is”: • No retraining using TAC data