Unsupervised Models for Coreference Resolution

Unsupervised Models for Coreference Resolution Vincent Ng Human Language Technology Research Institute University of Texas at Dallas

Plan for the Talk • Supervised learning for coreference resolution • how and when supervised coreference research started • standard machine learning approach

Plan for the Talk • Supervised learning for coreference resolution • how and when supervised coreference research started • standard machine learning approach • Unsupervised learning for coreference resolution • self-training • EM clustering (Ng, 2008) • nonparametric Bayesian modeling (Haghighi and Klein, 2007) • three modifications

Machine Learning for Coreference Resolution • started in mid-1990s • Connolly et al. (1994), Aone and Bennett (1995), McCarthy and Lehnert (1995) • propelled by availability of annotated corpora produced by • Message Understanding Conferences (MUC-6/7: 1995, 1998) • English only • Automatic Content Extraction (ACE 2003, 2004, 2005, 2008) • English, Chinese, Arabic

Machine Learning for Coreference Resolution • started in mid-1990s • Connolly et al. (1994), Aone and Bennett (1995), McCarthy and Lehnert (1995) • propelled by availability of annotated corpora produced by • Message Understanding Conferences (MUC-6/7: 1995, 1998) • English only • Automatic Content Extraction (ACE 2003, 2004, 2005, 2008) • English, Chinese, Arabic • identified as an important task for information extraction • identity coreference only

Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity Queen Elizabeth set about transforming herhusband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome hisspeech impediment...

Identity Coreference • Identify the noun phrases (or mentions) that refer to the same real-world entity • Lots of prior work on supervised coreference resolution Queen Elizabeth set about transforming herhusband, King George VI, into a viable monarch. Logue,a renowned speech therapist, was summoned to help the King overcome hisspeech impediment...

Standard Supervised Learning Approach • Classification • a classifier is trained to determine whether two mentions are coreferentor not coreferent

coref ? coref ? [Queen Elizabeth] set about transforming [her] [husband], ... not coref ? Standard Supervised Learning Approach • Classification • a classifier is trained to determine whether two mentions are coreferentor not coreferent

Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise coreference decisions

Queen Elizabeth Queen Elizabeth her coref [Queen Elizabeth], set about transforming [her] [husband] ... King George VI not coref Clustering Algorithm husband King George VI the King his not coref Logue Logue a renowned speech therapist Standard Supervised Learning Approach • Clustering • coordinates possibly contradictory pairwise classification decisions

Standard Supervised Learning Approach • Typically relies on a large amount of labeled data What if we only have a small amount of annotated data?

First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm • feature set • clustering algorithm

First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm (Bayes) • feature set • clustering algorithm (Bell-tree)

The Bayes Classifier • finds the class value y that is the most probable given the feature vector x1,..,xn

The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn

The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that

The Bayes Classifier Coref, Not Coref • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that What features to use in the feature representation?

Linguistic Features • Use 7 linguistic features divided into 3 groups

Linguistic Features • Use 7 linguistic features divided into 3 groups E.g., for the mention pair (Barack Obama, president-elect),the feature value is(Name, Nominal)

The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that

The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem

The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem Let’s simplify this term!

The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that But we may have a data sparseness problem Let’s simplify this term! • assume that feature values from different groups are independent of each other given the class

The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that

The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that These are the model parameters (to be estimated from annotated data using maximum likelihood estimation)

The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

Generate the class y with P(y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) Given y, generate x4, x5, and x6with P(x4, x5, x6| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

Generate the class y with P(y) Given y, generate x1, x2, and x3with P(x1, x2, x3| y) Given y, generate x4, x5, and x6with P(x4, x5, x6| y) Given y, generate x7 with P(x7| y) The Bayes Classifier COREF or NOT COREF • finds the class value y that is the most probable given the feature vector x1,..,xn • finds y* such that Generative model: specifies how an instance is generated

First Attempt: Supervised Learning • train on whatever annotated data we have • need to specify • learning algorithm • feature set • clustering algorithm

Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree

Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [1]

[12] [1][2] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [1]

Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [123] [12] [12][3] [1] [1][2]

[123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree [12] [1] [1][2]

[123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree Leaves contain all the possible partitions of all of the mentions [12] [1] [1][2]

[123] [12][3] [13][2] [1][23] [1][2][3] Bell-Tree Clustering (Luo et al., 2004) • searches for the most probable partition of a set of mentions • structures the search space as a Bell tree Leaves contain all the possible partitions of all of the mentions [12] [1] Computationally infeasible to expand all nodes in the Bell tree [1][2]

Unsupervised Models for Coreference Resolution

Unsupervised Models for Coreference Resolution

Presentation Transcript

Supervised models for coreference resolution

Error Analysis for Learning-based Coreference Resolution

Easy-First Coreference Resolution

Decision Trees for Coreference Resolution

Specialized models and ranking for coreference resolution

Unsupervised models and clustering

Coreference Resolution

A Global Relaxation Labeling Approach to Coreference Resolution

Memory-based learning for noun phrase coreference resolution

Inference Protocols for Coreference Resolution

Exploring Unsupervised and Knowledge-Rich Approaches to Coreference Resolution.

Graph-based Event Coreference Resolution

Learning noun phrase coreference resolution

Learning Dutch noun phrase coreference resolution

Coreference Resolution using Web-Scale Statistics

Detecting Anaphoricity and Antecedenthood for Coreference Resolution

Incorporating Contextual Cues in Trainable Models for Coreference Resolution

Learning noun phrase coreference resolution

Unsupervised Models for Named Entity Classifcation

A Constrained Latent Variable Model for Coreference Resolution

First-Order Probabilistic Models for Coreference Resolution

Using MapReduce for Scalable Coreference Resolution