Multi-Criteria-based Active Learning for Named Entity Recognition

Multi-Criteria-based Active Learning for Named Entity Recognition Dan Shen†‡, Jie Zhang†‡, Jian Su†, Guodong Zhou†, Chew-Lim Tan‡ †Institute for Infocomm Research ‡National University of Singapore

Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004

Motivation • Named Entity Recognition (NER) • most of current work: supervised learning • a large annotated corpus • MUC-6 / MUC-7 corpus (newswire domain) • GENIA corpus (biomedical domain) • Limitation of supervised NER • corpus annotating: tedious and time-consuming • adaptability: in limited level • Target of our work • explore active learning in NER • minimize the human annotation effort • without degrading performance ACL 2004

research focus Active Learning Framework • Given • a small labeled data set L • a large unlabeled data set U • Repeat • Train a model M on L • Use M to test U • select the most useful example from U • require human expert to label it • add the labeled example to L • Until M achieves a certain performance level ACL 2004

Related Work • Active Learning Criteria • Most of current work: informativeness • Few work: representativeness or diversity • E.g. [McCallum and Nigam 1998] and [Tang et al. 2002] • E.g. [Brinker 2003] NO workexplored multiple criteria in active learning • Active Learning in NLP • E.g. POS tagging / text classification / statistical parsing NO work explored active learning in NER ACL 2004

Active Learning in NER • SVM-based NER system • Recognize one class of NEs at a time • Features – Different from supervised learning! • Cannot be produced statistically from training data set • No gazetteer or dictionaries • Active Learning in NER system • Select a most useful batch of examples • An example – a word sequence • E.g. a named entity and its context • Measurements – from words to NEs • Only word-based score is available from SVM ACL 2004

1. Informativeness Criterion Most informative example: most uncertain in existing model Most previous works are only based on this criterion.

Informativeness Measurement • Informativeness for Word • Change / induce support vectors in SVM • Distance of the word’s feature vector to separating hyperplane • Informativeness for NE • NE -- a sequence of words • NE = w1w2…wN , wi is the ith word of NE • Heuristic scoring functions • E.g. ACL 2004

2. Representativeness Criterion Most representative example: represent most examples Only few works [McCallum and Nigam 1998; Tang et al. 2002] consider this criterion.

Similarity Measurement • Similarity between Words • Cosine-similarity Measurement • Adapted to SVM • Similarity between NEs • Alignment of two word sequences • Dynamic Time Warping (DTW) algorithm • Given • point-by-point distance • To find an optimal path • Minimize accumulated distance along the path ACL 2004

Representativeness Measurement • Representativeness of NEiin NESet • NESet = {NE1, …NEi , …NEN} • Quantified by its density • The average similarity between NEi and the other NEj(j≠i ) in NESet • Most representative NE • Largest density among all NEs in NESet • centroid of NESet ACL 2004

3. Diversity Criterion Maximize the training utility of a batch: the members in the batch have high variance to each other Only one work [Brinker 2003] considered this criterion.

Global Consideration • Consider the examples in a whole sample space • K-Means Clustering • Cluster all named entities in NESet • Suppose: • the examples in one cluster are quite similar to each other • Select the examples from different clusters at a time • Time consuming • For efficiency, filter out NEs before clustering ACL 2004

Local Consideration • Consider the examples in a batch • For an example candidate: • Compare it with all previously selected examples in the batch one by one • Add it into the batch • If the similarity between all of them is below a threshold • More efficient! ACL 2004

Select M most informative examples (Informativeness Criterion) Unlabeled Data Set Select centroid of each cluster (Representativeness Criterion) Strategy 1 Intermediate Set Batch Clustering (K clusters) (Diversity Criterion) ACL 2004

Select example with max score: λInfo+(1- λ)Rep (Informativeness & Representativeness Criteria) Unlabeled Data Set Batch example candidate Strategy 2 Compare the candidate with each example in Batch IF any of the similarity values > threshold THEN reject ELSE add to Batch (Diversity Criterion) ACL 2004

Experimental Setting 1 • Corpus • MUC-6 Corpus • To recognize Person, Location and Organization • GENIA Corpus V1.1 • To recognize Protein • Corpus Split • Initial training data set • Test data set • Unlabeled data set • Size of each data set • Batch size K • = 50 in biomedical domain • = 10 in newswire domain ACL 2004

Experimental Setting 2 • Supervised learning • trained on the entire annotated corpus. • Newswire: 408 WSJ articles • Biomedical: 590 MEDLINE abstracts • Random Selection • a batch of examples is randomly selected in each round • F-Measurement ACL 2004

Experimental Results 1 • Effectiveness of Single-Criterion-based Active Learning Supervised (223K) Info_based (52K, 62%,23%) Random (83K) ACL 2004

Avg(3.5K/11.5K, 2.1K/13.6K, 7.8K/20.2K) = 28% Avg(3.5K/157K, 2.1K/157K, 7.8K/157K) = 5% 31K/83K = 37% 31K/223K = 14% more 9K words Experimental Results 2 • Overall Results of Multi-Criteria-based Active Learning ACL 2004

Experimental Results 3 • Effectiveness of Multi-Criteria-based Active Learning Strategy2 (31K, 60%) Info_based (52K) Supervised (223K) Strategy1 (40K, 77%) ACL 2004

Conclusion • The first work-- multiple criteria in active learning • Informativeness, Representativeness & Diversity • Two active learning strategies • Outperform single-criterion-based method (60%) • The first work -- active learning in NER • Measurements for multiple criteria • Greatly reduce annotation cost • General measurements and strategies • Can be easily adapted to other NLP tasks ACL 2004

Future Work • How to automatically decide the optimal value of these parameters? • Batch size K • Linear interpolation parameter • When to stop the active learning process? • the change of support vectors ACL 2004

Acknowledgement Thanks to International Post-Graduate College (IGK) at Saarland University for the generous travel support. The first author is now a Ph.D. student in this program. ACL 2004

The End Thank You !

Corpus Split ACL 2004

Multi-Criteria-based Active Learning for Named Entity Recognition

Multi-Criteria-based Active Learning for Named Entity Recognition

Presentation Transcript

Named Entity Recognition

Exploiting Domain Structure for Named Entity Recognition

Named Entity Recognition

Cross-Domain Bootstrapping for Named Entity Recognition

CS544: Named Entity Recognition and Classification

Named Entity Recognition in Tweets: TwitterNLP

Information Extraction Lecture 3 – Rule-based Named Entity Recognition

Biomedical Named Entity Recognition

Named Entity Recognition

Structure Learning for NLP Named-entity recognition using generative models

Named Entity Recognition

Named Entity Recognition based on Bilingual Co-training

Named Entity Recognition based on three different machine learning techniques

NAMED ENTITY RECOGNITION

Named Entity Recognition (NER) with NLTK

Named Entity Recognition

CS544: Named Entity Recognition and Classification

How Does Named Entity Recognition Work?