Importance of Semantic Representation: Dataless Classification

Importance of Semantic Representation: Dataless Classification Ming-Wei Chang Lev Ratinov Dan Roth Vivek Srikumar University of Illinois, Urbana-Champaign

Text Categorization Classify the following sentence: Syd Millar was the chairman of the International Rugby Board in 2003. Pick a label: Class1 vs. Class2 • Traditionally, we need annotated data to train a classifier

Text Categorization • Humans don’t seem to need labeled data Syd Millar was the chairman of the International Rugby Board in 2003. Pick a label: Sportsvs.Finance Label names carry a lot of information!

Text Categorization Do wereally always need labeled data?

Contributions • We can often go quite far without annotated data • … if we “know” the meaning of text • This works for text categorization • ….and is consistent across different domains

Outline • Semantic Representation • On-the-fly Classification • Datasets • Exploiting unlabeled data • Robustness to different domains

Semantic Representation • One common representation is the Bag of Words representation • All text is a vector in the space of words.

Semantic Representation • Explicit Semantic Analysis • [Gabrilovich & Markovitch, 2006, 2007] • Text is a vector in the space of concepts • Concepts are defined by Wikipedia articles

Monetary Policy Apple IPod ESA representation ESA representation International Monetary Fund Monetary policy Economic and Monetary Union Hong Kong Monetary Authority Monetarism Central bank IPod mini IPod photo IPod nano Apple Computer IPod shuffle ITunes Explicit Semantic Analysis: Example Wikipedia article titles

Semantic Representation • Two semantic representations • Bag of words • ESA

Traditional Text Categorization Labeled corpus Sports Finance Semantic space A classifier

Dataless Classification Labeled corpus Labels Sports Finance What can we do using just the labels?

But labels are text too!

Dataless Classification New unlabeled document Labels Sports Finance Semantic space

What is Dataless Classification? • Humans don’t need training for classification • Annotatedtraining data not always needed • Look for the meaning of words

What is Dataless Classification? • Humans don’t need training for classification • Annotated training data not always needed • Look for the meaning of words

On-the-fly Classification New unlabeled document Labels Sports Finance Semantic space

On-the-fly Classification • No training data needed • We know the meaning of label names • Pick the label that is closest in meaning to the document • Nearest neighbors

On-the-fly Classification New unlabeled document New labels Hockey Baseball Semantic space

On-the-fly Classification • No need to even know labels before hand • Compare with traditional classification • Annotated training data for each label

Dataset 1: Twenty Newsgroups • Posts to newsgroups • Newsgroups have descriptive names sci.electronics = Science Electronics rec.motorbikes = Motorbikes

Dataset 2: Yahoo Answers • Posts to Yahoo! Answers • Posts categorized into a two level hierarchy • 20 top level categories • Totally 280 categories at the second level Arts and Humanities, Theater Acting Sports,Rugby League

Experiments • 20 Newsgroups • 10 binary problems (from [Raina et al, ‘06]) Religion vs. Politics.guns Motorcycles vs. MS Windows • Yahoo! Answers • 20 binary problems Health, Diet fitness vs. Health Allergies Consumer Electronics DVRs vs. Pets Rodents

Results: On-the-fly classification Naïve Bayes classifier Uses annotated data, Ignores labels Nearest neighbors, Uses labels, No annotated data

Using Unlabeled Data • Knowing the data collection helps • We can learn specific biases of the dataset • Potential for semi-supervised learning

Bootstrapping • Each label name is a “labeled” document • One “example” in word or concept space • Train initial classifier • Same as the on-the-fly classifier • Loop: • Classify all documents with current classifier • Retrain classifier with highly confident predictions

Co-training • Words and concepts are two independent “views” • Each view is a teacher for the other [Blum & Mitchell ‘98]

Co-training • Train initial classifiers in word space and concept space • Loop • Classify documents with current classifiers • Retrain with highly confident predictions of both classifiers

Using unlabeled data • Three approaches • Bootstrapping with labels using Bag of Words • Bootstrapping with labels using ESA • Co-training

More Results Co-training using just labels does as well as supervision with 100 examples No annotated data

Domain Adaptation • Classifiers trained on one domain and tested on another • Performance usually decreases across domains

But the label names are the same • Label names don’t depend on the domain • Label names are robust across domains • On-the-fly classifiers are domain independent

Example Baseball vs. Hockey

Conclusion • Sometimes, label names are tell us more about a class than annotated examples • Standard learning practice of treating labels as unique identifiers loses information • The right semantic representation helps • What is the right one?

Importance of Semantic Representation: Dataless Classification