Supervised Clustering

PranjalAwasthi, Carnegie Mellon University Reza Bosagh Zadeh, Stanford University Supervised Clustering • Clustering is usually unsupervised • Full supervision renders task meaningless, find middle ground by interacting with teacher • Remove subjective ambiguities according to teacher • [Balcan, Blum’08] : A PAC style query model for clustering.

The Model • Limited interaction with teacher • Only query allowed: “Here’s what I think the clustering should be” • Teacher responds with one of: • Split this cluster: c • Merge these two clusters: c1 and c2 • How many queries can we get away with in the worst case?

Main Results • Previous query bound of O(k3 log |C|) known for any concept class C. • We improve the bound to O(k log |C|). • Give algorithms for clustering geometric concept classes. • Present noisy versions of model and give query bounds. • What if we knew about separation properties of the dataset?

Dataset Separation • Worst case number of queries under some “separation” properties: • The better separated the dataset, the fewer queries required • Lots of open problems!

Supervised Clustering

Supervised Clustering

Presentation Transcript

Semi-Supervised Clustering I

Supervised Clustering --- Algorithms and Applications

Classification (Supervised Clustering)

A Probabilistic Framework for Semi-Supervised Clustering

Semi-Supervised Clustering II

Kernel Methods for Weakly Supervised Mean Shift Clustering

Semi-Supervised Clustering and its Application to Text Clustering and Record Linkage

Lab 5 Unsupervised and supervised clustering

Clustering: Partition Clustering

Semi-Supervised Clustering

Efficient Semi-supervised Spectral Co-clustering with Constraints

Scalable Supervised Dimensionality Reduction using Clustering

Using Clustering to Learn Distance Functions for Supervised Similarity Assessment

Discovering Interesting Regions in Spatial Data Sets using Supervised Clustering

Region Discovery Using Supervised Clustering Algorithms

Pseudo-supervised Clustering for Text Documents

A Semi-supervised Document Clustering Algorithm based on EM

Semi-Supervised Clustering

Semi-supervised Relation Extraction with Large-scale Word Clustering

K-medoid-style Clustering Algorithms for Supervised Summary Generation

Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Semi-Supervised Clustering