Distance metric learning, with application to clustering with side-information

Distance metric learning, with applicationto clustering with side-information Eric P. Xing, Andrew Y. Ng, Michael I. Jordan and Stuart Russell University of California, Berkeley {epxing,ang,jordan,russell}@cs.berkeley.edu Neural Information Processing Systems2002

Abstract • Many algorithms rely critically on being given a good metric over the input space. • Provide a more systematic way for users to indicate what they consider “similar”. • If a clustering algorithm fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. • In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in , learns a distance metric over that respects these relationships.

Introduction • Given a good metrics that reflect reasonably well the important relationships between the data. • K-means , nearest-neighbors , SVMs ,and so on. • Many learning and datamining algorithm depend on a good metric measurement, this problem is particularly acute in unsupervised settings such as clustering. • One important family of algorithms that learn metrics are the unsupervised ones that take an input dataset, and find an embedding of it in some space. Such as: • Multidimensional Scaling (MDS) • Locally Linear Embedding (LLE) • Have “no right answer problem” • Similar to Principal Components Analysis (PCA)

Introduction • For clustering with similarity information certain pairs are “similar” or “dissimilar” , they search for a clustering that puts the similar pairs into the same, and dissimilar pairs into different, clusters. • This gives a way of using similarity side-information to find clusters that reflect a user’s notion of meaningful clusters.

Learning Distance Metrics • Suppose we have some set of points ,and are given information that certain pairs of them are “similar”: • Consider learning a distance metric of the form • How can we learn adistance metric d(x,y) between points x and y that respects this; specifically, so that “similar” points end up close to each other .

Learning Distance Metrics • metric – satisfying non-negativity and the triangle inequality • We require that A be positive semi-definite, • Setting A = givens Euclidean distance , • If we restrict A to be diagonal, this corresponds to learning a metric in which the different axes are given different “weights” • More generally, A parameterizes a family of Mahalanobis distances over . • Learning such a distance metric is also equivalent to finding a rescaling of a data that replaces each point x with and applying the standard Euclidean metric to the rescaled data.

Learning Distance Metrics • A simple way of defining a criterion for the desired metric is to demand that pairs of points (xi, yj) in S have, say, small squared distance between them : • This is trivially solved with A=0, which is not useful, and we add the constraint : • To ensure that A does not collapse the dataset into a single point. • Here, D can be a set of pairs of points known to be “dissimilar” ,i.e., all pairs not in S.

Learning Distance Metrics • This givens the optimization problem: • The optimization problem is convex, which enables us to derive efficient, local-minima-free algorithm to solve it.

Note.

Note. Newton-Raphson Method:

Note.

Learning Distance Metrics • The case of diagonal • In the case that we want to learn a diagonal A=diag(A11,A22,…Ann), we can derive an efficient algorithm using the Newton-Raphson to efficiently optimize g. Newton update by ,where is a step-size parameter optimized via a line-search to give the largest downhill step subject to Aii 0

Learning Distance Metrics • The case of full A • Newton’s method often becomes prohibitively expensive requiring O(n6) time to invert the Hessian over n2 parameters.

Learning Distance Metrics

Experiments and Examples In the experiments with synthetic data, S was a randomly sampled 1% of all pairs of similar points. We can use the fact discussed earlier that learning ||.||A is equivalent to finding rescaling of the data x -> A1/2x

Experiments and Examples -Examples of learned distance metrics

Experiments and Examples-Application to clustering • Clustering with side information: • Given S,and told that each pair means xi and xj belong to the same cluster. • Let be the cluster to which point xi is assigned by an automatic clustering algorithm,and let ci be some “correct” or desired clustering of the data.

Experiments and Examples-Application to clustering

Conclusions • We have presented an algorithm that, given examples of similar pairs of points in , learns a distance metric that respects these relationships. • The method is based on posing metric learning as a convex optimization problem, which allowed us to derive efficient, local optima free algorithms. • We also showed examples of diagonal and full metrics learned from simple artificial examples, and demonstrated on artificial and on UCI datasets how our methods can be used to improve clustering performance.

Distance metric learning, with application to clustering with side-information

Distance metric learning, with application to clustering with side-information

Presentation Transcript

Semisupervised Clustering with Metric Learning Using Relative Comparisons

Distance Metric Learning from Uncertain Side Information with Application to Automated Photo Tagging

To work with Distance Tables

Clustering with Application to Fast Object Search

Mining Social Images with Distance Metric Learning for Automated Image Tagging

Semantic learning with specific application to ontologies

Distance Metric

Bridging the Gap with Distance Learning

Distance metric learning Vs. Fisher discriminant analysis

Semisupervised Clustering with Metric Learning using Relative Comparisons

Client-Side Web Application Development with JavaScript

Client-Side Web Application Development with Java

Making Distance Learning Courses Accessible to Students with Disabilities

Practice with Metric Measures

Robust Path-based Spectral Clustering with Application to Image Segmentation

Learning Spectral Clustering, With Application to Speech Separation

Learning Instance Specific Distance Using Metric Propagation

Learning to Reason with Extracted Information

Distance Metric Learning: A Comprehensive Survey

Learning to Reason with Extracted Information

Client-Side Web Application Development with JavaScript

Watermarking with Side Information