Similarities, Distances and Manifold Learning

Similarities, Distances and Manifold Learning Prof. Richard C. Wilson Dept. of Computer Science University of York

Background • Typically objects are characterised by features • Face images • SIFT features • Object spectra • ... • If we measure n features → n-dimensional space • The arena for our problem is an n-dimensional vector space

Background • Example: Eigenfaces • Raw pixel values: n by m gives nm features • Feature space is space of all n by m images

Background • The space of all face-like images is smaller than the space of all images • Assumption is faces lie on a smaller manifold embedded in the global space All images Face images

Manifold: A space which locally looks Euclidean Manifold learning: Finding the manifold representing the objects we are interested in All objects should be on the manifold, non-objects outside

Part I: Euclidean Space Position, Similarity and Distance Manifold Learning in Euclidean space Some famous techniques Part II: Non-Euclidean Manifolds Assessing Data Nature and Properties of Manifolds Data Manifolds Learning some special types of manifolds Part III: Advanced Techniques Methods for intrinsically curved manifolds Thanks to Edwin Hancock, Eliza Xu, Bob Duin for contributions And support from the EU SIMBAD project

Part I: Euclidean Space

Position The main arena for pattern recognition and machine learning problems is vector space • A set of n well defined features collected into a vector ℝn Also defined are addition of vectors and multiplication by a scalar Feature vector → position

Similarity To make meaningful progress, we need a notion of similarity Inner product • The inner-product ‹x,y› can be considered to be a similarity between x and y

Induced norm • The self-similarity ‹x,x› is the (square of) the ‘size’ of x and gives rise to the induced norm, of the length of x: • Finally, the length of x allows the definition of a distance in our vector space as the length of the vector joining x and y • Inner product also gets us distance

Euclidean space • If we have a vector space for features, and the usual inner product, all three are connected:

non-Euclidean Inner Product • If the inner-product has the form • Then the vector space is Euclidean • Note we recover all the expected stuff for Euclidean space, i.e. • The inner-product doesn’t have to be like this; for example in Einstein’s special relativity, the inner-product of spacetime is

The Golden Trio • In Euclidean space, the concepts of position, similarity and distance are elegantly connected Position X Similarity K Distance D

Point position matrix • In a normal manifold learning problem, we have a set of samples X={x1,x2,...,xm} • These can be collected together in a matrix X I use this convention, but others may write them vertically

Centreing A common and important operation is centreing – moving the mean to the origin • Centred points behave better is the mean matrix, so is the centred matrix • J is the all-ones matrix This can be done with C • C is the centreing matrix (and is symmetric C=CT)

Position-Similarity Position X • The similarity matrix K is defined as • From the definition of X, we simply get • The Gram matrix is the similarity matrix of the centred points (from the definition of X) • i.e. a centring operation on K • Kc is really a kernel matrix for the points (linear kernel) Similarity K

Position-Similarity Position X • To go from K to X, we need to consider the eigendecomposition of K • As long as we can take the square root of Λ then we can find X as Similarity K

Kernel embedding First manifold learning method – kernel embedding Finds a Euclidean manifold from object similarities • Embeds a kernel matrix into a set of points in Euclidean space (the points are automatically centred) • K must have no negative eigenvalues, i.e. it is a kernel matrix (Mercer condition)

Similarity-Distance Similarity K Distance D • We can easily determine Ds from K

Similarity-Distance What about finding K from Ds ? Looking at the top equation, we might imagine that K=-½ Ds is a suitable choice • Not centred; the relationship is actually

Classic MDS • Classic Multidimensional Scaling embeds a (squared) distance matrix into Euclidean space • Using what we have so far, the algorithm is simple • This is MDS Position X Distance D

The Golden Trio Position X Kernel Embedding MDS Similarity K Distance D

Kernel methods • A kernel is function k(i,j) which computes an inner-product • But without needing to know the actual points (the space is implicit) • Using a kernel function we can directly compute K without knowing X Position X Similarity K Kernel function Distance D

Kernel methods • The implied space may be very high dimensional, but a true kernel will always produce a positive semidefinite K and the implied space will be Euclidean • Many (most?) PR algorithms can be kernelized • Made to use K rather than X or D • The trick is to note that any interesting vector should lie in the space spanned by the examples we are given • Hence it can be written as a linear combination • Look for α instead of u

Kernel PCA • What about PCA? PCA solves the following problem • Let’s kernelize:

Kernel PCA • K2 has the same eigenvectors as K, so the eigenvectors of PCA are the same as the eigenvectors of K • The eigenvalues of PCA are related to the eigenvectors of K by • Kernel PCA is a kernel embedding with an externally provided kernel matrix

Kernel PCA • So kernel PCA gives the same solution as kernel embedding • The eigenvalues are modified a bit • They are essentially the same thing in Euclidean space • MDS uses the kernel and kernel embedding • MDS and PCA are essentially the same thing in Euclidean space • Kernel embedding, MDS and PCA all give the same answer for a set of points in Euclidean space

Some useful observations • Your similarity matrix is Euclidean iff it has no negative eigenvalues (i.e. it is a kernel matrix and PSD) • By similar reasoning, your distance matrix is Euclidean iff the similarity matrix derived from it is PSD • If the feature space is small but the number of samples is large, then the covariance matrix is small and it is better to do normal PCA (on the covariance matrix) • If the feature space is large and the number of samples is small, then the kernel matrix will be small and it is better to do kernel embedding

Part II: Non-Euclidean Manifolds

Non-linear data • Much of the data in computer vision lies in a high-dimensional feature space but is constrained in some way • The space of all images of a face is a subspace of the space of all possible images • The subspace is highly non-linear but low dimensional (described by a few parameters)

Non-linear data • This cannot be exploited by the linear subspace methods like PCA • These assume that the subspace is a Euclidean space as well • A classic example is the ‘swiss roll’ data:

‘Flat’ Manifolds • Fundamentally different types of data, for example: • The embedding of this data into the high-dimensional space is highly curved • This is called extrinsic curvature, the curvature of the manifold with respect to the embedding space • Now imagine that this manifold was a piece of paper; you could unroll the paper into a flat plane without distorting it • No intrinsic curvature, in fact it is homeomorphic to Euclidean space

Curved manifold • This manifold is different: • It must be stretched to map it onto a plane • It has non-zero intrinsic curvature • A flatlander living on this manifold can tell that it is curved, for example by measuring the ratio of the radius to the circumference of a circle • In the first case, we might still hope to find Euclidean embedding • We can never find a distortion free Euclidean embedding of the second (in the sense that the distances will always have errors)

Intrinsically Euclidean Manifolds • We cannot use the previous methods on the second type of manifold, but there is still hope for the first • The manifold is embedded in Euclidean space, but Euclidean distance is not the correct way to measure distance • The Euclidean distance ‘shortcuts’ the manifold • The geodesic distance calculates the shortest path along the manifold

Geodesics • The geodesic generalizes the concept of distance to curved manifolds • The shortest path joining two points which lies completely within the manifold • If we can correctly compute the geodesic distances, and the manifold is intrinsically flat, we should get Euclidean distances which we can plug into our Euclidean geometry machine Position X Similarity K Distance D Geodesic Distances

ISOMAP • ISOMAP is exactly such an algorithm • Approximate geodesic distances are computed for the points from a graph • Nearest neighbours graph • For neighbours, Euclidean distance≈geodesic distances • For non-neighbours, geodesic distance approximated by shortest distance in graph • Once we have distances D, can use MDS to find Euclidean embedding

ISOMAP • ISOMAP: • Neighbourhood graph • Shortest path algorithm • MDS • ISOMAP is distance-preserving – embedded distances should be close to geodesic distances

Laplacian Eigenmap • The Laplacian Eigenmap is another graph-based method of embedding non-linear manifolds into Euclidean space • As with ISOMAP, form a neighbourhood graph for the datapoints • Find the graph Laplacian as follows • The adjacency matrix A is • The ‘degree’ matrix D is the diagonal matrix • The normalized graph Laplacian is

Laplacian Eigenmap • We find the Laplacian eigenmap embedding using the eigendecomposition of L • The embedded positions are • Similar to ISOMAP • Structure preserving not distance preserving

Locally-Linear Embedding • Locally-linear Embedding is another classic method which also begins with a neighbourhood graph • We make point i (in the original data) from a weighted sum of the neighbouring points • Wij is 0 for any point j not in the neighbourhood (and for i=j) • We find the weights by minimising the reconstruction error • Subject to the constrains that the weights are non-negative and sum to 1 • Gives a relatively simple closed-form solution j i

Locally-Linear Embedding • These weights encode how well a point j represents a point i and can be interpreted as the adjacency between i and j • A low dimensional embedding is found by then finding points to minimise the error • In other words, we find a low-dimensional embedding which preserves the adjacency relationships • The solution to this embedding problem turns out to be simply the eigenvectors of the matrix M • LLE is scale-free: the final points have the covariance matrix I • Unit scale

Comparison • LLE might seem like quite a different process to the previous two, but actually very similar • We can interpret the process as producing a kernel matrix followed by scale-free kernel embedding

Comparison • ISOMAP is the only method which directly computes and uses the geodesic distances • The other two depend indirectly on the distances through local structure • LLE is scale-free, so the original distance scale is lost, but the local structure is preserved • Computing the necessary local dimensionality to find the correct nearest neighbours is a problem for all such methods

Non-Euclidean data • Data is Euclidean iff K is psd • Unless you are using a kernel function, this is often not true • Why does this happen?

What type of data do I have? • Starting point: distance matrix • However we do not know apriori if our measurements are representable on a manifold • We will call them dissimilarities • Our starting point to answer the question “What type of data do I have?” will be a matrix of dissimilarities D between objects • Types of dissimilarities • Euclidean (no intrinsic curvature) • Non-Euclidean, metric (curved manifold) • Non-metric (no point-like manifold representation)

Causes • Example: Chicken pieces data • Distance by alignment • Global alignment of everything could find Euclidean distances • Only local alignments are practical

Causes Dissimilarities may also be non-metric The data is metric if it obeys the metric conditions • Dij≥ 0 (nonegativity) • Dij= 0 iff i=j (identity of indiscernables) • Dij= Dji (symmetry) • Dij≤Dik+ Dkj (triangle inequality) Reasonable dissimilarites should meet 1&2

Causes • Symmetry Dij= Dji • May not be symmetric by definition • Alignment: i→j may find a better solution than j→i

Causes • Triangle violations Dij≤Dik+ Dkj • ‘Extended objects’ • Finally, noise in the measure of D can cause all of these effects i j k

Similarities, Distances and Manifold Learning

Similarities, Distances and Manifold Learning

Presentation Transcript

Distances

Manifold Learning Using Geodesic Entropic Graphs

Manifold learning: Laplacian Eigenmaps

Dictionary Learning on a Manifold

Manifold learning: MDS and Isomap

Manifold Learning Via Homology

Manifold learning and pattern matching with entropic graphs

Manifold learning

Angles and Distances

Manifold learning

Manifold learning: Locally Linear Embedding

Non-Isometric Manifold Learning Analysis and an Algorithm

Topology in Manifold Learning

Manifold learning: MDS and Isomap

Manifold learning: Locally Linear Embedding

Distances

Manifold learning: Locally Linear Embedding

Distances...

Manifold Learning

Manifold learning

Manifold Learning