Advances in Metric Embedding Theory: Embeddings and Distortions in Metric Spaces

UCLA IPAM 07 Advances in Metric Embedding Theory Yair Bartal Hebrew University & Caltech

Metric Spaces • Metric space:(X,d) d:X2→R+ • d(u,v)=d(v,u) • d(v,w) ≤ d(v,u) + d(u,w) • d(u,u)=0 • Data Representation: Pictures (e.g. faces), web pages, DNA sequences, … • Network: communication distance

Metric Embedding • Simple Representation: Translate metric data into easy to analyze form, gain geometric structure: e.g. embed in low-dimensional Euclidean space • Algorithmic Application: Apply algorithms for a “nice” space to solve problem on “problematic” metric spaces

Embedding Metric Spaces • Metric spaces (X,dX), (Y,dy) • Embedding is a function f:X→Y • For an embedding f, Given u,v in X let • Distortion c = max{u,v  X} distf(u,v) / min{u,v  X} distf(u,v)

Special Metric Spaces • Euclidean space • lp metric in Rn: • Planar metrics • Tree metrics • Ultrametrics • Doubling

Embedding in Normed Spaces • [Fréchet Embedding]: Any n-point metric space embeds isometrically in L∞ • Proof. w x y

Embedding in Normed Spaces • [Bourgain 85]: Any n-point metric space embeds in Lp with distortion Θ(log n) • [Johnson-Lindenstrauss 85]: Any n-point subset of Euclidean Space embeds with distortion (1+e) in dimension Θ(-2log n) • [ABN 06, B 06]: Dimension Θ(log n) In fact:Θ*(log n/ loglog n)

EmbeddingsMetrics in their Intrinsic Dimension • Definition: A metric space X has doubling constant λ, if any ball with radius r>0 can be covered with λ balls of half the radius. • Doubling dimension: dim(X) = log λ • [ABN 07b]: Any n point metric space X can be embedded into Lp with distortion O(log1+θn), dimensionO(dim(X)) • Same embedding, using: • nets • Lovász Local Lemma • Distortion-Dimension Tradeoff

Average Distortion • Practical measure of the quality of an embedding • Network embedding, Multi-dimensional scaling, Biology, Vision,… • Given a non-contracting embedding f:(X,dX)→(Y,dY): • [ABN06]: Every n point metric space embeds into Lpwith average distortion O(1), worst-case distortion Θ(log n) and dimension Θ(log n).

The lq-Distortion • lq-distortion: [ABN 06]:lq-distortion is bounded by Θ(q)

Dimension Reduction into Constant Dimension • [B 07]: Any finite subset of Euclidean Space embeds in dimension h with lq-distortioneO(q/h) ~ 1+ O(q/h) • Corollary: Every finite metric space embeds into Lpin dimension h with lq-distortion

Local Embeddings • Def:Ak-local embeddinghas distortion D(k) if for every k-nearest neighbors x,y: distf(x,y) ≤ D(k) • [ABN 07c]: For fixed k, k-local embedding into Lp distortion Q(log k) and dimension Q(log k) (under very weak growth bound condition) • [ABN 07c]:k-local embedding into Lp with distortion Õ(log k) on neighbors, for all k simultaneously, and dimension Q(log n) • Same embedding method • Lovász Local Lemma

Local Dimension Reduction • [BRS 07]: For fixed k, any finite set of points in Euclidean space has k-local embedding with distortion (1+e) in dimension Q(-2log k) (under very weak growth bound condition) • New embedding ideas • Lovász Local Lemma

Time for a…

Metric Ramsey Problem • Given a metric space what is the largest size subspace which has some special structure, e.g. close to be Euclidean • Graph theory: Every graph of size n contains either a clique or an independent set of size Q(log n) • Dvoretzky’s theorem… • [BFM 86]: Every n point metric space contains a subspace of size W(ce log n) which embeds in Euclidean space with distortion (1+e)

(u) (v) (w) x z (z)=0 0 = (z)  (w)/k(v)/k2(u)/k3 d(x,z)= (lca(x,z))= (v) Basic Structures: Ultrametric, k-HST [B 96] • An ultrametrick-embedsin ak-HST (moreover this • can be done so that labels are powers of k).

1 D2 D1/ k D2 1 1 1 D2 D2 D3 D2/ k 1 D3 D3 D3 D3 D3 Hierarchically Well-Separated Trees

Properties of Ultrametrics • An ultrametric is a tree metric. • Ultrametrics embed isometrically inl2. • [BM 04]:Any n-point ultrametric (1+)- embeds in lpd, whered= O(-2log n) .

A Metric Ramsey Phenomenon • Consider n equally spaced points on the line. • Choose a “Cantor like” set of points, and construct a binary tree over them. • The resulting tree is 3-HST, and the original subspace embeds in this tree with distortion 3. • Size of subspace: .

Metric Ramsey Phenomena • [BLMN 03, MN 06, B 06]: Anyn-point metric space contains a subspace of size which embeds in an ultrametric with distortion Θ(1/e) • [B 06]: Anyn-point metric space contains a subspace of linear size which embeds in an ultrametric with lq-distortion is bounded by Õ(q)

Metric Ramsey Theorems • Key Ingredient:Partitions

Complete Representation via Ultrametrics ? • Goal: Given an n point metric space, we would like to embed it into an ultrametric with low distortion. • Lower Bound:W(n), in fact this holds event for embedding the n-cycle into arbitrary tree metrics [RR 95]

C Probabilistic Embedding • [Karp 89]:Then-cycle probabilistically-embeds in n-line spaces with distortion 2 • If u,vare adjacent in the cycle Cthen E(dL(u,v))= (n-1)/n + (n-1)/n < 2 = 2dC(u,v)

Probabilistic Embedding • [B 96,98,04, FRT 03]:Anyn-point metric space probabilistically embeds into an ultrametric with distortion Θ(log n) [ABN 05,06, CDGKS 05]: lq-distortion is Θ(q)

Probabilistic Embedding • Key Ingredient:Probabilistic Partitions

η x2 x1 η Probabilistic Partitions • P={S1,S2,…St} is a partition of X if • P(x)is the cluster containing x. • P is Δ-bounded if diam(Si)≤Δfor all i. • A probabilistic partitionP is a distribution over a set of partitions. • P is (η,d)-padded if • Call Pη-padded if d=1/2. • [B 96]h=Q(1/(log n)) • [CKR01+FRT03, ABN06]: η(x)= Ω(1/log (ρ(x,Δ))

Partitions and Embedding • [B 96, Rao 99, …] • Let Δi=4ibe the scales. • For each scale i, create a probabilistic Δi-boundedpartitions Pi,that are η-padded. • For each cluster choose σi(S)~Ber(½) i.i.d. fi(x)= σi(Pi(x))·d(x,X\Pi(x)) • Repeat O(log n) times. • Distortion : O(η-1·log1/pΔ). • Dimension : O(log n·log Δ). diameter of X =Δ Δi 16 4 x d(x,X\P(x))

Time to…

Uniform Probabilistic Partitions • In a Uniform Probabilistic Partitionη:X→[0,1] all points in a cluster have the same padding parameter. • [ABN 06]: Uniform partition lemma: There exists a uniform probabilistic Δ-bounded partition such that for any , η(x)=log-1ρ(v,Δ),where • The local growth rate of x at radius r is: C1 C2 v2 v1 v3 η(C1)  η(C2) 

Embedding into a single dimension • Let Δi=4i. • For each scale i, create uniformly padded probabilistic Δi-boundedpartitions Pi. • For each cluster choose σi(S)~Ber(½) i.i.d. , fi(x)= σi(Pi(x))·ηi-1(x)·d(x,X\Pi(x)) • Upper bound : |f(x)-f(y)| ≤ O(log n)·d(x,y). • Lower bound: E[|f(x)-f(y)|] ≥Ω(d(x,y)) • ReplicateD=Θ(log n) times to get high probability.

Upper Bound:|f(x)-f(y)| ≤ O(log n) d(x,y) • For all x,yєX: - Pi(x)≠Pi(y)implies fi(x)≤ ηi-1(x)· d(x,y) - Pi(x)=Pi(y)impliesfi(x)-fi(y)≤ ηi-1(x)· d(x,y) Use uniform padding in cluster

Lower Bound: y x • Take a scale i such that Δi≈d(x,y)/4. • It must be that Pi(x)≠Pi(y) • With probability ½ : ηi-1(x)d(x,X\Pi(x))≥Δi

Partial Embedding & Scaling Distortion • Definition: A (1-ε)-partial embedding has distortion D(ε), if at least 1-ε of the pairs satisfy distf(u,v) ≤ D(ε) • Definition: An embedding has scaling distortion D(·) if it is a 1-ε partial embedding with distortion D(ε), for all ε>0 • [KSW 04] • [ABN 05, CDGKS 05]: • Partial distortion and dimensionQ(log(1/ε)) • [ABN06]: Scaling distortion Q(log(1/ε)) for all metrics

lq-Distortion vs. Scaling Distortion • Upper boundD(e) = c log(1/e) on Scaling distortion: • ½ of pairs have distortion ≤ c log 2 = c • + ¼ ofpairs have distortion ≤ c log 4 = 2c • + ⅛ ofpairs have distortion ≤ c log 8 = 3c • …. • Average distortion = O(1) • Wost case distortion= O(log(n)) • lq-distortion = O(min{q,log n})

Coarse Scaling Embedding into Lp • Definition: For uєX, rε(u) is the minimal radius such that |B(u,rε(u))| ≥ εn. • Coarse scaling embedding: For each uєX,preserves distances to v s.t. d(u,v) ≥rε(u). rε(w) w rε(u) u rε(v) v

Scaling Distortion • Claim: If d(x,y) ≥ rε(x) then 1 ≤ distf(x,y) ≤ O(log 1/ε) • Let l be the scale d(x,y) ≤Δl < 4d(x,y) • Lower bound: E[|f(x)-f(y)|] ≥ d(x,y) • Upper bound for high diameter terms • Upper bound for low diameter terms • ReplicateD=Θ(log n) times to get high probability.

Upper Bound for high diameter terms:|f(x)-f(y)| ≤ O(log 1/ε) d(x,y) Scale l such that rε(x)≤d(x,y) ≤Δl < 4d(x,y).

Upper Bound for low diameter terms:|f(u)-f(v)| =O(1) d(u,v) Scale l such that d(x,y) ≤Δl < 4d(x,y). • All lower levels i ≤ l are bounded by Δi.

Embedding into trees with Constant Average Distortion • [ABN 07a]:An embedding of any n point metric into a single ultrametric. • An embedding of any graph on n vertices into a spanning tree of the graph. • Average distortion = O(1). • L2-distortion = • Lq-distortion = Θ(n1-2/q), for 2<q≤∞

Conclusion • Developing mathematical theory of embedding of finite metric spaces • Fruitful interaction between computer science and pure/applied mathematics • New concepts of embedding yield surprisingly strong properties

Summary • Unified frameworkfor embedding finite metrics. • Probabilistic embeddinginto ultrametrics. • MetricRamsey theorems. • Newmeasuresof distortion. • Embeddings with strong properties: • Optimalscaling distortion. • Constantaverage distortion. • Tightdistortion-dimensiontradeoff. • Embedding metrics intheirintrinsic dimension. • Embedding that strongly preservelocality.

Advances in Metric Embedding Theory: Embeddings and Distortions in Metric Spaces