1 / 13

Distance Metric

Distance Metric. Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y , such that      d(X, Y) is positive definite :   if (X  Y), d(X, Y) > 0 if (X = Y), d(X, Y) = 0 d(X, Y) is symmetric: d(X, Y) = d(Y, X)

alaire
Télécharger la présentation

Distance Metric

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that      d(X, Y)is positive definite:  if (X  Y), d(X, Y) > 0 if (X = Y), d(X, Y) = 0 d(X, Y) issymmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality:d(X, Y) + d(Y, Z)  d(X, Z)

  2. Standard Distance Metrics Minkowski distance or Lp distance, Manhattan distance, (P = 1) Euclidian distance, (P = 2) Max distance, (P = )

  3. An Example Y (6,4) Z X (2,1) A two-dimensional space: Manhattan, d1(X,Y)= XZ+ ZY =4+3 = 7 Euclidian, d2(X,Y)= XY = 5 Max, d(X,Y)= Max(XZ, ZY) = XZ = 4 d1d2 d For any positive integer p,

  4. HOBbit Similarity These notes contain NDSU confidential & Proprietary material. Patents pending on bSQ, Ptree technology Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x1: 0 1 10 1 0 0 1 x2: 0 1 0 11 1 0 1 y1: 0 1 11 1 1 0 1 y2: 0 1 0 1 0 0 0 0 HOBbitS(x1, y1) = 3 HOBbitS(x2, y2) = 4 Higher Order Bit (HOBbit) similarity: HOBbitS(A, B) = A, B: two scalars (integer) ai, bi :ith bit of A and B (left to right) m : number of bits

  5. HOBbit Distance (High Order Bifurcation bit) Example: Bit position: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 x1: 0 1 10 1 0 0 1 x2: 0 1 0 11 1 0 1 y1: 0 1 11 1 1 0 1 y2: 0 1 0 1 0 0 0 0 HOBbitS(x1, y1) = 3 HOBbitS(x2, y2) = 4 dv(x1, y1) = 8 – 3 = 5 dv(x2, y2) = 8 – 4 = 4 HOBbit distance between two scalar value A and B:dv(A, B)= m – HOBbit(A, B) HOBbit distance for X and Y: In our example (considering 2-dim data): dh(X, Y) = max (5, 4) = 5

  6. HOBbit Distance Is a Metric HOBbit distance is positive definite if (X = Y), = 0 if (XY), > 0 HOBbit distance is symmetric HOBbit distance holds triangle inequality

  7. Neighborhood of a Point 2r 2r 2r 2r X X X X T T T T Neighborhood of a target point, T, is a set of points, S, such thatXSif and only if d(T, X) r Manhattan Euclidian Max HOBbit If Xis a point on the boundary, d(T, X) = r

  8. Decision Boundary Manhattan Euclidian Max Max Euclidian Manhattan  > 45  < 45 X A A A A A R1 B B B B B d(A,X) d(B,X) R2 D decision boundary between points A and B, is the locus of the point X satisfying d(A, X) = d(B, X) Decision boundary for HOBbit Distance is perpendicular to axis that makes max distance Decision boundaries for Manhattan, Euclidean and max distance

  9. Minkowski Metrics ? Lp-metrics (aka: Minkowski metrics) dp(X,Y) = (i=1 to n wi|xi - yi|p)1/p (weights, wi assumed =1)Unit DisksBoundary p=1 (Manhattan) p=2 (Euclidean) p=3,4,… . . P= (chessboard) P=½,⅓, ¼, … dmax≡ max|xi - yi|  d≡ limp  dp(X,Y). Proof (sort of) limp  { i=1 to n aip }1/p ‎ max(ai) ≡b. For p large enough, other aip << bp since y=xp increasingly concave, so i=1 to n aip  k*bp(k=duplicity of b in the sum), so {i=1 to n aip }1/p  k1/p*b and k1/p1

  10. P>1Lpmetrics q x1 y1 x2 y2 Lq distance x to y 2 .5 0 .5 0 .7071067812 4 .5 0 .5 0 .5946035575 9 .5 0 .5 0 .5400298694 100 .5 0 .5 0 .503477775 MAX .5 0 .5 0 .5 x y q x1 y1 x2 y2 Lq distance x to y 2 .71 0 .71 0 1.0 3 .71 0 .71 0 .8908987181 7 .71 0 .71 0 .7807091822 100 .71 0 .71 0 .7120250978 MAX .71 0 .71 0 .7071067812 x y q x1 y1 x2 y2 Lq distance x to y 2 .99 0 .99 0 1.4000714267 8 .99 0 .99 0 1.0796026553 100 .99 0 .99 0 .9968859946 1000 .99 0 .99 0 .9906864536 MAX .99 0 .99 0 .99 x y x q x1 y1 x2 y2 Lq distance x to y 2 1 0 1 0 1.4142135624 9 1 0 1 0 1.0800597389 100 1 0 1 0 1.0069555501 1000 1 0 1 0 1.0006933875 MAX 1 0 1 0 1 y q x1 y1 x2 y2 Lq distance x to y 2 .9 0 .1 0 .9055385138 9 .9 0 .1 0 .9000000003 100 .9 0 .1 0 .9 1000 .9 0 .1 0 .9 MAX .9 0 .1 0 .9 y x x q x1 y1 x2 y2 Lq distance x to y 2 3 0 3 0 4.2426406871 3 3 0 3 0 3.7797631497 8 3 0 3 0 3.271523198 100 3 0 3 0 3.0208666502 MAX 3 0 3 0 3 y x q x1 y1 x2 y2 Lq distance x to y 6 90 0 45 0 90.232863532 9 90 0 45 0 90.019514317 100 90 0 45 0 90 MAX 90 0 45 0 90 y

  11. x P<1Lpmetrics q x1 y1 x2 y2 Lq distance x to y 1 .1 0 .1 0 .2 .8 .1 0 .1 0 .238 .4 .1 0 .1 0 .566 .2 .1 0 .1 0 3.2 .1 .1 0 .1 0 102 .04 .1 0 .1 0 3355443 .02 .1 0 .1 0 112589990684263 .01 .1 0 .1 0 1.2676 E+29 2 .1 0 .1 0 .141421356 x y y q x1 y1 x2 y2 Lq distance x to y 1 .5 0 .5 0 1 .8 .5 0 .5 0 1.19 .4 .5 0 .5 0 2.83 .2 .5 0 .5 0 16 .1 .5 0 .5 0 512 .04 .5 0 .5 0 16777216 .02 .5 0 .5 0 5.63 E+14 .01 .5 0 .5 0 6.34 E+29 2 .5 0 .5 0 .7071 q x1 y1 x2 y2 Lq distance x to y 1 .9 0 0.1 0 1 .8 .9 0 0.1 0 1.098 .4 .9 0 0.1 0 2.1445 .2 .9 0 0.1 0 10.82 .1 .9 0 0.1 0 326.27 .04 .9 0 0.1 0 10312196.962 .02 .9 0 0.1 0 341871052443154 .01 .9 0 0.1 0 3.8 E+29 2 .9 0 0.1 0 .906 y x d1/p(X,Y) = (i=1 to n |xi - yi|1/p)p P<1 For p=0 (lim as p0), Lp doesn’t exist (Does not converge.)

  12. Min dissimilarity function The dmin function ( dmin(X,Y) = min i=1 to n|xi - yi| ) is strange. It is not even a psuedo-metric. The Unit Disk is: And the neighborhood of the blue point relative to the red point (the neighborhood of points closer to the blue than the red) is strangely shaped! http://www.cs.ndsu.nodak.edu/~serazi/research/Distance.html

  13. Other Interesting Metrics Canberra metric: dc(X,Y) = (i=1 to n |xi – yi| / (xi + yi) normalized manhattan distance Square Cord metric: dsc(X,Y) = i=1 to n( xi – yi )2 Already discussed as Lp with p=1/2 Squared Chi-squared metric: dchi(X,Y) = i=1 to n (xi – yi)2/ (xi + yi) Scalar Product metric: dchi(X,Y) = X • Y = i=1 to n xi * yi Hyperbolic metrics: (which map infinite space 1-1 onto a sphere) Which are rotationally invariant? Translation invariant? Other? Some notes on distance functions can be found at http://www.cs.ndsu.NoDak.edu/~datasurg/distance_similarity.pdf

More Related