1 / 33

Abstract

FastMap : Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Abstract. Describe a fast algorithm to map objects into points in some k-dimensional space, such that the dis-similarities are preserved. Abstract.

megan-kirk
Télécharger la présentation

Abstract

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FastMap : Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets

  2. Abstract • Describe a fast algorithm to map objects into points in some k-dimensional space, such that the dis-similarities are preserved.

  3. Abstract • Thus, we can subsequently use fine-tuned spatial access methods (SAMs) to answer queries such as “query by example” or “all pairs query”.

  4. Introduction • Not easy to extract k feature-extraction functions, which map to k-dimensional points • For instance, typed English words, what distance function should we consider to transform one string to the other?

  5. Solutions • Old : Multi-Dimensional Scaling (MDS) • Unsuitable for indexing • Proposed : Fast Algorithm • Much faster • Allow indexing

  6. Applications • Image and multimedia databases • Medical databases

  7. Applications • String databases, e.g. OCR • Time series, e.g. financial data

  8. Applications • Data mining and visualization applications

  9. Desirable types of queries • query-by-example search a collection of objects to find the ones that are within a user-defined distance from the query object • all pairs query find the pairs of objects which are within distance from each other

  10. Benefit of mapping objects • Accelerate the search time for queries, by employing SAMs like R*-trees and z-ordering • Help with visualization, clustering and data-mining

  11. Ideal mapping fulfills… • Fast to compute: O(N) or O(N logN), but not O(N 2) • Preserve distances with little discrepancies • Should be very fast to map a new object

  12. MDS • Used to discover the underlying (spatial) structure of a set of data items from the (dis)similarity information • Map objects to a k-dimensional space, so as to minimize the stress function

  13. MDS • Stress function • it is the average difference between the distance of the "images" and the actual distance.

  14. Drawbacks of MDS • Requires O(N2) time, which is impractical for large databases • Fast retrieval is questionable as MDS is not prepared for “query-by-example” operation

  15. Definitions • k-d point Pi that corresponds to the object Oi, will be called the ‘image’ of object Oi. That is , Pi = (xi,1, xi,2,…, xi,k) • k-d space containing ‘images’ will be called target space

  16. Proposed algorithm • Assumption: a domain expert has only provided us with a distance/dis-similarity function D (*, *) • For instance, the Euclidean distance between two feature vectors as the distance function between the corresponding objects

  17. Proposed algorithm • Pretend that objects are indeed points in some unknown n-dimensional space, and to try to project these points on k mutually orthogonal directions • The challenge is to compute these projections from the distance matrix only

  18. Proposed algorithm • Project the objects on a carefully selected “line” • Choose Oa and Ob be “pivot objects”

  19. Proposed algorithm • compute the distance of each point from the pivot points using only information we know, i.e., the distances between objects

  20. Proposed algorithm Oi Oa Ob Xi

  21. Proposed algorithm • By Cosine Law, in any triangle OaOiOb db,i2 = da,i2 + da,b2– 2xida,b • di,j the shorthand for the distance D (Oi, Oj)

  22. Proposed algorithm • By simple math manipulation Xi = (da,i2 + da,b2 - db,i2) / 2da,b • We can map objects into points on a line, preserving some of the distance information

  23. Proposed algorithm • Solved 2-d space • Extend to higher dimensions

  24. Proposed algorithm • Determines the coordinates of the N objects on a new axis, after each of k recursive calls • Record the “pivot objects” in each recursive call is to facilitate queries • Choose pivots objects by heuristic algorithm

  25. Proposed algorithm • All steps are linear • Complexity is O(N k)

  26. Experiments • Compare FastMap with MDS • speed and quality • Illustrate the visualization and clustering abilities • real and synthetic datasets

  27. Comparison with MDS • Response time vs. no. of database size

  28. Comparison with MDS • Response time vs. no. of dimensions k

  29. Comparison with MDS • Response time vs. stress

  30. Clustering/visualization properties of FastMap

  31. Clustering/visualization properties of FastMap

  32. Conclusion • A fast algorithm to map objects into points in k-d space • Accelerate searching by highly optimized SAMs e.g. R-trees, R*-trees etc. • Application of the algorithm to multimedia database, data-mining, clustering and document retrieval etc.

  33. Reference • Christos Faloutsos, King-Ip (David) Lin FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets • Joseph B. Kruskal, Myron Wish Multidimensional scaling

More Related