1 / 17

SIGMOD 08, June 10 th 2008, Vancouver, Canada

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction. SIGMOD 08, June 10 th 2008, Vancouver, Canada Marc Wichterich , Ira Assent, Philipp Kranen, Thomas Seidl. Outline. Introduction Similarity Search The Earth Mover’s Distance

noura
Télécharger la présentation

SIGMOD 08, June 10 th 2008, Vancouver, Canada

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient EMD-based Similarity Search in Multimedia Databases viaFlexible Dimensionality Reduction SIGMOD 08, June 10th 2008, Vancouver, Canada Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl

  2. Outline • Introduction • Similarity Search • The Earth Mover’s Distance • Dimensionality Reduction • Dimensionality Reduction for the EMD • Reduction Matrixes • Data-independent Reduction • Data-dependent Reduction • Experimental Results • Conclusion & Outlook

  3. Introduction – Similarity Search • Objective: Find similar objects in database • Applications: • Medical images, edutainment, engineering, etc. • Requires: • Object feature extraction (here: feature histograms) • Similarity measure (here: Earth Mover’s Distance) • Efficient retrieval technique for similar objects similar? similar?

  4. Introduction – The Earth Mover’s Distance[1] • Transform object features to match those of other object • Minimum “cost x flow” for transformation: EMD Flows histogramy histogramx histogramx histogramy [1] Rubner, Tomasi, Perceptual Metrics for Image Database Navigation, Kluwer, 2001.

  5. Introduction – Dimensionality Reduction • Challenge for Similarity Search: high computational complexity for high dimensionalities • Approach: • Reduce dimensionality of query & DB • Filter DB using lower dimensionality • Refine using orig. dimensionality • Filter quality criteria • Selectivity (few refinements) • No false dismissals (lower bound property) reduce

  6. Dimensionality Reduction for the EMD reduce • Both the feature vectorsand the cost matrixhave to be reduced • General linear dimensionality reduction techniques (PCA, ICA, etc.) fail quality criteria for EMD • Discarding dimensions destroys LB property • Splitting dimensions causes poor selectivity • Aggregating dimensionality reductions can work well • Original dimensions are not split up • Each reduced dimension consists of set of orig. dimensions

  7. Reduction Matrixes • Aggregating dimensionality reductions are characterized by reduction matrix R = [ rab ]  {0,1} d x d’ with • Example: • Lower-bounding reduced cost matrix C’ = [ c’a’b’ ] given R • as given by [2] • There is no larger lower bound (see paper) • Main question: Which dimensions to aggregate? 1 0 1 0 0 1 0 1 R = 1 0 1 0 0 1 0 1 x = ( 2 4 3 6 ) x' = ( 2 4 3 6 ) • = ( 6 9 ) [2] Ljosa, Bhattacharya, Singh, Indexing Spatially Sensitive Distance Measures using Multi-Resolution Lower Bounds, EDBT2006.

  8. Data-Independent Reduction • Goal: Tight lower bound (large reduced EMD values) • Large cost between reduced dimensions • Small loss of cost for each reduced dimension • Matches clustering goal: low intra-cluster dissimilarity / high inter-cluster dissimilarity • kMedoid clustering based on the cost matrix 0 1 3 4 1 0 2 3 3 2 0 1 4 3 1 0 1 0 1 0 0 1 0 1 0 2 2 0 C = C' = R = lost cost information

  9. Data-Dependent Reduction based on flows • Idea: Incorporate knowledge on data for better reduction • In data-independent reduction, only C is used • Problem: Ensuring large c’a’b’ pointless if f’a’b’ is small • Now: Also include information on F

  10. Data-Dependent Reduction: Algorithm • Add preprocessing step analyzing the data • Collect information about flows in unreduced EMD • Use information to improve initial / intermediate reduction matrix • iterate until no improvement made intermediate R yes calculate EMD/collect flows improve R improved? sample data S flows R no final R original data initial R

  11. Data-Dependent Reduction: Preprocessing • Calculate average flow matrix F = [ fab ] for sample S of DB • Approximate the flows F’ in reduced EMD with F’ = RT F R • Maximize approximate average reduced EMD _ _ _ ~ 2 1 2 3 0 1 2 1 3 2 3 1 1 3 0 1 1 0 1 0 0 1 0 1 _ ~ 4 8 9 5 F = F' = R = approximate average reduced flows approximate average reduced EMD average flows

  12. Data-Dependent Reduction: Optimization • Global optimization ofrequires assessment of all possible reduction matrices • Find local optimum via reassignment of dimensions • FB-All: Choose best reassignment in each iteration • FB-Mod: Choose first profitable reassignment in each iteration • Initial reduction matrices • Base: assign all original dimensions to first reduced dimension • KMed: reduction matrix from data-independent reduction

  13. Experimental Results • Data-independent vs. data-dependent aggregation sample image [2] data independent (kMedoid) data dependent (FB-All-Mod) costliest flows

  14. Experimental Results • Efficiency vs. reduced dimensionality (Retina DB)

  15. Experimental Results • Efficiency vs. reduced dimensionality (IRMA DB)

  16. Experimental Results • Filter & Refinement times and filter selectivity (IRMA DB)

  17. Conclusion & Outlook • Conclusion • Earth Mover’s Distance as a similarity measure • High quality, but computationally expensive in high dimensions • Dimensionality reduction for the EMD • Data-independent reduction: Clustering in feature space • Data-dependent reduction: Analyze flow information • Outlook • Local reductions • Different reduction for query and DB • Index reduced histograms using [3] [3] Assent, Wichterich, Meisen, Seidl, Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases, ICDE 2008.

More Related