Download
indexing and data mining in multimedia databases n.
Skip this Video
Loading SlideShow in 5 Seconds..
Indexing and Data Mining in Multimedia Databases PowerPoint Presentation
Download Presentation
Indexing and Data Mining in Multimedia Databases

Indexing and Data Mining in Multimedia Databases

254 Views Download Presentation
Download Presentation

Indexing and Data Mining in Multimedia Databases

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos

  2. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resources C. Faloutsos

  3. Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: • Allow fast, approximate queries, and • Find rules/patterns C. Faloutsos

  4. Sample queries • Similarity search • Find pairs of branches with similar sales patterns • find medical cases similar to Smith's • Find pairs of sensor series that move in sync C. Faloutsos

  5. Sample queries –cont’d • Rule discovery • Clusters (of patients; of customers; ...) • Forecasting (total sales for next year?) • Outliers (eg., fraud detection) C. Faloutsos

  6. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resourses C. Faloutsos

  7. Indexing - Multimedia Problem: • given a set of (multimedia) objects, • find the ones similar to a desirable query object (quickly!) C. Faloutsos

  8. $price $price $price 1 1 1 365 365 365 day day day distance function: by expert C. Faloutsos

  9. ‘GEMINI’ - Pictorially eg,. std S1 F(S1) 1 365 day F(Sn) Sn eg, avg off-the-shelf S.A.Ms (spatial Access Methods) 1 365 day C. Faloutsos

  10. fast; ‘correct’ (=no false dismissals) used for images (eg., QBIC) (2x, 10x faster) shapes (27x faster) video (eg., InforMedia) time sequences ([Rafiei+Mendelzon], ++) ‘GEMINI’ C. Faloutsos

  11. Remaining issues • how to extract features automatically? • how to merge similarity scores from different media C. Faloutsos

  12. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

  13. ~100 ~1 FastMap ?? C. Faloutsos

  14. FastMap • Multi-dimensional scaling (MDS) can do that, but in O(N**2) time • We want a linear algorithm: FastMap [SIGMOD95] C. Faloutsos

  15. Applications: time sequences • given n co-evolving time sequences • visualize them + find rules [ICDE00] DEM rate JPY HKD time C. Faloutsos

  16. Applications - financial • currency exchange rates [ICDE00] FRF GBP JPY HKD USD(t) USD(t-5) C. Faloutsos

  17. FRF DEM HKD JPY USD GBP Applications - financial • currency exchange rates [ICDE00] USD(t) USD(t-5) C. Faloutsos

  18. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

  19. Merging similarity scores • eg., video: text, color, motion, audio • weights change with the query! • solution 1: user specifies weights • solution 2: user gives examples  • and we ‘learn’ what he/she wants: rel. feedback (Rocchio, MARS, MindReader) • but: how about disjunctive queries? C. Faloutsos

  20. DEMO demo server C. Faloutsos

  21. ‘FALCON’ Vs Inverted Vs Trader wants only ‘unstable’ stocks C. Faloutsos

  22. ‘FALCON’ Vs Inverted Vs average: is flat! C. Faloutsos

  23. “Single query point” methods std + + + x + + + avg Rocchio C. Faloutsos

  24. + + + + + + + + + + + + “Single query point” methods + + + x x x + + + Rocchio MindReader MARS The averaging affect in action... C. Faloutsos

  25. Main idea: FALCON Contours [Wu+, vldb2000] + + feature2 eg., std + + + feature1 (eg., avg) C. Faloutsos

  26. + + + + + A: Aggregate Dissimilarity • : parameter (~ -5 ~ ‘soft OR’) x g1 g2 C. Faloutsos

  27. converges quickly (~5 iterations) good precision/recall is fast (can use off-the-shelf ‘spatial/metric access methods’) FALCON C. Faloutsos

  28. Conclusions for indexing + visualization • GEMINI: fast indexing, exploiting off-the-shelf SAMs • FastMap: automatic feature extraction in O(N) time • FALCON: relevance feedback for disjunctive queries C. Faloutsos

  29. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resourses C. Faloutsos

  30. Data mining & fractals – Road map • Motivation – problems / case study • Definition of fractals and power laws • Solutions to posed problems • More examples C. Faloutsos

  31. Problem #1 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) • - ‘spiral’ and ‘elliptical’ galaxies • (stores & households; healthy & ill subjects) • - patterns? (not Gaussian; not uniform) • attraction/repulsion? • separability?? C. Faloutsos

  32. Problem#2: dim. reduction mpg • given attributes x1, ... xn • possibly, non-linearly correlated • drop the useless ones (Q: why? A: to avoid the ‘dimensionality curse’) engine size C. Faloutsos

  33. Answer: • Fractals / self-similarities / power laws C. Faloutsos

  34. What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; infinite length! ... C. Faloutsos

  35. Definitions (cont’d) • Paradox: Infinite perimeter ; Zero area! • ‘dimensionality’: between 1 and 2 • actually: Log(3)/Log(2) = 1.58… (long story) C. Faloutsos

  36. Q: fractal dimension of a line? Intrinsic (‘fractal’) dimension Eg: #cylinders; miles / gallon C. Faloutsos

  37. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 Intrinsic (‘fractal’) dimension C. Faloutsos

  38. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) ) Intrinsic (‘fractal’) dimension C. Faloutsos

  39. log(#pairs within <=r ) 1.58 log( r ) Sierpinsky triangle == ‘correlation integral’ C. Faloutsos

  40. Observations self-similarity -> • <=> fractals • <=> scale-free • <=> power-laws (y=x^a, F=C*r^(-2)) log(#pairs within <=r ) 1.58 log( r ) C. Faloutsos

  41. Road map • Motivation – problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • Conclusions C. Faloutsos

  42. Solution#1: spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol - ‘BOPS’ plot - [sigmod2000]) • clusters? • separable? • attraction/repulsion? • data ‘scrubbing’ – duplicates? C. Faloutsos

  43. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  44. Solution#1: spatial d.m. [w/ Seeger, Traina, Traina, SIGMOD00] log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  45. r1 r2 r2 r1 spatial d.m. Heuristic on choosing # of clusters C. Faloutsos

  46. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  47. Solution#1: spatial d.m. log(#pairs within <=r ) • - 1.8 slope • - plateau! • repulsion!! ell-ell spi-spi -duplicates spi-ell log(r) C. Faloutsos

  48. Problem #2: Dim. reduction C. Faloutsos

  49. Solution: • drop the attributes that don’t increase the ‘partial f.d.’ PFD • dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00] C. Faloutsos

  50. Problem #2: dim. reduction global FD=1 PFD=1 PFD~1 PFD=0 PFD=1 PFD~1 C. Faloutsos