200 likes | 298 Vues
Explore feature extraction, indexing methods, distance calculation, and similarity queries in multimedia information retrieval. Learn about editing and Euclidean distance, nearest neighbors, false dismissal, false alarm detection, and spatial access methods. Discover the importance of preserving distances, selecting effective features, and handling high-dimensional data. Gain insights into feature selection, nearest neighbor queries, and one-dimensional time series analysis using techniques like Discrete Fourier Transform.
E N D
Chapter 12 Multimedia IR: Indexing and Searching Date: 11/17/2005
Introduction • Feature extraction, feature indexing, distance, similarity query • Distance • Editing distance: smallest number of insertions, deletions, and substitutions that are needed to transform the first string to the second • Euclidean distance: • Similarity query • Whole match: the query and the objects are of the same type • Sub-pattern match: 16×16 sub-pattern on 512×512 grey-scale images • Nearest neighbors • All pairs
Correctness of query results • False dismissal: unacceptable • False alarm: can be discarded via post-processing • Spatial access method: R-tree
A generic indexing approach • The whole match problem • Given O1, O2, …, On, D(Oi, Oj), Q, {Oi | D(Q, Oi) } • Basic idea of the approach • A quick-and-dirty test • Discard non-qualifying objects • Allow false alarms • Use of SAM
An example (yearly stock-price movements) • Average as the quick-and-dirty test • Large difference can’t be similar • Small difference similar false alarm • f features reduce false alarms each object can be mapped into a point in f-dimensional space • No need to test all f-d points ?
Preservation of distance mapping • Exact preservation • No false alarm, no false dismissal • Difficult to find such features • Dimensionality curse • No false dismissal if Dfeature(F(O1), F(O2)) D(O1, O2) • With potential false alarms • The lower bounding lemma
Feature selection • Preserve distance • Carry much information about the corresponding objects to reduce false alarms • Nearest neighbor query • Find the point F(P) that is the nearest neighbor to the query point F(Q) • Issue a range query with Q and = D(Q, P)
One-dimensional time series • The first day’s value is a bad feature • 365 values dimensionality curse • Average is better • Discrete Fourier Transform • For a signal x = [xi], i = 0, 1, …, n-1, let XF denote the DFT coefficient at the F-th frequency, F = 0, 1, …, n-1 • Keeping the first f coefficients of the DFT as the features
The fewer the coefficients that contain most of the energy, the fewer the false alarms, and the faster the response time • The dimensionality curse is avoided with the low-bounding lemma and the energy-concentrating property of the DFT • f = 1 ~ 3