200 likes | 289 Vues
Chapter 12 Multimedia IR: Indexing and Searching. Date: 11/17/2005. Introduction. Feature extraction, feature indexing, distance, similarity query Distance
E N D
Chapter 12 Multimedia IR: Indexing and Searching Date: 11/17/2005
Introduction • Feature extraction, feature indexing, distance, similarity query • Distance • Editing distance: smallest number of insertions, deletions, and substitutions that are needed to transform the first string to the second • Euclidean distance: • Similarity query • Whole match: the query and the objects are of the same type • Sub-pattern match: 16×16 sub-pattern on 512×512 grey-scale images • Nearest neighbors • All pairs
Correctness of query results • False dismissal: unacceptable • False alarm: can be discarded via post-processing • Spatial access method: R-tree
A generic indexing approach • The whole match problem • Given O1, O2, …, On, D(Oi, Oj), Q, {Oi | D(Q, Oi) } • Basic idea of the approach • A quick-and-dirty test • Discard non-qualifying objects • Allow false alarms • Use of SAM
An example (yearly stock-price movements) • Average as the quick-and-dirty test • Large difference can’t be similar • Small difference similar false alarm • f features reduce false alarms each object can be mapped into a point in f-dimensional space • No need to test all f-d points ?
Preservation of distance mapping • Exact preservation • No false alarm, no false dismissal • Difficult to find such features • Dimensionality curse • No false dismissal if Dfeature(F(O1), F(O2)) D(O1, O2) • With potential false alarms • The lower bounding lemma
Feature selection • Preserve distance • Carry much information about the corresponding objects to reduce false alarms • Nearest neighbor query • Find the point F(P) that is the nearest neighbor to the query point F(Q) • Issue a range query with Q and = D(Q, P)
One-dimensional time series • The first day’s value is a bad feature • 365 values dimensionality curse • Average is better • Discrete Fourier Transform • For a signal x = [xi], i = 0, 1, …, n-1, let XF denote the DFT coefficient at the F-th frequency, F = 0, 1, …, n-1 • Keeping the first f coefficients of the DFT as the features
The fewer the coefficients that contain most of the energy, the fewer the false alarms, and the faster the response time • The dimensionality curse is avoided with the low-bounding lemma and the energy-concentrating property of the DFT • f = 1 ~ 3