1 / 102

Approximate Nearest Neighbor - Applications to Vision & Matching

Approximate Nearest Neighbor - Applications to Vision & Matching. Lior Shoval Rafi Haddad. Approximate Nearest Neighbor Applications to Vision & Matching. Object matching in 3D Recognizing cars in cluttered scanned images A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik

howe
Télécharger la présentation

Approximate Nearest Neighbor - Applications to Vision & Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate Nearest Neighbor - Applications to Vision & Matching Lior Shoval Rafi Haddad

  2. Approximate Nearest NeighborApplications to Vision & Matching • Object matching in 3D • Recognizing cars in cluttered scanned images • A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik • Video Google • A Text Retrieval Approach to object Matching in Videos • Sivic, J. and Zisserman, A

  3. Object Matching • Input: • An object and a dataset of models • Output: • The most “similar” model • Two methods will be presented • Voting based method • Cost based method Model Sn Model S1 Model S2 Object Sq …

  4. A descriptor based Object matching - Voting • Every descriptor vote for the model that gave the closet descriptor • Choose the model with the most votes • Problem • The hard vote discards the relative distances between descriptors Model Sn Model S1 Model S2 Object Sq …

  5. A descriptor based Object matching - Cost • Compare all object descriptors to all target model descriptors Model Sn Model S1 Model S2 Object Sq …

  6. Application to cars matching

  7. Matching - Nearest Neighbor • In order to match the object to the right model a NN algorithm is implemented • Every descriptor in the object is compared to all descriptors in the model • The operational cost is very high.

  8. Experiment 1 – Model matching

  9. Experiment 2 – Clutter scenes

  10. Matching - Nearest Neighbor • E.g: • Q – 160 descriptors in the object • N – 83,640 [ref. desc.] X 12 [rotations] ~ 1E6 descriptors in the models • Exact NN - takes 7.4 Sec on 2.2GHz processor per one object descriptor

  11. Speeding search with LSH • Fast search techniques such as LSH (Locality-sensitive hashing) can reduce the search space by order of magnitude • Tradeoff between speed and accuracy • LSH – Dividing the high dimensional feature space into hypercubes, devided by a set of k randomly-chosen axis parallel hyperplanes & l different sets of hypercubes

  12. LSH – k=4; l=1

  13. LSH – k=4; l=2

  14. LSH – k=4; l=3

  15. LSH - Results • Taking the best 80/160 descriptors • Achieving close results with fewer descriptors

  16. Descriptor based Object matching – Reducing Complexity • Approximate nearest neighbor • Dividing the problem to two stages • Preprocessing • Querying • Locality-Sensitive Hashing (LSH) • Or...

  17. Video Google • A Text Retrieval Approach to object Matching in Videos

  18. Query Results

  19. Interesting facts on Google The most used search engine in the web

  20. Who wants to be a Millionaire?

  21. How many pages Google search? a. Around half a billion b. Around 4 billions c. Around 10 billions d. Around 50 billions

  22. How many machines do Google use? a. 10 b. Few hundreds c. Few thousands d. Around a million

  23. Video Google: On-line Demo Samples Run Lola Run: Supermarket logo (Bolle)Frame/shot 72325 / 824 Red cube logo:Entry frame/shot 15626 / 174 Rolette #20 Frame/shot94951 / 988 Groundhog Day: Bill Murray's tiesFrame/shot 53001/294Frame/shot 40576/208 Phil's home:Entry frame/shot 34726/172

  24. Query

  25. Occluded !!!

  26. Video Google • Text Google • Analogy from text to video • Video Google processes • Experimental results • Summary and analysis

  27. Text retrieval overview • Word & Document • Vocabulary • Weighting • Inverted file • Ranking

  28. Words & Documents • Documents are parsed into words • Common words are ignored (the, an, etc) • This is called ‘stop list’ • Words are represented by their stems • ‘walk’, ‘walking’, ‘walks’’walk’ • Each word is assigned a unique identifier • A document is represented by a vector • With components given by the frequency of occurrence of the words it contains

  29. Vocabulary • The vocabulary contains K words • Each document is represented by a K components vector of words frequencies (0,0, … 3,… 4,…. 5, 0,0)

  30. Example: “…… Representation, detection and learning are the main issues that need to be tackled in designing a visual system for recognizing object. categories …….”

  31. Parse and clean represent detect learn Representation, detection and learning are the main issue tackle design main issues that need to be tackled in designing visual system recognize category a visual system for recognizing object categories. …

  32. Creating document vector ID • Assign unique id to each word • Create a document vector of size K with word frequency: • (3,7,2,………)/789 • Or compactly with the original order and position

  33. Weighting • The vector components are weighted in various ways: • Naive - Frequency of each word. • Binary– 1 if word appear 0 if not. • tf-idf - ‘Term Frequency – Inverse Document Frequency’

  34. tf-idf Weighting - Number of occurrences of word i in document - Total number of words in the document - The number of documents in the whole database - The number of occurrences of term i in the whole database => “Word frequency” X “Inverse document frequency” => All documents are equal!

  35. Inverted File – Index • Crawling stage • Parsing all documents to create document representing vectors • Creating word Indices • An entry for each word in the corpus followed by a list of all documents (and positions in it)

  36. Querying • Parsing the query to create query vectorQuery: “Representation learning” Query Doc ID = (1,0,1,0,0,…) • Retrieve all documents ID containing one of the Query words ID (Using the invert file index) • Calculate the distance between the query and document vectors (angle between vectors) • Rank the results

  37. Ranking the query results • Page Rank (PR) • Assume page A has page T1,T2…Tn links to it • Define C(X) as the number of links in page X • d is a weighting factor ( 0≤d≤1) • Word Order • Font size, font type and more

  38. Corpus Film The Visual Analogy ??? Word Stem ??? Document Frame Text Visual

  39. Detecting “Visual Words” • “Visual word” Descriptor • What is a good descriptor? • Invariant to different view points, scale, illumination, shift and transformation • Local Versus Global • How to build such a descriptor ? • Finding invariant regions in the frame • Representation by a descriptor

  40. Finding invariant regions • Two types of ‘viewpoint covariant regions’, are computed for each frame • SA – Shape Adapted • MS - Maximally Stable

  41. SA – Shape Adapted • Finding interest point using Harris corner detector • Iteratively determining the ellipse center, scale and shape around the interest point • Reference - Baumberg

  42. MS - Maximally Stable • Intensity water shade image segmentation • Iteratively determining the ellipse center, scale and shape • Reference - Matas

  43. Why two types of detectors ? • They are complementary representation of a frame • SA regions tends to centered at corner like features • MS regions correspond to blobs of high contrast (such as dark window on a gray wall) • Each detector describes a different “vocabulary” (e.g. the building design and the building specification)

  44. MS - MA example MS – yellow SA - cyan Zoom

More Related