1 / 45

Visual Grouping and Recognition

Visual Grouping and Recognition. Jitendra Malik University of California at Berkeley. Collaborators. Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL) Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren Recognition: Serge Belongie, Jan Puzicha.

clem
Télécharger la présentation

Visual Grouping and Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Grouping and Recognition Jitendra Malik University of California at Berkeley

  2. Collaborators • Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL) • Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren • Recognition: Serge Belongie, Jan Puzicha

  3. From images to objects Labeled sets: tiger, grass etc

  4. What enables us to parse a scene? • Low level cues • Color/texture • Contours • Motion • Mid level cues • T-junctions • Convexity • High level Cues • Familiar Object • Familiar Motion

  5. Grouping factors

  6. But is segmentation a meaningful problem? • Difficult to define formally, but humans are remarkably consistent…

  7. Human Segmentations (1)

  8. Human Segmentations (2)

  9. A B C Consistency Perceptual organization forms a tree: Image BG L-bird R-bird bush far grass body beak body beak eye head eye head Two segmentations are consistent when they can be explained by the same segmentation tree (i.e. they could be derived from a single perceptual organization). • A,C are refinements of B • A,C are mutual refinements • A,B,C represent the same percept • Attention accounts for differences

  10. Ecological Statistics of image segmentation • Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950) • Design algorithm for incorporating multiple cues for image segmentation

  11. Proximity

  12. Similarity of brightness(cf. Coughlan & Yuille, Geman & Jedynek)

  13. Convexity

  14. Region Area • Compare to Alvarez,Gousseau,Morel y = Kx-  = 1.008

  15. Lengths of curves

  16. Image Segmentation as Graph Partitioning Build a weighted graph G=(V,E) from image V: image pixels E: connections between pairs of nearby pixels Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]

  17. Normalized Cut, A measure of dissimilarity • Minimum cut is not appropriate since it favors cutting small pieces. • Normalized Cut, Ncut:

  18. Normalized Cut As Generalized Eigenvalue problem • after simplification, we get

  19. Cue-Integration for Image Segmentation [Malik, Belongie, Shi, Leung 1999]

  20. On image segmentation.. • Humans are quite consistent, so model the goal as emulating their behavior. • Ecological statistics of grouping cues can be learned from image data. • We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.

  21. Framework for Recognition (1) Segmentation PixelsSegments (2) Association SegmentsRegions (3) Matching RegionsPrototypes ~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps Over-segmentation necessary; Under-segmentation fatal Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k

  22. Matching regions to views • GOAL: obtain small misclassification error using few views • Matching allowing deformations of prototype views makes this possible

  23. Matching with original and deformed prototypes Prototype Test Error

  24. Deforming Biological Shapes • D’Arcy Thompson: On Growth and Form, 1917 • studied transformations between shapes of organisms

  25. ... model target • Find correspondences between points on shape • Estimate transformation • Measure similarity

  26. Finding correspondences between shapes • Each shape is represented by a set of sample points • Each sample point has a descriptor – the shape context • Define cost Wij for matching pointi on first shape with point j on second shape. • Solve for correspondence as optimum assignment.

  27. Shape Context Count the number of points inside each bin, e.g.: Count = 4 ... Count = 10 • Compact representation of distribution of points relative to each point

  28. Comparing Shape Contexts Compute matching costs using Chi Squared Test: Recover correspondences by solving linear assignment problem with costs Cij [Jonker & Volgenant 1987]

  29. MatchingExample model target

  30. Synthetic Test Results Fish - deformation + noise Fish - deformation + outliers ICP Shape Context RPM

  31. Measuring Shape Similarity • Image appearance around matched points • color or gray-level window • orientation • Shape context differences at matched points • Bending Energy

  32. COIL Object Database

  33. Editing: Prototypes • Human Shape Perception • Computational Needs for K-NN

  34. Prototype Selection: Coil-20

  35. MNIST Handwritten Digits

  36. Handwritten Digit Recognition • MNIST 600 000 (distortions): • LeNet 5: 0.8% • SVM: 0.8% • Boosted LeNet 4: 0.7% • MNIST 60 000: • linear: 12.0% • 40 PCA+ quad: 3.3% • 1000 RBF +linear: 3.6% • K-NN: 5% • K-NN (deskewed): 2.4% • K-NN (tangent dist.): 1.1% • SVM: 1.1% • LeNet 5: 0.95% • MNIST 20 000: • K-NN, Shape Context matching: 0.63%

  37. Hand-written Digit Recognition • MNIST 600 000 (distortions): • LeNet 5: 0.8% • SVM: 0.8% • Boosted LeNet 4: 0.7% • MNIST 20 000 • K-NN, Shape context matching: 0.63 % • MNIST 60 000: • linear: 12.0% • 40 PCA+ quad: 3.3% • 1000 RBF +linear: 3.6% • K-NN: 5% • K-NN (deskewed): 2.4% • K-NN (tangent dist.): 1.1% • SVM: 1.1% • LeNet 5: 0.95%

  38. Results: Digit Recognition 1-NN classifier using:Shape context + 0.3 * bending + 1.6 * image appearance

  39. Results: Digit Recognition (Detail)

  40. Trademark Similarity

  41. Future work.. • Indexing based on color/texture/shape features before correspondence matching • Integrate segmentation and recognition

  42. Computing cost on a Pentium PC • Segmentation: 2 minutes /image (200x100) • Matching : 0.2 sec / match (100 points)

  43. Given a 104 speedup.. • 5K object categories/sec • Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.

More Related