1 / 49

Learning Local Affine Representations for Texture and Object Recognition

Learning Local Affine Representations for Texture and Object Recognition. Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce). Overview. Goal: Recognition of 3D textured surfaces, object classes Our contribution:

denver
Télécharger la présentation

Learning Local Affine Representations for Texture and Object Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Local Affine Representations for Texture and Object Recognition Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce)

  2. Overview • Goal: • Recognition of 3D textured surfaces, object classes • Our contribution: • Texture and object representations based on local affine regions • Advantages of proposed approach: • Distinctive, repeatable primitives • Robustness to clutter and occlusion • Ability to approximate 3D geometric transformations

  3. The Scope • Recognition of single-texture images (CVPR 2003) • Recognition of individual texture regions in multi-texture images (ICCV 2003) • Recognition of object classes (BMVC 2004, work in progress)

  4. 1. Recognition of Single-Texture Images

  5. Affine Region Detectors Harris detector (H) Laplacian detector (L) Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)

  6. Affine Rectification Process Patch 1 Patch 2 Rectified patches (rotational ambiguity)

  7. Rotation-Invariant Descriptors 1: Spin Images • Based on range spin images (Johnson & Hebert 1998) • Two-dimensional histogram: distance from center × intensity value

  8. Rotation-Invariant Descriptors 2: RIFT • Based on SIFT (Lowe 1999) • Two-dimensional histogram: distance from center × gradient orientation • Gradient orientation is measured w.r.t. to the direction pointing from the center of the patch

  9. Signatures and EMD • SignaturesS = {(m1, w1), … , (mk, wk)}mi — cluster centerwi — relative weight • Earth Mover’s Distance (Rubner et al. 1998) • Computed from ground distances d(mi, m'j) • Can compare signatures of different sizes • Insensitive to the number of clusters

  10. Database: Textured Surfaces 25 textures, 40 sample images each (640x480)

  11. Evaluation • Channels: HS, HR, LS, LR • Combined through addition of EMD matrices • Classification results • 10 training images per class, rates averaged over 200 random training subsets

  12. Comparative Evaluation

  13. (H+L)(S+R) VZ-Joint VZ-MRF Results of Evaluation:Classification rate vs. number of training samples • Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set

  14. Summary • A sparse texture representation based on local affine regions • Two novel descriptors (spin images, RIFT) • Successful recognition in the presence of viewpoint changes, non-rigidity, non-homogeneity • A flexible approach to invariance

  15. 2. Recognition of Individual Regions in Multi-Texture Images • A two-layer architecture: • Local appearance + neighborhood relations • Learning: • Represent the local appearance of each texture class using a mixture-of-Gaussians model • Compute co-occurrence statistics of sub-class labels over affinely adapted neighborhoods • Recognition: • Obtain initial class membership probabilities from the generative model • Use relaxation to refine these probabilities

  16. Two Learning Scenarios • Fully supervised: every region in the training image is labeled with its texture class • Weakly supervised: each training image is labeled with the classes occurring in it brick brick, marble, carpet

  17. Neighborhood Statistics • Estimate: • probability p(c,c') • correlation r(c,c') Neighborhood definition

  18. Relaxation (Rosenfeld et al. 1976) • Iterative process: • Initialized with posterior probabilities p(c|xi) obtained from the generative model • For each region i and each sub-class label c, update the probability pi(c) based on neighbor probabilities pj(c') and correlations r(c,c') • Shortcomings: • No formal guarantee of convergence • After the initialization, the updates to the probability values do not depend on the image data

  19. Experiment 1: 3D Textured Surfaces Single-texture images T1 (brick) T2 (carpet) T3 (chair) T4 (floor 1) T5 (floor 2) T6 (marble) T7 (wood) Multi-texture images 10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images

  20. Effect of Relaxation on Labeling Original image Top: before relaxation, bottom: after relaxation

  21. Retrieval (single-texture training images) T1 (brick) T2 (carpet) T3 (chair) T4 (floor 1) T5 (floor 2) T6 (marble) T7 (wood)

  22. Successful Segmentation Examples

  23. Unsuccessful Segmentation Examples

  24. Experiment 2: Animals • No manual segmentation • Training data: 10 sample images per class • Test data: 20 samples per class + 20 negative images cheetah, background zebra, background giraffe, background

  25. Cheetah Results

  26. Zebra Results

  27. Giraffe Results

  28. Summary Future Work • A two-level representation (local appearance + neighborhood relations) • Weakly supervised learning of texture models • Design an improved representation using a random field framework, e.g., conditional random fields (Lafferty 2001, Kumar & Hebert 2003) • Develop a procedure for weakly supervised learning of random field parameters • Apply method to recognition of natural texture categories

  29. 3. Recognition of Object Classes The approach: • Represent objects using multiple composite semi-local affine parts • More expressive than individual regions • Not globally rigid • Correspondence search is key to learning and detection

  30. Correspondence Search • Basic operation: a two-image matching procedure for finding collections of affine regions that can be mapped onto each other using a single affine transformation • Implementation: greedy search based on geometric and photometric consistency constraints • Returns multiple correspondence hypotheses • Automatically determines number of regions in correspondence • Works on unsegmented, cluttered images (weakly supervised learning) A

  31. Matching: 3D Objects

  32. Matching: 3D Objects closeup closeup

  33. Matching: Faces spurious match ???

  34. Finding Symmetries

  35. Finding Repeated Patterns and Symmetries

  36. Learning Object Models for Recognition • Match multiple pairs of training images to produce a set of candidate parts • Use additional validation images to evaluate repeatability of parts and individual regions • Retain a fixed number of parts having the best repeatability score

  37. Recognition Experiment: Butterflies Admiral Swallowtail Machaon Monarch 1 Monarch 2 Peacock Zebra • 16 training images (8 pairs) per class • 10 validation images per class • 437 test images • 619 images total

  38. Butterfly Parts

  39. Recognition • Top 10 parts per class used for recognition • Relative repeatability score: • Classification results: total number of regions detectedtotal part size Total part size (smallest/largest)

  40. Classification Rate vs. Number of Parts

  41. Detection Results (ROC Curves) Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)

  42. Successful Detection Examples Training images Test images (blue: occluded regions) All ellipses found in the test images

  43. Unsuccessful Detection Examples Training images Test images (blue: occluded regions) All ellipses found in the test image

  44. Summary Summary • Semi-local affine parts for describing structure of 3D objects • Finding a part vocabulary: • Correspondence search between pairs of images • Validation • Additional application: • Finding symmetry and repetition Future Work • Find a better affine region detector • Represent, learn inter-part relations • Evaluation: CalTech database, harder classes, etc.

  45. Birds Egret Puffin Snowy Owl Mandarin Duck Wood Duck

  46. Birds: Candidate Parts Mandarin Duck Puffin

  47. Objects without Characteristic Texture (LeCun’04)

  48. Summary of Talk • Recognition of single-texture images • Distribution of local appearance descriptors • Recognition of individual regions in multi-texture images • Local appearance + loose statistical neighborhood relations • Recognition of object categories • Local appearance + strong geometric relations For more information: http://www-cvr.ai.uiuc.edu/ponce_grp

  49. Issues, Extensions • Weakly supervised learning • Evaluation methods? • Learning from contaminated data? • Probabilistic vs. geometric approaches to invariance • EM vs. direct correspondence search • Training set size • Background modeling • Strengthening the representation • Heterogeneous local features • Automatic feature selection • Inter-part relations

More Related