Learning Shared Body Plans
This research explores how to effectively represent and detect multiple related object categories, with a focus on joint learning approaches. Leveraging spatial models and independent detectors for various animal categories, the study emphasizes the benefits of using shared representations to enhance the detection and localization of familiar and unfamiliar objects. By employing advanced optimization techniques and mixed supervision, our results reveal significant improvements in detecting parts (head, legs) and the generalization capability for new categories. Joint training ensures robust performance across diverse object structures.
Learning Shared Body Plans
E N D
Presentation Transcript
Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, VivekSrikumar and Ming-Wei Chang
How should we represent multiple related object categories? Want to detect, localize, and estimate pose of broad range of objects, including new ones
One option: independent detectors Basic-Level Categories Broad Categories Parts … Cat Detector Dog Detector Head Detector 4-Legged Animal Detector
Our previous work: Train separate detectors, Joint spatial model Wheel Vehicle Animal Four-legged Mammal Head Leg Can run Can Jump Facing right Moves on road Facing right Farhadi Endres Hoiem (2010)
Jointly trained multi-category models • Train part/category detectors to jointly predict object structure • Only need to perform well in context defined by others • Spatial model encodes likely part positions, number of parts, likely categories, etc. • Generalizes Felzenszwalb et al.: cross-category sharing, multiple parts with one model, variable size
Deformable Part Models From Felzenszwalb et al.
Detection with Deformable Part Models From Felzenszwalb et al.
Shared mixture of deformable parts: Body Plans Include a body plan for background patches: No appearance models, just a bias
Body Plan Overview High Scoring Detections + + Object Center + Head Anchors
Anchor Point Score HOG based Deformable part model (Felzenszwalb et al.) Quadratic penalty in position and scale Sa = bias + appearance score - deformation cost Sa = bias + appearance score - deformation cost Overall score must be greater than 0 to be detected
Inference: Head + + ✓ +
Inference: Leg + + + + +
Inference: Leg + + + ✓ + + Search Constraints: Count Pairwise Exclusion
Inference: Leg + + + ✓ + +
Inference: Leg + + + ✓ + + ✓
Inference: Leg + + + ✓ + + ✓
Inference: Leg + + + ✓ + ✓ + ✓
Inference: Leg + + + ✓ + ✓ + ✓
Inference: Leg ✓ + + + ✓ + ✓ + ✓
Inference Score for each body plan: Overall score for an object hypothesis:
Benefits of Joint Learning Only consider structures with:
Benefits of Joint Learning No structures have
(Latent) Max Margin Structured Learning Loss Highest Scoring Valid Structure Invalid Structure Soft margin slack
Valid Structures Positive Examples Negative Examples Head Four-legged Elk Must select BG body plan LEG LEG LEG LEG Object Detectors: 50% Overlap with ground truth Part Detectors: 25% Overlap with ground truth
Loss Positive Examples Negative Examples Head Four-legged Elk Non-BG body plan: +1 False Positives: +1 LEG LEG Head LEG LEG False Positives: +1 Duplicate Detections: +1 Missed Detections: + 1
Optimization • Latent Structured SVM • Non-convex - CCCP • Stochastic gradient descent based cutting plane optimization
Optimization Challenges • Expensive search for violated constraints • Mine many violated constraints at once • Speeds convergence • Large feature vectors (100k+) • Can’t store every mined violated constraint • Requires careful caching
Experimental Setup • CORE: Train + Test • Familiar Categories: Camel, Dog, Elephant, Elk • Parts: Head, Leg, Torso • Unfamiliar Categories: Cat, Cow • Pascal 2008: Test • Unfamiliar Categories: Cat, Cow, Horse, Sheep
Familiar Objects Unfamiliar Objects
Mixed Supervision Four-legged Dog Head L E G L E G LEG LEG Learning Four-legged Dog Head L E G LEG L E G L E G
Mixed Supervision Four-legged Dog Four-legged Dog Head + L E G L E G LEG LEG Learning Four-legged Dog Head L E G LEG L E G L E G
Mixed Supervision - Learning • Unlabeled boxes become latent variables • Compute most likely positition • No loss for missed detections Loss Highest Scoring Valid Structure
Conclusions • Jointly representing related categories leads to better performance and generalization to unfamiliar categories • Joint training important to get full benefit of spatial model