330 likes | 453 Vues
This paper presents a comprehensive approach for 3D object recognition and segmentation by integrating object-level and pixel-level reasoning. The framework emphasizes the importance of understanding object properties such as bounding boxes, viewpoint, color, and pose, while also addressing the intricacies of part-level reasoning. It proposes a shared model that accounts for occlusions and contextual information, ensuring robust classification. Preliminary results from the UIUC and PASCAL'06 datasets indicate effective performance with competitive accuracy, demonstrating the potential of the combined reasoning approach.
E N D
3D LayoutCRF Derek Hoiem Carsten Rother John Winn
Goal 1: Object Description • Object Description: • Bounding Box • Viewpoint • Color • Pose • Subclass
Key Idea • Combine object-level and pixel-level reasoning
Recognition Requires Object-Level Reasoning • Position • Shape/Size • Viewpoint/Pose • Style/Color
Solution: Window Detector? • 45 degree range of viewpoints • Minor scale/position variation
Recognition Requires Part-Level Reasoning • Propose good global model
Recognition Requires Part-Level Reasoning • Propose good global model • Occlusions
Context Requires Both Object and Part-Level Info • Size relationships require object model
Context Requires Both Object and Part-Level Info • Surface relationships require occlusion info Not visibly sitting on ground Visibly sitting on ground
Our Object/Part Model … T1 Tm Ti = { bounding box, viewpoint, color model, instance cost } h1 h2 h3 h4 hj object parts h5 h6 h7 h8 part consistency occlusions … … h9 h10 h11 hn x Extension from [Winn Shotton 2006]
Modeling Viewpoint Parameterized by Bounding Box and Corner
L F Training Annotation 3D Parts Model Assigning Parts from Model Training Image Assigned Parts
Relabeling • Allowing slight deformations, relabel training data Training Image Original Labels New Labels
Height Range Eight Viewpoint/Scale Ranges • Appearance (but not location) constant within each range
Modeling Part Appearance • Template patches (normalized xcorr) • Intensity / Color Image Edges (DT)
Modeling Part Appearance • Randomized decision trees • 25 trees, 250 leaf nodes • Once: • Learn structure on 50,000 object / 50,000 background pixels • For each appearance model: • Learn parameters on all pixels (850 LabelMe images)
Inference Input Image
Inference Input Image • Proposals • One per appearance model • Objects proposed by connected components
Proposal Stage Model • CRF Inference (TRW-BP) h1 h2 h3 h4 hi object parts h5 h6 h7 h8 part consistency occlusions … … h9 h10 h11 hn x
Inference Input Image Proposals • Refinement • One per proposal • Incorporate viewpoint, size information
Refinement Stage Model T1 Ti = { bounding box, viewpoint } h1 h2 h3 h4 hi object parts h5 h6 h7 h8 part consistency occlusions … … h9 h10 h11 hn x
Inference Input Image Proposals Refinement • Arbitration • Includes color model, instance penalty (graph cuts)
Preliminary Results on UIUC • Trained on 20, tested on rest • Quantitatively comparable to best
T1 h1 h2 h3 h4 h5 h6 h7 h8 … … h9 h10 h11 hn x Preliminary Results on UIUC Without Instance Cost With Instance Cost
Preliminary Results on PASCAL’06 • 25 images • One proposal (viewpoint within 45 degrees, scale of 26-38 pixels)
Preliminary Results on PASCAL’06 Without Color Model With Color Model
Conclusion • Combined object-level and pixel-level reasoning • Object-level: Position/Size, Viewpoint, Color • Pixel-level: Part appearance, Occlusion reasoning • Good preliminary results