Context-based vision system for place and object recognition
E N D
Presentation Transcript
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed from Kevin Murphy
4x4x24 =384 dim 80 dim Downsample to 4x4 24 filtered Images
Visualizing the filter bank output Images 80-dimensional representation
Hidden Markov Model • Hidden states = location (63 values) • Observations = vGt∈ R80 • Transition model encodes topology of environment • Observation model is a mixture of Gaussians (100 views per place)
Mixture of Gaussians MLE (counting) Hidden Markov Model Observation Likelihood Prediction Prior Transition Matrix
Scene Categorization • 17 Categories (Office, Corridor, Street, etc) • Train a separate HMM on category labels
Performance on known env. Ground truth System estimate Specific location Location category Indoor/outdoor
Comparison of features Categorization Recognition
Effect of HMM on recognition Without With (But with temporal smoothing)
Object priming • Predict object properties based oncontext (top-down signals): • Visual gist, vtG • Specific Location, Qt • Kind of location, Ct
MLE Mixture of Gaussians Object Priming Estimate of current place (Output of HMM) Probability of object i in image vi given entire video sequence Probability of object i Given current observation & place Prior probability of object i being in place q Observation Likelihood Probability of object i Again…
Predicting object position and scale Probability of an object i being present and location being q (Output of previous system) Estimate of mask Estimate of mask given current gist, place, and object delta Gaussian
Conclusion • Real world problem (and it works!) • Uses only global feature (context) • How much did {HMM / place prior} affect{place recognition / object detection}?Can we really say “context” did the job?