1 / 38

Learning the Appearance and Motion of People in Video

Learning the Appearance and Motion of People in Video. Hedvig Sidenbladh, KTH hedvig@nada.kth.se, www.nada.kth.se/~hedvig/ Michael Black, Brown University black@cs.brown.edu, www.cs.brown.edu/people/black/. Collaborators.

redford
Télécharger la présentation

Learning the Appearance and Motion of People in Video

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning the Appearance and Motion of People in Video Hedvig Sidenbladh, KTH hedvig@nada.kth.se, www.nada.kth.se/~hedvig/ Michael Black, Brown University black@cs.brown.edu, www.cs.brown.edu/people/black/

  2. Collaborators • David Fleet, Xerox PARC fleet@parc.xerox.com • Dirk Ormoneit, Stanford University ormoneit@stat.stanford.edu • Jan-Olof Eklundh, KTHjoe@nada.kth.se

  3. Goal • Tracking and reconstruction of human motion in 3D • Articulated 3D model • Monocular sequence • Pinhole camera model • Unknown, cluttered environment

  4. Why is it Important? • Human-machine interaction • Robots • Intelligent rooms • Video search • Animation, motion capture • Surveillance

  5. People appear in many ways - how find a model that fits them all? • Structure is unobservable - inference from visible parts Why is it Hard?

  6. Why is it Hard? • People move fast and non-linearly • 3D to 2D projection ambiguities • Large occlusion • Similar appearance of different limbs • Large search space Extreme case

  7. BayesianInference Exploit cues in the images. Learn likelihood models: p(image cue | model) Build models of human form and motion. Learn priors over model parameters: p(model) Represent the posterior distribution: p(model | cue) p(cue | model) p(model)

  8. Human Model • Limbs = truncated cones in 3D • Pose determined by parameters 

  9. State of the Art. Bregler and Malik ‘98 • Brightness constancy cue • Insensitive to appearance • Full-body required multiple cameras • Single hypothesis

  10. Brightness Constancy I(x, t+1) = I(x+u,t) + h Image motion of foreground as a function of the 3D motion of the body. Problem: no fixed model of appearance (drift).

  11. State of the Art. Cham and Rehg ‘99 • Single camera, multiple hypotheses • 2D templates (no drift but view dependent) I(x, t) = I(x+u,0) + h

  12. Multiple Hypotheses • Posterior distribution over model parameters often multi-modal (due to ambiguities) • Represent whole distribution: • sampled representation • each sample is a pose • predict over time using a particle filtering approach

  13. State of the Art. Deutscher, North, Bascle, & Blake ‘00 • Multiple hypotheses • Multiple cameras • Simplified clothing, lighting and background

  14. State of the Art. Sidenbladh, Black, & Fleet ‘00 • Multiple hypotheses • Monocular • Brightness constancy • Activity specific prior • Significant changes in view and depth, template-based methods will fail

  15. Need a constraining likelihood model that is also invariant to variations in human appearance • Need a good model of how people move • Need an effective way to explore the model space (very high dimensional) How to Address the Problems Bayesian formulation: p(model | cue) p(cue | model) p(model)

  16. Edge Detection? • Probabilistic model? • Under/over-segmentation, thresholds, …

  17. Key Idea #1 • Use the 3D model to predict the location of limb boundaries in the scene. • Compute various filter responses steered to the predicted orientation of the limb. • Compute likelihood of filter responses using a statistical model learnedfrom examples.

  18. Key Idea #2 “Explain” the entire image. p(image | foreground, background) Generic, unknown, background Foreground person

  19. Key Idea #2 p(image | foreground, background)  p(foreground part of image | foreground) p(foreground part of image | background) Do not look in parts of the image considered background Foreground part of image

  20. Training Data Points on limbs Points on background

  21. Edge Distributions Edge response steered to model edge: Similar to Konishi et al., CVPR 99

  22. Ridge Distributions Ridge response steered to limb orientation Ridge response only on certain image scales!

  23. Motion Training Data x+u x Motion response = I(x, t+1) - I(x+u,t) Motion response = temporal brightness change given model of motion = noise term in brightness constancy assumption

  24. Motion distributions Different underlying motion models

  25. Likelihood Formulation • Independence assumptions: • Cues: p(image | model) = p(cue1 | model) p(cue2 | model) • Spatial: p(image | model) =  p(image(x) | model) • Scales: p(image | model) =  p(image() | model) • Combines cues and scales! • Simplification, in reality there are dependencies ximage =1,...

  26. Likelihood Foreground pixels Background pixels

  27. Need a constraining likelihood model that is also invariant to variations in human appearance • Need a good model of how people move Step One Discussed… Bayesian formulation: p(model | cue) p(cue | model) p(model)

  28. Models of Human Dynamics • Model of dynamics are used to propagate the sampled distribution in time • Constant velocity model • All DOF in the model parameter space, , independent • Angles are assumed to change with constant speed • Speed and position changes are randomly sampled from normal distribution

  29. Models of Human Dynamics • Action-specific model - Walking • Training data: 3D motion capture data • From training set, learn mean cycle and common modes of deviation (PCA) Mean cycle Small noise Large noise

  30. Need a constraining likelihood model that is also invariant to variations in human appearance • Need a good model of how people move • Need an effective way to explore the model space (very high dimensional) Step Two Also Discussed… Bayesian formulation: p(model | cue) p(cue | model) p(model)

  31. Posterior Temporal dynamics sample sample normalize Posterior Likelihood Particle Filter • Problem: Expensive represententation of posterior! • Approaces to solve problem: • Lower the number of samples. (Deutsher et al., CVPR00) • Represent the space in other ways (Choo and Fleet, ICCV01)

  32. Tracking an Arm 1500 samples ~2 min/frame Moving camera, constant velocity model

  33. Self Occlusion 1500 samples ~2 min/frame Constant velocity model

  34. Walking Person #samples from 15000 to 2500 by using the learned likelihood 2500 samples ~10 min/frame Walking model

  35. Ongoing and Future Work • Learned dynamics • Correlation across scale • Estimate background motion • Statistical models of color and texture • Automatic initialization

  36. Lessons Learned • Probabilistic (Bayesian) framework allows • Integration of information in a principled way • Modeling of priors • Particle filtering allows • Multi-modal distributions • Tracking with ambiguities and non-linear models • Learning image statistics and combining cues improves robustness and reduces computation

  37. Conclusions • Generic, learned, model of appearance • Combines multiple cues • Exploits work on image statistics • Use the 3D model to predict features • Model of foreground and background • Exploits the ratio between foreground and background likelihood • Improves tracking

  38. Other Related Work J. Sullivan, A. Blake, M. Isard, and J.MacCormick. Object localization by Bayesian correlation. ICCV99. J. Sullivan, A. Blake, and J.Rittscher. Statistical foreground modelling for object localisation. ECCV00. J. Rittscher, J. Kato, S. Joga, and A. Blake. A Probabilistic Background Model for Tracking. ECCV00. S. Wachter and H. Nagel. Tracking of persons in monocular image sequences. CVIU, 74(3), 1999.

More Related