1 / 71

Stereo Person Tracking with Adaptive Plan-View Statistical Templates

This paper presents a method for detecting and tracking people in stereo images using adaptive plan-view statistical templates. The method provides accurate physical locations in real units and is suitable for use in arbitrary environments. The paper also discusses the advantages of using plan-view images over real overhead camera views.

vella
Télécharger la présentation

Stereo Person Tracking with Adaptive Plan-View Statistical Templates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stereo Person Tracking with Adaptive Plan-View Statistical Templates Michael Harville HP Laboratories Palo Alto, CA, United States

  2. Person Detection and Tracking: Motivation • Fundamental technology enabling many apps in pervasive computing and intelligent environments • Automatic personal diary / memory aid • Computer/phone/speakers/lights moving with person • HCI/PUI • Usually, need to find person before analyzing face, gestures, etc. • Activity-monitoring and surveillance • Security • Shopper behavior in retail stores • Video coding, indexing, compression • Special treatment for the people in the scene

  3. Why Vision? • No special equipment, clothing, or behaviors required of user • People are passive participants, not active drivers. No special effort needed. • Works on everyone, not just the “special” ones • Video is a rich (the richest?) source of information for tasks beyond person tracking • Provides information not just for detection and tracking, but also for identification, activity analysis, mood, etc. • How many active sensors can the world stand?

  4. Goals of Method • Detect people and track their locations in space • Provide physical locations in real units (e.g. meters) • Handle multiple people, complex behavior • Arbitrary environments • Compact tracking unit, easy setup • Real-time

  5. Key Contributions • New substrate of image statistics on which to do tracking • Transformations and refinement of raw, dense “camera-view” depth images to “plan-view” • Suitable for use with many different tracking techniques • Tracking framework based on adaptive templates • Better use of plan-view features • Can be used with other plan-view image substrates • Methods for avoiding typical adaptive template problems

  6. Outline • Introduction and Motivation for Plan-View Maps • Plan-View Map Construction • Tracking Method • Implementation and Results

  7. Outline • Introduction and Motivation for Plan-View Maps • Plan-View Map Construction • Tracking Method • Implementation and Results

  8. Input: Color and Depth from Stereo Unit Spatially- and temporally-registered color+depth

  9. Real-time Stereo Becoming Practical • Tyzx ( www.tyzx.com; from Interval ) • ASIC costs <$5 in volume, uses little power • Point Grey Digiclops ( www.ptgrey.com ) • SRI Small Vision System ( www.videredesign.com ) • 3DV Systems Zcam ( www.3dvsystems.com ) • Canesta ( www.canesta.com ) • Sarnoff Acadia vision processor

  10. Tracking in “Camera View” with Depth • Depth helpful in many ways: • Powerful cue for foreground segmentation • Gives physical size and shape information • Allows for better occlusion detection and handling • Provides new types of features for tracking • Provides third dimension of prediction in tracking • Several recent papers have illustrated this: • Eveland & Konolige (1997): depth only, single person • Darrell et. al. (1998): color+depth, multi-person • Haritaoglu et. al. (1998): W4S • Beymer & Konolige (1999); Krumm et. al. (2000): multi-camera

  11. Problem: Depth Images Very Noisy! • Unreliable depth in areas of low visual texture • Poor depth contour accuracy • For static scene: std. dev. of depth at a pixel typically 10% of mean or more

  12. A Solution: Use Depth to Render New Views Depth image coordinate and value (u,v,D) Camera calibration params 3D scene location (X, Y, Z) • Construct 3D point cloud of “interesting” part of image (e.g. foreground, people). • Render images of statistics of this point cloud, from new view points and with arbitrary projection models.

  13. “Plan-View” Statistical Images Virtual overhead view, with orthographic projection Easier, more reliable separation of people

  14. “Plan-View” Statistical Images color Stereo camera depth

  15. “Plan-View” Statistical Images color bg model Stereo camera foreground depth

  16. “Plan-View” Statistical Images Use depth + camera calibration to do 3D back-projection color bg model Stereo camera foreground depth 3D point cloud

  17. “Plan-View” Statistical Images Quantize space into 3D vertical bins Use depth + camera calibration to do 3D back-projection color bg model Stereo camera foreground depth 3D point cloud

  18. “Plan-View” Statistical Images Quantize space into 3D vertical bins Use depth + camera calibration to do 3D back-projection color bg model Stereo camera Plan-view projection: image of one statistic per vertical bin foreground depth 3D point cloud

  19. Why Not Just Use a Real Overhead Camera? • Sometimes, there is no “ceiling”! • For example, outdoors • Cannot see faces easily • desirable in many applications that employ person tracking • Also…

  20. Advantages Over a Real Overhead Camera • Real camera perspective projection • Along image periphery (most of image), projection axis far from parallel to ground normal; much inter-person occlusion

  21. Advantages Over a Real Overhead Camera • Real camera perspective projection • Along image periphery (most of image), projection axis far from parallel to ground normal; much inter-person occlusion Orthographic projection better

  22. Advantages Over a Real Overhead Camera • Overhead camera typically sacrifices on ground coverage (particularly when ceiling is low)

  23. Advantages Over a Real Overhead Camera • Overhead camera typically sacrifices on ground coverage (particularly when ceiling is low)

  24. Outline • Introduction and Motivation for Plan-View Maps • Plan-View Map Construction • Tracking Method • Implementation and Results

  25. What (Vertical Bin) Statistic to Image? Count of 3D points in each bin plan-view “occupancy” or “density” maps

  26. Scaling Occupancy to Get Surface Area • Scale increments to occupancy map by Z2/fu fv • Occupancy map now represents object surface area visible to camera, in real units (e.g. in cm2) • Occupancy map representations now invariant to distance from camera (except for noise) f Camera center of projection Imager pixel Z Area subtended by pixel in real world at a distance Z from camera

  27. Plan-View Occupancy Maps • Applied to person tracking by • Interval researchers (1999) - unpublished • Beymer (2000) • Darrell et. al. (2001) • Advantages • Good indicator of where people are likely to be • Disadvantages • Discards shape information in dimension normal to ground • Occupancy statistical representations of people are very sensitive to partial occlusions

  28. An Alternative Statistic: Maximum Height Z-coordinate (height above ground) of highest point in each bin plan-view height maps

  29. Height Map Computation Notes • Can be done in a single pass through depth image data • Ignore data at heights above some Hmax that is reasonable for people • Scene ground need not be planar • Add in height offset map Ho constructed from background model depth

  30. Plan-View Height Maps • Not previously applied to person-tracking • But used in other contexts: path-planning for Mars rover, military target recognition • Advantages • Preserves about as much 3D shape as possible in a 2D image • Fast computation (e.g. compared to 90th percentile height) • For high camera mounts and typical environments, height map statistical representations of people are less affected by partial occlusions. • Disadvantages • Very sensitive to depth noise • Easy to confuse person upper body with small foreground objects placed at the same height

  31. Example Height Map Data

  32. Can We Combine Them, Get Best of Both? Idea: Restrict use of height data to map locations where we believe something “significant” is present, as indicated by the local occupancy data. + ?

  33. Plan-View Map Refinement smooth threshold Oraw Osm Othresh mask smooth Hraw Hsm Hmasked

  34. Height Map: Before and After Raw height map Masked, smoothed height map

  35. Example Plan-View Map Data Oraw Hraw Othresh Hmasked

  36. Statistical Substrate for Tracking smooth threshold Oraw Osm Othresh mask smooth Hraw Hsm Hmasked

  37. Statistical Substrate for Tracking smooth threshold threshold Oraw Osm Othresh mask smooth Hraw Hsm Hmasked

  38. Statistical Substrate for Tracking smooth threshold threshold Oraw Osm Othresh mask smooth Hraw Hsm Hmasked

  39. Statistical Substrate for Tracking Object surface area (visible to camera) smooth threshold Oraw Osm Othresh mask smooth Object shape, as viewed from above Hraw Hsm Hmasked

  40. Outline • Introduction and Motivation for Plan-View Maps • Plan-View Map Construction • Tracking Method • Implementation and Results

  41. How to Track People in this Feature Space? Two important choices to make: • Person model • Tracking method

  42. How to Track People in this Feature Space? Two important choices to make: • Person model • Tracking method

  43. Options for a Person Model • “Blob” / connected component • Not very descriptive; pray for good person separation and/or an excellent tracking framework. • Good ol’ Gaussian • Tried and true, lots of techniques and algorithms based on it from which to draw ideas • But not a very good use of our feature data • Fixed template(s) • For instance, use common shape(s) of head+shoulders in a height map • People of shapes or in poses inconsistent with template(s) will not be tracked well

  44. Our Person Model: Adaptive Templates • Use patches of the plan-view statistical image data itself as the model TH (height template)

  45. Our Person Model: Adaptive Templates • Use patches of the plan-view statistical image data itself as the model TH (height template) TO (occupancy template)

  46. Adaptive Templates (continued) • Allow model to evolve as person changes pose or becomes (dis)occluded use image data • Still need initialization criterion to decide that a patch of plan-view image data is a person • Currently: • significant occupancy (at least half a person’s worth) • max height above some reasonable minimum for people • not a completely static object (according to inter-frame diffs) • Future: Compare plan-view data to “person-like” templates learned from training

  47. How to Track People in this Feature Space? Two important choices to make: • Person model • Tracking method

  48. Tracking Method: Simple Kalman Filter-Based Approach Prediction Measurement • Constant velocity • No template change • Find image location that minimizes match energy • Measurements are data from match location State Fast frame-rate desirable! Update Do loop for each person individually on each frame, in order of tracking confidence (equal to inverse of Kalman variance in location estimate) • Standard Kalman update for position, velocity • Update templates directly from image data (faster)

  49. Match Energy Minimization • Match energy: for ith person • Search in restricted image area • centered around predicted location • size determined by positional uncertainty Surface area difference Shape difference Distance from predicted location Do not match multiple people to the same place

  50. “Lost” People • Set a maximum on tracking match energy • If maximum exceeded, report Kalman prediction as person location • Put person on “lost people” list • Only use prediction in absence of data for limited time • Attempt to match “new” and “lost” people • For now, just check temporal and spatial nearness • Future: compare shape and color features

More Related