Sensor, Motion & Temporal Planning

Sensor, Motion & Temporal Planning PhD Defense for Ser-Nam Lim Department of Computer Science University of Maryland, College Park

Outline • Two-camera background subtraction: • Invariant to shadows, lighting changes. • Multi-camera background subtraction and tracking: • Occlusions. • Active camera: • Predictive tracking. • Motion, temporal planning. • Camera scheduling. • Abandoned package detection: • Severe occlusions. • Temporal analysis in a statistical framework to minimize reliance on thresholding.

1. Two-camera Background Subtraction Details given during proposal. "Fast Illumination-invariant Background Subtraction using Two Views: Error Analysis, Sensor Placement and Applications", IEEE CVPR 2005.

Problem Description • Single-camera background subtraction: • Shadows, • Illumination changes, and • Specularities. • Disparity-based background subtraction: • Can overcome many of these problems, BUT • Slow and • Inaccurate online matches.

Two-Camera Algorithm • Real time, two-camera background subtraction: • Develop a fast two camera background subtraction algorithm that doesn’t require solving the correspondence problem online. • Analyze advantages of various camera configurations with respect to robustness of background subtraction.

Fast Illumination-invariant Two-cameras Approach • Clever idea due to Ivanov et. al. • Yuri A. Ivanov, Aaron F. Bobick and John Liu, “Fast Lighting Independent Background Subtraction”, IEEE Workshop on Visual Surveillance, ICCV'98, Bombay, India, January 1998. • Intuition: • Established background conjugate pixels offline. • Color differences between conjugate pixels. • What are the problems? • False and missed detections caused by homogeneous objects.

Advantages • FAST!! No online stereo matching. • Invariant to shadows, lighting changes. • Invariant to specularities: • Through a height-inferring process. • Detect near-background object: • Difficult problem with disparity-based background subtraction. • Accurate: • Offline stereo matching can be computational intensive. • Human intervention can be used.

Experiments – Lighting Changes

2. Multi-camera Detection and Tracking Under Occlusions Preparing for submission.

Problem • Severe occlusions make detection and tracking difficult. • We often need to observe highly occluded places!! • Partial and full occlusions.

Algorithm Outline • Silhouette detection on a per-camera basis. • Count people in a top view. • Constrained stereo. • Sensor fusion – particle filter.

Silhouette Detection –background subtraction

People Counting • Project the foreground silhouettes onto a common ground plane – do it for every available camera. • Intersect projections of different cameras. • Obtains a set of polygons, that possibly contain valid objects. • Number of polygons is a rough estimate of the number of people in the scene.

Phantom polygon Camera 1 Camera 2

Constrained Stereo Vertical line Correct vertical line Wrong vertical line Foreground pixel Good color matching Bad color matching Phantom polygon. Epipolar line Mapped candidate ground plane pixels Candidate ground plane pixels Camera 1 view Camera 2 view

Note that only the visible foreground pixels are successfully segmented based on selective stereo with one pair. Partial and full occlusions need to be dealt with multiple camera fusion. How??

Additional Consideration – Sensor Fusion Choosing the best stereo pairs for performing stereo matching – guided by particle filter.

Final Results

3. Active Camera Submitted to ACM Multimedia System Journal. Submitted to ACM Multimedia 2006. “Constructing Task Visibility Intervals for Surveillance Systems”, VSSN Workshop, ACM Multimedia 2005. “A Scalable Image-based Multi-camera Visual Surveillance System”, AVSS 2003.

Problem Description • Given: • Collection of calibrated PTZ cameras and • Large surveillance site. • How to control cameras to acquire surveillance videos? • Why collect surveillance videos? • Collect k secs of unobstructed video from as close to a side angle as possible for gait recognition. • Collect unobstructed video of person near any ROI.

Project Goals - Visual Requirements • Targets have to be unobstructed in the collected videos during useful video collections. • Involves predicting object trajectories in the field of regard based on tracking. • Targets have to be in the field of view in the collected videos. • Constrains PT parameters for cameras as a function of time during periods of visibility. • Targets have to satisfy some task-specific minimum resolutionsin the collected videos. • Constrains Z parameter.

Project Goals - Performance Requirements • Scheduling cameras to maximize task coverage. • Determine future time intervals within which visual requirements of tasks are satisfied: • We first do this for each camera, task pair. • We then combine these solutions across tasks and then cameras to schedule tasks.

System Timeline For every (camera, task, object) tuple: • Detection and tracking – using existing methods. • Predict future locations of objects. • Visibility analysis, to predict period during which objects are visible – visibility intervals. • Determine allowable camera settings over time, within these visibility intervals to form Task Visibility Intervals (TVI’s). • Composite TVI’s to form Multiple Task Visibility Intervals (MTVI’s) - scalability. • Scheduling – scalability.

Predicting Future Location • Represent object as sphere. • For computational efficiency, each sphere represented as triplet of circular shadows on the projection planes for visibility analysis: • Extrapolate the motion of each shadow for predicting their future locations. • Each shadow move in a straight line in the predicted path, and its radius is grows linearly to capture the positional uncertainty.

Predictive Tracking Experiments

Visibility Analysis • With the predicted locations, we can represent the extremal angle trajectories over time of each shadow in closed-form: • Extremal angles are the angles subtended by the pair of tangent points. Straight line trajectory Shadow’s radius increases over time Extremal angle of one tangent point Camera center

The extremal angle trajectories of two different objects, are equated to find time intervals (intersections) when occlusion occurs – occlusion intervals: • Complements of occlusion intervals are the visibility intervals. • Can do this for every object pair. But can be more efficient using an optimal segment intersection algorithm (details given in dissertation).

Efficient Segment Intersection vs Brute Force

Task Visibility Intervals (TVI’s) • Combine allowable camera settings over time with visibility intervals to form TVI’s. • Allowable camera settings are determined at each future time step in the visibility interval: • Iterates through range of pan, tilt and zoom settings, and determine time intervals during which PTZ ranges exist that satisfy task-specific resolution. • For efficiency, use a piecewise approximation to the PTZ range. • These TVI’s must also satisfy the required length of collected video.

Multiple Task Visibility Intervals (MTVI’s) • TVI’s can be combined if: • Common time intervals exist that are at least as long as the maximum required processing times among all the tasks involved. • Common camera settings exist in these common time intervals. • For efficiency, TVI’s can be combined with a plane-sweep algorithm.

Zoom

Camera Scheduling • Scheduling based on the constructed (M)TVI’s. • Two methods are compared: • Branch and bound. • Greedy.

Define slack  as:  = [t-, t+] = [r, d – p], where d is the deadline, r is the earliest release time and p is the processing time (duration of task). • Let || be t+ - t-. • It can be shown that if |max| < pmin, then in any feasible schedule, the (M)TVI’s must be ordered by r.

Each camera can then be modeled with a acyclic graph with source and sink, with the nodes being the (M)TVI’s and the edge being the number of tasks covered on moving from one node to another. • The sink of the graph of one camera is linked to the source of the graph of another camera – cascading.

Example 1 4 0 0 2 2 0 0 s1 2 t1 s2 2 t2 2 5 2 0 2 0 3 6 0 0 2 7 0 0 0 s3 2 t3 t 8 2 0 9

Dynamic Programming (DP) is run on the multi-camera graph: • Equivalent to greedy algorithm, BUT • Branch & Bound look at what are the tasks other cameras in the graph can potentially covered while running DP backtracking.

Approximation Factors – Branch & Bound vs Greedy • Given k cameras, the approximation factor for multi-camera scheduling using the greedy algorithm is 2 + k, where  and  are variables representing the distribution of tasks among the cameras. • Proof in dissertation. • Important – depends on the number of cameras, i.e., does not scale well to large camera networks!!

For k cameras, the approximation factor of the branch and bound algorithm is: • Proof in dissertation -  and u are task distribution factors. • Important – insensitive to number of cameras!!

Performance – Simulations

Experiments – Face Capture

Experiments – Full Body Video

Experiments – Lower Resolution

Experiments – Higher Resolution

4. Abandoned Package Detection under Severe Occlusions A short overview. Refer to dissertation for details. Preparing for submission.

Constraints • No background frame available. • Constant foreground motion. • Constant occlusion. • Single camera.

Algorithm • PDF for motion detection, Pd: • Observe successive frame differences. • Assume pdf is zero-mean – extract the zero-centered mode. • PDF for background model, Pb: • Histogram frequency computed based on joint probability with Pd. • Intuition – true background pixels should observe no motion.

PDF of static pixels that are foreground, conditioned on ~Pb and Pd: • Intuition – pixels belonging to abandoned packages are static foreground pixels. • MRF to label these pixels. Avoid thresholding. • Evaluate the clusters based on temporal persistency of shape (Hausdorff) and intensities.

Experiments

Sensor, Motion & Temporal Planning