Computer Vision

Computer Vision – Lecture 18 Motion and Optical Flow 24.01.2012 Bastian Leibe RWTH Aachen http://www.mmp.rwth-aachen.de leibe@umic.rwth-aachen.de Many slides adapted from K. Grauman, S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA

Announcements (2) • Exercise sheet 7 out! • Optical flow • Tracking with a Kalman filter • Application: Tracking billiard balls • Exercise will be on Thu, 02.02. • Submit your results until 01.02. night...

Course Outline • Image Processing Basics • Segmentation & Grouping • Object Recognition • Local Features & Matching • Object Categorization • 3D Reconstruction • Motion and Tracking • Motion and Optical Flow • Tracking with Linear Dynamic Models • Repetition

Xj x1j x3j x2j P1 P3 P2 Recap: Structure from Motion • Given: m images of n fixed 3D points xij = Pi Xj , i = 1, … , m, j = 1, … , n • Problem: estimate m projection matrices Piand n 3D points Xjfrom the mn correspondences xij B. Leibe Slide credit: Svetlana Lazebnik

Recap: Structure from Motion Ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same. • More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change B. Leibe Slide credit: Svetlana Lazebnik

Recap: Hierarchy of 3D Transformations • With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction. • Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean. Projective 15dof Preserves intersection and tangency Preserves parallellism, volume ratios Affine 12dof Similarity 7dof Preserves angles, ratios of length Euclidean 6dof Preserves angles, lengths B. Leibe Slide credit: Svetlana Lazebnik

Recap: Affine Structure from Motion • Let’s create a 2m×n data (measurement) matrix: • The measurement matrix D = MS must have rank 3! Points (3× n) Cameras(2m ×3) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik

Recap: Affine Factorization • Obtaining a factorization from SVD: This decomposition minimizes|D-MS|2 Slide credit: Martial Hebert

Projective Structure from Motion • Given: m images of n fixed 3D points • zijxij = Pi Xj , i = 1,… , m, j = 1, … , n • Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij • With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X → QX, P → PQ-1 • We can solve for structure and motion when 2mn >= 11m +3n – 15 • For two cameras, at least 7 points are needed. B. Leibe Slide credit: Svetlana Lazebnik

Projective Factorization • If we knew the depths z, we could factorize D to estimate M and S. • If we knew M and S, we could solve for z. • Solution: iterative approach (alternate between above two steps). Points (4× n) Cameras(3m ×4) D = MS has rank 4 B. Leibe Slide credit: Svetlana Lazebnik

Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration Points Cameras B. Leibe Slide credit: Svetlana Lazebnik

Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration • Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation Points Cameras B. Leibe Slide credit: Svetlana Lazebnik

Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration • Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation • Refine structure and motion: bundle adjustment Points Cameras B. Leibe Slide credit: Svetlana Lazebnik

Bundle Adjustment • Non-linear method for refining structure and motion • Minimizing mean-square reprojection error Xj P1Xj x3j x1j P3Xj P2Xj x2j P1 P3 B. Leibe P2 Slide credit: Svetlana Lazebnik

Projective Ambiguity • If we don’t know anything about the camera or the scene, the best we can get with this is a reconstruction up to a projective ambiguity Q. • This can already be useful. • E.g. we can answer questions like “at what point does a line intersect a plane”? • If we want to convert this to a “true” reconstruction, we need a Euclidean upgrade. • Need to put in additional knowledgeabout the camera (calibration) orabout the scene (e.g. from markers). • Several methods available (see F&P Chapter 13.5 or H&Z Chapter 19) B. Leibe Images from Hartley & Zisserman

Practical Considerations (1) • Role of the baseline • Small baseline: large depth error • Large baseline: difficult search problem • Solution • Track features between frames until baseline is sufficient. Small Baseline Large Baseline B. Leibe Slide adapted from Steve Seitz

Practical Considerations (2) • There will still be many outliers • Incorrect feature matches • Moving objects  Apply RANSAC to get robust estimates based on the inlier points. • Estimation quality depends on the point configuration • Points that are close together in the image produce less stablesolutions.  Subdivide image into a grid and tryto extract about the same number offeatures per grid cell. B. Leibe

Some Commercial Software Packages • boujou (http://www.2d3.com/) • PFTrack (http://www.thepixelfarm.co.uk/) • MatchMover (http://www.realviz.com/) • SynthEyes (http://www.ssontech.com/) • Icarus (http://aig.cs.man.ac.uk/research/reveal/icarus/) • Voodoo Camera Tracker (http://www.digilab.uni-hannover.de/) B. Leibe

Applications: Matchmoving • Putting virtual objects into real-world videos • Original sequence • Tracked features • SfM results • Final video B. Leibe Videos from Stefan Hafeneger

Applications: Large-Scale SfM from Flickr S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, R. Szeliski, Building Rome in a Day, ICCV’09, 2009. (Video from http://grail.cs.washington.edu/rome/) B. Leibe

Topics of This Lecture • Introduction to Motion • Applications, uses • Motion Field • Derivation • Optical Flow • Brightness constancy constraint • Aperture problem • Lucas-Kanade flow • Iterative refinement • Global parametric motion • Coarse-to-fine estimation • Motion segmentation • KLT Feature Tracking B. Leibe

Video • A video is a sequence of frames captured over time • Now our image data is a function of space (x, y) and time (t) B. Leibe Slide credit: Svetlana Lazebnik

Applications of Segmentation to Video • Background subtraction • A static camera is observing a scene. • Goal: separate the static background from the moving foreground. How to come up with background frame estimate without access to “empty” scene? B. Leibe Slide credit: Svetlana Lazebnik, Kristen Grauman

Applications of Segmentation to Video • Background subtraction • Shot boundary detection • Commercial video is usually composed of shots or sequences showing the same objects or scene. • Goal: segment video into shots for summarization and browsing (each shot can be represented by a single keyframe in a user interface). • Difference from background subtraction: the camera is not necessarily stationary. B. Leibe Slide credit: Svetlana Lazebnik

Applications of Segmentation to Video • Background subtraction • Shot boundary detection • For each frame, compute the distance between the current frame and the previous one: • Pixel-by-pixel differences • Differences of color histograms • Block comparison • If the distance is greater than some threshold, classify the frame as a shot boundary. B. Leibe Slide credit: Svetlana Lazebnik

Applications of Segmentation to Video • Background subtraction • Shot boundary detection • Motion segmentation • Segment the video into multiple coherently moving objects B. Leibe Slide credit: Svetlana Lazebnik

Motion and Perceptual Organization • Sometimes, motion is the only cue… B. Leibe Slide credit: Svetlana Lazebnik

Motion and Perceptual Organization • Sometimes, motion is foremost cue B. Leibe Slide credit: Kristen Grauman

Motion and Perceptual Organization • Even “impoverished” motion data can evoke a strong percept B. Leibe Slide credit: Svetlana Lazebnik

Uses of Motion • Estimating 3D structure • Directly from optic flow • Indirectly to create correspondences for SfM • Segmenting objects based on motion cues • Learning dynamical models • Recognizing events and activities • Improving video quality (motion stabilization) B. Leibe Slide adapted from Svetlana Lazebnik

Motion Estimation Techniques • Direct methods • Directly recover image motion at each pixel from spatio-temporal image brightness variations • Dense motion fields, but sensitive to appearance variations • Suitable for video and when image motion is small • Feature-based methods • Extract visual features (corners, textured areas) and track them over multiple frames • Sparse motion fields, but more robust tracking • Suitable when image motion is large (10s of pixels) B. Leibe Slide credit: Steve Seitz

Motion Field • The motion field is the projection of the 3D scene motion into the image B. Leibe Slide credit: Svetlana Lazebnik

Motion Field and Parallax P(t+dt) • P(t) is a moving 3D point • Velocity of scene point: V = dP/dt • p(t) = (x(t),y(t))is the projection of P in the image. • Apparent velocity v in the image: given by components vx = dx/dtand vy = dy/dt • These components are known as the motion field of the image. V P(t) v p(t+dt) p(t) B. Leibe Slide credit: Svetlana Lazebnik

Quotient rule: D(f/g) = (g f’ – g’ f)/g^2 Motion Field and Parallax P(t+dt) To find image velocity v, differentiate p with respect to t (using quotient rule): • Image motion is a function of both the 3D motion (V) and the depth of the 3D point (Z). V P(t) v p(t+dt) p(t) B. Leibe Slide credit: Svetlana Lazebnik

Motion Field and Parallax • Pure translation: V is constant everywhere B. Leibe Slide credit: Svetlana Lazebnik

Motion Field and Parallax • Pure translation: V is constant everywhere • Vzis nonzero: • Every motion vector points toward (or away from) v0, the vanishing point of the translation direction. B. Leibe Slide credit: Svetlana Lazebnik

Motion Field and Parallax • Pure translation: V is constant everywhere • Vzis nonzero: • Every motion vector points toward (or away from) v0, the vanishing point of the translation direction. • Vz is zero: • Motion is parallel to the image plane, all the motion vectors are parallel. • The length of the motion vectors is inversely proportional to the depth Z. B. Leibe Slide credit: Svetlana Lazebnik

Optical Flow • Definition: optical flow is the apparent motion of brightness patterns in the image. • Ideally, optical flow would be the same as the motion field. • Have to be careful: apparent motion can be caused by lighting changes without any actual motion. • Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere under moving illumination. B. Leibe Slide credit: Svetlana Lazebnik

Apparent Motion  Motion Field B. Leibe Figure from Horn book Slide credit: Kristen Grauman

Estimating Optical Flow • Given two subsequent frames, estimate the apparent motion field u(x,y) and v(x,y) between them. • Key assumptions • Brightness constancy: projection of the same point looks the same in every frame. • Small motion: points do not move very far. • Spatial coherence: points move like their neighbors. I(x,y,t–1) I(x,y,t) B. Leibe Slide credit: Svetlana Lazebnik

The Brightness Constancy Constraint • Brightness Constancy Equation: • Linearizing the right hand side using Taylor expansion: • Hence, I(x,y,t–1) I(x,y,t) Spatial derivatives Temporal derivative B. Leibe Slide credit: Svetlana Lazebnik

The Brightness Constancy Constraint • How many equations and unknowns per pixel? • One equation, two unknowns • Intuitively, what does this constraint mean? • The component of the flow perpendicular to the gradient (i.e., parallel to the edge) is unknown gradient (u,v) • If (u,v) satisfies the equation, so does (u+u’, v+v’) if (u’,v’) (u+u’,v+v’) edge B. Leibe Slide credit: Svetlana Lazebnik

The Aperture Problem Perceived motion B. Leibe Slide credit: Svetlana Lazebnik

The Aperture Problem Actual motion B. Leibe Slide credit: Svetlana Lazebnik

The Barber Pole Illusion http://en.wikipedia.org/wiki/Barberpole_illusion B. Leibe Slide credit: Svetlana Lazebnik

Computer Vision – Lecture 18

Computer Vision – Lecture 18

Presentation Transcript

CAP 5415 Computer Vision Fall 2004

COMP 304 Computer Graphics II

Lecture 9-10:

Intel ® OPEN SOURCE COMPUTER VISION LIBRARY

COMPSCI 210S1C 2014 Computer Systems 1 Introduction

Lecture: Tunnel FET

Computer and Robot Vision I

Computer and Robot Vision I

Advanced Computer Vision Chapter 8

Foundations of Computer Vision Image Texture Analysis

IT 300: Computer Graphics

Introduction to Computer Science

Foundations of Computer Vision Lecture 13

Image analysis and computer vision

Lecture 25: Multi-view stereo, continued

LECTURE THREE

Introduction to Computer Vision Image Texture Analysis

Introduction to Computer Vision

Stanford CS223B Computer Vision, Winter 2006 Lecture 3 More On Features

Algorithms & Applications in Computer Vision

Training Discriminative Computer Vision Models with Weak Supervision

Computer Vision – Lecture 18

Computer Vision – Lecture 18

Presentation Transcript

CAP 5415 Computer Vision Fall 2004

COMP 304 Computer Graphics II

Lecture 9-10:

Intel ® OPEN SOURCE COMPUTER VISION LIBRARY

COMPSCI 210S1C 2014 Computer Systems 1 Introduction

Computer Vision

Lecture: Tunnel FET

Computer and Robot Vision I

Computer and Robot Vision I

Advanced Computer Vision Chapter 8

Foundations of Computer Vision Image Texture Analysis

IT 300: Computer Graphics

Introduction to Computer Science

Foundations of Computer Vision Lecture 13

Image analysis and computer vision

Lecture 25: Multi-view stereo, continued

LECTURE THREE

Introduction to Computer Vision Image Texture Analysis

Introduction to Computer Vision

Stanford CS223B Computer Vision, Winter 2006 Lecture 3 More On Features

Algorithms &amp; Applications in Computer Vision

Training Discriminative Computer Vision Models with Weak Supervision

Algorithms & Applications in Computer Vision