750 likes | 941 Vues
Computer Vision – Lecture 17. Structure-from-Motion 21.01.2009. Bastian Leibe RWTH Aachen http://www.umic.rwth-aachen.de/multimedia leibe@umic.rwth-aachen.de. Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz. TexPoint fonts used in EMF.
E N D
Computer Vision – Lecture 17 Structure-from-Motion 21.01.2009 Bastian Leibe RWTH Aachen http://www.umic.rwth-aachen.de/multimedia leibe@umic.rwth-aachen.de Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA
Course Outline • Image Processing Basics • Segmentation & Grouping • Object Recognition • Local Features & Matching • Object Categorization • 3D Reconstruction • Epipolar Geometry and Stereo Basics • Camera calibration & Uncalibrated Reconstruction • Structure-from-Motion • Motion and Tracking
Recap: A General Point • Equations of the form • How do we solve them? (always!) • Apply SVD • Singular values of A = square roots of the eigenvalues of ATA. • The solution of Ax=0 is the nullspace vector of A. • This corresponds to the smallest singular vector of A. SVD Singular values Singular vectors B. Leibe
Properties of SVD • Frobenius norm • Generalization of the Euclidean norm to matrices • Partial reconstruction property of SVD • Let i i=1,…,N be the singular values of A. • Let Ap = UpDpVpTbe the reconstruction of A when we set p+1,…, Nto zero. • Then Ap = UpDpVpTis the best rank-p approximation of Ain the sense of the Frobenius norm (i.e. the best least-squares approximation). B. Leibe
Recap: Camera Parameters • Intrinsic parameters • Principal point coordinates • Focal length • Pixel magnification factors • Skew (non-rectangular pixels) • Radial distortion • Extrinsic parameters • Rotation R • Translation t(both relative to world coordinate system) • Camera projection matrix General pinhole camera: 9 DoF CCD Camera with square pixels: 10 DoF General camera: 11 DoF s s B. Leibe
Xi xi Recap: Calibrating a Camera Goal • Compute intrinsic and extrinsic parameters using observed camera data. Main idea • Place “calibration object” with known geometry in the scene • Get correspondences • Solve for mapping from scene to image: estimate P=PintPext B. Leibe Slide credit: Kristen Grauman
Recap: Camera Calibration (DLT Algorithm) • P has 11 degrees of freedom. • Two linearly independent equations per independent 2D/3D correspondence. • Solve with SVD (similar to homography estimation) • Solution corresponds to smallest singular vector. • 5 ½ correspondences needed for a minimal solution. • Solve with …? B. Leibe Slide adapted from Svetlana Lazebnik
X? x2 x1 O2 O1 Recap: Triangulation – Linear Algebraic Approach • Two independent equations each in terms of three unknown entries of X. • Stack equations and solve with SVD. • This approach nicely generalizes to multiple cameras. R1 R2 • Stack equations and solve with …? B. Leibe Slide credit: Svetlana Lazebnik
Recap: Epipolar Geometry – Calibrated Case X x x’ t R Camera matrix: [I|0]X = (u, v, w, 1)T x = (u, v, w)T Camera matrix: [RT | –RTt] Vector x’in second coord. system has coordinates Rx’in the first one. The vectors x, t, and Rx’ are coplanar B. Leibe Slide credit: Svetlana Lazebnik
Recap: Epipolar Geometry – Calibrated Case X x x’ Essential Matrix (Longuet-Higgins, 1981) B. Leibe Slide credit: Svetlana Lazebnik
Recap: Epipolar Geometry – Uncalibrated Case X • The calibration matrices K and K’ of the two cameras are unknown • We can write the epipolar constraint in terms of unknown normalized coordinates: x x’ B. Leibe Slide credit: Svetlana Lazebnik
Recap: Epipolar Geometry – Uncalibrated Case X x x’ Fundamental Matrix (Faugeras and Luong, 1992) B. Leibe Slide credit: Svetlana Lazebnik
Minimize: under the constraint|F|2 = 1 Recap: The Eight-Point Algorithm x = (u, v, 1)T, x’ = (u’, v’, 1)T • Problem: poor numerical conditioning B. Leibe Slide credit: Svetlana Lazebnik
Recap: Normalized Eight-Point Algorithm • Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels. • Use the eight-point algorithm to compute F from the normalized points. • Enforce the rank-2 constraint using SVD. • Transform fundamental matrix back to original units: if T and T’ are the normalizing transformations in the two images, than the fundamental matrix in original coordinates is TT F T’. Set d33 to zero and reconstruct F SVD B. Leibe [Hartley, 1995] Slide credit: Svetlana Lazebnik
Recap: Comparison of Estimation Algorithms B. Leibe Slide credit: Svetlana Lazebnik
Recap: Epipolar Transfer • Assume the epipolar geometry is known • Given projections of the same point in two images, how can we compute the projection of that point in a third image? x1 x2 x3 l31 l32 l31 = FT13 x1 l32 = FT23 x2 B. Leibe Slide credit: Svetlana Lazebnik
Recap: Active Stereo with Structured Light • Optical triangulation • Project a single stripe of laser light • Scan it across the surface of the object • This is a very precise version of structured light scanning • Digital Michelangelo Project • http://graphics.stanford.edu/projects/mich/ B. Leibe Slide credit: Steve Seitz
Topics of This Lecture • Structure from Motion (SfM) • Motivation • Ambiguity • Affine SfM • Affine cameras • Affine factorization • Euclidean upgrade • Dealing with missing data • Projective SfM • Two-camera case • Projective factorization • Bundle adjustment • Practical considerations • Applications B. Leibe
Xj x1j x3j x2j P1 P3 P2 Structure from Motion • Given: m images of n fixed 3D points xij = Pi Xj , i = 1, … , m, j = 1, … , n • Problem: estimate m projection matrices Piand n 3D points Xjfrom the mn correspondences xij B. Leibe Slide credit: Svetlana Lazebnik
What Can We Use This For? • E.g. movie special effects • Video B. Leibe Video Credit: Stefan Hafeneger
Structure from Motion Ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same: It is impossible to recover the absolute scale of the scene! B. Leibe Slide credit: Svetlana Lazebnik
Structure from Motion Ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same. • More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change B. Leibe Slide credit: Svetlana Lazebnik
Reconstruction Ambiguity: Similarity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman
Reconstruction Ambiguity: Affine B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman
Reconstruction Ambiguity: Projective B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman
Projective Ambiguity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman
From Projective to Affine B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman
From Affine to Similarity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman
Hierarchy of 3D Transformations • With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction. • Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean. Projective 15dof Preserves intersection and tangency Preserves parallellism, volume ratios Affine 12dof Similarity 7dof Preserves angles, ratios of length Euclidean 6dof Preserves angles, lengths B. Leibe Slide credit: Svetlana Lazebnik
Topics of This Lecture • Structure from Motion (SfM) • Motivation • Ambiguity • Affine SfM • Affine cameras • Affine factorization • Euclidean upgrade • Dealing with missing data • Projective SfM • Two-camera case • Projective factorization • Bundle adjustment • Practical considerations • Applications B. Leibe
Structure from Motion • Let’s start with affine cameras (the math is easier) center atinfinity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman
Orthographic Projection • Special case of perspective projection • Distance from center of projection to image plane is infinite • Projection matrix: Image World B. Leibe Slide credit: Steve Seitz
Affine Cameras Orthographic Projection Parallel Projection B. Leibe Slide credit: Svetlana Lazebnik
x a2 X a1 Affine Cameras • A general affine camera combines the effects of an affine transformation of the 3D space, orthographic projection, and an affine transformation of the image: • Affine projection is a linear mapping + translation in inhomogeneous coordinates Projection ofworld origin B. Leibe Slide credit: Svetlana Lazebnik
Affine Structure from Motion • Given: m images of n fixed 3D points: • xij = Ai Xj + bi , i = 1,… , m, j = 1, … , n • Problem: use the mn correspondences xij to estimate m projection matrices Ai and translation vectors bi, and n points Xj • The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom): • We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity). • Thus, we must have 2mn >= 8m + 3n – 12. • For two views, we need four point correspondences. B. Leibe Slide credit: Svetlana Lazebnik
Affine Structure from Motion • Centering: subtract the centroid of the image points • For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points. • After centering, each normalized point xij is related to the 3D point Xi by B. Leibe Slide credit: Svetlana Lazebnik
Affine Structure from Motion • Let’s create a 2m×n data (measurement) matrix: Cameras(2m) Points (n) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik
Affine Structure from Motion • Let’s create a 2m×n data (measurement) matrix: • The measurement matrix D = MS must have rank 3! Points (3× n) Cameras(2m ×3) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik
Factorizing the Measurement Matrix B. Leibe Slide credit: Martial Hebert
Factorizing the Measurement Matrix • Singular value decomposition of D: Slide credit: Martial Hebert
Factorizing the Measurement Matrix • Singular value decomposition of D: Slide credit: Martial Hebert
Factorizing the Measurement Matrix • Obtaining a factorization from SVD: Slide credit: Martial Hebert
Factorizing the Measurement Matrix • Obtaining a factorization from SVD: This decomposition minimizes|D-MS|2 Slide credit: Martial Hebert
Affine Ambiguity • The decomposition is not unique. We get the same D by using any 3×3 matrix C and applying the transformations M → MC, S →C-1S. • That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axis to be perpendicular, for example). We need a Euclidean upgrade. B. Leibe Slide credit: Martial Hebert
Estimating the Euclidean Upgrade • Orthographic assumption: image axes are perpendicular and scale is 1. • This can be converted into a system of 3m equations: a1 · a2 = 0 x |a1|2 = |a2|2= 1 a2 X a1 B. Leibe Slide adapted from S. Lazebnik, M. Hebert
Estimating the Euclidean Upgrade • This can be converted into a system of 3m equations: • Let • Then this translates to 3m equations in L • Solve for L • Recover C from L by Cholesky decomposition: L = CCT • Update M and S: M = MC, S = C-1S B. Leibe Slide adapted from S. Lazebnik, M. Hebert
Algorithm Summary • Given: m images and n features xij • For each image i, center the feature coordinates. • Construct a 2m ×n measurement matrix D: • Column j contains the projection of point j in all views • Row i contains one coordinate of the projections of all the n points in image i • Factorize D: • Compute SVD: D = U W VT • Create U3 by taking the first 3 columns of U • Create V3 by taking the first 3 columns of V • Create W3 by taking the upper left 3 × 3 block ofW • Create the motion and shape matrices: • M = U3W3½ and S = W3½ V3T (or M = U3 and S = W3V3T) • Eliminate affine ambiguity Slide credit: Martial Hebert
Reconstruction Results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik Image Source: Tomasi & Kanade
Dealing with Missing Data • So far, we have assumed that all points are visible in all views • In reality, the measurement matrix typically looks something like this: Cameras Points B. Leibe Slide credit: Svetlana Lazebnik
Dealing with Missing Data • Possible solution: decompose matrix into dense sub-blocks, factorize each sub-block, and fuse the results • Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) • Incremental bilinear refinement • Perform factorization on a dense sub-block F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007. Slide credit: Svetlana Lazebnik