1 / 75

Computer Vision – Lecture 17

Computer Vision – Lecture 17. Structure-from-Motion 21.01.2009. Bastian Leibe RWTH Aachen http://www.umic.rwth-aachen.de/multimedia leibe@umic.rwth-aachen.de. Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz. TexPoint fonts used in EMF.

leiko
Télécharger la présentation

Computer Vision – Lecture 17

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Vision – Lecture 17 Structure-from-Motion 21.01.2009 Bastian Leibe RWTH Aachen http://www.umic.rwth-aachen.de/multimedia leibe@umic.rwth-aachen.de Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA

  2. Course Outline • Image Processing Basics • Segmentation & Grouping • Object Recognition • Local Features & Matching • Object Categorization • 3D Reconstruction • Epipolar Geometry and Stereo Basics • Camera calibration & Uncalibrated Reconstruction • Structure-from-Motion • Motion and Tracking

  3. Recap: A General Point • Equations of the form • How do we solve them? (always!) • Apply SVD • Singular values of A = square roots of the eigenvalues of ATA. • The solution of Ax=0 is the nullspace vector of A. • This corresponds to the smallest singular vector of A. SVD Singular values Singular vectors B. Leibe

  4. Properties of SVD • Frobenius norm • Generalization of the Euclidean norm to matrices • Partial reconstruction property of SVD • Let i i=1,…,N be the singular values of A. • Let Ap = UpDpVpTbe the reconstruction of A when we set p+1,…, Nto zero. • Then Ap = UpDpVpTis the best rank-p approximation of Ain the sense of the Frobenius norm (i.e. the best least-squares approximation). B. Leibe

  5. Recap: Camera Parameters • Intrinsic parameters • Principal point coordinates • Focal length • Pixel magnification factors • Skew (non-rectangular pixels) • Radial distortion • Extrinsic parameters • Rotation R • Translation t(both relative to world coordinate system) • Camera projection matrix  General pinhole camera: 9 DoF  CCD Camera with square pixels: 10 DoF  General camera: 11 DoF s s B. Leibe

  6. Xi xi Recap: Calibrating a Camera Goal • Compute intrinsic and extrinsic parameters using observed camera data. Main idea • Place “calibration object” with known geometry in the scene • Get correspondences • Solve for mapping from scene to image: estimate P=PintPext B. Leibe Slide credit: Kristen Grauman

  7. Recap: Camera Calibration (DLT Algorithm) • P has 11 degrees of freedom. • Two linearly independent equations per independent 2D/3D correspondence. • Solve with SVD (similar to homography estimation) • Solution corresponds to smallest singular vector. • 5 ½ correspondences needed for a minimal solution. • Solve with …? B. Leibe Slide adapted from Svetlana Lazebnik

  8. X? x2 x1 O2 O1 Recap: Triangulation – Linear Algebraic Approach • Two independent equations each in terms of three unknown entries of X. • Stack equations and solve with SVD. • This approach nicely generalizes to multiple cameras. R1 R2 • Stack equations and solve with …? B. Leibe Slide credit: Svetlana Lazebnik

  9. Recap: Epipolar Geometry – Calibrated Case X x x’ t R Camera matrix: [I|0]X = (u, v, w, 1)T x = (u, v, w)T Camera matrix: [RT | –RTt] Vector x’in second coord. system has coordinates Rx’in the first one. The vectors x, t, and Rx’ are coplanar B. Leibe Slide credit: Svetlana Lazebnik

  10. Recap: Epipolar Geometry – Calibrated Case X x x’ Essential Matrix (Longuet-Higgins, 1981) B. Leibe Slide credit: Svetlana Lazebnik

  11. Recap: Epipolar Geometry – Uncalibrated Case X • The calibration matrices K and K’ of the two cameras are unknown • We can write the epipolar constraint in terms of unknown normalized coordinates: x x’ B. Leibe Slide credit: Svetlana Lazebnik

  12. Recap: Epipolar Geometry – Uncalibrated Case X x x’ Fundamental Matrix (Faugeras and Luong, 1992) B. Leibe Slide credit: Svetlana Lazebnik

  13. Minimize: under the constraint|F|2 = 1 Recap: The Eight-Point Algorithm x = (u, v, 1)T, x’ = (u’, v’, 1)T • Problem: poor numerical conditioning B. Leibe Slide credit: Svetlana Lazebnik

  14. Recap: Normalized Eight-Point Algorithm • Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels. • Use the eight-point algorithm to compute F from the normalized points. • Enforce the rank-2 constraint using SVD. • Transform fundamental matrix back to original units: if T and T’ are the normalizing transformations in the two images, than the fundamental matrix in original coordinates is TT F T’. Set d33 to zero and reconstruct F SVD B. Leibe [Hartley, 1995] Slide credit: Svetlana Lazebnik

  15. Recap: Comparison of Estimation Algorithms B. Leibe Slide credit: Svetlana Lazebnik

  16. Recap: Epipolar Transfer • Assume the epipolar geometry is known • Given projections of the same point in two images, how can we compute the projection of that point in a third image? x1 x2 x3 l31 l32 l31 = FT13 x1 l32 = FT23 x2 B. Leibe Slide credit: Svetlana Lazebnik

  17. Recap: Active Stereo with Structured Light • Optical triangulation • Project a single stripe of laser light • Scan it across the surface of the object • This is a very precise version of structured light scanning • Digital Michelangelo Project • http://graphics.stanford.edu/projects/mich/ B. Leibe Slide credit: Steve Seitz

  18. Topics of This Lecture • Structure from Motion (SfM) • Motivation • Ambiguity • Affine SfM • Affine cameras • Affine factorization • Euclidean upgrade • Dealing with missing data • Projective SfM • Two-camera case • Projective factorization • Bundle adjustment • Practical considerations • Applications B. Leibe

  19. Xj x1j x3j x2j P1 P3 P2 Structure from Motion • Given: m images of n fixed 3D points xij = Pi Xj , i = 1, … , m, j = 1, … , n • Problem: estimate m projection matrices Piand n 3D points Xjfrom the mn correspondences xij B. Leibe Slide credit: Svetlana Lazebnik

  20. What Can We Use This For? • E.g. movie special effects • Video B. Leibe Video Credit: Stefan Hafeneger

  21. Structure from Motion Ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same:  It is impossible to recover the absolute scale of the scene! B. Leibe Slide credit: Svetlana Lazebnik

  22. Structure from Motion Ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same. • More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change B. Leibe Slide credit: Svetlana Lazebnik

  23. Reconstruction Ambiguity: Similarity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

  24. Reconstruction Ambiguity: Affine B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

  25. Reconstruction Ambiguity: Projective B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

  26. Projective Ambiguity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

  27. From Projective to Affine B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

  28. From Affine to Similarity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

  29. Hierarchy of 3D Transformations • With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction. • Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean. Projective 15dof Preserves intersection and tangency Preserves parallellism, volume ratios Affine 12dof Similarity 7dof Preserves angles, ratios of length Euclidean 6dof Preserves angles, lengths B. Leibe Slide credit: Svetlana Lazebnik

  30. Topics of This Lecture • Structure from Motion (SfM) • Motivation • Ambiguity • Affine SfM • Affine cameras • Affine factorization • Euclidean upgrade • Dealing with missing data • Projective SfM • Two-camera case • Projective factorization • Bundle adjustment • Practical considerations • Applications B. Leibe

  31. Structure from Motion • Let’s start with affine cameras (the math is easier) center atinfinity B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

  32. Orthographic Projection • Special case of perspective projection • Distance from center of projection to image plane is infinite • Projection matrix: Image World B. Leibe Slide credit: Steve Seitz

  33. Affine Cameras Orthographic Projection Parallel Projection B. Leibe Slide credit: Svetlana Lazebnik

  34. x a2 X a1 Affine Cameras • A general affine camera combines the effects of an affine transformation of the 3D space, orthographic projection, and an affine transformation of the image: • Affine projection is a linear mapping + translation in inhomogeneous coordinates Projection ofworld origin B. Leibe Slide credit: Svetlana Lazebnik

  35. Affine Structure from Motion • Given: m images of n fixed 3D points: • xij = Ai Xj + bi , i = 1,… , m, j = 1, … , n • Problem: use the mn correspondences xij to estimate m projection matrices Ai and translation vectors bi, and n points Xj • The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom): • We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity). • Thus, we must have 2mn >= 8m + 3n – 12. • For two views, we need four point correspondences. B. Leibe Slide credit: Svetlana Lazebnik

  36. Affine Structure from Motion • Centering: subtract the centroid of the image points • For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points. • After centering, each normalized point xij is related to the 3D point Xi by B. Leibe Slide credit: Svetlana Lazebnik

  37. Affine Structure from Motion • Let’s create a 2m×n data (measurement) matrix: Cameras(2m) Points (n) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik

  38. Affine Structure from Motion • Let’s create a 2m×n data (measurement) matrix: • The measurement matrix D = MS must have rank 3! Points (3× n) Cameras(2m ×3) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik

  39. Factorizing the Measurement Matrix B. Leibe Slide credit: Martial Hebert

  40. Factorizing the Measurement Matrix • Singular value decomposition of D: Slide credit: Martial Hebert

  41. Factorizing the Measurement Matrix • Singular value decomposition of D: Slide credit: Martial Hebert

  42. Factorizing the Measurement Matrix • Obtaining a factorization from SVD: Slide credit: Martial Hebert

  43. Factorizing the Measurement Matrix • Obtaining a factorization from SVD: This decomposition minimizes|D-MS|2 Slide credit: Martial Hebert

  44. Affine Ambiguity • The decomposition is not unique. We get the same D by using any 3×3 matrix C and applying the transformations M → MC, S →C-1S. • That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axis to be perpendicular, for example). We need a Euclidean upgrade. B. Leibe Slide credit: Martial Hebert

  45. Estimating the Euclidean Upgrade • Orthographic assumption: image axes are perpendicular and scale is 1. • This can be converted into a system of 3m equations: a1 · a2 = 0 x |a1|2 = |a2|2= 1 a2 X a1 B. Leibe Slide adapted from S. Lazebnik, M. Hebert

  46. Estimating the Euclidean Upgrade • This can be converted into a system of 3m equations: • Let • Then this translates to 3m equations in L • Solve for L • Recover C from L by Cholesky decomposition: L = CCT • Update M and S: M = MC, S = C-1S B. Leibe Slide adapted from S. Lazebnik, M. Hebert

  47. Algorithm Summary • Given: m images and n features xij • For each image i, center the feature coordinates. • Construct a 2m ×n measurement matrix D: • Column j contains the projection of point j in all views • Row i contains one coordinate of the projections of all the n points in image i • Factorize D: • Compute SVD: D = U W VT • Create U3 by taking the first 3 columns of U • Create V3 by taking the first 3 columns of V • Create W3 by taking the upper left 3 × 3 block ofW • Create the motion and shape matrices: • M = U3W3½ and S = W3½ V3T (or M = U3 and S = W3V3T) • Eliminate affine ambiguity Slide credit: Martial Hebert

  48. Reconstruction Results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik Image Source: Tomasi & Kanade

  49. Dealing with Missing Data • So far, we have assumed that all points are visible in all views • In reality, the measurement matrix typically looks something like this: Cameras Points B. Leibe Slide credit: Svetlana Lazebnik

  50. Dealing with Missing Data • Possible solution: decompose matrix into dense sub-blocks, factorize each sub-block, and fuse the results • Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) • Incremental bilinear refinement • Perform factorization on a dense sub-block F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007. Slide credit: Svetlana Lazebnik

More Related