Structure from Motion

Structure from Motion Introduction to Computer VisionCS223B, Winter 2005Richard Szeliski

Today’s lecture • Calibration • estimating focal length and optic center • triangulation and pose • Structure from Motion • two-frame methods • factorization • bundle adjustment (non-linear least squares) • robust statistics and RANSAC • correspondence Structure from Motion

Camera Calibration

Camera calibration • Determine camera parameters from known 3D points or calibration object(s) • internal or intrinsic parameters such as focal length, optical center, aspect ratio:what kind of camera? • external or extrinsic (pose)parameters:where is the camera? • How can we do this? Structure from Motion

Camera calibration – approaches • Possible approaches: • linear regression (least squares) • non-linear optimization • vanishing points • multiple planar patterns • panoramas (rotational motion) Structure from Motion

(Xc,Yc,Zc) uc f u Image formation equations Structure from Motion

Calibration matrix • Is this form of K good enough? • non-square pixels (digital video) • skew • radial distortion Structure from Motion

Camera matrix • Fold intrinsic calibration matrix K and extrinsic pose parameters (R,t) together into acamera matrix • M = K [R | t ] • (put 1 in lower r.h. corner for 11 d.o.f.) Structure from Motion

Camera matrix calibration • Directly estimate 11 unknowns in the M matrix using known 3D points (Xi,Yi,Zi) and measured feature positions (ui,vi) Structure from Motion

Camera matrix calibration • Linear regression: • Bring denominator over, solve set of (over-determined) linear equations. How? • Least squares (pseudo-inverse) • Is this good enough? Structure from Motion

Optimal estimation • Feature measurement equations • Likelihood of M given {(ui,vi)} Structure from Motion

Optimal estimation • Log likelihood of M given {(ui,vi)} • How do we minimize C? • Non-linear regression (least squares), because ûi and vi are non-linear functions of M Structure from Motion

Levenberg-Marquardt • Iterative non-linear least squares [Press’92] • Linearize measurement equations • Substitute into log-likelihood equation: quadratic cost function in Dm Structure from Motion

Levenberg-Marquardt • Iterative non-linear least squares [Press’92] • Solve for minimumHessian:error: • Does this look familiar…? Structure from Motion

Levenberg-Marquardt • What if it doesn’t converge? • Multiply diagonal by (1 + l), increase l until it does • Halve the step size Dm (my favorite) • Use line search • Other ideas? • Uncertainty analysis: covariance S = A-1 • Is maximum likelihood the best idea? • How to start in vicinity of global minimum? Structure from Motion

Camera matrix calibration • Advantages: • very simple to formulate and solve • can recover K [R | t] from M using QR decomposition [Golub & VanLoan 96] • Disadvantages: • doesn't compute internal parameters • more unknowns than true degrees of freedom • need a separate camera matrix for each new view Structure from Motion

Separate intrinsics / extrinsics • New feature measurement equations • Use non-linear minimization • Standard technique in photogrammetry, computer vision, computer graphics • [Tsai 87] – also estimates k1 (freeware @ CMU)http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-source.html • [Bogart 91] – View Correlation Structure from Motion

Separate intrinsics / extrinsics • How do we parameterize R and ΔR? • Euler angles: bad idea • quaternions: 4-vectors on unit sphere • use incremental rotation R(I + DR) • update with Rodriguez formula Structure from Motion

Intrinsic/extrinsic calibration • Advantages: • can solve for more than one camera pose at a time • potentially fewer degrees of freedom • Disadvantages: • more complex update rules • need a good initialization (recover K [R | t] from M) Structure from Motion

u0 u1 u2 Vanishing Points • Determine focal length f and optical center (uc,vc) from image of cube’s(or building’s) vanishing points[Caprile ’90][Antone & Teller ’00] Structure from Motion

(Xc,Yc,Zc) uc f u u0 u1 u2 Vanishing Points • X, Y, and Z directions, Xi = (1,0,0,0) … (0,0,1,0) correspond to vanishing points that are scaled version of the rotation matrix: Structure from Motion

u0 u1 u2 Vanishing Points • Orthogonality conditions on rotation matrix R, • ri¢rj = dij • Determine (uc,vc) from orthocenter of vanishing point triangle • Then, determine f2 from twoequations(only need 2 v.p.s if (uc,vc) known) Structure from Motion

Vanishing point calibration • Advantages: • only need to see vanishing points(e.g., architecture, table, …) • Disadvantages: • not that accurate • need rectahedral object(s) in scene Structure from Motion

Single View Metrology • A. Criminisi, I. Reid and A. Zisserman (ICCV 99) • Make scene measurements from a single image • Application: 3D from a single image • Assumptions • 3 orthogonal sets of parallel lines • 4 known points on ground plane • 1 height in the scene • Can still get an affine reconstruction without 2 and 3 Structure from Motion

Criminisi et al., ICCV 99 • Complete approach • Load in an image • Click on parallel lines defining X, Y, and Z directions • Compute vanishing points • Specify points on reference plane, ref. height • Compute 3D positions of several points • Create a 3D model from these points • Extract texture maps • Output a VRML model Structure from Motion

3D Modeling from a Photograph Structure from Motion

Multi-plane calibration • Use several images of planar target held at unknown orientations [Zhang 99] • Compute plane homographies • Solve for K-TK-1 from Hk’s • 1plane if only f unknown • 2 planes if (f,uc,vc) unknown • 3+ planes for full K • Code available from Zhang and OpenCV Structure from Motion

f=468 f=510 Rotational motion • Use pure rotation (large scene) to estimate f • estimate f from pairwise homographies • re-estimate f from 360º “gap” • optimize over all {K,Rj} parameters[Stein 95; Hartley ’97; Shum & Szeliski ’00; Kang & Weiss ’99] • Most accurate way to get f, short of surveying distant points Structure from Motion

Pose estimation and triangulation

Pose estimation • Once the internal camera parameters are known, can compute camera pose [Tsai87] [Bogart91] • Application: superimpose 3D graphics onto video • How do we initialize (R,t)? Structure from Motion

Pose estimation • Previous initialization techniques: • vanishing points [Caprile 90] • planar pattern [Zhang 99] • Other possibilities • Through-the-Lens Camera Control [Gleicher92]: differential update • 3+ point “linear methods”: [DeMenthon 95][Quan 99][Ameller 00] Structure from Motion

(Xc,Yc,Zc) uc f u Pose estimation • Solve orthographic problem, iterate [DeMenthon 95] • Use inter-point distance constraints [Quan 99][Ameller 00] Solve set of polynomial equations in xi2p Structure from Motion

Triangulation • Problem: Given some points in correspondence across two or more images (taken from calibrated cameras), {(uj,vj)}, compute the 3D location X Structure from Motion

Triangulation • Method I: intersect viewing rays in 3D, minimize: • X is the unknown 3D point • Cj is the optical center of camera j • Vj is the viewing ray for pixel (uj,vj) • sj is unknown distance along Vj • Advantage: geometrically intuitive X Vj Cj Structure from Motion

Triangulation • Method II: solve linear equations in X • advantage: very simple • Method III: non-linear minimization • advantage: most accurate (image plane error) Structure from Motion

Structure from Motion

Structure from motion • Given many points in correspondence across several images, {(uij,vij)}, simultaneously compute the 3D location xi and camera (or motion) parameters (K, Rj, tj) • Two main variants: calibrated, and uncalibrated (sometimes associated with Euclidean and projective reconstructions) Structure from Motion

Structure from motion • How many points do we need to match? • 2 frames: (R,t): 5 dof + 3n point locations  4n point measurements  n 5 • k frames: 6(k–1)-1 + 3n  2kn • always want to use many more Structure from Motion

Two-frame methods • Two main variants: • Calibrated: “Essential matrix” Euse ray directions (xi, xi’ ) • Uncalibrated: “Fundamental matrix” F • [Hartley & Zisserman 2000] Structure from Motion

Essential matrix • Co-planarity constraint: • x’ ≈ R x + t • [t]x’ ≈[t] R x • x’T[t]x’ ≈x’ T[t] R x • x’ T E x = 0 withE =[t] R • Solve for E using least squares (SVD) • t is the least singular vector of E • R obtained from the other two s.v.s Structure from Motion

Fundamental matrix • Camera calibrations are unknown • x’ F x = 0 withF = [e] H = K’[t] R K-1 • Solve for F using least squares (SVD) • re-scale (xi, xi’ ) so that |xi|≈1/2 [Hartley] • e (epipole) is still the least singular vector of F • H obtained from the other two s.v.s • “plane + parallax” (projective) reconstruction • use self-calibration to determine K [Pollefeys] Structure from Motion

Three-frame methods • Trifocal tensor • [Hartley & Zisserman 2000] Structure from Motion

Multi-frame Structure from Motion

Factorization [Tomasi & Kanade, IJCV 92]

Structure [from] Motion • Given a set of feature tracks,estimate the 3D structure and 3D (camera) motion. • Assumption: orthographic projection • Tracks: (ufp,vfp), f: frame, p: point • Subtract out mean 2D position… • ufp = ifT spif: rotation,sp: position • vfp = jfT sp Structure from Motion

Measurement equations • Measurement equations • ufp = ifT spif: rotation,sp: position • vfp = jfT sp • Stack them up… • W = R S • R = (i1,…,iF, j1,…,jF)T • S = (s1,…,sP) Structure from Motion

Factorization • W = R2F3 S3P • SVD • W = U Λ V Λmust be rank 3 • W’ = (U Λ 1/2)(Λ1/2 V) = U’ V’ • MakeRorthogonal • R = QU’ , S = Q-1V’ • ifTQTQif = 1 … Structure from Motion

Results • Look at paper figures… Structure from Motion

Extensions • Paraperspective • [Poelman & Kanade, PAMI 97] • Sequential Factorization • [Morita & Kanade, PAMI 97] • Factorization under perspective • [Christy & Horaud, PAMI 96] • [Sturm & Triggs, ECCV 96] • Factorization with Uncertainty • [Anandan & Irani, IJCV 2002] Structure from Motion

Structure from Motion