Structure from motion

Structure from motion (Tomasi and Kanade) • Input: • a set of point tracks • Output: • 3D location of each point (shape) • camera parameters (motion)

Orthographic SFM: Setup • : a collection of images (video frames) depicting a rigid scene • Orthographic projection (no scale) • point tracks in those frames • Unknown 3D location:, • Projected locations: denote by the location of at frame , then , are the two top rows of a rotation matrix

Orthographic SFM: Objective Find and that minimize Subject to

Eliminate translation • We can eliminate translation by representing the location of each point relative to the centroids of all points: • Assume without loss of generality that the centroid of coincides with the origin • Translate each image point by setting denotes the centroid of

Objective (w/o translation) Find that minimize Subject to

Measurement matrix

Transformation and shape matrices

Objective: matrix notation Find and that minimize Subject to is , is , is

TK-Factorization Step 1: find rank 3 approximation to using SVD • where • is , , • , size ,and • is ,

TK-Factorization where Note: this is a relaxation, only noise components outside the 3D space are annihilated Step 2: factorization Ambiguity: for any non-singular, matrix

TK-Factorization Step 3: resolve ambiguity Let , note that Let be the corresponding rows in , then Find a symmetric matrix

TK-Factorization • Equation is linear in • There are equations in 6 unknowns • Find by eigen-decomposition so that • Solution is obtained up to a rotation ambiguity such that

TK-Factorization: Summary • Eliminate translation, construct • to get rank 3 and factorize (ambiguity remains) • Resolve ambiguity: estimate from orthonormality and factorize to obtain Solution up to rotation and reflection

Incomplete tracks • Tracks are often incomplete – • Factorization with missing data • Rank is difficult to enforce • Surrogate: minimize the nuclear norm – sum of singular values, • Nuclear norm is convex, minimization often achieves low rank • Accurate reconstruction usually requires accounting for perspective distortion

Perspective projection • A point is projected to • A point rotated by and translated by projects to denotes the rows of • We call a camera matrix • calibration matrix, • camera orientation, • camera location

Bundle adjustment • Given points in frames, ,,find camera matrices and positions ()that minimize • Alternate optimization • Given and , solve for • Given solve forand • Very good initial guess is required

Bundler (photo-tourism) (Snavelyet al.)

Bundler (photo-tourism) • Given images, identify feature points, describe them with SIFTs • Match SIFTs, accept each match whose score is at least twice of any other match • For every pair of images with sufficiently many matches use RANSAC to recover Essential matrices • Starting with two images and adding one image at a time: use essential matrix to recover depth and apply bundle adjustment

Simultaneous solutions • : Essential matrix between and , • (on a subset of image pairs) • Objective: recover camera orientation and location relative to a global coordinate system • This can be solved in various ways, for example : least squares solution if we ignore the orthonormality constraints for

Essential in global coordinates • Corresponding points, and , satisfy the following relation • This generalizes the formula for the essential matrix (plug in , ) • Once camera orientations are known we can solve for camera locations • Solution suffers from shrinkage problems

Reconstruction example

Structure from motion