Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland)

Structure-from-EgoMotion(based on notes from David Jacobs, CS-Maryland) • Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera. • Equivalently, we can think of the world as moving and the camera as fixed. • Like stereo, but the position of the camera isn’t known (and it’s more natural to use many images with little motion between them, not just two with a lot of motion). • We may or may not assume we know the parameters of the camera, such as its focal length. Computer Vision

… Structure-from-EgoMotion • As with stereo, we can divide the problem: • Correspondence. • Reconstruction. • We will focus on the reconstruction. • So we assume that each image contains some points, and we know which points match which. Computer Vision

Representation • We’ll talk about a fixed camera, and moving object. • We use scaled orthographic projection (weak perspective). • we remove the z coordinate and scale all x and y coordinates the same amount. • Key point: Points Some matrix The image Then: Computer Vision

Rotation R can represent a 3D rotation of the points in P. What are the constraints on R?. First, look at 2D rotation (easier) • RRT= Identity. RT is also a rotation matrix, in the opposite direction to R. Computer Vision

Full 3D Rotation Rotation about z axis: Rotates x,y coordinates. Leaves z coordinates fixed. • Any rotation can be expressed as combination of three rotations about three axes. • Rows (and columns) of R are orthonormal vectors. • R has determinant 1 (not -1). Computer Vision

S: Putting it Together 3D Translation Scale Projection 3D Rotation We can just write stx as tx and styas ty. Computer Vision

Affine Structure from Motion Computer Vision

Affine Structure-from-Motion: Two Frames (1) To simplify, suppose for the first four points: Computer Vision

Affine Structure-from-Motion: Two Frames (2) Looking at the first four points, we get: We can solve for motion by inverting matrix of points. Or, explicitly, we see that first column on left (images of first point) give the translations. After solving for these, we can solve for the each column of the s components of the motion using the images of each point, in turn. Computer Vision

Affine Structure-from-Motion: Two Frames (5) But, what if the first four points aren’t so simple? Then we define A, affine transformation, so that: Note that corresponds to translation of the points, plus a linear transformation. This is always possible as long as the points aren’t coplanar. Then, given: Computer Vision

Affine Structure-from-Motion: Two Frames (6) We have: And: is our motion. Thus, we can never determine the exact 3D structure of the scene. We can only determine it up to some transformation, A. Then: Computer Vision

Affine Structure-from-Motion: Many frames (1) I S P Computer Vision

First Step: Solve for Translation (1) • We pick the center of mass as origin, i.e., the average of all 3d points. It also averages noise locations. Rotation doesn’t move the origin, which is now the center of mass. Neither does scaled orthographic projection. Computer Vision

First Step: Solve for Translation (2) Thus, translation can be eliminated. Computer Vision

has rank 3. This means there are 3 vectors such that every row of is a linear combination of these vectors. These vectors are the rows of P Rank Theorem P P S • SVD is made to do this. D is diagonal with non-increasing values, select the first/top three values, i.e., make D a 3 x 3 matrix. U and V have orthonormal rows, 2f x 3 and 3 x n respectively. Computer Vision

Linear Ambiguity (as before) = U(:,1:3) * D(1:3,1:3) * V(1:3,:) = (U(:,1:3) * A) * (inv(A) *D(1:3,1:3) * V(1:3,:)) Noise • has full rank. • Best solution is to estimate I that’s as near to as possible, with estimate of I having rank 3. • Our current method does this. Computer Vision

Weak Perspective Motion Row 2k and 2k+1 of S should be orthogonal. All rows should be unit vectors. (Push all scale into P). P Choose A so (U(:,1:3) * A) satisfies these conditions. S =(U(:,1:3)*A)*(inv(A) *D(1:3,1:3)*V(1:3,:)) Computer Vision

Related problems we won’t cover • Missing data. • Points with different, known noise. • Multiple moving objects. Computer Vision

Final Messages • Structure-from-egomotion for points can be reduced to linear algebra. • Epipolar constraint reemerges. • SVD useful. • Rank Theorem says the images a scene produces aren’t complicated (also important for recognition). Computer Vision

Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland)

Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland)

Presentation Transcript

Introduction to Computer Engineering

Chapter 7 Demand Forecasting in a Supply Chain

Going Green in Low Income Housing: Perspectives from Developers and Researchers

Shape from X

Feature-Based Alignment

CS 585 Computational Photography

Structure from Motion

Projective Geometry and Epipolar Constraints

Structure from Motion

Student Expectations From CS and other STEM Course

Structure from Motion

Bob Lucas Based on previous notes by James Demmel and David Culler nersc/~dhbailey/cs267

XML Design (A Gentle Transition from XML to RDF)

David H. Bailey based on notes by J. Demmel and D. Culler nersc/~dhbailey/cs267

David H. Bailey Based on previous notes by James Demmel and David Culler

Leaf Classification from Local Boundary Analysis

Leaf Classification from Boundary Analysis

CSC 463 Fall 2010

Structure from Motion

Leaf Classification from Local Boundary Analysis

ECE/CS 752: Midterm 1 Review ECE/CS 752 Fall 2017