Structure and Motion from Line Segments in Multiple Images

Structure and Motion from Line Segments in Multiple Images Camillo J. Taylor, David J. Kriegman Presented by David Lariviere

Primary Goal Given a series of images with known corresponding line segments, calculate the relative locations of the cameras imaging the scene and the three-dimensional locations of the line segments.

Some Previous Work • (1981) Longuet-Higgins. “A computer algorithm for reconstructing a scene from two projections.” • (1990) Vieville. “Estimation of 3D-motion and structure from tracking 2D-lines in a sequence of images.” • (1992) Tomasi, Kanade. “Shape and motion from image streams under orthography.”

Problem Characterization • Instead of using generalized scenes and points, focus on rigid scenes with clear edges as features. • Advantages of lines as features: • Occur frequently in man-made environments. • Easily located and tracked • More accurately localized than points because there is more information available in corroboration.

Algorithm Overview • Determine a non-linear objective function whose minimization leads to an estimate of scene structure. • In this case, estimate 3D camera locations/orientations and locations of line segments in 3D, and then reproject the lines onto the estimated image planes. • The difference between the predicted projected lines and the actually observed lines is the error function to minimize.

Objective Function • pi: ith 3D line • qj: jth camera position/orientation • uij: observed edge i in image j. • m images • n lines • F: reprojection of line pi onto the image plane of camera qj.

Notation – Line Representation • Represent a line in 3D space by (v,d) • v: unit vector pointing in direction of the line • d: vector from origin to closest point on the line. • m: normal vector of the plane defined by the camera center and line. • Edge in image plane defined by mxx + myy + mz = 0

Notation – Reference Frames • Relate location/orientation of each camera to some world base frame.

Summary of Parameters • Camera Location (tj): 3 DOF • Camera Orientation (Rj): 3 DOF • Line Location/Orientation (v,d): 4 DOF • Requires at least 6 edge correspondences in 3 images.

Reprojection Error • Visible endpoints (x1,y1) & (x2,y2) • Calculate minimal distance between observed and predicted lines for every point integrated on interval between endpoints. • Normalize error by dividing by length of observed edge.

Algorithm • Primary Algorithm for minimizing non-linear function: minimize line reprojection error through gradient decent to find local minimum: • Randomly generate initial values. • Iteratively follow function along steepest descent to reach local minimum. • If local minimum error is below a certain threshold, accept. • Else, generate new initial values and try again. • Quality of initial values influence heavily the number of iterations required before the function converges.

Initial Value Estimation In order to decrease computational cost, additional steps are added to acquire acceptable starting values for gradient decent: • User inputs range for camera orientations (Rj) and values of Rj within that range are randomly chosen. • Holding constant estimates from (1), estimate vi subject to a constraint equation. • Improve estimate from (2) by now minimizing same constraint equation with both vi and Rj as free parameters. • Generate initial estimates of di and tj, using a second constraint equation. • Provide estimates from (3) and (4) as starting values for gradient decent.

Constraint Equations • From the defined relations: • One can derive: • Which provides two constraint equations:

Results • Simulation Results: • measuring tolerance to noise, rate of returns due to increased number of images/features, and rate of convergence of global minimization. • Comparing proposed method to previous linear methods • Real-world Results

Simulation Results: • Main Results: • The algorithm is much more sensitive to errors in edge endpoints than error in the calibrated camera center. • Holding maximum baseline constant, increasing the number of images beyond 6 or the number of lines beyond 50 does not improve accuracy. • Small number of large-baseline images superior to many small-baseline images. • Rate of convergence of global decent minimization algorithm is highly dependant on initial range of theta.

Simulation Results Continued

This method is significantly less sensitive to noise than the leading linear algorithm1 Comparison to Linear Method 1J. Weng, Y. Liu, T. S. Huang, and N. Ahuja, “Estimating motion/structure from line correspondences”

Real-world Results

Real-world Results…

Real-world Results: Hallway

Discussion • Initial estimation optimizations improve calculation speed. • Algorithm is very insensitive to noise • Future improvements: • Automate edge correspondence tracking by using video. • Impose edge-intersection and other geometric restrictions (coplanarity, parallelism, etc).

Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach Paul E. Debevec, Camillo J. Taylor, Jitendra Malik

Overview • Apply previous paper’s methods to modeling architectural scenes with restricted geometry. • Utilize model-based stereo to extract precise geometry from a sparse set of large-baseline photographs. • Utilize 3D models and view-dependant photographs to construct photorealistic computer-generated views.

Architectural Models: Blocks • User starts by choosing geometric primitives (blocks) to represent the basic geometry of the building • Block: “hierarchical model of a parametric polyhedral primitive” • Parametrized by base vertex and Po and other various properties (width, height, length, etc).

Block Relations • Hierarchy of blocks are used to describe the various geometric primitives that make up the basic architecture. • User manually maps corresponding edges in images to the edges of the blocks. • Blocks are related by constraints on their relations in terms of location and orientation: • For example, ensure that the bottom of one block sits on top of the top of another block. • Values of blocks are stored symbolically, meaning if one specifies a series of blocks to be parallel, then only one variable is used to enforce this restriction across all blocks. • gi(X): rigid transformation mapping one block to adjacent block. • Pw(x): block vertex in world coordinates • vw(x): line orientation in world orientation

Block Relations Continued…

Advantages of Blocks • Well model most architectural scenes • Implicitly contain features commonly found in architecture (ex: parallel edges, right angles) • Manipulation by user is easier due to reduced number of parameters. • Surfaces are pre-defined by the model, removing the need to calculate them from edges. • Number of parameters are greatly reduced when performing minimization of cost function.

Single Image Examples:

Estimation of 3D Structure • Very similar to previous paper: Estimate parameters of camera (R, t) and edges (v, d) which minimize the reprojection error. • Differences: • Many edges are defined with relation to one another, meaning fewer variables. • Apply horizontal/vertical constraints on vi to more accurately estimate Rj. • Instead of using gradient decent, the authors use Newton-Raphson method to minimize the non-linear error function.

View-Dependant Texture Mapping • Once camera and edge locations/orientations are known, project images onto block models. • If multiple images of same area exist, apply weighted averaging to fuse multiple images. • Weights are inversely proportional to the difference in angle between the virtual view being synthesized and the camera location/orientation which took the particular image. • Possible to divide planes into faces, and only calculate the weighted average for one value and apply it to the entire face.

Example of Texture-Mapping

Model-based Stereopsis • Use known scene geometry and camera locations to rectify large-baseline images before performing stereo. • Allows for the avoidance of foreshore-shortening problems which can be very large when images are taken far apart. • Maintain epipolar constraint by projecting offset image onto model and then reprojecting onto key image’s image plane to create rectified image for use in stereopsis.

Model-based Stereopsis Example

Discussion • For architectural scenes that generally fit the allowed geometric primitives, approach works quite well. • Future Possible Improvements: • Additional models: surfaces of revolution • Estimate BRDF • Devise method of selecting best images to use for rendering of novel views.

Questions?

Structure and Motion from Line Segments in Multiple Images

Structure and Motion from Line Segments in Multiple Images

Presentation Transcript

Computing Motion from Images

Structure from motion

Structure from Motion

Structure from Motion

Structure from images

Structure from Motion

Structure from motion

Structure from Motion

Structure from Motion

Finding Line and Curve Segments from Edge Images

Structure from motion

Structure from motion

Structure from Motion

Structure-from-Motion

Structure from Motion

Structure from Motion

Structure from Motion

Structure from Motion

Structure from motion

Structure from motion

Structure from motion

Structure from motion