Create Presentation
Download Presentation

Download Presentation
## Camera Calibration & Stereo Reconstruction

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Camera Calibration & Stereo Reconstruction**Jinxiang Chai**3D Computer Vision**• The main goal here is to reconstruct geometry of 3D worlds.**How can we estimate the camera parameters?**- Where is the camera located? - Which direction is the camera looking at? - Focal length, projection center, aspect ratio?**Stereo reconstruction**• Given two or more images of the same scene or object, compute a representation of its shape • How can we estimate camera parameters? known camera viewpoints**Camera calibration**• Augmented pin-hole camera • - focal point, orientation • - focal length, aspect ratio, center, lens distortion Known 3D • Classical calibration • - 3D 2D • - correspondence Camera calibration online resources**Classical camera calibration**• Known 3D coordinates and 2D coordinates • - known 3D points on calibration targets • - find corresponding 2D points in image using feature detection • algorithm**Camera parameters**Known 3D coords and 2D coords sx а u0 u 0 -sy v0 v 0 0 1 1 Viewport proj. Perspective proj. View trans.**Camera parameters**Known 3D coords and 2D coords sx а u0 u 0 -sy v0 v 0 0 1 1 Viewport proj. Perspective proj. View trans. Intrinsic camera parameters (5 parameters) extrinsic camera parameters (6 parameters)**Camera matrix**• Fold intrinsic calibration matrix K and extrinsic pose parameters (R,t) together into acamera matrix • M = K [R | t ] • (put 1 in lower r.h. corner for 11 d.o.f.)**Camera matrix calibration**• Directly estimate 11 unknowns in the M matrix using known 3D points (Xi,Yi,Zi) and measured feature positions (ui,vi)**Camera matrix calibration**• Linear regression: • Bring denominator over, solve set of (over-determined) linear equations. How?**Camera matrix calibration**• Linear regression: • Bring denominator over, solve set of (over-determined) linear equations. How? • Least squares (pseudo-inverse) - 11 unknowns (up to scale) - 2 equations per point (homogeneous coordinates) - 6 points are sufficient**Nonlinear camera calibration**• Perspective projection:**Nonlinear camera calibration**• Perspective projection: K R T P**Nonlinear camera calibration**• Perspective projection: • 2D coordinates are just a nonlinear function of its 3D coordinates and camera parameters: K R T P**Nonlinear camera calibration**• Perspective projection: • 2D coordinates are just a nonlinear function of its 3D coordinates and camera parameters: K R T P**Multiple calibration images**• Find camera parameters which satisfy the constraints from M images, N points: • for j=1,…,M • for i=1,…,N • This can be formulated as a nonlinear optimization problem:**Multiple calibration images**• Find camera parameters which satisfy the constraints from M images, N points: • for j=1,…,M • for i=1,…,N • This can be formulated as a nonlinear optimization problem: Solve the optimization using nonlinear optimization techniques: - Gauss-newton - Levenberg-Marquardt**Nonlinear approach**• Advantages: • can solve for more than one camera pose at a time • fewer degrees of freedom than linear approach • Standard technique in photogrammetry, computer vision, computer graphics - [Tsai 87] also estimates lens distortions (freeware @ CMU)http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-source.html • Disadvantages: • more complex update rules • need a good initialization (recover K [R | t] from M)**Application: camera calibration for sports video**images Court model [Farin et. Al]**Stereo matching**• Given two or more images of the same scene or object as well as their camera parameters, how to compute a representation of its shape? • What are some possible representations for shapes? • depth maps • volumetric models • 3D surface models • planar (or offset) layers**Outline**• Stereo matching • - Traditional stereo • - Active stereo • Volumetric stereo • - Visual hull • - Voxel coloring • - Space carving**Readings**• Stereo matching • 11.1, 11.2,.11.3,11.5 in Sezliski book • D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms.International Journal of Computer Vision, 47(1/2/3):7-42, April-June 2002.**Stereo**scene point image plane optical center**Stereo**• Basic Principle: Triangulation • Gives reconstruction as intersection of two rays • Requires • calibration • point correspondence**epipolar plane**epipolar line Stereo correspondence • Determine Pixel Correspondence • Pairs of points that correspond to same scene point epipolar line • Epipolar Constraint • Reduces correspondence problem to 1D search along conjugateepipolar lines • Java demo: http://www.ai.sri.com/~luong/research/Meta3DViewer/EpipolarGeo.html**Stereo image rectification**• reproject image planes onto a common • plane parallel to the line between optical centers • pixel motion is horizontal after this transformation • two homographies (3x3 transform), one for each input image reprojection • C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.**Rectification**Original image pairs Rectified image pairs**Stereo matching algorithms**• Match Pixels in Conjugate Epipolar Lines • Assume brightness constancy • This is a tough problem • Numerous approaches • A good survey and evaluation: http://www.middlebury.edu/stereo/**For each epipolar line**For each pixel in the left image • Improvement: match windows • This should look familiar.. (cross correlation or SSD) • Can use Lukas-Kanade or discrete search (latter more common) Your basic stereo algorithm • compare with every pixel on same epipolar line in right image • pick pixel with minimum matching cost**W = 3**W = 20 Window size • Smaller window - • Larger window - • Effect of window size**More constraints?**• We can enforce more constraints to reduce matching ambiguity • - smoothness constraints: computed disparity at a pixel • should be consistent with neighbors in a surrounding window. • - uniqueness constraints: the matching needs to be bijective • - ordering constraints: e.g., computed disparity at a pixel • should not be larger than the disparity of its right neighbor pixel by • more than one pixel.**Stereo results**• Data from University of Tsukuba • Similar results on other images without ground truth Scene Ground truth**Results with window search**Window-based matching (best window size) Ground truth**Better methods exist...**• A better method • Boykov et al., Fast Approximate Energy Minimization via Graph Cuts, • International Conference on Computer Vision, September 1999. Ground truth**More recent development**• High-Quality Single-Shot Capture of Facial Geometry [siggraph 2010, project website] • - capture high-fidelity facial geometry from multiple cameras • - pairwise stereo reconstruction between neighboring cameras • - hallucinate facial details**More recent development**• High Resolution Passive Facial Performance Capture [siggraph 2010, project website] • - capture dynamic facial geometry from multiple video cameras • - spatial stereo reconstruction for every frame • - building temporal correspondences across the entire sequence**Stereo reconstruction pipeline**• Steps • Calibrate cameras • Rectify images • Compute disparity • Estimate depth**Stereo reconstruction pipeline**• Steps • Calibrate cameras • Rectify images • Compute disparity • Estimate depth • Camera calibration errors • Poor image resolution • Occlusions • Violations of brightness constancy (specular reflections) • Large motions • Low-contrast image regions • What will cause errors?**Outline**• Stereo matching • - Traditional stereo • - Active stereo • Volumetric stereo • - Visual hull • - Voxel coloring • - Space carving**camera 1**camera 1 projector projector camera 2 Active stereo with structured light • Project “structured” light patterns onto the object • simplifies the correspondence problem Li Zhang’s one-shot stereo**Laser scanning**• Optical triangulation • Project a single stripe of laser light • Scan it across the surface of the object • This is a very precise version of structured light scanning • Digital Michelangelo Project • http://graphics.stanford.edu/projects/mich/**Laser scanned models**The Digital Michelangelo Project, Levoy et al.**Laser scanned models**The Digital Michelangelo Project, Levoy et al.**Recent development**• Capturing dynamic facial movement using active stereo [project website] • - use synchronized video cameras and structured light projectors to capture dynamic facial geometry • - use a generic 3D model to build temporal correspondences across the entire sequence