1 / 35

Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD)

Local Stereo M atching U sing Motion C ue and Modified C ensus in Video D isparity E stimation. Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD). 20th European Signal Processing Conference (EUSIPCO 2012 ). Outline. Introduction Framework

olympe
Télécharger la présentation

Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local Stereo Matching Using Motion Cue and Modified Census in Video Disparity Estimation ZucheulLee, RamsinKhoshabeh, Jason Juangand Truong Q. Nguyen (UCSD) 20th European Signal Processing Conference (EUSIPCO 2012)

  2. Outline • Introduction • Framework • Proposed Algorithm • Experimental Results • Conclusion

  3. Introduction

  4. Background • The disparity estimation has been thoroughly studied • Focus strictly on images • Video disparity estimation: • (1) Lack of video datasets with ground-truth disparity maps • (2) Temporal inconsistency problems • flickering resulting from simply applying image-based algorithms to video

  5. Background • Fundamental attributes that group objects together locally: • Proximity • Similarity • Motion • The objects grouped by these attributes are most likely to have the same depth. Image disparity estimation • - Important for accurate depth estimation near edges of moving objects

  6. Objective • Propose a more accurate and noise tolerant method for video disparity estimation • More accurate than other methods on edges and in flat (textureless) areas • Using: • Motion cues (edges) • Modified census transform (flat areas) • Spatio-temporal consistency (refinement)

  7. Related Work • Adaptive Weight[6] • Cost-volume filtering[7] • Guided filter • Spatio-Temporal Consistency[3] Do not provide a reliable solution for disparity estimation in textureless (flat) areas [6] K.-J. Yoon and I.-S. Kweon, “Adaptive Support-Weight Approach for Correspondence Search,” IEEE Trans. Pattern Anal. Mach. Intell., vol.28, no. 4, pp. 650-656, 2006. [7] C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, “Fast Cost-Volume Filtering for Visual Correspondence and Beyond,” in Proc.IEEEIntl. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3017-3024,2011. [3] R. Khoshabeh, S. H. Chan, and T. Q. Nguyen, “Spatio-Temporal Consistency in Video Disparity Estimation,” ICASSP, pp. 885-888, 2011

  8. Framework

  9. Framework

  10. ProposedAlgorithm

  11. Support Weight Using Correlated Color and Motion • The support weight: • : color difference • : motion difference • γm :motion parameter • γs :similarity parameter

  12. Support Weight Using Correlated Color and Motion • Let and be the color coordinates of pixel c and neighbor pixel q in the CIELab color space • Color difference: • Let and be the flow vectors[10] of pixel c and neighbor pixel q • Truncated motion difference: • τ: truncation value [10] D. Sun, S. Roth, M.J. Black, “Secrets of Optical FlowEstimation and Their Principles,” CVPR, pp. 2432-2439, 2010.

  13. Benefits of a Motion Cue • The “car” video frames (480x270 15 disparity levels): Proximity + Similarity Proximity + Similarity + Motion Proximity

  14. Modified Census Transform • Difficult in finding the correct correspondences in flat areas. • Due to the fact that the census matching cost is extremely sensitive to image noise since all pixels in flat areas have a similar intensity. • Three moded census transform with a noise buffer Problem: Solution:

  15. Modified Census Transform • Using two bits to implement three modes • α:noise buffer threshold • Set 10if (neighbor pixel intensity) - (center pixel intensity) > α • Set 01 if (neighbor pixel intensity) - (center pixel intensity) < α • Set 00 otherwise • Intensity value 0~50 α= 0 • Intensity value 50~100 α= 1 • Intensity value 100~150 α= 2 • Intensity value 150~200 α= 3 • Intensity value 200~255 α= 4

  16. Modified Census Transform • Raw matching cost: • : Intensity difference • compare two center pixels • : Hamming distance • compare the spatial structure • calculated by the bitwise XOR operation(Census transform)

  17. Aggregation and disparity Computation • Aggregated matching cost: • Winner-take-all (WTA): : left and right support window w(cd , qd) : support weight of pixel qdin the right window D : the set of all possible disparities

  18. Aggregation and disparity Computation Left view Original census Modified census (without intensity difference) Modified census

  19. Spatio-temporal Consistency[3] Problem: • Simply applying image-based algorithms to individual frames • temporally inconsistent (even the best methods) • Consider the sequence of disparity maps as a space-time volume • A three-dimensional function f(x,y,t) with • (x,y) : spatial coordinates • t : temporal coordinate • Piecewise smooth solution: • has less temporal noise • preserves the disparity information as much as possible Solution:

  20. Spatio-temporal Consistency[3] • l1– minimization problem: • f : unknown disparity map • g : initial disparity map from the previous step • D : forward difference operator • : piecewise smooth • : total variation norm • Video Restoration Problem: • g = Hf + Ƞ • f:unknown image(MN) • g : observed image(MN) • H : linear transformation representing convolution operator • Ƞ : noise

  21. Spatio-temporal Consistency[3] • l1– minimization problem: • f : unknown disparity map f(x,y,t) : • Each frame of the video : M rows, N columns • Total: K frames • Stack the entries of f(x,y,z)into a column vector of size MNK x 1 x (M rows) y (N columns) t (K frames)

  22. Spatio-temporal Consistency[3] • l1– minimization problem: • D : forward difference operator : parameters(constants)

  23. Spatio-temporal Consistency[3] • Solve : • : piecewise smooth • : total variation norm • Solve sub-problem:f,u,riteratively [1]S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented lagrangian method for total variation video restoration,” in ICASSP, May 2011

  24. ExperimentalResults

  25. Experimental Results • 5 synthetic videos with ground truth[14] (400300, 64 disparity range) • Compare LASW, Cost-filter, and proposed method • Without post-processing • γs= 17, γm= 1, γI= 3, γH= 20, and τ = 1 • Support window:1111, Census window:7 [14] C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. A. Dodgson, “Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid,” ECCV, 2010.

  26. Experimental Results • Jamie1 from Microsoft i2i database

  27. Experimental Results • Ilkayfrom Microsoft i2i database

  28. Experimental Results • Tunnel

  29. Experimental Results • Performance comparison of methods The average percentage of bad pixels (threshold of 1)

  30. Experimental Results • 19s to compute the disparity map • Can be adopted into a real-time application (by using GPU) • Refinement using the TV method[3]reduces errors in the background (spatial noise and temporal inconsistencies)

  31. Experimental Results • Spatio-Temporal Consistency[3]

  32. Experimental Results • Spatio-Temporal Consistency[3]

  33. Experimental Results • Spatio-Temporal Consistency[3]

  34. Conclusion

  35. Conclusion • Propose an accurate local stereo matching method for video disparity estimation • Motion cue • To obtain more accurate support weight • Modified census transform • To obtain more reliable raw matching costs in flat areas • Spatio-temporal volume • Improve spatial and temporal consistency • It presents the probability for directly extending current image-based disparity algorithms to the video domain

More Related