1 / 40

Auditory and Visual Spatial Sensing

Auditory and Visual Spatial Sensing. Stan Birchfield Department of Electrical and Computer Engineering Clemson University. Human Spatial Sensing. The five senses:. Seeing. Hearing. f(x,y, l ,t). f(t). Taste. Smell. Touch. Visual and Auditory Pathways.

Télécharger la présentation

Auditory and Visual Spatial Sensing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University

  2. Human Spatial Sensing The five senses: Seeing Hearing f(x,y,l,t) f(t) Taste Smell Touch

  3. Visual and Auditory Pathways

  4. Two Problems inSpatial Sensing Stereo Vision Acoustic Localization

  5. Clemson Vision Laboratory head tracking highway monitoring root detection reconstruction motion segmentation

  6. Clemson Vision Lab (cont.) microphone position calibration speaker localization

  7. Stereo Vision epipolar constraint INPUT Left Right OUTPUT Disparity map Depth discontinuities

  8. Epipolar Constraint world point epipolar line epipolar plane center of projection Left camera Right camera

  9. Energy Minimization Left occluded pixels intensity Right constraint (underconstrained) minimize: discontinuity penalty dissimilarity

  10. History of Stereo Correspondence Birchfield & Tomasi 1998 Geiger et al. 1995 Intille &Bobick 1994 Belhumeur & Mumford 1992 Ohta & Kanade 1985 Baker & Binford 1981 Kolmogorov & Zabih 2001, 2002 Lin & Tomasi 2002 Birchfield & Tomasi 1999 Boykov, Veksler, and Zabih 1998 Roy & Cox 1998 DYNAMIC PROGRAMMING (1D) MULTIWAY-CUT (2D)

  11. c a t c a r t Dynamic Programming: 1D Search c a r t 0 1 2 3 4 penalties: mismatch = 1 insertion = 1 deletion = 1 c 1 0 1 2 3 string editing: a 2 1 0 1 2 3 2 1 1 1 t occlusion RIGHT Disparity map stereo matching: LEFT depth discontinuity

  12. Multiway-Cut:2D Search labels labels pixels pixels [Boykov, Veksler, Zabih 1998]

  13. source label minimum cut sink label Multiway-Cut Algorithm labels pixels pixels (cost of label discontinuity) (cost of assigning label to pixel) Minimizes

  14. Sampling-InsensitivePixel Dissimilarity d(xL,xR) IL IR xL xR Our dissimilarity measure: d(xL,xR) = min{d(xL,xR) ,d(xR,xL)} [Birchfield & Tomasi 1998]

  15. Dissimilarity Measure Theorems Given: An interval A such that [xL – ½ , xL + ½] _ A, and [xR – ½ , xR + ½] _ A If | xL – xR | ≤ ½, then d(xL,xR) = 0| xL – xR | ≤ ½ iff d(xL,xR) = 0 ∩ ∩ Theorem 1: (when A is convex or concave) Theorem 2: (when A is linear)

  16. Correspondence as Segmentation • Problem: disparities (fronto-parallel) O(D)surfaces (slanted) O(Ds2 n)=> computationally intractable! • Solution: iteratively determine which labels to use find affine parameters of regions label pixels multiway-cut (Expectation) Newton-Raphson (Maximization)

  17. Stereo Results (Dynamic Programming)

  18. Stereo Results (Multiway-Cut)

  19. Stereo Results on Middlebury Database image BirchfieldTomasi 1999 Hong-Chen 2004

  20. Multiway-Cut Challenges Dynamic programming Multiway-cut

  21. Acoustic Localization distributed compact Problem: Use microphone signals to determine sound source location • Traditional solutions: • Delay-and-sum beamforming ! • Time-delay estimation (TDE) ! • Recent solutions: • Hemisphere sampling !! • Accumulated correlation !! • Bayesian ! • Zero-energy ! ! efficient! accurate

  22. t - t = t 2 1 Localization Geometry sound source t 1 t 2 t time microphones (one-half hyperboloid)

  23. Principle of Least Commitment “Delay decisions as long as possible” Example: [Marr 1982 Russell & Norvig 1995]

  24. Localization by Beamforming mic 1 signal makes decision late in pipeline (“principle of least commitment”) delay prefilter mic 2 signal delay prefilter q,f find peak sum energy mic 3 signal delay prefilter mic 4 signal delay prefilter delays (shifts) each signal for each candidate location [Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002] ! accurateNOT efficient

  25. Localization by Time-Delay Estimation (TDE) decision is made early mic 1 signal prefilter find peak correlate mic 2 signal prefilter q,f intersect (may be no intersection) mic 3 signal prefilter find peak correlate mic 4 signal prefilter cross-correlation computed once for each microphone pair [Brandstein et al. 1995; Brandstein & Silverman 1997; Wang & Chu 1997] ! efficient NOTaccurate

  26. Localization by Hemisphere Sampling map to common coordinate system mic 1 signal prefilter correlate sampled locus mic 2 signal prefilter correlate final sampled locus … correlate q,f find peak sum correlate correlate temporal smoothing map to common coordinate system mic 3 signal prefilter correlate mic 4 signal prefilter ! efficient ! accurate (but restricted to compact arrays) [Birchfield & Gillmor 2001]

  27. Localization by Accumulated Correlation map to common coordinate system mic 1 signal prefilter correlate sampled locus mic 2 signal prefilter correlate final sampled locus … correlate q,f find peak sum correlate correlate temporal smoothing map to common coordinate system mic 3 signal prefilter correlate mic 4 signal prefilter ! efficient ! accurate [Birchfield & Gillmor 2002]

  28. pair 1: + pair 2: + ... = likelihood Accumulated Correlation Algorithm candidate location microphone

  29. accurate efficient Comparison Beamforming: energy similarity Bayesian: Zero energy: Acc corr: Hem samp: TDE:

  30. accurate efficient Unifying framework

  31. Integration limits Beamforming Bayesian Zero energy Accumulated correlationHemisphere sampling Time-delay estimation

  32. Compact Microphone Array microphone sampled hemisphere d=15cm

  33. Results on compact array pan tilt without PHAT prefilter with PHAT prefilter

  34. More Comparison Accumulated Correlation Beamforming [Birchfield & Gillmor 2002] Hemisphere Sampling [Birchfield & Gillmor 2001]

  35. Results on distributed array

  36. Computational efficiency Computing time per window (ms) (600x faster) (50x faster)

  37. Simultaneous Speakers + =

  38. Detecting Noise Sources background noise source

  39. Connection with Stereo “Multi-baseline stereo” [Okutomi & Kanade 1993]

  40. Conclusion • Spatial sensing achieved by arrays of visual and auditory sensors • Stereo vision • match visual signals from multiple cameras • recent breakthrough: multiway-cut • limitations of multiway-cut • Acoustic localization • match acoustic signals from multiple microphones • recent breakthrough: accumulated correlation • connection with multi-baseline stereo

More Related