1 / 30

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

34. “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound. Kidron, Schechner, Elad, CVPR 2005. 47. Audio-Visual Analysis: Applications. Lip reading – detection of lips (or person) Slaney, Covell (2000) Bregler, Konig (1994) Analysis and synthesis of music from motion

lana-kane
Télécharger la présentation

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 34 “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR 2005

  2. 47 Audio-Visual Analysis: Applications • Lip reading – detection of lips (or person) • Slaney, Covell (2000) • Bregler, Konig (1994) • Analysis and synthesis of music from motion • Murphy, Andersen, Jensen (2003) • Source separation based on vision • Li, Dimitrova, Li, Sethi (2003) • Smaragdis, Casey (2003) • Nock, Iyengar, Neti (2002) • Fisher, Darrell, Freeman, Viola (2001) • Hershey, Movellan (1999) • Tracking • Vermaak, Gangnet, Blake, Pérez (2001) • Biological systems • Gutfreund, Zheng, Knudsen (2002)

  3. 47 audio-visual analysis microphone camera Problem: Different Modalities Audio data 44.1 KHz, few bands Not stereophonic Visual data 25 frames/sec Each frame: 576 x 720 pixels Kidron, Schechner, Elad, Pixels that Sound

  4. 54 Not Typical • Cluster of pixels - • linear superposition • Canonical Correlation Analysis (CCA) • Smaragdis, Casey (2003) • Li, Dimitrova, Li, Sethi (2003) • Slaney, Covell (2000) Ill-posed (lack of data) • Mutual Information (MI) • Fisher et. al. (2001) • Cutler, Davis (2000) • Bregler,Konig (1994) highly complex Previous Work • Pointwise correlation • Nock, Iyengar, Neti (2002) • Hershey, Movellan (1999)

  5. 49 Pixel #2 Band #2 Band #1 Pixel #1 Pixel #3 CCA Optimal Optimal visual components Projection Projection Video Audio Kidron, Schechner, Elad, Pixels that Sound

  6. 40 Visual Projection v • Video features • Pixels intensity • Transform coeff (wavelet) • Image differences 1D variable 3 40 120 52 68 74 36 859 Projection

  7. 41 Audio Projection a 1D variable • Audio features • Average energy per frame • Transform coeffs per frame Projection

  8. 42 Audio Video Canonical Correlation Representation Projections (per time window) Random variables (time dependent) Correlation coefficient

  9. 43 Canonical Correlation Largest Eigenvalue equivalent to Corresponding Eigenvectors • yield an eigenvalue problem: • Knutsson, Borga, Landelius (1995) CCA Formulation Projections

  10. 51 t (frames) Spatial Location (pixels intensities) Visual Data Kidron, Schechner, Elad, Pixels that Sound

  11. 44 t (frames) Spatial Location (pixels intensities) = Rank Deficiency Kidron, Schechner, Elad, Pixels that Sound

  12. 45 Estimation of Covariance Rank deficient

  13. 46 Impossible to invert !!! Ill-Posedness • Prior solutions: • Use many more frames  poor temporal resolution. • Aggressive spatial pruning  poor spatial resolution. • Trivial regularization

  14. 47 Large number of weights AGeneral Problem Small amount of data The problem is ILL-POSED Over fitting is likely

  15. 48 Minimizing Maximizing An Equivalent Problem

  16. 49 A has a single column, and Known data Minimizing Single Audio Band (The denominator is non-zero)

  17. 52 Full correlation if a(1) a(2) a(ti) a(30) = Time a V Underdetermined system ! Kidron, Schechner, Elad, Pixels that Sound end

  18. 52 “Out of clutter, find simplicity. From discord, find harmony.” Albert Einstein Detected correlated pixels end

  19. 53 • Non-convex • Exponential complexity minimum -norm Sparse Solution

  20. 54 • Sparse • Convex • Polynomial complexity minimum -norm in common situations The -norm criterion Donoho, Elad (2005)

  21. 55 -norm (pseudo-inverse, SVD, QR) Solving using Energy spread minimum -norm The Minimum Norm Solution

  22. 56 Audio-visual events No parameters to tweak Maximum correlation: Eigenproblem Minimum objective function G Linear programming Fully correlated Sparse Polynomial

  23. 57 • Convex • Linear -ball Multiple Audio Bands - Solution The optimization problem: Non-convex constraint

  24. 58 Optimization over each face is: S2 S1 S3 S4 No parameters to tweak Multiple Audio Bands • Each face: linear programming

  25. Frame 9 Frame 42 Frame 68 Frame 115 Frame 146 Frame 169 Sharp & Dynamic, Despite Distraction

  26. Frame 51 Frame 106 Frame 83 Frame 177 Performing in Audio Noise • Sparse • Localization on the proper elements • False alarm – temporally inconsistent • Handling dynamics

  27. 56 –norm: Energy Spread Frame 146 Frame 83 Movie #1 Movie #2

  28. 57 –norm: Localization Frame 146 Frame 83 Movie #1 Movie #2

  29. The “Chorus Ambiguity” Synchronized talk Who’s talking? • Possible solutions: • Left • Right • Both Not unique (ambiguous)

  30. feature 2 feature 2 Both feature 1 feature 1 -norm -norm The “Chorus Ambiguity”

More Related