1 / 29

Mark Hasegawa-Johnson January 29, 2003

Audiovisual Display and Audiovisual Recognition in Free Field Environments: Caves, Cars, and Critical Bands. Mark Hasegawa-Johnson January 29, 2003 Collaborators: Bowon Lee, Camille Goudeseune, Zhinian Jing, Danfeng Li, Thomas Huang, Stephen Levinson. What is “free-field audio?”.

Albert_Lan
Télécharger la présentation

Mark Hasegawa-Johnson January 29, 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audiovisual Display and Audiovisual Recognition in Free FieldEnvironments: Caves, Cars, and Critical Bands Mark Hasegawa-Johnson January 29, 2003 Collaborators: Bowon Lee, Camille Goudeseune, Zhinian Jing, Danfeng Li, Thomas Huang, Stephen Levinson

  2. What is “free-field audio?”

  3. What is “free-field audio?”

  4. Problems of free-field audio • CONTROL: acoustics of the room • CONTROL: where the user stands/sits • CONTROL: background noise

  5. Topic #1: Audio Display for a Six-Walled Virtual Reality Theater (Beckman CUBE)

  6. Audio Demo: the Plywood CUBE

  7. What does the User Hear?

  8. Measuring the Impulse Response of the Room: 2X2 Locations

  9. Measurement of the Impulse Response: Starter Pistol SNR

  10. Measurements: Gun-Related Transfer Function

  11. What do we WANT the User to Hear?

  12. Solution: Regularized Semi-Inversion of Matrix Frequency Response

  13. Image Source Method for Simulating the Room Response

  14. Comparison of Measured and Simulated Room Responses

  15. One More Problem: Image Source Method only accurate for t<100ms

  16. Heuristic solution: Window the simulation with a decaying window

  17. Results: 12dB Dereverberation, 15dB Early Echo Suppression

  18. Topic #2: Audiovisual Speech Recognition in a Moving Car

  19. Solution: Two Cameras, Two Microphones

  20. Lip Motion Tracking(Tao and Huang, 1999)

  21. Audiovisual Speech Recognition: Audio and Visual Integration(Chu and Huang, 2002)

  22. AVSR Results

  23. Two-Microphone Beamforming using an LSE Beamformer

  24. Recognition Features based on Auditory Scene Analysis1: Bandpass Filters on a Semilog Freq Scale (Simulate the Inner Ear)

  25. Recognition Features based on Auditory Scene Analysis: 2. Correlogram

  26. Recognition Features based on Auditory Scene Analysis: Correlogram Band Center Frequency Sub-band Pitch Period

  27. Recognition Features based on Auditory Scene Analysis: 3. Periodicity-Weighted Spectrum(Jing and Hasegawa-Johnson, 2001)

  28. Digit Recognition Results

  29. Conclusions • Image source simulated room response • 12dB dereverberation, 15dB early echo suppression • Speech recognition in a car: Two Cameras, Two Microphones. • Speech Rec w/ Auditory Scene Analysis: error rate halved at 0dB.

More Related