1 / 14

Leakage Problems in Array Speech Processing Julien Bourgeois Martigny - September 2003

Leakage Problems in Array Speech Processing Julien Bourgeois Martigny - September 2003. x 1 (t). x 4 (t). Array Processor. Context of the work. Several simultaneous speakers (sources) spatially located. Road Noise spatially diffuse. s 2 (t). s 1 (t). Microphone Array

Télécharger la présentation

Leakage Problems in Array Speech Processing Julien Bourgeois Martigny - September 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leakage Problems in Array Speech Processing Julien Bourgeois Martigny - September 2003

  2. x1(t) x4(t) Array Processor Context of the work Several simultaneous speakers (sources) spatially located Road Noise spatially diffuse s2(t) s1(t) Microphone Array get mixtures of the sources and noise Individual speech flows Recover clean individual speech flows: separate and denoise the sources

  3. Beamforming Beamforming: Minimization of output power with unit gain at the direction (DOA) of the target + robust against noise, sources do not have to be active - array geometry and target location must be known and far-field

  4. Leakage Problem (Beamforming) With echo or source location error: the source signal arrives from another direction than the constrained DOA. The beamformer can produce a zero output... ... and indeed it minimizes the output power. +1 -1 (Constrain) x1 xN 0 (output) In a reverberant environment or by target location error, beamforming can cancel the target signal.

  5. Solution to the Leakage Problem Do not adapt the beamformer when the target is active (the speaker is speaking). Do not speak When the target is off, minimizing the output power will cancel the noise sources. +1 (Constrain) With the constrain, good behavior should be preserved for the target. x1 xN A beamformer needs a voice activity detector (VAD) to control its adaptation. 0

  6. VAD by unknown noise field M = 4 microphones Estimate the target power PTwith a delay-sum beam Estimate noise power PN with M-1 orthogonal beams Voice Activity Detector: VAD(t) = PN (t)/PT (t)(frame-wise)

  7. Noisy Speech (freeze) Noisy Jammer (adapt) Noisy Double Talk (freeze) Realistic scenario (road noise always present) Prior: DOA of the target speaker It can be difficult to discriminate Double-Talk and Talk situations.

  8. Leakage Problem (Beamforming) Is caused by echoic environment (such as a car) target location error calibration error wrong propagation model (far-field) A solution: no adaptation during target activity (speech) requires a voice activity detector is a trade-off between noise tracking and robustness

  9. Blind Source Separation Blind Source Separation: Minimization of a dependence measure + only statistical assumption on the sources (independence) + no prior on the array geometry and sources locations - ambiguities: permutations and scaling at each frequency - not robust against noise, need all sources to be active

  10. Robust Blind Source Separation: Multiple Decorrelation t1 t2 t3 t4 tK Find W s.t. the components of s = W x are decorrelated at several times i.e. such that Rss(tk) = WHRxx (tk) W is diagonal for k = 1,...,K W is found using the gradient descent and is constrained to unity gain.

  11. Leakage Problem (BSS) 2 microphones 3 microphones W initialized to identity

  12. Leakage Problemn (BSS) 4 microphones 8 microphones W initialized to identity

  13. A solution with prior on source locations 8 microphones W initialized to delay-sums at sources locations

  14. Conclusion & Future Plans Leakage Problem Beamformers need to detect who speaks and when (VAD). Double talk is difficult to detect because of low directivity at low frequencies, where speech has more power. For source separation, an unbiased spatial prior (source locations) prevents convergence to zero of the separator. Future Work 1. Set a spatial constrain at low frequencies where location error have little effect. 2. Estimate location of the source at higher frequencies. 3. Is it possible to constructively use the early reflections ? (multiple beamforming, matched filtering)

More Related