Leakage Problems in Array Speech Processing

Leakage Problems in Array Speech Processing Julien Bourgeois Martigny - September 2003

x1(t) x4(t) Array Processor Context of the work Several simultaneous speakers (sources) spatially located Road Noise spatially diffuse s2(t) s1(t) Microphone Array get mixtures of the sources and noise Individual speech flows Recover clean individual speech flows: separate and denoise the sources

Beamforming Beamforming: Minimization of output power with unit gain at the direction (DOA) of the target + robust against noise, sources do not have to be active - array geometry and target location must be known and far-field

Leakage Problem (Beamforming) With echo or source location error: the source signal arrives from another direction than the constrained DOA. The beamformer can produce a zero output... ... and indeed it minimizes the output power. +1 -1 (Constrain) x1 xN 0 (output) In a reverberant environment or by target location error, beamforming can cancel the target signal.

Solution to the Leakage Problem Do not adapt the beamformer when the target is active (the speaker is speaking). Do not speak When the target is off, minimizing the output power will cancel the noise sources. +1 (Constrain) With the constrain, good behavior should be preserved for the target. x1 xN A beamformer needs a voice activity detector (VAD) to control its adaptation. 0

VAD by unknown noise field M = 4 microphones Estimate the target power PTwith a delay-sum beam Estimate noise power PN with M-1 orthogonal beams Voice Activity Detector: VAD(t) = PN (t)/PT (t)(frame-wise)

Noisy Speech (freeze) Noisy Jammer (adapt) Noisy Double Talk (freeze) Realistic scenario (road noise always present) Prior: DOA of the target speaker It can be difficult to discriminate Double-Talk and Talk situations.

Leakage Problem (Beamforming) Is caused by echoic environment (such as a car) target location error calibration error wrong propagation model (far-field) A solution: no adaptation during target activity (speech) requires a voice activity detector is a trade-off between noise tracking and robustness

Blind Source Separation Blind Source Separation: Minimization of a dependence measure + only statistical assumption on the sources (independence) + no prior on the array geometry and sources locations - ambiguities: permutations and scaling at each frequency - not robust against noise, need all sources to be active

Robust Blind Source Separation: Multiple Decorrelation t1 t2 t3 t4 tK Find W s.t. the components of s = W x are decorrelated at several times i.e. such that Rss(tk) = WHRxx (tk) W is diagonal for k = 1,...,K W is found using the gradient descent and is constrained to unity gain.

Leakage Problem (BSS) 2 microphones 3 microphones W initialized to identity

Leakage Problemn (BSS) 4 microphones 8 microphones W initialized to identity

A solution with prior on source locations 8 microphones W initialized to delay-sums at sources locations

Conclusion & Future Plans Leakage Problem Beamformers need to detect who speaks and when (VAD). Double talk is difficult to detect because of low directivity at low frequencies, where speech has more power. For source separation, an unbiased spatial prior (source locations) prevents convergence to zero of the separator. Future Work 1. Set a spatial constrain at low frequencies where location error have little effect. 2. Estimate location of the source at higher frequencies. 3. Is it possible to constructively use the early reflections ? (multiple beamforming, matched filtering)

Leakage Problems in Array Speech Processing

Leakage Problems in Array Speech Processing

Presentation Transcript

Speech Processing

Speech Processing

Array Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Array Processing

Speech Processing

Speech Processing

Speech Processing

Array Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing

Speech Processing