160 likes | 285 Vues
This presentation explores advanced methods for speaker separation in automotive environments, focusing on providing individual speech input for each passenger amidst road noise and multiple speakers. We delve into the challenges of blind separation versus informed techniques, evaluating existing algorithms like Conventional Adaptive Signal Processing (CASA) and multi-microphone techniques, including beamforming and blind source separation (BSS). The discussion emphasizes the integration of prior spatial information to optimize separation outcomes and address common issues like target cancellation and permutation ambiguity.
E N D
Adaptive Methods for Speaker Separation in Cars DaimlerChrysler Research and Technology Julien Bourgeois Julien.Bourgeois@daimlerchrysler.com
General context s2(t) s1(t) +Road Noise spatially diffuse Several simultaneous speakers (sources) spatially located x1(t) x4(t) Separation Algorithm Individual speech flows Microphones Goal: provide individual speech input for each passenger
General context s2(t) s1(t) +Road Noise spatially diffuse Mixing system Several simultaneous speakers (sources) spatially located x1(t) x4(t) Separation Algorithm Individual speech flows Microphones Goal: provide individual speech input for each passenger
General context s2(t) s1(t) +Road Noise spatially diffuse Several simultaneous speakers (sources) spatially located Software x1(t) x4(t) Separation Algorithm Individual speech flows Microphones Goal: provide individual speech input for each passenger
Plan of the presentation • Overview of existing methods • Supervised/Informed separation vs. Blind separation • Blind separation and prior spatial information • Conclusion and future work
Existing methods: CASA vs. Multichannel Techniques • CASA: • 1 microphone separation • Heuristics based on an analysis of human auditory system • Requires a lot of data (training of parameters) • Multi-microphones techniques: • Speech moves much faster than… • the coherence relating two (or more) microphones.
Direction of interest Filters output Existing Methods: Beamforming • Beamforming: • Prior information on target position • Constrain the response in the direction of interest • Minimize the output power • Problem of target cancellation if prior spatial info is not perfect.
Independent Outputs Acoustic Mixing Sources BSS Dependent Observations Existing methods: Blind Source Separation • Blind Source Separation (BSS) • First applications to speech separation at the end of the 90’s • Only requirement: statistically independent sources • Difficult optimization problem: maximizing a nonlinear function (independence measure). • With many microphones, target cancellation can also appear. • Permutation ambiguity.
The question is… • Is it possible to merge Beamforming and BSS, and combine their advantages? • In cars, prior knowlegde on speaker positions, separate blindly is suboptimal.
Independent Outputs Acoustic Mixing Sources BSS Dependent Observations Blind separation and prior spatial information Prior info : positions Initialisation of BSS according to speakers positions helps optimisation procedure a lot. • Solve permutations problem solved • Target cancellation problem solved
BSS is not that blind… • BSS performances depends dramatically on the type of mixing • Strictly causal
BSS is not that blind… • BSS performances depends dramatically on the type of mixing • Strictly causal • Non strictly causal
Direction of interest Filters output Beamforming is not that informed… • Perfect prior spatial information is actually not necessary: Target cancellation problem can be solved if one can detect activity/silences of each speaker. • The detection problem is strongly related with IDIAP smart meeting room projects.
Conclusion and future works • Combining BSS with a beamformer is not gainful. • We may inform BSS efficiently in the case of non-causal mixings (algorithmic rotation of the microphone array)
Conclusion and future works • Combining BSS with a beamformer is not gainful. • We may inform BSS efficiently in the case of non-causal mixings (algorithmic rotation of the microphone array)