1 / 28

Approaches of Interest in Blind Source Separation of Speech

Approaches of Interest in Blind Source Separation of Speech. Julien Bourgeois D AIMLER C HRYSLER AG Research and Technology , RIC/AD. 1. Background. - Need of speech-based Human-Machine Interface in cars.

vkratzer
Télécharger la présentation

Approaches of Interest in Blind Source Separation of Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approaches of Interest in Blind Source Separation of Speech Julien Bourgeois DAIMLERCHRYSLER AG Research and Technology, RIC/AD 1

  2. Background - Need of speech-based Human-Machine Interface in cars. - Road noise, passengers speech create adverse conditions to Automatic Speech Recognition. 2

  3. 4 Approaches to the Cocktail Party Problem 1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future plans 3

  4. Computational Auditory Scene Analysis (CASA) Generalities Aim: get an algorithmic description of higher auditory functions. Strong biological inspiration. One or two sensors (microphones) are considered. Mic signal is filtered like in a human ear. Variations on a Segmentation - Grouping scheme. 4

  5. CASA - Segmentation Frequency Index Time Segmentation is based on temporal continuity. 5

  6. CASA - Grouping Frequency Index Time Grouping rules are (1) harmonicity and (2) synchronous start or end. These rules agree with certain psychoacoustical phenomena. 6

  7. CASA - Audio example mixture separated

  8. 4 Approaches to the Cocktail Party Problem 1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future plans

  9. Sparse Decomposition - Generalities 2 sensors x1 and x2 of N acoustic sources si are given. Aim : Find an invertible transform T so that the N sources are disjoint in the transformed domain. DUET : T = STFT works !! (Windowed Short Term Fourier Transform) Indeed, statistically S1(w,t)S2(w,t) is small. 7

  10. Angle(X1(w,t)/X2(w,t))/w [Group delay] Group delay 1 Group delay 2 Sparse Decomposition - DUET Assumption : “At each point (w,t) of the spectrogram, only one source is active.” Which source Si is active at (w,t) ? Look at the phase between X1(w,t) and X2(w,t). Frequency Index Time Then set Si(w,t) = X1(w,t) 8

  11. Sparse Decomposition - Audio Example Mix 1 Mix 2 Out 1 Out 2

  12. 4 Approaches to the Cocktail Party Problem 1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future plans

  13. Statistical Blind Source Separation Assumption: “The sources are decorrelated.” or “The sources are independent.” ICA = Independent Component Analysis Generally needs (at least) as many sensors as sources. Permutation and scale ambiguities: If s1 and s2 are independent, so are s2andb s1 9

  14. Statistical Blind Source Separation Mixture model: x(n) = A(0)s(n) + ... + A(K)s(n-K) = A*s (n) (TF) X(w,t) = A(w)S(w,t) Separation filters W: find W(w)so that the components of Y(w,t) = W(w)X(w,t) are independent or decorrelated. (Y estimates the sources S). For a decorrelation criterion, the output Y is decorrelated at each t. One can find W minimizing the off-diagonal terms of RYY(w,t) = E[Y(w,t)YH(w,t)] jointly for all t. 10

  15. Statistical Blind Source Separation Very few assumption on the sources. But: In frequency domain, the ambiguities occur independently at each frequency bin w. Can be CPU-expensive because of iterative optimization. 11

  16. Statistical Blind Source Separation Audio example Mix 1 Mix 2 Out 1 Out 2

  17. 4 Approaches to the Cocktail Party Problem 1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future Plans

  18. Beamforming - Array signal processing Spatial locations of the sources (direction of arrival - D.O.A.) are mapped on delays between sensors. Array signal processing addresses 3 estimation problems: 1) number of sources, 2) their spatial locations, 3) spatial filtering. Can require more sensors than sources, depending on the spatial resolution. s1 s2 x1 xi xN xi(t) = s1(t-d1,i ) + s2(t-d2,i ) 12

  19. Beamforming - Source Location 1/Energy-Based: Search for the delays dithat maximizesy2 y(t) = x1(t+d1 ) + ... + xN(t+dN ) [output of a delay-sum beamformer] 2/Correlation Based:Search for the delay d that maximizes E[xi (t)xj (t-d )], for some pairs (i,j) 3/ High Resolution:X(w,t) = A(w)S(w,t) The eigendecomposition ofRXX=A RSS AHprovides information on A, i.e. on the source location. diagonal if the sources are decorrelated 13

  20. x1 xi xN ... ... dN d1 di F1 ... Fi ... FN + Beamforming - Spatial Filtering direction of interest 1/Data-Independant: e.g. delay sum beamforming 2/Statistically optimal: Constrain the response in the direction of interest and minimize the output power 14

  21. Beamforming - Audio example Mix 1 Mix 2 Out 1 Out 2

  22. 4 Approaches to the Cocktail Party Problem 1 - Computational Auditory Scene Analysis (CASA) 2 - Sparse Decomposition Approach 3 - Statistical Blind Source Separation 4 - Beamforming Conclusion & Future plans

  23. Conclusion & Questions Different definitions of “source”. Perceptual,Topological, Statistical, Spatial: Complementary approaches. No perfect solution to the cocktail party problem. 15

  24. Future plans in Hoarse Combination of existing methods: DUET if the sources are disjoint ICA or beamforming if they overlap Investigation of specific open questions Estimation of the number of sources at each (w,t) point. Sparse Decomposition: Optimal transform T ? Extension to more than 2 mics ? Theoretical Boundaries ? Equivalencies between these approaches (e.g. Second Order BSS and Beamforming) ? 16

  25. Short Bibliography CASA Guy J Brown, Martin Cooke. Computational Auditory Scene Analysis. Computer Speech and Language, vol. 8, no. 4, pp. 297-336, 1994. A. S. Bregman. “Auditory Scene Analysis”, MIT Press, Cambridge, MA, 1990. Guoning Hu and DeLiang Wang, Monaural speech separation, NIPS 2002

  26. Short Bibliography Sparse Decomposition - DUET M. Zibulevsky, B. A. Pearlmutter, P. Bofill, and P. Kisilev, "Blind Source Separation by Sparse Decomposition", chapter in the book: S. J. Roberts, and R.M. Everson eds., Independent Component Analysis: Principles and Practice, Cambridge, 2001. O. Yilmaz and S. Rickard, Blind Separation of Speech Mixtures via Time-Frequency Masking, Submitted to the IEEE Transactions on Signal Processing, November 4, 2002 Jourjine, S. Rickard, and O. Yilmaz, Blind Separation of Disjoint Orthogonal Signals: Demixing N Sources from 2 Mixtures, Proceedings of the 2000 IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP2000), Volume 5, Pages 2985-2988, Istanbul, Turkey, June 2000

  27. Short Bibliography Statistical Blind Source Separation - ICA Lucas Parra, Clay Spence, "Convolutive blind source separation of non-stationary sources", IEEE Trans. on Speech and Audio Processing pp. 320-327, May 2000 Te-Won Lee, Independent Component Analysis: Theory and Applications Kluwer Academic Publishers, September 1998

  28. Short Bibliography Beamforming B.D. van Veen and K.M. Buckley, ``Beamforming: A Versatile Approach to Spatial Filtering,'' IEEE ASSP Magazine, vol.5, pp. 4-24, Apr. 1988. M. Brandstein and H. Silverman, "A practical methodology for speech source localization with microphone arrays," Computer, Speech and Language, vol. 11, no. 2, pp. 91--126, 1997. D. Ward and M. Brandstein (Eds.), 'Microphone Arrays: Techniques and Applications', Springer, Berlin, 2001, pp. 231-256.

More Related