1 / 33

Update

Update. May 3 rd , 2010. Outline. Audio spatialization Performance evaluation (source separation) Source separation System overview Demonstration (system) Concentration measure and W-disjoint orthogonality Adaptive time-frequency representation (TFR) Demonstration (adaptive TFR).

bruis
Télécharger la présentation

Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Update May 3rd, 2010

  2. Outline Audio spatialization Performance evaluation (source separation) Source separation System overview Demonstration (system) Concentration measure and W-disjoint orthogonality Adaptive time-frequency representation (TFR) Demonstration (adaptive TFR)

  3. Audio spatialization • Audio spatialization – a spatial rendering technique for conversion of the available audio into desired listening configuration • Analysis – separating individual sources • Re-synthesis – re-creating the desired listener-end configuration

  4. Performance evaluation [1] • ISR = Image to Spatial-distortion Ratio • SIR = Source to Interference Ratio • SAR = Source to Artifacts Ratio • SDR = Source to Distortion Ratio

  5. Performance evaluation • Estimated source image can be decomposed as • true source image, • error components • spatial distortion, • interference, • artifacts,

  6. Performance evaluation

  7. Source separation [2,3] • Source separation – obtaining the estimates of the underlying sources, from a set of observations from the sensors • Time-frequency transform • Source analysis – estimation of mixing parameters • Source synthesis – estimation of sources • Inverse time-frequency representation

  8. Mixing model Figure: Anechoic mixing model – Audio is observed at the microphones with differing intensity and arrival times (because of propagation delays) but with no reverberations Source: P. O. Grady, B. Pearlmutter and S. Rickard, “Survey of sparse and non-sparse methods in source separation,” International Journal of Imaging Systems and Technology, 2005 • Anechoic mixing model • Mixtures, xi • Sources, sj • Under-determined (M < N) • M = Number of mixtures • N = Number of sources

  9. Mixtures Mixtures (stereo) Source 1 Source 2 Source 3

  10. function – TFRStereo Inputs Outputs • Mixture (stereo) • Sampling frequency • DFT size • Window size • Hop size • Mixture TFRs

  11. Time-frequency transform

  12. function – SourceAnalysis Inputs Outputs • Mixture TFRs • 2-D histogram • Mixing parameters

  13. Source analysis(estimation of mixing parameters)

  14. function – SourceSynthesis Inputs Outputs • Mixing parameters • Mixture TFRs • Estimation technique • DUET/LQBP • Estimated source masks • Estimated source TFRs

  15. Source synthesis (estimation of sources)

  16. Source synthesis (estimation of sources)

  17. Source synthesis (estimation of sources)

  18. function – InverseTFR Inputs Outputs • Estimated source TFRs • Sampling frequency • Estimated sources

  19. Inverse time-frequency transform Orig. source 1 Source 1 Source 2 Orig. source 2 Source 3 Orig. source 3

  20. DFT size = 2048 Window size = 50 ms Hop size = 25 ms Sampling frequency = 22050 Hz Demonstration (system) all the values are in dB

  21. Concentration measure • Requirement for source separation • W-disjoint orthogonality • Sparsity is an indicator of WDO [4] • Thus a sparser TFR is expected to satisfy WDO criterion to a greater extent • Commonly used sparsity measures [5] • Kurtosis • Gini Index

  22. Adaptive TFR • Source separation demands (WDO) • Sparse time-frequency representation (TFR) • Some observations • Music/speech signals – different frequency components present at different time instants • Different analysis window lengths provide different sparsity [4] • Therefore, to obtain a sparser TFR • Use that analysis window length for a particular time-instant, which gives highest sparsity [6]

  23. Adaptive TFR

  24. Adaptive TFR

  25. function – TFRStereo(modified) Inputs Outputs • Mixture (stereo) • Sampling frequency • DFT size • Window size • Window size default • Concentration measure • Mixture TFRs • Adapted window sequence

  26. Inverse adaptive TFR • Constraint • TFR should be invertible • Solution • Select analysis windows such that they satisfy constant over-lap add (COLA) criterion [7]

  27. Analysis windows (COLA)

  28. function – InverseTFR(modified) Inputs Outputs • Estimated source TFRs • Sampling frequency • Adapted window sequence • Window size default • Estimated sources

  29. Demonstration (adaptive TFR) all the values are in dB

  30. Demonstration (adaptive TFR) all the values are in dB

  31. References E. Vincent, R. Gribonval and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech and Language Processing, 2006 A. Jourjine, S. Rickard and O. Yilmaz, “Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures,” IEEE Conference on Acoustics, Speech and Signal Processing, 2000 R. Saab, O. Yilmaz, M. J. Mckeown and R. Abugharbieh, “Underdetermined anechoic blind source separation via lq basis pursuit with q<1,” IEEE Transactions on Signal Processing, 2007

  32. References S. Rickard, “Sparse sources are separated sources,” European Signal Processing Conference, 2006 N. Hurley and S. Rickard, “Comparing measures of sparsity,” IEEE Transactions on Information Theory, 2009 D. L. Jones and T. Parks, “A high resolution data-adaptive time-frequency representation,” IEEE Transactions on Acoustics, Speech and Signal Processing, 1990 P. Basu, P. J. Wolfe, D. Rudoy, T. F. Quatieri and B. Dunn, “Adaptive short-time analysis-synthesis for speech enhancement,” IEEE Conference on Acoustics, Speech and Signal Processing, 2008

  33. Thank you Questions ?

More Related