1 / 179

A Tutorial on Bayesian Speech Feature Enhancement

SCALE Workshop, January 2010. A Tutorial on Bayesian Speech Feature Enhancement. Friedrich Faubel. I. Motivation. Speech Recognition System Overview. A speech recognition system converts speech to text. It basically consists of two components:

evita
Télécharger la présentation

A Tutorial on Bayesian Speech Feature Enhancement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SCALE Workshop, January 2010 A Tutorial on Bayesian Speech Feature Enhancement Friedrich Faubel

  2. I Motivation

  3. Speech Recognition SystemOverview • A speech recognition system converts speech to text. It basically consists of two components: • Front End: extracts speech features from the audio signal • Decoder: finds that sentence (sequence of acoustical states), which is the most likely explanation for the observed sequence of speech features Front End Decoder Text Speech

  4. Speech Feature ExtractionWindowing

  5. Speech Feature ExtractionWindowing

  6. Speech Feature ExtractionWindowing

  7. Speech Feature ExtractionWindowing

  8. Speech Feature ExtractionTime Frequency Analysis • Performing spectral analysis separately for each frame yields a time-frequency representation

  9. Speech Feature ExtractionTime Frequency Analysis • Performing spectral analysis separately for each frame yields a time-frequency representation

  10. Speech Feature ExtractionPerceptual Representation • Emulation of the logarithmic frequency and intensity perception of the human auditory system

  11. Background Noise • Background noise distorts speech features • Result: features don’t match the features used during training • Consequence: severely degraded recognition performance

  12. Overview of the Tutorial I - Motivation II - The effect of noise to speech features III - Transforming probabilities IV - The MMSE solution to speech feature enhancement V - Model-based speech feature enhancement VI - Experimental results VII - Extensions

  13. II Interaction Function The Effect of Noise

  14. Interaction Function • Principle of Superposition: signals are additive noise clean speech noisy speech = +

  15. Interaction Function • In the signal domain we have the following relationship: noisy speech noise clean speech

  16. Interaction Function • In the signal domain we have the following relationship:

  17. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes:

  18. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  19. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  20. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  21. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  22. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  23. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  24. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  25. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  26. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  27. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  28. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  29. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  30. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  31. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  32. Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

  33. Interaction Function • Taking the magnitude square on both sides, we get:

  34. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have:

  35. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: phase term

  36. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase

  37. Interaction Function • The relative phase between two waves describes their relative offset in time (delay) time relative phase

  38. Interaction Function • When 2 sound sources are present the following can happen: = = amplification amplification = = cancellation attenuation

  39. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase

  40. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: zero in average

  41. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

  42. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

  43. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

  44. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

  45. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: Acero, 1990

  46. Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: But is that really right?

  47. Interaction Function • The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform

  48. Interaction Function • The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform

  49. Interaction Function • Phase-averaged relationship between clean and noisy speech:

  50. III Transforming Probabilities

More Related