1 / 10

On Properties of Modulation Spectrum for Robust Automatic Speech Recognition

On Properties of Modulation Spectrum for Robust Automatic Speech Recognition. Noboru Kanedera, Hynek Hermansky, Takayuki Arai ICASSP 1998 2012/07/17 汪逸婷. Outline. Introduction Relative importance of modulation frequencies Automatic speech recognition experiments Effect of filter

mikel
Télécharger la présentation

On Properties of Modulation Spectrum for Robust Automatic Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Properties of Modulation Spectrum for Robust Automatic Speech Recognition Noboru Kanedera, Hynek Hermansky, Takayuki Arai ICASSP 1998 2012/07/17 汪逸婷

  2. Outline • Introduction • Relative importance of modulation frequencies • Automatic speech recognition experiments • Effect of filter • 2-D cepstrum using selected part of the modulation spectrum • Effect of phase • Multi-resolution ASR • conclusions

  3. Introduction • Perceptual experiments indicate that some components of the modulation spectrum are more important for the intelligibility of speech than others. • [5,6] concluded that low-pass filtering below 16Hz or high-pass filtering above 4Hz does not appreciably reduce speech intelligibility. • [14] indicate that most of the useful linguistic information is in modulation frequency components from the range between 1Hz and 16Hz, with the dominant components at around 4Hz. • [5] R. Drullman, J. M. Festen, and R. Plomp (1994), “Effect of temporal envelope smearing on speech perception,” J. Acoust. Soc. Amer., Vol. 95, pp. 1053 – 1064. • [6] R. Drullman, J. M. Festen, and R. Plomp (1994), “Effect of reducing slow temporal modulations on speech perception,” J. Acoust. Soc. Amer., Vol. 95, pp. 2670 – 2680. • [14] N. Kanedera, T. Arai, H. Hermansky, and M. Pavel (1997), “On the importance of variousmodulation frequencies for speech recognition,” Proc. Eurospeech ’97, Rhodes, Greece, pp. 1079 – 1082.

  4. Introduction • In noisy environment, the range below 1Hz is not contributing useful information and the recognition performance can be typically improved by eliminating it from the recognition process. • Such trends of the relative importance of modulation frequencies were unchanged despite changes of recognizers and features used.

  5. Relative importance of modulation frequencies

  6. Experiments- Effect of filter

  7. Experiments- 2-D cepstrum using selected part of the modulation spectrum • The fourier transorm of the time trajectory of cepstrum has become know as 2-D cepstrum. • We extract much lower frequency components of the modulation spectrum, and use only some selected ones.

  8. Experiments- Effect of phase • The results of experiments using only absolute values are worse than results from experiments which used both real or imaginary parts.

  9. Experiments- Multi-resolution ASR • Select range between 2 and 10Hz.

  10. Conclusions • Results indicate that most of the useful linguistic information is in modulation frequency components in the range between 1 and 16Hz, with dominant contributions coming from the range between 2 and 8Hz. • It is important to preserve the phase information of the modulation spectrum. • It may be useful to use some selected parts of the modulation spectrum in multi-stream ASR.

More Related