1 / 38

HIWIRE MEETING Athens, November 3-4, 2005

HIWIRE MEETING Athens, November 3-4, 2005. José C. Segura, Ángel de la Torre. Schedule. HIWIRE database evaluations Non-linear feature normalization ECDF segmental implementation Parametric equalization Robust VAD Bispectrum-based VAD Model-based feature compensation

akina
Télécharger la présentation

HIWIRE MEETING Athens, November 3-4, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HIWIRE MEETINGAthens, November 3-4, 2005 José C. Segura, Ángel de la Torre

  2. Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

  3. HIWIRE database evaluations • PARAMETERS: MFCC_0_D_A_Z (39 component) • MODELS: • TIMIT: 46 phone models / 3 states / 128 Gaussians (17.664 G) • WSJ16k: 16.825 triphones / 3.608 tied-states / 6 Gaussians (21.648 G) • WSJ16kFon: 40 phone models / 3 states / 128 Gaussians (15.360 G) • ADAPTATION: • MLLR: 32 regression classes / 50 adaptation utterances • GRAMMAR: • LORIA & Word-Loop • MODIFICATIONS: Some transcriptions have been modified to match the grammar definition

  4. Transcription modifications BEGIN { lista = LISTA; nfrase = 0; } { linea=$0; gsub("-","_",linea); gsub("Due_to_","Due_to ",linea); gsub("Mayday_Mayday","Mayday Mayday",linea); gsub("Pan_Pan","Pan Pan",linea); gsub("three hundred twenty","three_hundred_twenty",linea); gsub("one hundred sixty","one_hundred_sixty",linea); printf("%s\n",tolower(linea)); nfrase = nfrase+1; }

  5. RESULTS WITHOUT ADAPTATION (WER) RESULTS WITH MLLR (WER) HIWIRE database results

  6. Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

  7. ECDF segmental implementation • ECDF segmental implementation • Provided LOQUENDO with a reference “C” implementation of segmental Gaussian transformation to be tested within LOQUENDO recognizer • Current work • Nonlinear feature transformation with a clean reference to avoid the problem of system retraining

  8. Parametric Equalization (1) PARAMETRIC NONLINEAR FEATURE EQUALIZATION FOR ROBUST SPEECH RECOGNITION (submitted ICASSP’06) • HEQ limitations • Influence of relative amount of silence in utterances • With a parametric model, a more robust equalization can be obtained

  9. CLASS-DEPENDENT LINEAR EQUALIZATION SOFT DECISSION VAD (two-class Gaussian classifier on C0)NONLINEAR INTERPOLATION Parametric Equalization (2)

  10. Parametric Equalization (3)

  11. Parametric Equalization (4) • In comparison with HEQ, PEQ transformations are smoother • For C0 a monotonic transformation is obtained • For other coefficients, the interpolated transformation is not monotonic

  12. Parametric Equalization (5) • BASE • MFCC_0_D_A_Z (39 component) • HEQ • Quantile based CDF-transformation • Clean reference • Implemented over MFCC_0 / CMS and regressions computed after HEQ • AFE • Standard implementation • PEQ • Clean reference • Implemented over MFCC_0 / CMS and regressions computed after PEQ

  13. Parametric Equalization (6) • Current work • Development of an on-line version • Relax the diagonal covariance assumption • Investigate the normalization of dynamic features • Using a more detailed model of speech frames • (i.e. More than one Gaussian)

  14. Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation (LOQ) • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

  15. Bispectrum-based VAD (1) • Motivations: • Ability of higher order statistics to detect signals in noise • Polyspectra methods rely on an a priori knowledge of the input processes • Issues to be addressed: • Computationally expensive • Variance of the bispectrum estimators is much higher than that of power spectral estimators for identical data record size • Solution: Integrated bispectrum • J. K. Tugnait, “Detection of non-Gaussian signals using integrated polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994. • Computationally efficient and reduced variance statistical test based on the integrated polyspectra • Detection of an unknown random, stationary, non-Gaussian signal in Gaussian noise

  16. Bispectrum-based VAD (2) • Integrated bispectrum: • Defined as a cross spectrum between the signal and its square, and therefore, it is a function of a single frequency variable • Benefits: • Its computation as a cross spectrum leads to significant computational savings • The variance of the estimator is of the same order as that of the power spectrum estimator • Properties • Bispectrum of a Gaussian process is identically zero, its integrated bispectrum is as well

  17. Bispectrum-based VAD (3) • Two alternatives explored for formulating the decision rule: • Estimation by block averaging: • MO-LRT • Given a set of N= 2m+1 consecutive observations:

  18. Likelihoods Bispectrum-based VAD (4) • Variances

  19. Bispectrum-based VAD results (1)

  20. Bispectrum-based VAD results (2)

  21. Bispectrum-based VAD results (3)

  22. Schedule • HIWIRE database evaluations • Non-linear feature normalization • ECDF segmental implementation (LOQ) • Parametric equalization • Robust VAD • Bispectrum-based VAD • Model-based feature compensation • VTS results on AURORA4 • Including uncertainty caused by noise

  23. Schedule • Model-based feature compensation • VTS: results on AURORA4 • VTS formulation • VTS vs non linear feature normalization procedures • VTS results on AURORA 4 • Including uncertainty caused by noise • Including uncertainty in noise compensation • Wiener filtering + uncertainty: results on Aurora 2 • Wiener filtering + uncertainty: results on Aurora 4 • VTS + uncertainty: formulation • Numerical integration of probabilities: formulation

  24. VTS formulation • VTS: Vector Taylor Series approach to remove additive (and channel) noise • References: • P.J. Moreno. “Speech recognition in noisy environments” Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, Pensilvania, Apr. 1996. • A. de la Torre. “Técnicas de mejora de la representación en los sistemas de reconocimiento automático del habla” Ph.D. Thesis, University of Granada, Spain, Apr. 1999.

  25. VTS formulation • VTS provides an estimation of the clean speech in a statistical framework: • Log-FBO domain, assumed additive noise: • Effect of noise described using the “correction function” g():

  26. VTS formulation • Auxiliary functions f() and h(): 1st and 2nd derivatives: • VTS provides estimation of noisy-speech Gaussian given the clean-speech and the noise Gaussians: • Noisy-speech Gaussian obtained with the expected values:

  27. VTS formulation • Noisy-speech Gaussian: formulas: • Models for noise and clean speech:

  28. VTS formulation • Model for clean speech provides the model for noisy speech, and also P(k|y) (posterior probability of each Gaussian): • Estimation of clean speech:

  29. VTS vs non-linear feature normalization • VTS: • Statistical framework: • Model for noise in log-FBO domain: 1 Gaussian PDF • Model for clean-speech in log-FBO domain: Gaussian mixture • Noise assumed to be additive in FBO domain • Accurate description of noise process ACCURATE COMPENSATION • Non-linear feature normalization: • No a-priori assumption • Component-by-component MORE FLEXIBLE, LESS ACCURATE

  30. VTS results on AURORA 4

  31. Including uncertainty in noise compensation • Noise is a random process: we do not know n, but p(n) • Then, from an observation y we cannot find x, but p(x|y,x,n) • Usually, compensation procedures provide E[x|y,x,n] • What about uncertainty of x ? • Mean and variance of x :

  32. Including uncertainty in noise compensation

  33. Including uncertainty in noise compensation • An approach for the estimation of the variance: • Evaluation of HMM Gaussians:

  34. Wiener filt. + uncertainty: results on AURORA 2 • Preliminary results with Wiener filtering: • Results on Aurora 2 with Wiener filtering + uncertainty

  35. Wiener filter + uncertainty: results on AURORA 4

  36. VTS + uncertainty: formulation • VTS based estimation of clean speech: • VTS based estimation of variance:

  37. Numerical integration of probabilities: formulation • Computation of expected values: • Numerical integration of expected values:

  38. HIWIRE MEETINGAthens, November 3-4, 2005 José C. Segura, Ángel de la Torre

More Related