1 / 17

Robust Speech recognition

Robust Speech recognition. V. Barreaud LORIA. Mismatch Between Training and Testing. mismatch influences scores causes of mismatch Speech Variation Inter-Speaker Variation. Robust Approaches. three categories noise resistant features (Speech var.)

sarah
Télécharger la présentation

Robust Speech recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust Speech recognition V. Barreaud LORIA

  2. Mismatch Between Training and Testing • mismatch influences scores • causes of mismatch • Speech Variation • Inter-Speaker Variation

  3. Robust Approaches • three categories • noise resistant features (Speech var.) • speech enhancement (Speech var. + Inter-speaker var.) • model adaptation for noise (Speech var. + Inter-speaker var.) Recognition system Models training Features encoding testing Spk. B Word sequence Spk. A

  4. Contents • Overview • Noise resistant features • Speach enhancement • Model adaptation • Stochastic Matching • Our current work

  5. Noise resistant features • Acoustic representation • Emphasis on less affected evidences • Auditory systems inspired models • Filter banks, Loudness curve, Lateral inhibition • Slow variation removal • Cepstrum Mean Normalization, Time derivatives • Linear Discriminative Analysis • Searches for the best parameterization

  6. Speech enhancement • Parameter mapping • stereo data • observation subspace • Bayesian estimation • stochastic modelization of speech and noise • Template based estimation • restriction to a subspace • output is noise free • various templates and combination methods • Spectral Subtraction • noise and speech uncorrelated • slowly varying noise

  7. Model Adaptation for noise • Decomposition of HMM or PMC • Viterbi algorithm searches in a NxM state HMM • Noise and speech simultaneously recognized • complex noises recognized • State dependant Wiener filtering • Wiener filtering in spectral domain faces non-stationary • Hmms divide speech in quasi-stationary segments • wiener filters specific to the state • Discriminative training • Classical technique trains models independently • error corrective training • minimum classification error training • Training data contamination • training set corrupted with noisy speech • depends on the test environment • lower discriminative scores Training

  8. Stochastic Matching : Introduction • General framework • in feature space • in model space

  9. W n Y G Stochastic Matching : General framework • HMM Models X, X training space • Y ={y1, …, yt}observation in testing space • and

  10. Stochastic Matching : In Feature Space • Estimation step : Auxiliary function • Maximization step

  11. Stochastic Matching : In Feature Space (2) • Simple distorsion function • Computation of the simple bias

  12. Stochastic Matching : In Model Space • random additive bias sequence B={b1,…,bt} independent of speech stochastic process of mean b and diagonal covariance b

  13. On-Line Frame-Synchronous Noise Compensation • Lies on stochastic matching method • Transformation parameter estimated along with optimal path. • Uses forward probabilities Bias computation b1 b2 b3 b4 reco reco reco Transformed observations z2 z3 z4 z5 y4 y2 y3 Sequence of observations

  14. Theoretical framework and issue • On line frame synchronous • cascade of errors • Classical Stochastic Matching 1. Initiate bias of first frame b0=0 2. Compute  and then b 3. Transform next frame with b 4. Goto next frame

  15. Viterbi Hypothesis vs Linear Combination • Viterbi Hypothesis take into account only the « most probable » state and gaussian component.  • Linear combination states t t+1

  16. Experiments • Phone numbers in a running car • Forced Align • transcription + optimum path • Free Align • optimum path • Wild Align • no data

  17. Perspectives • Error recovery problem • a forgetting process • a model of distorsion function • environmental clues • More elaborated transform

More Related