1 / 28

Noise Reduction in Speech Recognition

Noise Reduction in Speech Recognition. Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06. Outline. Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven

Télécharger la présentation

Noise Reduction in Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06

  2. Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

  3. Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

  4. Mel Frequency Cepstral Coefficients(MFCC) • The most common used feature in speech recognition • Advantages: High accuracy and low complexity 39 dimension

  5. xt(n) Mel filter-bank At(k) Speech signal Pre-emphasis DFT x(n) x’(n) Yt(m) Window energy Log(| |2) et derivatives Yt’(m) yt(j) IDFT MFCC Mel Frequency Cepstral Coefficients(MFCC) • The framework of feature extraction:

  6. Pre-emohasis • Pre-emphasis of spectrum at higher frequencies Pre-emphasis x[n] x’[n]

  7. End-point Detection(Voice activity detection) Speech Noise(silence)

  8. Windowing Rectangle window Hamming window

  9. Mel-filter bank • After DFT we get spectrum amplitude frequency

  10. Mel-filter bank amplitude frequency Triangular shape in frequency(overlaped) Uniformly spaced below 1kHz Logarithmic scale above 1kHz

  11. Delta Coefficients • 1 st/2 nd order differences 13 dimension 39 dimension 1 st order 2 nd order

  12. Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

  13. y[n] W=w1w2...wR x[n] Search Feature Extraction O =o1o2…oT h[n] output sentences input signal original speech feature vectors acoustic reception microphone distortion phone/wireless channel n2(t) n1(t) Text Corpus Speech Corpus Acoustic Models Lexicon Language Model additive noise additive noise convolutional noise Acoustic Models x[n] Feature Extraction Model Training (training) Acoustic Models (recognition) Search and Recognition Feature Extraction y[n] Feature-based Approaches Model-based Approaches Speech Enhancement Mismatch in Statistical Speech Recognition • Possible Approaches for Acoustic Environment Mismatch

  14. Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

  15. Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) P P • Cepstral Mean Substraction(CMS)—Convolutional Noise • Convolutional noise in time domain becomes additive in cepstral domain • y[n] = x[n]h[n]  y = x+h ,x, y, h in cepstral domain • most convolutional noise changes only very slightly for some reasonable time interval x = yh • Cepstral Mean Substraction(CMS) • assuming E[x] = 0 , then E[y] = h • xCMS= yE[y] P(y) P(y) P(x) P(x) CMS

  16. Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) • CMVN: variance normalized as well • xCMVN= xCMS/[Var(xCMS)]1/2 P(y) P(y) P(x) P(x) P(y) P(x) CMS CMVN

  17. Feature-based Approach-HEQ(Histogram Equalization) • The whole distribution equalized • y=CDFy-1[CDFx(x)] P P CDFx CDFy P=0.2 P=0.2 x y 3 3.5

  18. Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

  19. Feature-based Approach-RASTA amplitude f amplitude f Perform filtering on these signals(temporal filtering) modulation frequency

  20. Modulation Frequency (Hz ) Feature-based Approach-RASTA(Relative Spectral Temporal filtering) • Assume the rate of change of noise often lies outside the typical rate of vocal tract shape • A specially designed temporal filter Emphasize speech

  21. Data-driven Temporal filtering • PCA(Principal Component Analysis) y x e

  22. B1(z) B2(z) Original feature stream yt Bn(z) Frame index L zk(1) zk(2) zk(3) Data-driven Temporal filtering • We should not guess our filter, but get it from data filter convolution

  23. Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

  24. Speech Enhancement- Spectral Subtraction(SS) • producing a better signal by trying to remove the noise • for listening purposes or recognition purposes • Noise n[n] changes fast and unpredictably in time domain, but relatively slowly in frequency domain, N(w) amplitude speech amplitude speech noise noise f t

  25. Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

  26. Conclusions • We give a general framework of how to extract speech feature • We introduce the mainstream robustness • There are still numerous noise reduction methods(leave in the reference)

  27. References

  28. Q & A

More Related