1 / 27

Speech Enhancement for ASR

Speech Enhancement for ASR. by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim ,etc., ” Multi-Channel Signal Separation by Decorrelation ” ,IEEE Trans. on ASSP,405-413,1993 2.Yunxin Zhao,etc., ” Adaptive Co-channel Speech Separation and

roza
Télécharger la présentation

Speech Enhancement for ASR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim ,etc.,”Multi-Channel Signal Separation by Decorrelation”,IEEE Trans. on ASSP,405-413,1993 2.Yunxin Zhao,etc.,”Adaptive Co-channel Speech Separation and Recognition”,IEEE Trans. On SAP,138-151,1999 3.Ing Yang Soon,etc.,”Noisy Speech Enhancement Using Discrete Cosine Transform”,Speech communication,249-257,1998

  2. Outline • Signal Separation by S-ADF/LMS • Speech Enhancement by DCT • Residual Signal Reduction • Experimental Results

  3. Speech Signal Separation • Introduction: -To Recover the desired signal and identify the unknown system from the observation signal -Speech signal recovered from SSS will increase SNR and improve the speech recognition accuracy -Specifically consider the two-channel case

  4. Two-channel model description A and Bare cross-coupling effect between channels and we ignore the transfer function of each channel. xi(t) is source signal and yi(t) is acquired signal SSS cont’d

  5. SSS (cont’d) • Source separation system (separate source signals out from acquired signals) and called decoupling filters and modeled as FIR filter

  6. SSS by ADF • Calculate the FIR coeff. by adaptive decorre- lation filter(ADF) proposed by A. V. Oppenheim in 1993 -The objective is to design decoupling filter s.t., the estimated signals are uncorrelated. -The decoupling filtering coeff.’s are estimated iteratively based on the previous estimated filter coeff.’s and current observations

  7. SSS by ADF (cont’d) • The closed form of decoupling filters where

  8. SSS by ADF (cont’d) • Choice of adaptation gain -As time goes to infinite the adaptation gain goes to zero for the system stable consideration. -Optimal choice adaptation gain for the system stability and convergence. -

  9. SSS by ADF (cont’d) • The experiment of :

  10. Source Signal Detection(SSD) • Introduction -If one of the two is inactive then the estimated signals will be poor by ADF and cause the recog- nition errors. -So the ASR and ADF are performed within active region of each target signal.

  11. SSD (cont’d)

  12. SSD (cont’d) • SSD by coherence function If then If then

  13. SSD (cont’d) - decision variable -Decision Rule:

  14. SSD (cont’d) -Implementation using DFT and Result

  15. SSD (cont’d)

  16. Improved Filter Estimation • Widrow’s LMS algorithm proposed in 1975 -If we don’t know A or B in observation(i.e., one of the source signals is inactive) then the estimation of filters will cause much errors compared to the actual filters. -If we know source signal 2 is inactive(using SSD) then we only estimate filter B and remain filter A unchanged.

  17. Improved Filter Estimation • LMS algorithm and result

  18. Experimental Results -Evaluate in terms of WRA and SIR

  19. Experimental Result (cont’d) *Use 717 TIMIT sentences to train 62 phone units. Front-end feature is PLP and its dynamic. Grammar perplexity is 105. After acoustic normalization

  20. Speech Enhancement usingDiscrete Cosine Transform • Motivation -DCT provides significantly higher compaction as compared to the DFT

  21. SE Using DCT (cont’d) -DCT provides higher spectral resolution than DFT -DCT is real transform so it has only binary phases. Its phase won’t be changed unless added noise is strong.

  22. Estimating signal by MMSE • Intorduction -y(t)=x(t)+n(t) and Y(k)=X(k)+N(k) Assume DCT coeff.’s are statistically independent and estimated signal is less diffenent from the original signal. - , by Bayes’ rule and signal model

  23. MMSE (cont’d) • Estimating signal source by Decision Directed Estimation(DDE) (proposed by Ephraim & Malah in ‘84) = 0.98 in computer simulation

  24. Reduction of Residual Signal • Introduction -If the source signal more likely exists then the estimated is more reliable. -two states of inputs H0:speech absent H1:speech present : modified filter output

  25. Reduction of Residual Signal - where

  26. Experimental Results Measure in Segmental SNR White noise added Fan noise added

  27. Experimental Results

More Related