1 / 17

Presented by Chen-Wei Liu 2004/11/10

Using Linear Interpolation to Improve Histogram Equalization for Speech Recognition Filipp Korkmazsky, Dominique Fohr, Irina Illina LORIA, France, ICSLP 2004. Presented by Chen-Wei Liu 2004/11/10. Outline. Introduction The Effect of Noise What ’ s Interpolated HEQ?

brooke
Télécharger la présentation

Presented by Chen-Wei Liu 2004/11/10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Linear Interpolation to Improve Histogram Equalization for Speech RecognitionFilipp Korkmazsky, Dominique Fohr, Irina IllinaLORIA, France, ICSLP 2004 Presented by Chen-Wei Liu 2004/11/10

  2. Outline • Introduction • The Effect of Noise • What’s Interpolated HEQ? • Linear Interpolation for HEQ • Experimental Results • Observations • Conclusions

  3. Introduction (1/4) • Histogram equalization is a signal normalization technique • Adjusts statistical parameters of the test data • Let CDF for test data match a target CDF for training data • Positive results of HEQ were achieved either along or in combination with other normalization methods • Ex. CMS, CN, SS

  4. Introduction (2/4)

  5. Introduction (3/4) • Lately, new approach was suggested • Instead of considering a single target histogram, two target histograms for silence and speech are estimated • Then, an adapted target histogram is computed for each speaker • By linear interpolation between speech and silence histograms • With a weighted coefficient “a” for silence histogram and “1-a” for speech histogram

  6. Introduction (4/4) • Coefficient “a” is estimated for each test speaker separately as a fraction of silence frames in the speaker data • This is speaker-wise • It’s assumed that the global statistics for the test data doesn’t change rapidly from one test sentence to another • So interpolation can be used to combine local statistics estimated for the test sentence I and global statistics for the test sentences that precede this sentence

  7. Linear Interpolation for HEQ (1/5) • It’s assumed that the environmental conditions don’t change rapidly from one test sentence to another • Combine information form multiple test sentences could provide more accurate estimation of the test conditions

  8. Linear Interpolation for HEQ (2/5) • For histogram equalization we match interpolated testhistogram against a target histogram • is estimated by using all frames of training data • It was proposed using a linear interpolation to get a unique target histogram for each test sentence • By separating silence frames from the speech frames • Forced alignment for training ; two-pass for testing

  9. Linear Interpolation for HEQ (3/5) • According to the ratio of silence/speech, a target cumulative histogram for this sentence can be estimated as follows

  10. Linear Interpolation for HEQ (4/5) • Interpolated test histogram for the test sentence i is estimated as follows

  11. Linear Interpolation for HEQ (5/5) • For example, frame F in dimension D in sentence S Global Silence Histogram Sentence 1 to S-1 Local Frame F Global SpeechHistogram Sentence 1 to S-1 Local Frame F

  12. Experiments (1/4) • The experiments were conducted on VODIS • Speech data was conducted in a moving car • Recording from 200 speakers • 22600 clean sentences for training • 6100 noise sentences for testing • CMS was used to normalize all training and testing data • Each of 39 phones was modeled by a 3-state HMM • Each state was represented by 32 Gaussian mixtures

  13. Experiments (2/4) • Interpolation of HEQ with only one kind of frame

  14. Experiments (3/4) • Interpolation of HEQ with silence/speech frame

  15. Experiments (4/4) • Interpolation of HEQ with multiple speech classes

  16. Observations • Using interpolation for HEQ • Always gives better recognition results • Better results are obtained • When a smaller number of classes is used • When interpolation parameter sets to zero • Larger number of classes leads to better performance • Only a small amount of data is available for test histogram estimation

  17. Conclusions • It was found that • Weighted interpolation between histograms of a test sentence and past test sentences improved speech recognition performance • 49.42%  48.59%  44.85% • It’s not sure that • More classes of speech obtain more improvements • Possible ways of weighted factor • Could be a subject of future research

More Related