Presented by Chen-Wei Liu 2004/11/10

Using Linear Interpolation to Improve Histogram Equalization for Speech RecognitionFilipp Korkmazsky, Dominique Fohr, Irina IllinaLORIA, France, ICSLP 2004 Presented by Chen-Wei Liu 2004/11/10

Outline • Introduction • The Effect of Noise • What’s Interpolated HEQ? • Linear Interpolation for HEQ • Experimental Results • Observations • Conclusions

Introduction (1/4) • Histogram equalization is a signal normalization technique • Adjusts statistical parameters of the test data • Let CDF for test data match a target CDF for training data • Positive results of HEQ were achieved either along or in combination with other normalization methods • Ex. CMS, CN, SS

Introduction (2/4)

Introduction (3/4) • Lately, new approach was suggested • Instead of considering a single target histogram, two target histograms for silence and speech are estimated • Then, an adapted target histogram is computed for each speaker • By linear interpolation between speech and silence histograms • With a weighted coefficient “a” for silence histogram and “1-a” for speech histogram

Introduction (4/4) • Coefficient “a” is estimated for each test speaker separately as a fraction of silence frames in the speaker data • This is speaker-wise • It’s assumed that the global statistics for the test data doesn’t change rapidly from one test sentence to another • So interpolation can be used to combine local statistics estimated for the test sentence I and global statistics for the test sentences that precede this sentence

Linear Interpolation for HEQ (1/5) • It’s assumed that the environmental conditions don’t change rapidly from one test sentence to another • Combine information form multiple test sentences could provide more accurate estimation of the test conditions

Linear Interpolation for HEQ (2/5) • For histogram equalization we match interpolated testhistogram against a target histogram • is estimated by using all frames of training data • It was proposed using a linear interpolation to get a unique target histogram for each test sentence • By separating silence frames from the speech frames • Forced alignment for training ; two-pass for testing

Linear Interpolation for HEQ (3/5) • According to the ratio of silence/speech, a target cumulative histogram for this sentence can be estimated as follows

Linear Interpolation for HEQ (4/5) • Interpolated test histogram for the test sentence i is estimated as follows

Linear Interpolation for HEQ (5/5) • For example, frame F in dimension D in sentence S Global Silence Histogram Sentence 1 to S-1 Local Frame F Global SpeechHistogram Sentence 1 to S-1 Local Frame F

Experiments (1/4) • The experiments were conducted on VODIS • Speech data was conducted in a moving car • Recording from 200 speakers • 22600 clean sentences for training • 6100 noise sentences for testing • CMS was used to normalize all training and testing data • Each of 39 phones was modeled by a 3-state HMM • Each state was represented by 32 Gaussian mixtures

Experiments (2/4) • Interpolation of HEQ with only one kind of frame

Experiments (3/4) • Interpolation of HEQ with silence/speech frame

Experiments (4/4) • Interpolation of HEQ with multiple speech classes

Observations • Using interpolation for HEQ • Always gives better recognition results • Better results are obtained • When a smaller number of classes is used • When interpolation parameter sets to zero • Larger number of classes leads to better performance • Only a small amount of data is available for test histogram estimation

Conclusions • It was found that • Weighted interpolation between histograms of a test sentence and past test sentences improved speech recognition performance • 49.42%  48.59%  44.85% • It’s not sure that • More classes of speech obtain more improvements • Possible ways of weighted factor • Could be a subject of future research

Presented by Chen-Wei Liu 2004/11/10