1 / 20

Covariation and weighting of harmonically decomposed streams for ASR

Covariation and weighting of harmonically decomposed streams for ASR. Introduction Pitch-scaled harmonic filter Recognition experiments Results Conclusion. Production of /z/:. periodic. aperiodic. Motivation and aims.

huong
Télécharger la présentation

Covariation and weighting of harmonically decomposed streams for ASR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Covariation and weighting of harmonically decomposed streams for ASR • Introduction • Pitch-scaled harmonic filter • Recognition experiments • Results • Conclusion Production of /z/: periodic aperiodic

  2. Motivation and aims • Most speech sounds are either voiced or unvoiced, which have very different properties: • voiced: quasi-periodic signal from phonation • unvoiced: aperiodic signal from turbulence noise • Do these properties allow humans to recognize speech in noise? Maybe, we can use this information to help ASR... by computing separate features for the two parts. • Are their two contributions complementary? INTRODUCTION http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  3. Voiced and unvoiced parts of a speech signal Production of /z/: periodic contribution aperiodic contribution INTRODUCTION http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  4. speech waveform pitch extraction optimised pitch f0raw Nopt re-splicing pitch optimisation f0opt ^ ^ u(n) v(n) Pitch-scaled harmonic filter s(n) time shifting . . . PSHF PSHF PSHF aperiodic waveform periodic waveform METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  5. Decomposition example (waveforms) Original Periodic Aperiodic METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  6. Decomposition ex. (spectrograms) Original Periodic Aperiodic METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  7. Decomposition ex. (MFCC specs.) Original Periodic Aperiodic METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  8. Speech database: Aurora 2.0 • From TIdigits database of connected English digit strings (male & female speakers), filtered with G.712 at 8 kHz. TRAIN TEST METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  9. Description of the experiments • Baseline experiment: [base] • standard parameterisation of the original waveforms (i.e., MFCC,+Δ,+ΔΔ) • PCA experiments: [pca26, pca78, pca13 and pca39] • decorrelation of the feature vectors, and reduction of the number of coefficients • Split experiments: [split, split1] • adjustment of stream weights (periodic vs. aperiodic) Caveat: pitch values were derived from clean speech files, for entire database! METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  10. waveform features BASE: MFCC +Δ, +Δ2 SPLIT1: SPLIT: MFCC MFCC +Δ, +Δ2 +Δ, +Δ2 PSHF PSHF cat cat PCA26: MFCC PSHF +Δ, +Δ2 cat PCA PCA78: MFCC +Δ, +Δ2 PSHF cat PCA PCA13: MFCC +Δ, +Δ2 PSHF cat PCA PCA39: MFCC +Δ, +Δ2 PSHF cat PCA Parameterisations METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  11. Full-sized PCA results RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  12. Variance of Principal Components PCA39 PCA26 • clean + multi RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  13. PCA26 experiment’s results CLEAN MULTI

  14. Summary of best PCA results RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  15. Split experiment’s results

  16. Sample Split results Note: same value of stream weights used in training as in testing, for Split. RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  17. Split1 experiment’s results

  18. Summary of PCA & Split results RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  19. Conclusions • PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic) • large improvements over the single-stream Baseline • Split was better than all PCA combinations: • PCA26/13 better than PCA 78/39, and PCA13 best • Split1 marginally better than Split • Periodic speech segments give robustness to noise. • Further work • Modeling: how best to combine the streams? • LVCSR: evaluate front end on TIMIT (phone recognition). • Robust pitch tracking CONCLUSION http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

  20. COLUMBO PROJECT: Harmonic decomposition applied to ASR Philip J.B. Jackson 1 <p.jackson@surrey.ac.uk> David M. Moreno 2 <davidm@talp.upc.es> Javier Hernando 2 <javier@talp.upc.es> Martin J. Russell 3 <m.j.russell@bham.ac.uk> http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ 1 2 3

More Related