Covariation and weighting of harmonically decomposed streams for ASR

Covariation and weighting of harmonically decomposed streams for ASR • Introduction • Pitch-scaled harmonic filter • Recognition experiments • Results • Conclusion Production of /z/: periodic aperiodic

Motivation and aims • Most speech sounds are either voiced or unvoiced, which have very different properties: • voiced: quasi-periodic signal from phonation • unvoiced: aperiodic signal from turbulence noise • Do these properties allow humans to recognize speech in noise? Maybe, we can use this information to help ASR... by computing separate features for the two parts. • Are their two contributions complementary? INTRODUCTION http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Voiced and unvoiced parts of a speech signal Production of /z/: periodic contribution aperiodic contribution INTRODUCTION http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

speech waveform pitch extraction optimised pitch f0raw Nopt re-splicing pitch optimisation f0opt ^ ^ u(n) v(n) Pitch-scaled harmonic filter s(n) time shifting . . . PSHF PSHF PSHF aperiodic waveform periodic waveform METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Decomposition example (waveforms) Original Periodic Aperiodic METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Decomposition ex. (spectrograms) Original Periodic Aperiodic METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Decomposition ex. (MFCC specs.) Original Periodic Aperiodic METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Speech database: Aurora 2.0 • From TIdigits database of connected English digit strings (male & female speakers), filtered with G.712 at 8 kHz. TRAIN TEST METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Description of the experiments • Baseline experiment: [base] • standard parameterisation of the original waveforms (i.e., MFCC,+Δ,+ΔΔ) • PCA experiments: [pca26, pca78, pca13 and pca39] • decorrelation of the feature vectors, and reduction of the number of coefficients • Split experiments: [split, split1] • adjustment of stream weights (periodic vs. aperiodic) Caveat: pitch values were derived from clean speech files, for entire database! METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

waveform features BASE: MFCC +Δ, +Δ2 SPLIT1: SPLIT: MFCC MFCC +Δ, +Δ2 +Δ, +Δ2 PSHF PSHF cat cat PCA26: MFCC PSHF +Δ, +Δ2 cat PCA PCA78: MFCC +Δ, +Δ2 PSHF cat PCA PCA13: MFCC +Δ, +Δ2 PSHF cat PCA PCA39: MFCC +Δ, +Δ2 PSHF cat PCA Parameterisations METHOD http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Full-sized PCA results RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Variance of Principal Components PCA39 PCA26 • clean + multi RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

PCA26 experiment’s results CLEAN MULTI

Summary of best PCA results RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Split experiment’s results

Sample Split results Note: same value of stream weights used in training as in testing, for Split. RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Split1 experiment’s results

Summary of PCA & Split results RESULTS http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

Conclusions • PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic) • large improvements over the single-stream Baseline • Split was better than all PCA combinations: • PCA26/13 better than PCA 78/39, and PCA13 best • Split1 marginally better than Split • Periodic speech segments give robustness to noise. • Further work • Modeling: how best to combine the streams? • LVCSR: evaluate front end on TIMIT (phone recognition). • Robust pitch tracking CONCLUSION http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/

COLUMBO PROJECT: Harmonic decomposition applied to ASR Philip J.B. Jackson 1 <p.jackson@surrey.ac.uk> David M. Moreno 2 <davidm@talp.upc.es> Javier Hernando 2 <javier@talp.upc.es> Martin J. Russell 3 <m.j.russell@bham.ac.uk> http://www.ee.surrey.ac.uk/Personal/P.Jackson/Columbo/ 1 2 3

Covariation and weighting of harmonically decomposed streams for ASR

Covariation and weighting of harmonically decomposed streams for ASR

Presentation Transcript

ASR

Simple Covariation

Weighting

Weighting for Model B

Weighting and Estimation

Weighting and imputation

Feature Extraction for ASR

Switching Among Non-Weighting, Clause Weighting, and Variable Weighting in Local Search for SAT

ASR and scalability

Simple Covariation

Application Equipment for ASR

Speech Enhancement for ASR

ASR

ASR

Covariation of parameter values

Feature Extraction for ASR

Lecture 4 Covariation of parameter values

Weighting and imputation

WEIGHTING OF FACTORS and Sub-Factors