300 likes | 428 Vues
This document explores cutting-edge methodologies in acoustic modeling, including frequency and wavelet filtering, supervised-predictive compensation, and language modeling with Dynamic Bayesian Networks (DBNs). It discusses the application of Hidden Markov Trees and presents a novel approach to spectral feature extraction via quasi-decorrelation techniques. Also highlighted are innovative solutions for achieving noise robustness in speech recognition systems, addressing challenges posed by varying acoustic environments. The findings promise enhancements in model performance under diverse conditions.
E N D
Reunion Bayestic Excuse moi! Murat Deviren Reunion Bayestic / Murat Deviren
Contents • Frequency and wavelet filtering • Supervised-predictive compensation • Language modeling with DBNs • Hidden Markov Trees for acoustic modeling Reunion Bayestic / Murat Deviren
Contents • Frequency and wavelet filtering • Supervised-predictive compensation • Language modeling with DBNs • Hidden Markov Trees for acoustic modeling Reunion Bayestic / Murat Deviren
Proposed by Nadeu’95, Paliwal’99. Goal : Spectral features comparable with MFCCs Properties : Quasi decorrelation of logFBEs. Cepstral weighting effect Emphasis on spectral variations DCT MFCC logFBEs H(z) FF H(z) = 1-az-1 Frequency Filtering Simplified block diagram for MFCC and FF parameterizations Typical derivative type frequency filters Reunion Bayestic / Murat Deviren
Significant performance decrease for FF2 & FF3 in high mismatch case Evaluation of FF on Aurora-3 Reunion Bayestic / Murat Deviren
FF1 = Haar Wavelet Reformulate FF as wavelet filtering Use higher order Daubechies wavelets Promising results Published in ICANN 2003 Wavelets and Frequency Filtering Reunion Bayestic / Murat Deviren
Perspectives • BUT • These results could not be verified on other subsets of Aurora-3 database. • To Do • Detailed analysis of FF and wavelet filtering • Develop models that exploit frequency localized features. • Exploit statistical properties of wavelet transform. Reunion Bayestic / Murat Deviren
Contents • Frequency and wavelet filtering • Supervised-predictive compensation • Language modeling with DBNs • Hidden Markov Trees for acoustic modeling Reunion Bayestic / Murat Deviren
Noise Robustness • Signal processing techniques : • CMN, RASTA, enhancement techniques • Compensation schemes • Adaptive : MLLR, MAP • Requires adaptation data and a canonical model • Predictive : PMC • Hypothetical errors in mismatch function • Strong dependence on front-end parameterization • Multi-condition training Reunion Bayestic / Murat Deviren
Supervised-predictive compensation • Goal : • exploit available data to devise a tool for robustness. • Available data : • speech databases recorded in different acoustic environments. • Principles : • Train matched models for each condition. • Train noise models. • Construct a parametric model that describe how matched models vary with noise model. Reunion Bayestic / Murat Deviren
Supervised-predictive compensation • Advantages : • No mismatch function • Independent of front-end • Canonical model is not required • Computationally efficient • Model can be trained incrementally • i.e. can be updated with new databases Reunion Bayestic / Murat Deviren
Deterministic model • Databases : D1, …, DK • Noise conditions : n1, …, nK • Sw(k) : matched speech model for acoustic unit wW trained on noise condition nk. • N{1,…, K}: noise variable. • For each wW, there exists a parametric function fw such that • || Sw(k) – fw(N) || 0 for some given norm ||.|| Reunion Bayestic / Murat Deviren
N1 S1 N2 S2 N3 S3 Probabilistic model • Given • S : speech model parameterization • N : noise model parameterization • Learn the joint probability density P(S, N) • Given the noise model N, what is the best set of speech models to use? • S` = argmax P(S|N) P(S,N) as a static Bayesian network S N Reunion Bayestic / Murat Deviren
A simple linear model • Speech model : mixture density HMM • Noise model : single Gaussian • wls(nk) = Awlsnk + Bwls • wls(nk) : mean vector for mixture component l ofstate s • nk: mean vector of noise model • fw is parameterized with Awls, Bwls • Supervised training using MMSE minimization Reunion Bayestic / Murat Deviren
Experiments • Connected digit recognition on TiDigits • 15 different noise sources from NOISEX • volvo, destroyer engine, buccaneer…. • Evaluations : • Model performance in training conditions • Robustness comparison with multi-condition training : • under new SNR conditions, • under new noise types. Reunion Bayestic / Murat Deviren
Results • Even a simple linear model can almost recover matched model performances. • The proposed technique can generalize to new SNR conditions and new noise types. • Results submitted to EUROSPEECH 2003 Reunion Bayestic / Murat Deviren
Contents • Frequency and wavelet filtering • Supervised-predictive compensation • Language modeling with DBNs • Hidden Markov Trees for acoustic modeling Reunion Bayestic / Murat Deviren
Classical n-grams • Word probability based on word history. • P(W) = iP(wi | wi-1, wi-2, … , wi-n) wi-n wi-2 wi-1 wi Reunion Bayestic / Murat Deviren
Class based n-grams • Class based word probability for a given class history. • P(W) = iP(wi | ci) P(ci | ci-1, ci-2, … , ci-n) ci-n ci-2 ci-1 ci wi-n wi-2 wi-1 wi Reunion Bayestic / Murat Deviren
Class based LM with DBNs • Class based word probability in a given class context. • P(W) = iP(wi | ci-n, …, ci,…ci+n) P(ci | ci-1, ci-2, … , ci-n) ci-n ci-2 ci-1 ci ci+1 ci+2 wi-n wi-2 wi-1 wi Reunion Bayestic / Murat Deviren
Initial results • Training corpus 11 months from le monde ~ 20 million words • Test corpus ~ 1.5 million words • Vocabulary size : 500 • # class labels = 198 wi-1 wi ci-1 ci wi ci-1 ci wi ci-1 ci ci+1 wi Reunion Bayestic / Murat Deviren
Perspectives • Initial results are promising. • To Do • Learning structure with appropriate scoring metric, i.e., based on perplexity • Appropriate back-off schemes • Efficient CPT representations for computational constraints, i.e., noisy-OR gates. Reunion Bayestic / Murat Deviren
Contents • Frequency and wavelet filtering • Supervised-predictive compensation • Language modeling with DBNs • Hidden Markov Trees for acoustic modeling Reunion Bayestic / Murat Deviren
Reconnaissance de la parole à l’aide de modèles de Markov cachés sur des arbres d’ondelettes Sanaa GHOUZALI DESA Infotelecom Université Med V - RABAT Reunion Bayestic / Murat Deviren
Problèmes de la reconnaissance de la parole • Paramétrisation: • Besoin de localiser les paramètres du signal parole dans le domaine temps-fréquence • Avoir des performances aussi bonnes que les MFCC • Modélisation: • Besoin de construire des modèles statistiques robuste au bruit • Besoin de modéliser les dynamiques fréquentielles du signal parole aussi bien que les dynamiques temporelles Reunion Bayestic / Murat Deviren
Paramètrisation • La transformée Ondelette a de nombreuses propriétés intéressantes qui permettent une analyse plus fine que la transformée Fourrier; • Localité • Multi-résolution • Compression • Clustering • Persistence Reunion Bayestic / Murat Deviren
Modélisation • Il existe plusieurs types de modèles statistiques qui tiennent compte des propriétés de la transformée ondelette; • Independent Mixtures (IM): traite chaque coefficient indépendamment des autres (pptés primaire) • Markov chains: considère seulement les corrélations entre les coefficients dans le temps (clustering) • Hidden Markov Trees (HMT): considère les corrélations entre échelles (persistence) Reunion Bayestic / Murat Deviren
t t f f Les modèles statistiques pour la transformée ondelette Reunion Bayestic / Murat Deviren
Description du modèle choisi • le modèle choisi WHMT : • illustre bien les propriété clustering et persistance de la transformée ondelette • interprète les dépendances complexes entre les coefficients d'ondelette • la modélisation pour la transformée ondelette sera faite en deux étapes: • modéliser chaque coefficient individuellement par un modèle de mélange de gaussienne • capturer les dépendances entre ces coefficients par le biais du modèle HMT Reunion Bayestic / Murat Deviren
Références M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, ‘Wavelet-Based Statistical Signal- Processing Using Hidden Markov Models’, IEEE Trans. Signal. Proc., vol. 46 , no. 4, pp. 886-902, Apr. 1998 M. Crouse, H. Choi and R. Baraniuk, ‘Multiscale Statistical Image Processing Using Tree-Structured Probability Models’, IT Workshop, Feb. 1999 K. Keller, S. Ben-Yacoub, and C. Mokbel, ‘Combining Wavelet-Domain Hidden Markov Trees With Hidden Markov Models’, IDIAP-RR 99-14, Aug. 1999 M. Jaber Borran and R. D. Nowak, ‘Wavelet-Based Denoising Using Hidden Markov Models’ Reunion Bayestic / Murat Deviren