10 likes | 125 Vues
Principle of the Time-Frequency Principal Components (TFPC) Analysis:. Results:. Expanded vectors: Contextual Covariance Matrix: covariance matrix of the expanded vectors. Time-Frequency principal Components: principal components of the contextual covariance matrix.
E N D
Principle of the Time-Frequency Principal Components (TFPC) Analysis: Results: Expanded vectors: Contextual Covariance Matrix: covariance matrix of the expanded vectors. Time-Frequency principal Components: principal components of the contextual covariance matrix. Examples of Vector Filtering of Spectral Trajectories: Conclusion: The TFPC analysis performs better when applied to spectral coefficients. The use of 1 TFPC filtering for each speaker gives better results than the use of 1 TFPC for all speakers. The best result is obtained by calculating 1 TFPC for each speaker from spectral coefficients with a context of 3 vectors. This score outperforms the cepstral coefficients augmented by their parameters by roughly 20 %. Future directions: Choice of the components. Application of the TFPC filtering to speaker verification. Application of the TFPC filtering to other pattern recognition problems. *Rice University, Houston, Texas - **Faculté Polytechnique de Mons, Mons, Belgiumivan@ieee.org - durou@tcts.fpms.ac.be TIME-FREQUENCY PRINCIPAL COMPONENTS OF SPEECH: APPLICATION TO SPEAKER IDENTIFICATION Introduction: Principle of the Vector Filtering of Spectral Trajectories: Goal: filtering of the spectral vectors in order to extract dynamic speaker characteristic information. Data-driven approach: the filtering is learned on the training data. Class-dependent filtering: one filtering for each speaker. Principle: principal component analysis applied to spectral vectors augmented by their context Time-Frequency Principal Components (TFPC) Ivan Magrin-Chagnolleau * and Geoffrey Durou ** Experiments: Task: closed-set text-independent speaker identification. Database: subset of the POLYCOST database - 112 speakers (64 females and 48 males) - 90 seconds of training (free text through the telephone) - 560 test utterances of 5 second in average. Spectrum: 13 Mel-scale filterbank coefficients. Cepstrum: 12 cepstral coefficients (the first one is discarded) augmented by their parameters. TFPC Filtering: 1 TFPC filtering for each speaker using several sizes of context - 1 TFPC for all the speakers. Modeling: Gaussian mixture models with 8 components and diagonal covariance matrices.