1 / 32

Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification

Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification. Man-Wai Mak. APSIPA 2015. Department of Electronic and Information Engineering The Hong Kong Polytechnic University. Contents. Motivation of Work Conventional PLDA vs. Mixture of PLDA Fast Scoring for Mixture of PLDA

Télécharger la présentation

Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification Man-Wai Mak APSIPA 2015 Department of Electronic and Information Engineering The Hong Kong Polytechnic University

  2. Contents • Motivation of Work • Conventional PLDA vs. Mixture of PLDA • Fast Scoring for Mixture of PLDA • Experiments on NIST 2012 SRE • Conclusions 2

  3. Motivation PLDA Model PLDA Score Enrollment i-vectors Conventional i-vector/PLDA systems use a single PLDA model to handle all SNR conditions.

  4. Motivation PLDA Model 1 PLDA Score PLDA Model 2 PLDA Score PLDA Model 3 PLDA Score We argue that a PLDA model should focus on a small range of SNR.

  5. Proposed Solution PLDA Model 1 PLDA Score SNR Estimator PLDA Model 2 SNR Posterior Estimator PLDA Model 3 M.W. Mak, X.M. Pang and J.T. Chien, "Mixture of PLDA for Noise Robust I-Vector Speaker Verification", IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 13-0142, Jan. 2016. The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the utterance’s SNR (Mak, Interspeech14, Mak et al. T-ASLP 16)

  6. Key Features of Proposed Solution • It was found that the performance of mixture of PLDA is much better than the conventional PLDA when the test utterances exhibit a wide range of SNR. • However, the scoring function of this model is significantly more complex than the conventional PLDA. • This paper proposes a method to reduce the scoring time by up to 60%.

  7. Contents • Motivation of Work • Conventional iVector-PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

  8. I-Vectors • A low dimension representation of the entire utterance. • Factor analysis model: Speaker- and channel-dependent latent factor Speaker- and channel-dependent supervector Low-rank total variability matrix UBM supervector • Given T and an utterance of speaker s, the posterior mean of the latent factor xs is the i-vector representing speaker s • Do the same for test speakers. • Totally unsupervised • I-vectors contain both speaker and channel information

  9. Probabilistic LDA (PLDA) Residual noise with covariance Σ Speaker factor i-vector extracted from the j-th session of the i-th speaker Global mean of all i-vectors Low-rank Speaker factor loading matrix • V is trained by using the i-vectors of many speakers, each has multiple sessions. • Speaker labels are used in the training • Aim to suppress channel effect on the verification scores • In PLDA, the i-vectors x are modeled by a factor analyzer of the form:

  10. PLDA Scoring

  11. Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

  12. Mixture of PLDA (mPLDA) For modeling SNR of utts. For modeling SNR-dependent i-vectors • Generative Model: I-vector SNR (dB) • Model Parameters: 2

  13. Graphical Model of mPLDA SNR of the j-th utterance from the i-th speaker For modeling SNR of utts. For modeling SNR-dependent i-vectors 2

  14. Likelihood-Ratio Scores of mPLDA • Different-speaker likelihood: Same-speaker likelihood • Verification Score = Different-speaker likelihood #For full derivation, see http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf 14

  15. Complexity Analysis Dimension of i-vectors 15

  16. Sparseness Analysis of SNR Posteriors • Key idea: If the posterior probabilities of SNR are sparse, we may drop the combinations of that lead to small posterior 16

  17. Sparseness Analysis of SNR Posteriors Combination of target-speaker utterances and test utterances pairs, sorted by SNR posterior prob. 17

  18. Fast mPLDA Scoring 18

  19. Fast mPLDA Scoring 19

  20. PLDA vs. Fast mPLDA Scoring • PLDA: • Complexity: • Fast mPLDA: • Complexity: 20

  21. Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

  22. Experiments Evaluation dataset:Common evaluation conditions 3 and 4 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives  60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing: Whitening by WCCN then length normalization Followed by LDA (500-dim  200-dim) and WCCN PLDA and mPLDA with 150 speaker factors

  23. Evaluation Conditions CC3 CC4

  24. Comparing Scoring Time Common Condition 3 EER Scoring Time (sec.) EER (%) Scoring Time K = 4 K = 3 K = 2

  25. Comparing Scoring Time Common Condition 4 EER Scoring Time (sec.) EER (%) Scoring Time K = 4 K = 3 K = 2

  26. Conclusions • Mixture of SNR-dependent PLDA (mPLDA) is a flexible model that can handle noisy speech with a wide range of SNR • This paper reduces the scoring time of mPLDA by half with minor degradation in performance. • This is achieved by omitting the computation of likelihood terms whose corresponding SNR posterior probabilities are small. • Further information: • http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf

  27. Performance on SRE12

  28. Performance on SRE12

  29. Distribution of SNR in SRE12 Each SNR region is handled by a PLDA Model

  30. Graphical Model of PLDA

  31. Likelihood-Ratio Scores of mPLDA • Same-speaker likelihood: SNR of target and test utterances i-vectors of target and test speakers

  32. Training Data • In NIST 2012 SRE, training utterances from telephone channels are clean, but some of the test utterances are noisy. • We used the FaNT tool to add babble noise to the clean training utterances Babble noise Utterances from microphone channels FaNT From telephone channels

More Related