1 / 12

Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000

Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000. Presented by 王瑞璋 Nick Wang Philips Research East Asia-Taipei Speech processing laboratory, NTU 25 October 2000. Speaker identification and verification.

javen
Télécharger la présentation

Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker identification and verification using EigenVoicesO. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented by 王瑞璋 Nick Wang Philips Research East Asia-Taipei Speech processing laboratory, NTU 25 October 2000

  2. Speaker identification and verification • Speaker identification • to identify the speaker, as one of the clients, via speech input • Speaker verification • to verify the speaker, as the claimed one, via speech input • Problem definition: the amount of available data is limited for each speaker • 60 seconds ==> enough to train GMM • 5 seconds ==> not enough to train GMM, but enough to estimate EigenVoices coefficients • Aim: to incorporate EigenVoices into GMM speaker modeling

  3. When GMM meets EigenVoices • GMM • one mixture Gaussian p.d.f. per client • for example, 32 Gaussian multi-variant p.d.f. in a GMM • Given acoustic feature vector of 26 components (13+13) • Model size: 32 x 26 = 832 variables • EigenVoices -- principle axes of GMM parameter supervectors • to reduce the dimensionality of GMM model by PCA, LDA, or MLES • to eliminate the effect of estimation error (noise) by removing the axes with lower variation (signal) ==> subspace selection with SNR > threshold (nick) • or fixed dimension of EigenVoices space: 20 to 70 EigenVoices (higher variation axes) • speaker location in EigenVoices space ==> reconstruct adapted GMM • Model size: 20 to 70 variables

  4. When GMM meets EigenVoices • Benefit -- principle axes • robust & fast: keep higher variation axes ==> produce less estimation error; show improvement immediately • obvious & small speaker distribution representation (v.s. MAP or MLLR) • more applications: e.g. SPID, telephony, embedded system, ... • Corpora • Extra training data (to train SI model and/or EigenVoices) • large-amounts of data from a large and diverse set of speakers • Client data (to train client models) • small-amounts of data per speaker • Test set • small-amounts of data per speaker (from clients or imposters)

  5. When GMM meets EigenVoices • Training procedure • train GMMs for each speaker in extra training data • large-amounts data per speaker • train EigenVoices (principle axes of GMMs) using PCA, LDA or MLES • on model parameters supervectors • apply environmental adaptation to all EigenVoices by client data using MLLR • by all client data • apply MLED to estimate eigen-coefficients for each client • small-amounts data per speaker • compose client models for each client by EigenVoices & their coefficients

  6. Speaker identification/verification • Measurement • eigenDistance decoding: eigenDist(test, client) • test speaker’s distance from client speaker in eigenspace • eigenGMM decoding: eigenGMMclient(test) • test speaker’s likelihood of client speaker eigen-adapted GMMs • Speaker identification • decision(test) = argminclient eigenDist(test, client) • or decision(test) = argmax eigenGMMclient(test) • Speaker Verification • decision(test,claim) = accept if eigenDist(test, claim) < thr, otherwise reject • or decision(test,claim) = accept if eigenGMMclaim(test) > thr, otherwise reject

  7. Experiments • Setup • Corpora • TIMIT: mismatched extra training data, 630 speakers x 10 sentences • YOHO: extra training, client and test data, 82 speakers x 96 sentences • Results for abundant (360 sec) enrollment data in SPID • 82 clients of 360 seconds enrollment speech • 5 seconds test speech • GMM: 98.8% correct identification • No eigenGMM model is better than GMM under the constraint of at most 71 EigenVoices. • Since: enough enrollment data, and constrained 71/832 axes. • The best is 98.0% with LDA EigenVoices, 71 (the most) axes, eigenGMM decoding.

  8. Experiments • Results for sparse (10 sec) enrollment data in SPID

  9. Experiments • Results for sparse (10 sec) enrollment data in speaker verification • SI impostor model for eigenGMM decoding • 40 EigenVoices on 64-GMMs supervectors over 72 speakers • EigenVoices helps • LDA-EigenVoices • eigenDistance

  10. Experiments • Results for matched/mismatched extra training data in SPID • MLLR adaptation helps to solve environment mismatch. • TIMIT is not suitable for LDA-EigenVoices because of: • 10 sentences per speaker • more allophonic variability

  11. Conclusions • EigenVoices provides a confined subspace. • For abundant client data, it is worse than conventional GMM because of the loss of degrees of freedom. • For sparse client data, it performs better than conventional GMM. • In the case of eigenDistance speaker verification, there is no need for an impostor model to normalize for utterance likelihood dependencies • eigenspace itself implicitly normalizes for utterance likelihood: two utterances with very different likelihood may map to the same point in the eigenspace. • Environment mismatch will hurt the client models. • even applied MLLR adaptation • LDA for EigenVoices generation will not work if • there are less utterances per speaker • or there are strong allophonic variability

  12. Comments & my future work • Since EigenVoices is a confinement, can we enlarge speaker models before applying it ? • GMM: no use of fine speech structure • LVCSR (segmentation => adaptation => SA score difference from SI one) : using of speech structure info hurt speaker recognition performance • Sequential Non-Parametric (SNP) or DTW distances: SNP+GMM work best in all • To try EigenMLLRs speaker recognition • 1/1000 memory requirement of EigenVocies • Separate test data to several fragments, each one is very small • eigenspace decoding • joint decision

More Related