1 / 21

Speaker Verification: Is it Industrial Strength?

Speaker Verification: Is it Industrial Strength?. Fergus McInnes - CCIR. Outline of Presentation. Introduction to speaker verification: what it is how it works Large-scale evaluation of Nuance Verifier on British speakers Conclusions and recommendations. Speaker Verification: What is it?.

amil
Télécharger la présentation

Speaker Verification: Is it Industrial Strength?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Verification:Is it Industrial Strength? Fergus McInnes - CCIR

  2. Outline of Presentation • Introduction to speaker verification: • what it is • how it works • Large-scale evaluation of Nuance Verifier on British speakers • Conclusions and recommendations

  3. Speaker Verification:What is it? • Test of claimed identity based on voice • Aims to answer question: “Is the speaker who he/she claims to be?” • Can be used in combination with other security checks • Compare and contrast: • speaker identification - no prior identity claim • other biometrics- not usable over phone

  4. Speaker Verification:How does it work? • Enrolment phase: client provides speech to build speaker model • Verification phase: • compare caller’s speech with client’s model • if it matches well enough, accept caller as being the client (assuming other security checks OK) • otherwise reject caller’s identity claim

  5. Tuning a Verifier EqualErrorRate(EER) Lower security:low FR (high FA) Higher security:low FA (high FR) EER threshold

  6. Enrolment Phase • How many enrolment sessions? - single session per client for convenience, or multiple sessions for improved modelling • What words or phrases to use? - digits, application keywords, passwords etc • How many utterances per word or phrase? - should be more than one to ensure consistency and representativeness

  7. Verification Phase • What words or phrases? - for best results use same words as in enrolment • How many utterances per verification bid? - more for accuracy, or fewer for speed • How many verification bids to allow? • What threshold setting? - tradeoff between false acceptances and false rejections • What to do on rejection? - pass to agent?

  8. Large-Scale Evaluationof Nuance Verifier • Speech collected over telephone network from >1000 employees of participating companies • Tests run at CCIR using Nuance Verifier Version 6.2.4-pre (1999/2000): • main series: 779 speakers with sufficient data • comparative tests on excluded speakers • tests on identical twins and other related speakers • Edinburgh impostor study (deliberate mimicry)

  9. Data Collection

  10. Main Series of Tests • 779 speakers (445 male, 334 female) • Digits, digit strings, names, banking words • Enrolment data: 1, 2 or 3 utterances per word or phrase, from registration call • Test data: 1 utterance per word or phrase from each of 3 simulated banking service calls • Equal numbers of client and impostor bids • Random or same-sex impostors

  11. Test Procedure SPEAKER SET X: SPEAKER SET Y: enrolment enrolment client andimpostor bids client andimpostor bids thresholdestimation thresholdestimation scores scores verification decisions verification decisions FA and FR rates HTER(X) HTER(Y) FA and FR rates averaging averaging averaging overall Half Total Error Rate (HTER)

  12. Results on Digits(HTER with random impostors)

  13. False Rejection / False Acceptance Tradeoff(all digit phrases, pooled ×3 enrolment)

  14. Results on Non-Digit Words(HTER with random impostors)

  15. Results on All Words(HTER with random impostors)

  16. Results with Adaptation(HTER, with pooled 3 enrolment) • After adaptation on client bids:1.2% (no ad)  0.9% (1 ad)  0.9% (2 ad) (using all digit phrases) • After adaptation on client and same-impostor bids: 1.2%  1.4%  1.7% • Adapted digit + unadapted non-digit models:client ad: 0.9%  0.8%  0.7%client + same-imp ad: 0.9%  1.3%  1.5%

  17. Related Impostors(FA on all digits, with pooled 3 enrolment and standard same-sex EER threshold) • Identical twin: 58.7% [19 speaker pairs] • Other sibling: 0.0% [10] • Parent: 15.5% [11] • Child: 11.1% [10] ( Unrelated same-sex impostor: 1.4% [779] ) HTER on identical twins with adjusted threshold: 17.1%

  18. Impostor Study- Deliberate Mimicry(combined digit/non-digit bids, pooled 3 enrolment, standard same-sex threshold)

  19. Summary of Results • Accuracy improved by usingdigits, byincreasing amount of enrolment data, and by increasing amount of speech for verification • EER ~1% for verification on 10s of speech, after enrolment on 3 utterances per word or phrase • False Acceptance/False Rejection tradeoff:0.1% FR  5% FA • Adaptation helps if there are no persistent impostors • Identical twins not reliably distinguished

  20. Speaker Verification - Conclusions • Recommend using speaker verificationin addition to other security checks - SV with 1% false acceptance reduces an impostor’s chance from 1% to 0.01% (PIN not known) or from 100% to 1% (PIN known) • Adding verification will always increase security, at the cost of some rate of false rejection for genuine speakers

  21. Fergus.McInnes@ccir.ed.ac.uk

More Related