1 / 29

Fusion

Fusion. Gérard CHOLLET chollet @ tsi.enst.fr GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 http://www.tsi.enst.fr/~chollet. Plan. Motivations, Applications Reconnaissance de formes Multi-capteurs Rehaussement du signal Parametres Scores Decisions Conclusions Perspectives.

Télécharger la présentation

Fusion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fusion Gérard CHOLLETchollet@tsi.enst.fr GET-ENST/CNRS-LTCI46 rue Barrault75634 PARIS cedex 13http://www.tsi.enst.fr/~chollet

  2. Plan • Motivations, Applications • Reconnaissance de formes • Multi-capteurs • Rehaussement du signal • Parametres • Scores • Decisions • Conclusions • Perspectives

  3. Introduction • Reconnaissance des formes • Pourquoi fusionner ? • Que fusionner ? • Des signaux issus de capteurs divers, • Des parametres mesures sur ces signaux, • Des scores calculés par des classificateurs, • Des decisions prises par des classificateurs • Comment fusionner ?

  4. Reconnaissance de formes

  5. Fusion de signaux • Nombre de capteurs • Types de capteurs • Identiques ? • Nombre de sources • Exemples : • Réseaux de microphones • Stérovision • Seïsmographe

  6. Fusion de paramètres • Issus d’un seul capteur • Issus de plusieurs capteurs • Modèles multi-flux • Exemples : • Reconnaissance de la parole • Réseaux bayésiens

  7. Fusion de scores

  8. Fusion de décisions

  9. Dictionnaire locuteur 1 “Bonjour” locuteur test Y Dictionnaire locuteur 2 Dictionnaire locuteur X Dictionnaire locuteur n best quant. Vector Quantization (VQ) SOONG, ROSENBERG 1987

  10. “Bonjour” locuteur test Y “Bonjour” locuteur 1 “Bonjour” locuteur 2 “Bonjour” locuteur X “Bonjour” locuteur n Best path Hidden Markov Models (HMM) ROSENBERG 1990, TSENG 1992

  11. HMM locuteur 1 “Bonjour” locuteurtestY HMM locuteur2 HMM locuteurX HMM locuteurn Best path Ergodic HMM PORITZ 1982, SAVIC 1990

  12. Gaussian Mixture Models (GMM) REYNOLDS 1995

  13. HMM structure depends on the application

  14. Gaussian Mixture Model • Parametric representation of the probability distribution of observations:

  15. Gaussian Mixture Models 8 Gaussians per mixture

  16. GMM Modeling Scoring SVM Support Vector Machines and Speaker Verification • Hybrid GMM-SVM system is proposed • SVM scoring model trained on development data to classify true-target speakers access and impostors access, using new feature representation based on GMMs

  17. Separating hyperplans H , with the optimal hyperplan Ho Feature space Input space H y(X) X Class(X) Ho SVM principles

  18. Results

  19. Combining Speech Recognition and Speaker Verification. • Speaker independent phone HMMs • Selection of segments or segment classes which are speaker specific • Preliminary evaluations are performed on the NIST extended data set (one hour of training data per speaker) • Some developments were done during a 6 weeks workshop (SuperSID) during summer 2002

  20. SuperSID experiments

  21. GMM with cepstral features

  22. Selection of nasals in words in -ing being everything getting anything thingsomething things going

  23. Fusion

  24. Fusion results

  25. Audio-Visual Identity Verification • A person speaking in front of a camera offers 2 modalities for identity verification (speech and face). • The sequence of face images and the synchronisation of speech and lip movements could be exploited. • Imposture is much more difficult than with single modalities. Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e-banking,…

  26. Examples of Speaking Faces Sequence of digits (PIN code) Free text

  27. Fusion of Speech and Face (from thesis of Conrad Sanderson, aug. 2002)

  28. An illustration Insecure Network Distant server: • Access to private data • Secured transactions Acquisition of biometric signals for each modality Scores are computed for each modality Fusion of scores and decision

  29. Conclusions and Perspectives • Speech is often the only usable biometric modality (over the telephone network). • Interactive Voice Servers may use both text dependent and text independent approaches for improved verification accuracy. • Evaluation campaigns and research workshops are efficient means to stimulate progress. • Most PCs, PDAs and Mobile Phones will be equipped with cameras. Audio-Visual Identity Verification should find applications in e-Banking, e-Commerce, ….

More Related