290 likes | 447 Vues
Fusion. Gérard CHOLLET chollet @ tsi.enst.fr GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 http://www.tsi.enst.fr/~chollet. Plan. Motivations, Applications Reconnaissance de formes Multi-capteurs Rehaussement du signal Parametres Scores Decisions Conclusions Perspectives.
E N D
Fusion Gérard CHOLLETchollet@tsi.enst.fr GET-ENST/CNRS-LTCI46 rue Barrault75634 PARIS cedex 13http://www.tsi.enst.fr/~chollet
Plan • Motivations, Applications • Reconnaissance de formes • Multi-capteurs • Rehaussement du signal • Parametres • Scores • Decisions • Conclusions • Perspectives
Introduction • Reconnaissance des formes • Pourquoi fusionner ? • Que fusionner ? • Des signaux issus de capteurs divers, • Des parametres mesures sur ces signaux, • Des scores calculés par des classificateurs, • Des decisions prises par des classificateurs • Comment fusionner ?
Fusion de signaux • Nombre de capteurs • Types de capteurs • Identiques ? • Nombre de sources • Exemples : • Réseaux de microphones • Stérovision • Seïsmographe
Fusion de paramètres • Issus d’un seul capteur • Issus de plusieurs capteurs • Modèles multi-flux • Exemples : • Reconnaissance de la parole • Réseaux bayésiens
Dictionnaire locuteur 1 “Bonjour” locuteur test Y Dictionnaire locuteur 2 Dictionnaire locuteur X Dictionnaire locuteur n best quant. Vector Quantization (VQ) SOONG, ROSENBERG 1987
“Bonjour” locuteur test Y “Bonjour” locuteur 1 “Bonjour” locuteur 2 “Bonjour” locuteur X “Bonjour” locuteur n Best path Hidden Markov Models (HMM) ROSENBERG 1990, TSENG 1992
HMM locuteur 1 “Bonjour” locuteurtestY HMM locuteur2 HMM locuteurX HMM locuteurn Best path Ergodic HMM PORITZ 1982, SAVIC 1990
Gaussian Mixture Models (GMM) REYNOLDS 1995
Gaussian Mixture Model • Parametric representation of the probability distribution of observations:
Gaussian Mixture Models 8 Gaussians per mixture
GMM Modeling Scoring SVM Support Vector Machines and Speaker Verification • Hybrid GMM-SVM system is proposed • SVM scoring model trained on development data to classify true-target speakers access and impostors access, using new feature representation based on GMMs
Separating hyperplans H , with the optimal hyperplan Ho Feature space Input space H y(X) X Class(X) Ho SVM principles
Combining Speech Recognition and Speaker Verification. • Speaker independent phone HMMs • Selection of segments or segment classes which are speaker specific • Preliminary evaluations are performed on the NIST extended data set (one hour of training data per speaker) • Some developments were done during a 6 weeks workshop (SuperSID) during summer 2002
Selection of nasals in words in -ing being everything getting anything thingsomething things going
Audio-Visual Identity Verification • A person speaking in front of a camera offers 2 modalities for identity verification (speech and face). • The sequence of face images and the synchronisation of speech and lip movements could be exploited. • Imposture is much more difficult than with single modalities. Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e-banking,…
Examples of Speaking Faces Sequence of digits (PIN code) Free text
Fusion of Speech and Face (from thesis of Conrad Sanderson, aug. 2002)
An illustration Insecure Network Distant server: • Access to private data • Secured transactions Acquisition of biometric signals for each modality Scores are computed for each modality Fusion of scores and decision
Conclusions and Perspectives • Speech is often the only usable biometric modality (over the telephone network). • Interactive Voice Servers may use both text dependent and text independent approaches for improved verification accuracy. • Evaluation campaigns and research workshops are efficient means to stimulate progress. • Most PCs, PDAs and Mobile Phones will be equipped with cameras. Audio-Visual Identity Verification should find applications in e-Banking, e-Commerce, ….