SPEAKER RECOGNITION

SPEAKER RECOGNITION A PRESENTATION BY SHAMALEE DESHPANDE

INTRODUCTION • Speaker Recognition * Automatically recognizing speaker * Uses individual information from the speaker’s speech waves

INTRODUCTION • Two Approaches Text-Dependant Recognition Text-Independent Recognition

INTRODUCTION • Two Approaches Text-Dependant Recognition *Use of keywords or sentences having the same text for the templates and the recognition Text-Independent Recognition

INTRODUCTION • Two Approaches Text-Dependant Recognition Text-Independent Recognition *Does not rely on a specific text being spoken.

INTRODUCTION • Classes of Sound: Voiced, unvoiced, Plosive • Production of Pitch Frequency and Formants Glottal Waveform

BLOCKDIAGRAM OF A SPEAKER RECOGNITION SYSTEM

DESIRABLE ATTRIBUTES OF A SPEAKER RECOGNITION SYS • Feature should occur naturally and frequently in speech • Easily measurable • Doesn’t change over time or be affected by speakers health • Isn’t affected by background noise • Not be subject to mimicry

SOURCES OF VARIABILITY IN SPEECH • Phonetic Identity Two samples may correspond to different phonetic segments. E.g. Vowel and fricative • Pitch Pitch, other features like breathiness and amplitude can be varied • Speaker Differences due to source physiology, emotions • Microphone • Environment

Possible Acoustic Parameters * Formant Frequencies * LPC * Pitch * Nasal Co articulation * Gain

COMMON SPEAKER RECOGNITION TECHNIQUES • DISCRETE FOURIER TRANSFORM • LINEAR PREDICTIVE CODING • CEPSTRAL ANALYSIS • DYNAMIC TIME WARPING • HIDDEN MARKOV MODELS

DISCRETE / FAST FOURIER TRANSFORM • Changes time domain signals into freq domain signal representations • Enables reduced complexity for processor Read N speech samples from input Append N-L zeroes to the input data Calculation of DFT Windowing

LINEAR PREDICTIVE CODING TUBE Vocal tract BUZZER Glottal excitation Characterized by intensity and pitch Characterized by formants LPC model of the speech producing organs of the body

CEPSTRAL ANALYSIS • Dis-adv of DFT/FFT is that formant freqs may shift the pitch or overlap it • In Cepstral analysis, formants are completely removed from the spectrum • Defined as Fourier Transform of the Log of the power spectrum • S(n) = p(n) * v(n) • X(n) = w(n) * s(n) • S’(w) = p’(w) * v’(w) Fourier Transform • Log S’(w)=log p’(w) + log v’(w) • C(q)= log S’(q) = log p’(q) + log v’(q) • Q – quefrency , C(q) – complex cepstrum

CEPSTRAL ANALYSIS Window DFT LOG IDFT Speech Cepstrum

DYNAMIC TIME WARPING • Incoming speech is usually compared frame by frame with stored template • Achieved via a pair wise comparison of feature vectors from each sequence • Dis Adv – variation in length of corresponding phonemes • DTW takes into account non linear relation between lengths of the two signals • Used as a matching algorithm Example DTW grid

HIDDEN MARKOV MODELS • Speech signal is identified during search process rather than explicitly • Comprises of – Hidden Markov Chain representing temporal variability Observable process representing spectral variability • Portrayed as stochastic pair (X,Y) • HMM is a Finite State Machine where a Probability Density Function p(x|s) is associated with each state s

FUTURE RESEARCH • To extract and apply all levels and information from the speech signal conveying speaker identity • Acoustic – use spectral features conveying vocal tract information • Prosodic - use features derived from pitch, energy tracks to classify information • Phonetic – use phone sequences to characterize speaker specific pronunciations • Idiolect – use words to characterize user specific word patterns • Linguistic – use linguistic patterns to characterize speaker specific conversation style

APPLICATIONS • Access Control- physical facilities, computer networks and websites • PC Login and Password Reset • Secured Transactions – remote banking and online credit card purchase authentication • Time Attendance - workplaces • Law Enforcement – forensics, parole

SPEAKER RECOGNITION

SPEAKER RECOGNITION

Presentation Transcript

Speaker Recognition

Speaker Recognition Research in Joensuu

Speaker Recognition

Language modeling for speaker recognition

Speaker Recognition

A Text-Independent Speaker Recognition System

Speaker recognition Phase 1: Detecting speech

Speaker Recognition

Speaker Recognition

Speaker Recognition

Robust speaker recognition over varying channels

An Intro to Speaker Recognition

Speaker Recognition Experiment

Automatic Speaker Recognition In Forensic Environment

Speaker Recognition

Speaker Recognition

IRISA 2003 SPEAKER RECOGNITION SYSTEM

Speaker Recognition

Robust Speaker Recognition

Using Speaker Recognition

Chapter 14 Speaker Recognition

Isolated word, speaker independent speech recognition