1 / 79

Speaker Verification

Alexandros Xafopoulos. Speaker Verification. Presentation Outline. Framework Preprocessing - Features (Extraction, Noise Compensation-Channel Equalization, Selection) (Pattern) Matching-Modeling Decision Making - Performance Evaluation Experimental Results References. Introduction.

sidney
Télécharger la présentation

Speaker Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alexandros Xafopoulos Speaker Verification Digital Days

  2. Presentation Outline • Framework • Preprocessing - • Features (Extraction, Noise Compensation-Channel Equalization, Selection) • (Pattern) Matching-Modeling • Decision Making - • Performance Evaluation • Experimental Results • References Digital Days

  3. Introduction Framework • Motivation • Speech contains speaker specific characteristics (physiological-behavioral) • vocal tract • pitch range - vocal cords • articulator movement • Mouth • Nasal cavity • Lips • Voiceprint as a biometric (distinguishing trait) • Natural & economical way Digital Days

  4. Introduction(2) Framework • Objective • Discriminate betw. a given speaker & all others • Definitions • Verification < Latin verus (true) • Claim: Speaker identity • Proof: Speech utterance • Binary decision to establish the truth • Client: speaker registered on the system • Impostor: speaker who claims a false identity • Model: set of parameters that represents a speaker or a group of speakers Digital Days

  5. Related Research Areas Framework • Signal Processing Signal processing Analog Signal Processing Digital Signal Processing Speech Processing Other Signals Analysis Recognition Coding / Synthesis Enhancement Storage / Transmission Speech Recognition Speaker Recognition Language Identification Speaker Identification Speaker Detection / Tracking Speaker Verification Digital Days

  6. Related Research Areas(2) Framework • (Statistical) Pattern Recognition Class Label Data Feature Extractor Features Trained Classifier Digital Days

  7. Related Research Areas(3) Framework • Biometrics Technology • def: automatic identification of a person based on his/her physiological or behavioral characteristics (=biometrics) • desirable properties of biometrics [Jain_bk] • universality (found in every person) • uniqueness (different ”value” for each person) • permanence (invariant with time) • collectability (quantitatively measurable) • performance ( accuracy vs.  resources) • high acceptability (person’s willingness) • low circumvention (not easy to ”fool”) Digital Days

  8. Related Research Areas(4) Framework • Speech Science Communication by speech [Somervuo] Digital Days

  9. Related Research Areas(5) Framework Speech Production Physiology • Speech Science(2) [Picone] Block Diagram of Human Speech Production [Picone] Digital Days

  10. Related Research Areas(6) Framework [Morgan] (corrected) • Speech Science(3) General Discrete-Time Model for Speech Production Gain for voice source G(z) H(z) R(z) Gain for noise source Digital Days

  11. Generic Speaker Verification Process Framework • Enrollment (Training) module  Speaker ”A” N utterances Known Identity: ”Speaker is ”A”” Speech Pressure Wave of ”A” N Sets of Feature Vectors Digital Speech Feature Vectors Model Registration Digital Signal Acquisition Feature Creation Channel to transfer signal Speaker Model of ”A” Digital Days

  12. Generic Speaker Verification Process(2) Framework • Enrollment module(2) • Digital signal acquisition • Sampling frequency: Speech Pressure Wave Analog Voltage Signal Conditioned Analog Signal Antialiasing low-pass filter Sampling & Quantization (A/D converter) Microphone Digital Speech Digital Days

  13. Generic Speaker Verification Process(3) Framework • Enrollment module(3): Feature creation Digital Speech Preprocessing Noise Compensation & Channel Equalization Preprocessed Digital Speech Plain Feature Vectors Feature Extraction (Clean) Feature Vectors Feature Selection Digital Days

  14. Generic Speaker Verification Process(4) Framework • Verification module Claimed Identity B Threshold of ”B” |P| Speaker Models Speaker Model of ”B” Speech Pressure Wave of ”A” Model Selection Digital Speech Feature Vectors Matching Results Digital Signal Aquisition Pattern Matching Feature Creation Decision Making Acceptance (A=B) or Rejection (AB) Digital Days

  15. Generic Speaker Verification Process(5) Framework • Threshold setting module Speaker Models of Threshold Setting Speaker Model of ”A” Threshold of ”A” • Cohort model: competitive clients only • World model: all the clients Digital Days

  16. Corpus Parameters Framework • Text-dependency [Nedic] • Text dependent: verification done on a fixed phrase, predetermined by the recognizer (fixed phrase) • Text prompted: verification done on system generated sequence of predetermined words (fixed vocabulary) • User customized: verification done on user requested phrase • Text independent: verification done on any phrase • Language independent: verification done on any language • Vocabulary • Fixed or not • Size (|V|) Digital Days

  17. Corpus Parameters(2) Framework • Population (Speakers) • Size (|P|) • Similarity • Speech Flow • Discrete Utterance (pauses betw. words) • Continuous • Spontaneous (natural) • Quantity (#sessions, #phrases, phrase duration) • Quality of speech (Problems) Digital Days

  18. Problems under real conditions Framework • Microphone / Communication channel / Digitizer quality • Channel - Environmental mismatch (different channels - environments for enrollment & verification request) • Mimicry by humans & tape recorders • Bad pronunciation • Extreme emotional states (e.g. anger) • Sickness / Allergies / Tiredness / Thirst • Aging • Environmental noise / Poor room acoustics Digital Days

  19. Errors Framework • False Rejection • A client makes a request to be verified as himself/herself & the request is rejected • High rate client: goat [Koolwaaij] • Low rate client: sheep • False Acceptance • An impostor makes a request to be verified as a client & the request is accepted • High rate client: lamb • Low rate client: ram • High rate impostor: wolf • Low rate impostor: badger Digital Days

  20. Applications Framework • Access control to databases / facilities • Electronic commerce • Remote access to computer networks • Forensic • Telephone banking [James] Digital Days

  21. Preemphasis-Frame Blocking Preprocessing • Preemphasis: Low order digital system to • spectrally flatten the signal (in favour of vocal tract parameters) • make it less susceptible to later finite precision effects • usually (order=1): • Frame blocking (short-term(st) processing) • L successive overlapping (by M samples) frames • window size - length: N samples = N/ sec • frame rate-shift-period: M samples = M/ sec Digital Days

  22. Frame Windowing Preprocessing • Used to minimize the singal discontinuities at the beg. & end of each frame • Time (long window)<->freq. (short) resolution • Window type: • Corrections: [Picone] Digital Days

  23. Speech Activity Detection Preprocessing • Silence-speech detection • Voiced-unvoiced discrimination • Endpoint detection [Deller_bk] • Can be applied afterward Digital Days

  24. Signal Measures & Graphs Preprocessing Speech waveform Zerocrossing rate Time-frequency plot (Spectrogram) Energy plot [Weingessel] Digital Days

  25. Features - General Feature Extraction • Maps each speech interval-frame to a multidimensional feature space • Order : number of coefficients in each feature vector (dimensionality) • Several kinds of coefficients have been proposed Digital Days

  26. Linear Prediction (LP) Feature Extraction • Speech sample as a linear combination of previous samples (autoregressive mdl): • : LP coefficients (LPC) • : normalized excitation source • G : scale factor • : stLPC of frame l Digital Days

  27. Linear Prediction (LP)(2) Feature Extraction • Calculation of stLPC • Mean squared error minimization • Autocorrelation method • Levinson-Durbin (L-D) recursion • Covariance method • Cholesky (LU) decomposition L-D recursion (l is implied, R: autocorrelation matrix) [Picone2] Digital Days

  28. Linear Prediction (LP)(3) Feature Extraction • LPC • highly correlated • not orthonormal • Distance: Itakura-Saito • Computationally expensive • LPC processor [Rabiner_bk] Digital Days

  29. Cepstrum (Complex-Real) Feature Extraction • Special case of homomorphic signal proc. • Focuses on voiced segments • Short-term complex cepstrum (stCC): • Short-term real cepstrum (stRC): • Distance of cepstrum based coefficients • Euclidean: vectors defined in an orthonormal space Digital Days

  30. Mel Cepstrum Feature Extraction Mel-cepstral feature generation (frame l) • Mel • unit of measure of perceived frequency of a tone • non-linear correspondance to the physical freq. (like the human ear) • mel freq. cepstral coefficients (MFCC): • generalized case [Vergin] [Young] Digital Days

  31. LP derived Cepstrum Feature Extraction • LP Cepstral Coefficients (LPCC): Digital Days

  32. Other cepstral variants Feature Extraction • Linear Freq. Cepstral Coefficients (LFCC) • Like MFCC but: • filters are uniformally spaced on the Hz scale • Mel-warped LPCC (MLPCC) [Kuitert] • CC not directly derived from LPC • 1st compute the log magn. spectrum of LPC • then warp the freq. axis to correspond to the mel axis Digital Days

  33. Variants Feature Extraction • Discrete Wavelet Transform (DWT) instead of FFT [Krishnan] • Application of other type than triangular filters • Application of the logarithm before the triangular filters Digital Days

  34. Delta Cepstrum Feature Extraction • [Milner]: • Inclusion of temporal information Digital Days

  35. PLP - Auditory Features Feature Extraction • Perceptual Linear Prediction (PLP) [Hermansky] • Spectral scale: non-linear Bark scale • Spectral features smoothed within freq. bands • Auditory Features [Kumar] • Imitates signal proc. performed by the ear • cochlear modeling Digital Days

  36. Intra-frame cepstral proc. Noise Compensation-Channel Equalization [Mammone] • Liftering-weighting • low order coeffs: sensitive to overall spectral slope • high order: sensitive to noise • =>tapered window (bandpass liftering) • Adaptive Component Weighting (ACW) • motivation: all frames don't have same distortion Digital Days

  37. Inter-frame cepstral proc. Noise Compensation-Channel Equalization • Cepstral Mean Subtraction (CMS) • mean (over a num of frames) subtraction (tackles training-testing discrepancy) • lowpass filtering • eliminates communication channel spectral shaping • Pole Filtered CMS (PFCMS): cepstrum poles modification Digital Days

  38. RASTA proc. Noise Compensation-Channel Equalization • Relative Spectral Filtering (RASTA) [Hermansky] • bandpass filtering in the log-spectral domain • suppresses spectral components that change more slowly or quickly than in typical speech • RASTA-PLP • Microphone (type, position) robustness Digital Days

  39. Feature Selection Introduction Feature Selection • Goal • find a transformation to a relatively low-dimensional feature space that preserves the information pertinent to the application while enabling meaningful comparisons to be performed using measures of similarity • Processing of features • Principal Component Analysis (PCA) (or Karhunen Loève Expansion-KLE) • seeks a lower dimensional representation that accounts for variance of the features • not necessarily optimum for class discrimination • Linear Discriminant Analysis (LDA) [Jin] • Non LDA (NLDA) (using MLP) [Konig] Digital Days

  40. Matching-Modeling Introduction Matching-Modeling • Modeling: creation of (speaker) models • Model: Can be considered as the output of a proper proc. of a speaker’s set of feature vectors • Matching: computation of a match score betw. the input feature vectors & some speaker model • Methods [Wassner] • Template Matching • deterministic • score: distance betw. a test speaker (feature vectors of an) utterance & a reference speaker model • better score: min distance Digital Days

  41. Matching-Modeling Introduction(2) Matching-Modeling • Methods(2) • Stochastic Approach • probabilistic matching • score: prob. of generation of a speech utterance by the claimed speaker • better score: max probability • Parametric speaker model: specific pdf is assumed & its appropriate parameters (e.g. mean vector, covariance matrix) can be estimated using the Maximum Likelihood Estimation (MLE) e.g. multivariate normal model Digital Days

  42. Template Matching Methods Matching-Modeling • Dynamic Time Warping (DTW) • dynamic comparison betw. a test & a reference (model) matrix (set of feature vectors) • computes a distance betw. the test & ref. patterns • allows time alignment at different costs • uses Dynamic Programming (DP) Digital Days

  43. Template Matching Methods(2) Matching-Modeling • Dynamic Time Warping (DTW)(2) The DP grid with test (t) & reference (r) feature vectors at respective frame indices [Picone] Digital Days

  44. Template Matching Methods(3) Matching-Modeling • Dynamic Time Warping (DTW)(3) • distances-costs on the DP grid (i,j frame indices, k step index) • Node • e.g. • Transition e.g. • Both • e.g. • Global • K: number of transitions (Type 4) Digital Days

  45. Template Matching Methods(4) Matching-Modeling • Dynamic Time Warping (DTW)(4) • DTW search constraints • Endpoint Constraints (bottom left(S) - top right(E) corners) • endpoint relaxation: max points allowed in each direction • Monotonicity (going up & right) • Global Path Constraints (global movement area) • permissible slope or • permissible window Digital Days

  46. Template Matching Methods(5) Matching-Modeling • Dynamic Time Warping (DTW)(5) • DTW search constraints(2) • Local Path Constraints (local movement area) Sakoe & Shiba local constraints on DTW path search [Picone] Digital Days

  47. Template Matching Methods(6) Matching-Modeling • Dynamic Time Warping (DTW)(6) • The minimum cost final endpoint provides the distance betw. a test & a reference phrase • Training-Modeling [Deller_bk] • Casual: Unaltered feature strings form models • Averaging feature strings of utterances • The stochastic techniques possess superior training methods Digital Days

  48. Template Matching Methods(7) Matching-Modeling • Vector Quantization (VQ) • Uses intra-vector dependencies to break-up a vector space in cells (unsupervised) • follows Linde-Buzo-Gray (LBG) algorithm • speaker model: codebook • codebook: set of prototype vectors used to represent vector spaces • goal: data structure "discovery" by finding how the data is clustered Digital Days

  49. Template Matching Methods(8) Matching-Modeling • Learning Vector Quantization (LVQ) • Predefined classes, labeled data • defines the class borders according to the nearest neighbor rule • supervised version of VQ • set of variants (e.g. LVQ1,2,3) • goal: to determine a set of prototypes that best represent each class. Digital Days

  50. Statistical Measures Matching-Modeling • Second Order Statistical Measures (SOSM) [Bimbot] • E.g. Arithmetic-Harmonic-Sphericity (AHS) • speaker model: covariance matrix of feature vectors • Distance=min(=0) iff all eigenvalues of test & ref covar matrices are equal Digital Days

More Related