1 / 26

Ala’a Spaih Abeer Abu-Hantash Directed by

Text-Independent Speaker Identification System. Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa. 1. 2. 3. 4. 5. Speaker Recognition Field. System Overview. MFCC & VQ. Experimental Results. Live Demo. Outline for Today.

tasya
Télécharger la présentation

Ala’a Spaih Abeer Abu-Hantash Directed by

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text-Independent Speaker Identification System Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

  2. 1. 2. 3. 4. 5. Speaker Recognition Field System Overview MFCC & VQ Experimental Results Live Demo Outline for Today

  3. Speaker Recognition Field SpeakerRecognition SpeakerVerification SpeakerIdentification Text Dependent Text Independent Text Dependent Text Independent

  4. System Overview Training mode Speaker Model Database Speaker modeling Feature extraction Feature Matching Testing Mode Speech input Speaker ID Decision Logic

  5. Feature Extraction • Feature extraction:is a special form of dimensionality reduction. • The aim: is to extract the formants.

  6. Feature Extraction • The extracted features must have specific characteristics: • Easily measurable, occur naturally and frequently in speech. • Not change over time. • Vary as much among speakers, consistent for each speaker. • Not affected by: speaker health, background noise. • Many algorithms to extract them: LPC,LPCC,HFCC,MFCC. • We used Mel Frequency Cepstral Coefficients algorithm: MFCC.

  7. Feature Extraction Using MFCC Input speech Framing and windowing Fast Fourier transform Absolute value Mel scaled-filter bank Log Feature vectors Discrete cosine transform

  8. Framing And Windowing FFT Spectrum Glottal pulse Vocal tract

  9. Mel Scaled-Filter Bank Spectrum Mel spectrum mel(f)= 2595*log10(1+f/700)

  10. Cepstrum Mel spectrum MFCC Coeff. DCT of the logarithm of the magnitude spectrum, the glottal pulse and the impulse response can be separated.

  11. Classification • Classification, that is to build a unique model for each speaker in the database. • Two major types of models for classification. Stochastic models: GMM,HMM,ANN Template models: VQ , DTW • We used VQ algorithm.

  12. VQ Algorithm • The VQ technique consists of extracting a small number of representative feature vectors. • The first step is to build a speaker-database consisting of N codebooks, one for each speaker in the database. Clustered into codewords Speaker model (codebook) Speaker Feature vectors This done by K-means Clustering algorithm

  13. K-means Clustering start No. of clusters k No centroids No change yes End Distance objects to centroids Grouping based on minimum distance

  14. VQ Example • Given data points, split into 4 codebook vectors with initial values at (2,2),(4,6),(6,5),(8,8).

  15. VQ Example • Once there’s no more change, the feature space will be partitioned into 4 regions. Any input feature can be classified as belonging to one of the 4 regions. The entire codebook can be specified by the 4 centroid points.

  16. K-means Clustering • If we set the codebook size to 8 then the output of the clustering will be: VQ MFCC’s of a speaker (1000x12) Speaker Codebook (8x12)

  17. Feature Matching • For each codebook a distortion measure is computed. • The speaker with the lowest distortion is chosen. • Define the distortion measure Euclidean distance.

  18. Monitoring Microphone Inputs MFCC Feature Extraction Calculate VQ Distortion Make Decision & Display System Operates In Two Modes Offline Online

  19. Applications • Speaker Recognition for Authentication. • Banking application. • Forensic Speaker Recognition Proving the identity of a recorded voice can help to convict a criminal or discharge an innocent in court. • Speaker Recognition for Surveillance. Electronic eavesdropping of telephone and radio conversations.

  20. Results • 12 MFCC, 29 Filter banks, 64 Codebook size … ELSDSR database. • To show how the system identify the speaker according to Euclidean distance calculation.

  21. Results • Number of MFCC Vs. ID rate. • Frame Size Vs. ID rate. • Frame size(10-30) ms Good • Above 30 ms Bad

  22. Results • The effect of the codebook size on the ID rate & VQ distortion.

  23. Results • Number of filter-banks Vs. ID rate & VQ distortion.

  24. Results • The performance of the system on different test shot lengths.

  25. Summary • Effect of changing some parameters on: • MFCC algorithm. • VQ algorithm. • Our system identify the speaker regardless of the language and the text. • Satisfied results: • The same training and testing environment. • Test data needs to be several ten seconds.

  26. Thank You

More Related