1 / 22

Speaker Recognition

Speaker Recognition. Searching & Decoding in SR ECE5527 Wilson Burgos. Outline. Introduction Objective Implementation Simulation Test and Result Conclusion. Introduction. Speaker Recognition aims to recognize speakers from their voices Divided into identification and verification.

glynn
Télécharger la présentation

Speaker Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Recognition Searching & Decoding in SR ECE5527 Wilson Burgos

  2. Outline • Introduction • Objective • Implementation • Simulation Test and Result • Conclusion

  3. Introduction • Speaker Recognition aims to recognize speakers from their voices • Divided into identification and verification. • Identification determines which registered speaker provides input • Verification determine if the speaker is really the person. • Speaker recognition can be text-dependent or text-independent

  4. Introduction • Text dependent, speakers say same key words for training and recognition. • Text independent, system identifies the speaker regardless. • The goal of the study is a real-time text dependent identification system that compares signals from unknown speakers to the database of known speakers.

  5. Objective • Real Text Dependent tool for speaker identification using sphinx4 • The tool will have two modes • Training • Detection or recognition • During training mode, feature models are created from user voices • The detection phase uses that model to identify the speaker

  6. Concept of Operation • The system uses the Mel Frequency Cepstral Coefficients (MFCC) and the Vector Quantization (VQ) algorithms. • The Kmeans clustering algorithm was used.

  7. Concept of Operation • Feature Extraction using MFCC

  8. Concept of Operation • Feature Matching • Training enrolls the speaker creating a unique model based on it’s features • Testing computes a score and matches to the speaker with the minimum matching score.

  9. Concept of Operation • Vector Quantization • Large number of vectors, reduced while maintaining characteristics • Codebook are generated for each speaker • Kmeans partitions the feature vectors into some number of centroids.

  10. Concept of Operation • Demonstration of the standard algorithm •  1) k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

  11. Concept of Operation 2) k clusters are created by associating every observation with the nearest mean.

  12. Concept of Operation 3) The centroid of each of the kclusters becomes the new means.

  13. Concept of Operation 4) Steps 2 and 3 are repeated until convergence has been reached.

  14. Concept of Operation • In the detection phase the unkown speaker feature vector is compared to all the codebook vectors in the database. • The speaker with the lowest score is chosen. • The score is defined as the average of the Euclidean distances

  15. Concept of Operation • Sphinx4 DataProcessor • Sphinx uses the front ends to perform the specific signal processing • The baseclass implements a DataProcessor to get the mfcc coefficients from the chain

  16. Concept of Operation • The configuration file was updated to reflect this: <item>dct </item> <item>liveCMN </item> <item>featureExtraction </item> <item>featureStore </item>

  17. Implementation • The Mfcc is data get’s stored in a Vector<float[]> • The array of all the mfcc coefficients is used as the input to the Kmeans algorithm to generate the clusters. • The codebook that gets generated is store into a Hashtable<Integer,Speaker> that get’s serialized into a file for later retrieval

  18. Concept of Operation Sphinx4 featureStore KMeans Codebook Disk Speaker Clusters

  19. Simulation Test and Result • Results using wave files from recorded speech 10 speakers • The MFFC used are 38 coefficients from sphinx4 • The number of filter banks is set automatically by sphinx 40 for 16Khz • The codebook size was set to 62

  20. Simulation Test and Result • Average Euclidean Distance of speakers K=8,cb=62

  21. Simulation Test and Result • Number of Clusters vsIdentification Rate

  22. References • http://cmusphinx.sourceforge.net/sphinx4/ • Internation Journal an EE Independent Speaker Identification ,Vol 3 2011

More Related