Speaker Recognition

Speaker Recognition Searching & Decoding in SR ECE5527 Wilson Burgos

Outline • Introduction • Objective • Implementation • Simulation Test and Result • Conclusion

Introduction • Speaker Recognition aims to recognize speakers from their voices • Divided into identification and verification. • Identification determines which registered speaker provides input • Verification determine if the speaker is really the person. • Speaker recognition can be text-dependent or text-independent

Introduction • Text dependent, speakers say same key words for training and recognition. • Text independent, system identifies the speaker regardless. • The goal of the study is a real-time text dependent identification system that compares signals from unknown speakers to the database of known speakers.

Objective • Real Text Dependent tool for speaker identification using sphinx4 • The tool will have two modes • Training • Detection or recognition • During training mode, feature models are created from user voices • The detection phase uses that model to identify the speaker

Concept of Operation • The system uses the Mel Frequency Cepstral Coefficients (MFCC) and the Vector Quantization (VQ) algorithms. • The Kmeans clustering algorithm was used.

Concept of Operation • Feature Extraction using MFCC

Concept of Operation • Feature Matching • Training enrolls the speaker creating a unique model based on it’s features • Testing computes a score and matches to the speaker with the minimum matching score.

Concept of Operation • Vector Quantization • Large number of vectors, reduced while maintaining characteristics • Codebook are generated for each speaker • Kmeans partitions the feature vectors into some number of centroids.

Concept of Operation • Demonstration of the standard algorithm • 1) k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

Concept of Operation 2) k clusters are created by associating every observation with the nearest mean.

Concept of Operation 3) The centroid of each of the kclusters becomes the new means.

Concept of Operation 4) Steps 2 and 3 are repeated until convergence has been reached.

Concept of Operation • In the detection phase the unkown speaker feature vector is compared to all the codebook vectors in the database. • The speaker with the lowest score is chosen. • The score is defined as the average of the Euclidean distances

Concept of Operation • Sphinx4 DataProcessor • Sphinx uses the front ends to perform the specific signal processing • The baseclass implements a DataProcessor to get the mfcc coefficients from the chain

Concept of Operation • The configuration file was updated to reflect this: <item>dct </item> <item>liveCMN </item> <item>featureExtraction </item> <item>featureStore </item>

Implementation • The Mfcc is data get’s stored in a Vector<float[]> • The array of all the mfcc coefficients is used as the input to the Kmeans algorithm to generate the clusters. • The codebook that gets generated is store into a Hashtable<Integer,Speaker> that get’s serialized into a file for later retrieval

Concept of Operation Sphinx4 featureStore KMeans Codebook Disk Speaker Clusters

Simulation Test and Result • Results using wave files from recorded speech 10 speakers • The MFFC used are 38 coefficients from sphinx4 • The number of filter banks is set automatically by sphinx 40 for 16Khz • The codebook size was set to 62

Simulation Test and Result • Average Euclidean Distance of speakers K=8,cb=62

Simulation Test and Result • Number of Clusters vsIdentification Rate

References • http://cmusphinx.sourceforge.net/sphinx4/ • Internation Journal an EE Independent Speaker Identification ,Vol 3 2011

Speaker Recognition