1 / 22

Text Independent Speaker Identification Using Gaussian Mixture Model

Text Independent Speaker Identification Using Gaussian Mixture Model. Chee -Ming Ting Sh-Hussain Salleh Tian-Swee Tan A . K. Ariff . International Conference on Intelligent and Advanced Systems 2007. Jain- De,Lee. OUTLINE. INTRODUCTION GMM SPEAKER IDENTIFICATION SYSTEM

nika
Télécharger la présentation

Text Independent Speaker Identification Using Gaussian Mixture Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Independent Speaker Identification Using Gaussian Mixture Model Chee-Ming Ting Sh-HussainSallehTian-Swee Tan A. K. Ariff. International Conference on Intelligent and Advanced Systems 2007 Jain-De,Lee

  2. OUTLINE • INTRODUCTION • GMM SPEAKER IDENTIFICATION SYSTEM • EXPERIMENTAL EVALUATION • CONCLUSION

  3. INTRODUCTION • Speaker recognition is generally divided into two tasks • Speaker Verification(SV) • Speaker Identification(SI) • Speaker model  • Text-dependent(TD) • Text-independent(TI)

  4. INTRODUCTION • Many approaches have been proposed for TI speaker recognition • VQ based method • Hidden Markov Models • Gaussian Mixture Model • VQ based method

  5. INTRODUCTION • Hidden Markov Models • State Probability • Transition Probability • Classify acoustic events corresponding to HMM states to characterize each speaker in TI task • TI performance is unaffected by discarding transition probabilities in HMM models

  6. INTRODUCTION • Gaussian Mixture Model • Corresponds to a single state continuous ergodic HMM • Discarding the transition probabilities in the HMM models • The use of GMM for speaker identity modeling • The Gaussian components represent some general speaker-dependent spectral shapes • The capability of Gaussian mixture to model arbitrary densities

  7. GMM SPEAKER IDENTIFICATION SYSTEM • The GMM speaker identification system consists of the following elements • Speech processing • Gaussian mixture model • Parameter estimation • Identification

  8. Speech Processing • The Mel-scale frequency cepstral coefficients (MFCC) extraction is used in front-end processing Input Speech Signal Pre-Emphasis Frame Hamming Window FFT Triangular band-pass filter Logarithm DCT Mel-sca1e cepstral feature analysis

  9. Gaussian mixture model • The Gaussian model is a weighted linear combination of M uni-model Gaussian component densities • The mixture weight satisfy the constraint that Where is a D-dimensional vector are the component densities wi, i=1,…,Mare the mixture weights

  10. Gaussian mixture model • Each component density is a D-variate Gaussian function of the form • The Gaussian mixture density model are denoted as Where is mean vector is covariance matrix

  11. Parameter estimation • Conventional GMM training process Input training vector LBG algorithm N EM algorithm Convergence End Y

  12. LBG Algorithm Input training vector Overall average N Y m<M End Split D’=D Clustering Cluster’s average  N Y Calculate Distortion   (D-D’)/D< δ

  13. EM Algorithm • Speaker model training is to estimate the GMM parameters via maximum likelihood (ML) estimation • Expectation-maximization (EM) algorithm

  14. Parameter estimation • This paper proposes an algorithm consists of two steps

  15. Parameter estimation • Cluster the training vectors to the mixture component with the highest likelihood • Re-estimate parameters of each component number of vectors classified in cluster i / total number of training vectors sample mean of vectors classified in cluster i. sample covariance matrix of vectors classified in cluster i

  16. Identification • The feature is classified to the speaker ,whosemodel likelihood is the highest • The above can be formulated in logarithmic term

  17. EXPERIMENTAL EVALUATION • Database and Experiment Conditions • 7 male and 3 female • The same 40 sentences utterances with different text • The average sentences duration is approximately 3.5 s • Performance Comparison between EM and Highest Mixture Likelihood Clustering Training • The number of Gaussian components 16 • 16 dimensional MFCCs • 20 utterances is used for training

  18. EXPERIMENTAL EVALUATION • Convergence condition

  19. EXPERIMENTAL EVALUATION • The comparison between EM and highest likelihood clustering training on identification rate • 10 sentences were used for training • 25 sentences were used for testing • 4 Gaussian components • 8 iterations

  20. EXPERIMENTAL EVALUATION • Effect of Different Number of Gaussian Mixture Components and Amount of Training Data • MFCCs feature dimension is fixed to 12 • 25 sentences is used for testing

  21. EXPERIMENTAL EVALUATION • Effect of Feature Set on Performance for Different Number of Gaussian Mixture Components • Combination with first and second order difference coefficients was tested • 10 sentences is used for training • 30 sentences is used for testing

  22. CONCLUSION • Comparably to conventional EM training but with less computational time • First order difference coefficients is sufficient to capture the transitional information with reasonable dimensional complexity • The 12 dimensional 16 order GMM and using 5 training sentences achieved 98.4% identification rate

More Related