1 / 19

Speaker recognition Phase 1: Detecting speech

Speaker recognition Phase 1: Detecting speech. Yannick Thimister Han van Venrooij Bob Verlinden . Project 3.1 21-10-2010 DKE Maastricht University. Contents. Speaker recognition Problem description Speech samples Voice activity detection Experiments and results Conclusion

arissa
Télécharger la présentation

Speaker recognition Phase 1: Detecting speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker recognitionPhase 1: Detecting speech Yannick Thimister Han van Venrooij Bob Verlinden Project 3.1 21-10-2010 DKE Maastricht University

  2. Contents Speaker recognition Problemdescription Speech samples Voice activitydetection Experiments and results Conclusion Next steps

  3. Speaker Recognition • Speech containsseverallayers of info • Spoken words • Speaker identity • Speaker-relateddifferences are a combination of anatomicaldifferences and learnedspeakinghabits • Speaker recognitioncanbedivided in 2 types • Speaker verification • Speaker identification

  4. Problemdescription • Thisphase: • Collect audio samples (.wavformat) of both single- and multi-speakerconversations • Develop a program to determine when an audio sample contains a conversation • Project: • Develop a program that is able to analyse a multi-speakerconversation and identify as many as possible speakers

  5. Speech samples • Recorded samples from 12 persons • 5 samples per person: • 2x pre-defined sentence 1 (Chuck) • 2x pre-defined sentence 2 (Shakespeare) • 1x random sentence • Professional microphone • As lessnoise as possible

  6. Voice activity detection Power-based Entropy-based Spectral divergence Frames Initial frames are noise Hangover

  7. Voice activity detection Adaptive noise estimation Power-based Entropy-based Spectral divergence Frames Initial frames are noise Hangover

  8. Voice activity detection • Entropy-based • Scale DFT coefficients • Entropy equals

  9. Voice activity detection • Entropy-based • Calculate Contour Tracker • For each frame j • Calculate the entropy • Differ at least 0.4 for CT • CT’ = mean(CT, H(j))

  10. Voice activity detection • Power-based • Assumes that the noise is normally distributed • Calculate mean, standard deviation • For each sample n • Calculate • For each frame j • The majority of the samples

  11. Voice activity detection • Spectral divergence • L-frame window • Estimation • Divergence

  12. Voice activity detection • Spectral divergence • Estimate the noise spectrum • Averages of the DFT coefficients • Calculate mean (μ) LTSD of noise frames • For each frame f • Calculate the LTSD > 1.25 μ • Update

  13. Experiments Hand labeled samples Labeled with algorithms Percentage of correct classified Variance

  14. Results • Entropy-based • Correctly classified: 63,3% • Variance: 1,64% • Power-based • Correctly classified: 73,9% • Variance: 1,42% • Spectral divergence • Correctly classified: 79,6% • Variance: 0,90%

  15. Results Entropy-based output Labeled data

  16. Results Power-based output Labeled data

  17. Results Spectral Diversion output Labeled data

  18. Conclusions Entropy based classifies almost the whole sample as speech. Energy based seems to be more sensitive, but this does not improve results. Spectral divergence scores best. Variance is low due to the same amount of speech in each sample.

  19. Next steps Gaussian Mixture Model Different microphones Different noise levels Difference in text-dependent and text-independent

More Related