1 / 11

Speaker Change Detection using Support Vector Machines

Speaker Change Detection using Support Vector Machines. V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai – India. Speaker Change Detection.

ronan-munoz
Télécharger la présentation

Speaker Change Detection using Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai – India

  2. Speaker Change Detection • Automatic segmentation of multispeaker speech data into data of one speaker only • Dissimilarity of distributions of the data before and after a speaker change point • Proposal: Speaker change detection as a pattern classification problem • Patterns extracted from the data around the speaker change points as positive examples • Patterns extracted from the data between the speaker change points negative examples

  3. Speaker Change Detection using SVMs • SVM trained using the positive and negative examples of speaker change points • The SVM to scan the multispeaker data to hypothesize speaker change points • Main issues: - Speaker independent detection of the points - Silence regions before speaker change points - Varying durations of speaker turns - Length of the window used for extraction of patterns - Large dimension of segmental pattern vectors - Large number of false alarms

  4. Speaker Change Detection System

  5. Fixed Duration Window based Patterns

  6. Speaker Change Point Hypothesization using Fixed Duration Window based Patterns • Input: The continuous speech signal of multispeaker speech data without silence regions • The SVM is trained with pattern vectors extracted from the fixed length windows of n frames • Sliding window method: A test pattern is extracted for every n frames with one frame shift. • The test patterns with positive output of the SVM are hypothesized as speaker change points • Several hypotheses may be spurious.

  7. False Alarm Reduction • Two methods are considered for reduction of spurious hypotheses (false alarms) • 1st method: A threshold of 5 frames on the duration of speaker turns. • 2nd method: The false hypotheses on validation data are used as the negative examples in training an SVM for false alarm reduction.

  8. Studies on Speaker Change Detection • Extended data of NIST2003 speaker recognition evaluation database • 2-sp conversations, each of about 5 minute duration including 3 for each of M-M, M-F and F-F speaker conversations • Speaker change points are manually marked • Data divided into training, validation and test datasets • Each dataset includes one each of M-M, M-F and F-F • Training dataset for SVM • Validation dataset to derive the negative examples for the false alarm reduction SVM • Test dataset to evaluate the performance of speaker change detection system

  9. Performance of Speaker Change Detection System • # actual speaker change points in test dataset: 282 • # frames in the test dataset: about 16000 • # speaker change points missed (not detected): M • # false alarms: FA

  10. Summary • Speaker change detection as a pattern classification problem. • Fixed duration window method • SVMs to hypothesize the speaker change points. • Methods for reduction of the number of false alarms. • Performance of the proposed method on NIST2003 speaker verification database.

  11. Thank You

More Related