1 / 13

FYP0202 Advanced Audio Information Retrieval System

FYP0202 Advanced Audio Information Retrieval System. By Alex Fok, Shirley Ng. Outline. Overview Read in the raw speech MFCC processing Detect the audio scene change Audio Clustering Interleave Audio Clustering Conclusion. Overview.

clovis
Télécharger la présentation

FYP0202 Advanced Audio Information Retrieval System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FYP0202Advanced Audio InformationRetrieval System By Alex Fok, Shirley Ng

  2. Outline • Overview • Read in the raw speech • MFCC processing • Detect the audio scene change • Audio Clustering • Interleave Audio Clustering • Conclusion

  3. Overview • Automatic segmentation of an audio stream and automatic clustering of audio segments have quite a bit of attention nowadays. • Example, in the task of automatic transcription of broadcast news, the data contains clean speech, telephone speech, music segments, speech corrupted by music or noise.

  4. Overview (cont’) • We would like to SEGMENT the audio stream into homogenous regions according to speaker identity. • We would like to cluster speech segments into homogeneous clusters according to speaker identity.

  5. Step1:Read in the raw speech • Read in a mpeg file as input • Convert the file from .mpeg format to .wav format • Because the MFCC library only process on .wav file

  6. Step2:MFCC processing • A wav is viewed as frames, each contains different features • We make use of the MFCC library to convert the wav to MFCC features for processing • We extract 24 features for each frames • The result are stored in feature vectors Frame1 Frame 2 Frame 3

  7. Step3: Detect the audio scene change • Make use of the feature vector to detect the audio scene change • The input audio stream will be modeled as Gaussian process • Model selection criterion called BIC (Bayesian Information Criterion) is used to detect the change point

  8. Step3: Detect the audio scene change • Denote Xi (i = 1,…,N) as the feature vector of frame i • N is the total number of frame • mi : mean of mean vector of frame i • ∑i : full covariance matrix of frame i • R(i) = N log |∑| - N1 log |∑1| - N2 log |∑2| • ∑, ∑1, ∑2 are the sample covariance matrices from all the data, from {x1,…,xi}, from {xi+1,…,Xn} respectively

  9. Step3: Detect the audio scene change • BIC(i) = R(i) – constant • If there is only one change point, then the frame with highest BIC score is the change point • If there are more than one change point, just simple extend the algorithm

  10. Step 4:Audio Clustering • As we want to speed up the audio detecting, so we just roughly find the change point. • As a result, there maybe some wrongly calculated change point. • In this part, we try to combine the wrongly segmented neighbor segments • Compare with neighbor segments, if they are speech of the same person, then combine it.

  11. Step5:Interleave Audio Clustering • Group all the segments of the same speaker into one node. • Before • After Speaker 1 Speaker 1 Speaker 2 Combined Speaker1 Speaker 1 Speaker 1 Speaker 2

  12. Conclusion • We would like to make a precise and speedy engine that recognize the identity of speaker in a wave file. • We would like to group the same speaker in the wave.

  13. Conclusion (cont’) • Instead of making local decision based on distance between fixed size sample, we expand the decision as wide as possible • Avoid the respectively calculation by using dynamic programming. • Detection algorithm can detects acoustic changing points with reasonable detestability.

More Related