1 / 1

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

8. Frequency (kHz). 4. False rejection (%). False Alarm (%). 0. el. a. m. E. . E. t. . e. b. . n. Amplitude. 0. 0.2. 0.4. 0.6. 0.8. 1.0. Time (s). Vowel. Pause. Non Vowel. Rhythm Modeling. Vowel System Modeling. Vowel System Modeling. Vowel System Models.

haamid
Télécharger la présentation

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 8 Frequency (kHz) 4 False rejection (%) False Alarm (%) 0 el a m E  E t  e b  n Amplitude 0 0.2 0.4 0.6 0.8 1.0 Time (s) Vowel Pause Non Vowel Rhythm Modeling Vowel System Modeling Vowel System Modeling Vowel System Models Mean Identification Rate: 79% Discussion Jérôme FARINAS1, François PELLEGRINO2, Jean-Luc ROUAS1 and Régine ANDRÉ-OBRECHT1 {farinas, rouas, obrecht}@irit.fr; pellegrino@univ-lyon2.fr MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION 1Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Université Paul Sabatier - INP 31062 Toulouse Cedex 4 - France 2Laboratoire Dynamique du Langage UMR 5596 CNRS - Université Lumière Lyon 2 69363 Lyon Cedex 7 - France Vowel / Non Vowel Segmentation • Speech segmentation: statistical segmentation (André-Obrecht, 1988) • Shorts segments (bursts and transient parts of sounds) • Longer segments (steady parts of sounds) • Speech Activity Detection and Vowel detection • Spectral analysis of the signal • Vowel detection (Pellegrino & Obrecht, 2000) • Language and speaker independent algorithm Vowel / Non Vowel Segmentation signal The speech signal is parsed in patterns matching the structure: Cn V (n integer, can be 0). (For the above example: CCVV.CCV.CV.CCCV.CV) Pseudo-syllable Segmentation Acoustic Modeling Each vowel segment is represented with a set of 8 Mel-Frequency Cepstral Coefficients and 8 delta-MFCC, augmented with the Energy and delta Energy of the segment. This parameter vector is extended with the duration of the underlying segment. Example for a .CCV. syllable: • 3 parameters are computed: • Global consonant cluster duration • Global vowel duration • Complexity of the consonantal cluster • With the same .CCV. example: Pseudo-syllable Modeling Rhythm Models Vowel System Likelihoods Rhythm Likelihoods For each language, a Gaussian Mixture Model (GMM) is trained using the EM algorithm. The number of components of the model is computed using the LBG-Rissanen algorithm. During the test, the decision lays on a Maximum Likelihood procedure. Merging A simple statistical merging is performed by adding the log-likelihoods of both the Rhythm model and the VSM for each language. Decision Rule L* Rhythm Modeling Merging Mean Identification Rate: 83% Mean Identification Rate: 70% We propose two algorithms dedicated to Automatic Language Identification. Experiments, performed with cross-validation, show that it is possible to achieve an efficient rhythmic modeling (78% of correct identification) in a way that requires no a priori knowledge of the rhythmic structure of the processed languages. Besides, the Vowel System Model reaches 70% of correct identification. With these read data, merging the two approaches improves the identification rate up to 83%.

More Related