MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

8 Frequency (kHz) 4 False rejection (%) False Alarm (%) 0 el a m E  E t  e b n Amplitude 0 0.2 0.4 0.6 0.8 1.0 Time (s) Vowel Pause Non Vowel Rhythm Modeling Vowel System Modeling Vowel System Modeling Vowel System Models Mean Identification Rate: 79% Discussion Jérôme FARINAS1, François PELLEGRINO2, Jean-Luc ROUAS1 and Régine ANDRÉ-OBRECHT1 {farinas, rouas, obrecht}@irit.fr; pellegrino@univ-lyon2.fr MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION 1Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Université Paul Sabatier - INP 31062 Toulouse Cedex 4 - France 2Laboratoire Dynamique du Langage UMR 5596 CNRS - Université Lumière Lyon 2 69363 Lyon Cedex 7 - France Vowel / Non Vowel Segmentation • Speech segmentation: statistical segmentation (André-Obrecht, 1988) • Shorts segments (bursts and transient parts of sounds) • Longer segments (steady parts of sounds) • Speech Activity Detection and Vowel detection • Spectral analysis of the signal • Vowel detection (Pellegrino & Obrecht, 2000) • Language and speaker independent algorithm Vowel / Non Vowel Segmentation signal The speech signal is parsed in patterns matching the structure: Cn V (n integer, can be 0). (For the above example: CCVV.CCV.CV.CCCV.CV) Pseudo-syllable Segmentation Acoustic Modeling Each vowel segment is represented with a set of 8 Mel-Frequency Cepstral Coefficients and 8 delta-MFCC, augmented with the Energy and delta Energy of the segment. This parameter vector is extended with the duration of the underlying segment. Example for a .CCV. syllable: • 3 parameters are computed: • Global consonant cluster duration • Global vowel duration • Complexity of the consonantal cluster • With the same .CCV. example: Pseudo-syllable Modeling Rhythm Models Vowel System Likelihoods Rhythm Likelihoods For each language, a Gaussian Mixture Model (GMM) is trained using the EM algorithm. The number of components of the model is computed using the LBG-Rissanen algorithm. During the test, the decision lays on a Maximum Likelihood procedure. Merging A simple statistical merging is performed by adding the log-likelihoods of both the Rhythm model and the VSM for each language. Decision Rule L* Rhythm Modeling Merging Mean Identification Rate: 83% Mean Identification Rate: 70% We propose two algorithms dedicated to Automatic Language Identification. Experiments, performed with cross-validation, show that it is possible to achieve an efficient rhythmic modeling (78% of correct identification) in a way that requires no a priori knowledge of the rhythmic structure of the processed languages. Besides, the Vowel System Model reaches 70% of correct identification. With these read data, merging the two approaches improves the identification rate up to 83%.

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

Presentation Transcript

Automatic Identification and Classification of Words using Phonetic and Prosodic Features

Automatic Identification System

Automatic Identification and Data Capture

Automatic Term Identification for Bibliometric Mapping

Segmental HMMs: Modelling Dynamics and Underlying Structure for Automatic Speech Recognition

Language Features

Language Features

Automatic Domain Identification

Using N-gram and Word Network Features for Native Language Identification

Language Features for JDK7

Supra-segmental Features

Automatic Identification System

Automatic Modelling of Rhythm and Intonation for Language Identification

Supra-segmental Features

AUTOMATIC IDENTIFICATION TECHNOLOGY

Language Identification

Landmarks for Routing – Automatic Identification, Extraction and Visualization

FISH Fast Identification of Segmental Homology

Automatic Merging of Pedigree Information

Language Identification and IT

Language use and identification

Automatic Identification System