Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System Mr. Suphattharachai Chomphan Tokyo Institute of Technology 23 August 2007

Outlines • Objectives • Study of Thai tones • Characteristics of Thai tones • Categorizations of Thai tones • Tree-based context clustering • Construction of contextual factors • Design of decision-tree structures • Design of context clustering styles • Experiments • Evaluation of overall tone correctness • Evaluation of tone correctness for each tone type • Evaluation of syllable duration distortion • Conclusions

Objectives • To implement an HMM-based speech synthesis system for Thai language with the highest correctness of tone.

Study of Thai tones • Syllable Structure [Nakasakul2002] • Thai : Tonal Language • Characteristics of Thai tones เครียด khr-ia-t^-2 (stress) เคร่ง khr-e-ng^-2 (strict) เพลียphl-iia-0 (exhausted) เรื่อย r-va-j^-2 (always) เสียs-iia-4 (spoil) รัก r-a-k^-3 (love) และl-x-3 (and) ปริ pr-i-1 (break)

Study of Thai tones • Characteristics of Thai tones • F0 contours of Standard Thai Tones (normalized duration) [Luksaneeyanawin1992] สามัญ Middle(0)เอกLow(1)โทFalling(2)ตรีHigh(3)จัตวาRising(4)

Study of Thai tones • Categorizations of Thai tones • Abramson divided the tones into two groups: • static group • dynamic group • According to the final trend of contours: • upward trend group • downward trend group

HMM-based speech synthesizer • 1994 K. Tokuda; et al, proposed HMM-based speech synthesizer for Japanese • Phoneme based speech unit modeling • Provide flexible models, an efficient adaptation • Speaker adaptation • Speaking style conversion

Tree-based context clustering Context clustering is to treat the problem of limitation of training data. • Construction of contextual factors • Phoneme level • {preceding, current, succeeding} phonetic type • {preceding, current, succeeding} part of syllable structure • Syllable level • {preceding, current, succeeding} tone type • the number of phones in {preceding, current, succeeding} syllable • current phone position in current syllable • Word level • current syllable position in current word • part of speech • the number of syllables in {preceding, current, succeeding} word • Phrase level • current word position in current phrase • the number of syllables in {preceding, current, succeeding} phrase • Utterance level • current phrase position in current sentence • the number of syllables in current sentence • the number of words in current sentence

Tree-based context clustering • Design of decision-tree structures Problem of Misshaped F0 contour F0 contours of (a) synthesized speech from the clustering style of single binary tree without tone type questions and (b) natural speech.

Tree-based context clustering • Design of decision-tree structures

+ tone type questions (g) + tone type questions (e) + tone type questions (h) + tone type questions (f) Tree-based context clustering • Design of 8 context clustering styles (a)-(h)

System Preparations VAJA Speech corpus ORCHID Text corpus 1 Wav file Label file XML file 2 • Sentence structure analysis • Word structure analysis • Full context labeling • Construction of question set for context clustering • Feature extraction Wav file Label file XML file Wav file Label file XML file Wav file Label file XML file Feature Extraction (mcep,f0) Full context Labeling 5 3 Parameter file (.cmp) Label file (.lab) Full context label file(.lab) Parameter file (.cmp) Label file (.lab) Full context label file(.lab) Parameter file (.cmp) Label file (.lab) Full context label file(.lab) Parameter file (.cmp) Label file (.lab) Full context label file(.lab) HMM Training and Synthesis 4 Synthetic Speech

Experiments • Evaluation of overall tone correctness Figure 5: F0 contours of synthesized speech from 8 different clustering styles; and F0 contour of natural speech.

Experiments • Evaluation of overall tone correctness Figure 6: Tone error percentages of synthesized speech from 4 different clustering styles

Experiments • Evaluation of overall tone correctness Figure 7: Tone error percentages of synthesized speech from 8 different clustering styles

Experiments • Evaluation of tone correctness for each tone type Figure 8: Tone error percentages of synthesized speech from 8 different clustering styles categorized by tone types;

Experiments • Evaluation of syllable duration distortion Figure 9: Scores of a paired-comparison test for natural duration among 4 different clustering styles;

Examples of synthesized speech

Conclusions • An analysis of tree-based context clustering of an HMM-based Thai speech synthesis system has been conducted in this paper. • Four structures of decision tree were designed according to tone groups and tone types to obtain higher correctness of tone of synthesized speech. • The results show that the tone-separated tree structures can reduce the tone error percentage of the synthesized speech compared to the single binary tree structure significantly. • As for using the contextual tone information in the syllable level, it can improve the tone correctness for all structures of decision tree. • There are some distortions of the syllable duration appearing in the case of using the simple tone-separated tree context clustering with a small amount of training data, however it can be relieved when using the constancy-based-tone-separated or the trend-based-tone-separated tree context clustering. • The analysis of tone correctness of the average-voice-based speech model and the intonation analysis issues are anticipated to be studied in the future.

Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

Presentation Transcript

HMM An Initial Study on HMM-based TTS for Mandarin Chinese

Synthesis-Based Software Architecture Design

A Comparative Analysis of Nonlinear Features for an HMM-Based Seizure Detection System

Density based Clustering

Creation of HMM-based Speech M odel for Estonian Text-to-Speech Synthesis

Microcontroller based system design

Pattern-based Clustering

Microcontroller based system design

Visitor-Based HMM

Tree-Based Density Clustering using Graphics Processors

Segmental GPD training of HMM based speech recognizer

VOICE RECOGNITION USING AN HMM BASED DESIGN

A novel irregular voice model for HMM-based speech synthesis

HMM-Based Synthesis of Creaky Voice

Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011

HMM-BASED PATTERN DETECTION

An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis

HMM-based speech synthesis: the new generation of artificial voices

A Bayesian Approach to HMM-Based Speech Synthesis

MICROPROCESSOR BASED SYSTEM DESIGN

Synthesis Unit and Question Set Definition for Mandarin HMM-based Singing Voice Synthesis