1 / 19

Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System. Mr. Suphattharachai Chomphan. Tokyo Institute of Technology. 23 August 2007. Outlines. Objectives Study of Thai tones. Characteristics of Thai tones Categorizations of Thai tones.

alessa
Télécharger la présentation

Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of Tree-based Context Clustering for an HMM-based Thai Speech Synthesis System Mr. Suphattharachai Chomphan Tokyo Institute of Technology 23 August 2007

  2. Outlines • Objectives • Study of Thai tones • Characteristics of Thai tones • Categorizations of Thai tones • Tree-based context clustering • Construction of contextual factors • Design of decision-tree structures • Design of context clustering styles • Experiments • Evaluation of overall tone correctness • Evaluation of tone correctness for each tone type • Evaluation of syllable duration distortion • Conclusions

  3. Objectives • To implement an HMM-based speech synthesis system for Thai language with the highest correctness of tone.

  4. Study of Thai tones • Syllable Structure [Nakasakul2002] • Thai : Tonal Language • Characteristics of Thai tones เครียด khr-ia-t^-2 (stress) เคร่ง khr-e-ng^-2 (strict) เพลียphl-iia-0 (exhausted) เรื่อย r-va-j^-2 (always) เสียs-iia-4 (spoil) รัก r-a-k^-3 (love) และl-x-3 (and) ปริ pr-i-1 (break)

  5. Study of Thai tones • Characteristics of Thai tones • F0 contours of Standard Thai Tones (normalized duration) [Luksaneeyanawin1992] สามัญ Middle(0)เอกLow(1)โทFalling(2)ตรีHigh(3)จัตวาRising(4)

  6. Study of Thai tones • Categorizations of Thai tones • Abramson divided the tones into two groups: • static group • dynamic group • According to the final trend of contours: • upward trend group • downward trend group

  7. HMM-based speech synthesizer • 1994 K. Tokuda; et al, proposed HMM-based speech synthesizer for Japanese • Phoneme based speech unit modeling • Provide flexible models, an efficient adaptation • Speaker adaptation • Speaking style conversion

  8. Tree-based context clustering Context clustering is to treat the problem of limitation of training data. • Construction of contextual factors • Phoneme level • {preceding, current, succeeding} phonetic type • {preceding, current, succeeding} part of syllable structure • Syllable level • {preceding, current, succeeding} tone type • the number of phones in {preceding, current, succeeding} syllable • current phone position in current syllable • Word level • current syllable position in current word • part of speech • the number of syllables in {preceding, current, succeeding} word • Phrase level • current word position in current phrase • the number of syllables in {preceding, current, succeeding} phrase • Utterance level • current phrase position in current sentence • the number of syllables in current sentence • the number of words in current sentence

  9. Tree-based context clustering • Design of decision-tree structures Problem of Misshaped F0 contour F0 contours of (a) synthesized speech from the clustering style of single binary tree without tone type questions and (b) natural speech.

  10. Tree-based context clustering • Design of decision-tree structures

  11. + tone type questions (g) + tone type questions (e) + tone type questions (h) + tone type questions (f) Tree-based context clustering • Design of 8 context clustering styles (a)-(h)

  12. System Preparations VAJA Speech corpus ORCHID Text corpus 1 Wav file Label file XML file 2 • Sentence structure analysis • Word structure analysis • Full context labeling • Construction of question set for context clustering • Feature extraction Wav file Label file XML file Wav file Label file XML file Wav file Label file XML file Feature Extraction (mcep,f0) Full context Labeling 5 3 Parameter file (.cmp) Label file (.lab) Full context label file(.lab) Parameter file (.cmp) Label file (.lab) Full context label file(.lab) Parameter file (.cmp) Label file (.lab) Full context label file(.lab) Parameter file (.cmp) Label file (.lab) Full context label file(.lab) HMM Training and Synthesis 4 Synthetic Speech

  13. Experiments • Evaluation of overall tone correctness Figure 5: F0 contours of synthesized speech from 8 different clustering styles; and F0 contour of natural speech.

  14. Experiments • Evaluation of overall tone correctness Figure 6: Tone error percentages of synthesized speech from 4 different clustering styles

  15. Experiments • Evaluation of overall tone correctness Figure 7: Tone error percentages of synthesized speech from 8 different clustering styles

  16. Experiments • Evaluation of tone correctness for each tone type Figure 8: Tone error percentages of synthesized speech from 8 different clustering styles categorized by tone types;

  17. Experiments • Evaluation of syllable duration distortion Figure 9: Scores of a paired-comparison test for natural duration among 4 different clustering styles;

  18. Examples of synthesized speech

  19. Conclusions • An analysis of tree-based context clustering of an HMM-based Thai speech synthesis system has been conducted in this paper. • Four structures of decision tree were designed according to tone groups and tone types to obtain higher correctness of tone of synthesized speech. • The results show that the tone-separated tree structures can reduce the tone error percentage of the synthesized speech compared to the single binary tree structure significantly. • As for using the contextual tone information in the syllable level, it can improve the tone correctness for all structures of decision tree. • There are some distortions of the syllable duration appearing in the case of using the simple tone-separated tree context clustering with a small amount of training data, however it can be relieved when using the constancy-based-tone-separated or the trend-based-tone-separated tree context clustering. • The analysis of tone correctness of the average-voice-based speech model and the intonation analysis issues are anticipated to be studied in the future.

More Related