Download
data driven approach to rapid prototyping xhosa speech synthesis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data-driven approach to rapid prototyping Xhosa speech synthesis PowerPoint Presentation
Download Presentation
Data-driven approach to rapid prototyping Xhosa speech synthesis

Data-driven approach to rapid prototyping Xhosa speech synthesis

124 Vues Download Presentation
Télécharger la présentation

Data-driven approach to rapid prototyping Xhosa speech synthesis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University South Africa

  2. Introduction • Japan-South African Intergovernmental Science and Technology Cooperation Programme. • Goals: • Understand what is needed from a linguistic and technology standpoint. • Build a text-analysis front-end. • Experimental platform.

  3. Outline • Xhosa: • orthography, • phonetics, • tone • Approach: • Text analysis, • HTS.

  4. Xhosa • Xhosa is spoken in South Africa, by about 8 million people. • One of the official languages of South Africa • Writing system is relatively young, and based on English letters. • Many dialects. • Borrowed clicks from Khoisan.

  5. Xhosa: Orthography Agglutinative language. Nouns: • 15 classes (including plural & singular). • Nouns affixed for dimunitive. Verbs: • Verbs affixed according to subject, tense, negative etc. Examples: teach: -fund- preacher (teacher): umfundisi  u + m(u) + fund + is + i small preacher: umfundisana  u + m(u) + fund + is + ana He/she will teach them: uzakubafundisa  u + za + ku + ba + fund + is + a

  6. Xhosa: Phonetics Consonants: • Implosive /b/ • Ejectives and aspirated versions of stops. • 15 Clicks Vowels • Five basic vowels, including long versions.

  7. Xhosa: Tone • According to the literature, it’s a tone language. • High, Low, and Falling tones. • Recent dictionary: has tone marked for root morphemes, rules can be constructed to predict movement under morphological composition. • Recent work: • Downing, Roux, argue for accent. • Kuun: Statistical experiment suggests highly regular structure. • Observed regularity on pitch rises and duration increase gives a simple method to use in a first prototype.

  8. Approach Focus on language dependent components: • Build the text analyser, • use an existing synthesiser. Choice: HTS 2.0 • Model driven, trainable synthesiser. • Contains language independent F0 and duration models • Good use of synthesis database by predicting spectrum, F0 and segment duration separately.

  9. HTS

  10. HTS: Symbolic Features Each segment of audio (HMM state) is labelled according to its linguistic context Examples: • Phonetic context: labels of preceding and following phones. • Parts-of-speech. • Stress or canonical tone. • Counting.

  11. Text Analyser Components Components: • Orthographic to phonetic • Morphological analysis • Parts-of-speech • Canonical tone marks

  12. Orthographic to Phonetic • The orthography is very young, and highly consistent with the pronunciation. • Hand-written letter-to-sound rewrite rules. • Lexicon for loan words.

  13. Morphology • Specially bootstrapped from a Zulu version for this project. • Requires a lexicon of root morphemes. • Works with isolated words. • Ambiguous! • Ideal: root morpheme boundaries, affix types, POS tagger for disambiguation. • Implemented: None

  14. Parts-of-Speech • Morphological analysis. • Ideal: POS tagger. • Implemented: Exhaustive lists of closed sets – pronouns, conjunctions, prepositions, etc.

  15. Tone • A printed dictionary with canonical tone markings for root morphemes is available. • Rules can be constructed to determine movement of at least High tones, under morphological composition. • Highly regular structure: 3rd-from-last syllable starts high pitch excursion, 2nd-from-last syllable lengthened. • Ideal: Exhaustive specification of set tones • Implemented: Word-level syllable counts (3-1, 2-2, 1-3)

  16. Tests • Basic intelligibility test:Listeners asked to transcribe what they hear. • Incomplete phrases. • Two versions of the question set, and natural utterances (recoded) • Mother-tongue and second language speakers. • Impressions: • “He’s from the townships.” • “That’s perfect, there’s nothing wrong with that.” • Also frowns and repeats.

  17. Next Steps • Comprehension test? • Impressions. • Baseline comparative/preference test. • Improvements • Question phrases. • Information from morphological analysis. • Canonical tone markings. • Zulu

  18. Conclusion • The system worked very well, considering the bare minimum of knowledge currently incorporated. • Data driven approach with HTS well suited to bootstrapping a new language. • Got experimental platform

  19. Demos “Ubangele amadoda amaninzi kule lali,” • Natural: • Synthesised: “waqalisa ukunqwenela ukuba nomzi.” • Natural: • Synthesised: Click song: