Data-driven approach to rapid prototyping Xhosa speech synthesis

Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University South Africa

Introduction • Japan-South African Intergovernmental Science and Technology Cooperation Programme. • Goals: • Understand what is needed from a linguistic and technology standpoint. • Build a text-analysis front-end. • Experimental platform.

Outline • Xhosa: • orthography, • phonetics, • tone • Approach: • Text analysis, • HTS.

Xhosa • Xhosa is spoken in South Africa, by about 8 million people. • One of the official languages of South Africa • Writing system is relatively young, and based on English letters. • Many dialects. • Borrowed clicks from Khoisan.

Xhosa: Orthography Agglutinative language. Nouns: • 15 classes (including plural & singular). • Nouns affixed for dimunitive. Verbs: • Verbs affixed according to subject, tense, negative etc. Examples: teach: -fund- preacher (teacher): umfundisi  u + m(u) + fund + is + i small preacher: umfundisana  u + m(u) + fund + is + ana He/she will teach them: uzakubafundisa  u + za + ku + ba + fund + is + a

Xhosa: Phonetics Consonants: • Implosive /b/ • Ejectives and aspirated versions of stops. • 15 Clicks Vowels • Five basic vowels, including long versions.

Xhosa: Tone • According to the literature, it’s a tone language. • High, Low, and Falling tones. • Recent dictionary: has tone marked for root morphemes, rules can be constructed to predict movement under morphological composition. • Recent work: • Downing, Roux, argue for accent. • Kuun: Statistical experiment suggests highly regular structure. • Observed regularity on pitch rises and duration increase gives a simple method to use in a first prototype.

Approach Focus on language dependent components: • Build the text analyser, • use an existing synthesiser. Choice: HTS 2.0 • Model driven, trainable synthesiser. • Contains language independent F0 and duration models • Good use of synthesis database by predicting spectrum, F0 and segment duration separately.

HTS

HTS: Symbolic Features Each segment of audio (HMM state) is labelled according to its linguistic context Examples: • Phonetic context: labels of preceding and following phones. • Parts-of-speech. • Stress or canonical tone. • Counting.

Text Analyser Components Components: • Orthographic to phonetic • Morphological analysis • Parts-of-speech • Canonical tone marks

Orthographic to Phonetic • The orthography is very young, and highly consistent with the pronunciation. • Hand-written letter-to-sound rewrite rules. • Lexicon for loan words.

Morphology • Specially bootstrapped from a Zulu version for this project. • Requires a lexicon of root morphemes. • Works with isolated words. • Ambiguous! • Ideal: root morpheme boundaries, affix types, POS tagger for disambiguation. • Implemented: None

Parts-of-Speech • Morphological analysis. • Ideal: POS tagger. • Implemented: Exhaustive lists of closed sets – pronouns, conjunctions, prepositions, etc.

Tone • A printed dictionary with canonical tone markings for root morphemes is available. • Rules can be constructed to determine movement of at least High tones, under morphological composition. • Highly regular structure: 3rd-from-last syllable starts high pitch excursion, 2nd-from-last syllable lengthened. • Ideal: Exhaustive specification of set tones • Implemented: Word-level syllable counts (3-1, 2-2, 1-3)

Tests • Basic intelligibility test:Listeners asked to transcribe what they hear. • Incomplete phrases. • Two versions of the question set, and natural utterances (recoded) • Mother-tongue and second language speakers. • Impressions: • “He’s from the townships.” • “That’s perfect, there’s nothing wrong with that.” • Also frowns and repeats.

Next Steps • Comprehension test? • Impressions. • Baseline comparative/preference test. • Improvements • Question phrases. • Information from morphological analysis. • Canonical tone markings. • Zulu

Conclusion • The system worked very well, considering the bare minimum of knowledge currently incorporated. • Data driven approach with HTS well suited to bootstrapping a new language. • Got experimental platform

Demos “Ubangele amadoda amaninzi kule lali,” • Natural: • Synthesised: “waqalisa ukunqwenela ukuba nomzi.” • Natural: • Synthesised: Click song:

Data-driven approach to rapid prototyping Xhosa speech synthesis

Data-driven approach to rapid prototyping Xhosa speech synthesis

Presentation Transcript

RAPID PROTOTYPING

Rapid Prototyping

Rapid Prototyping

Rapid Prototyping

rapid-prototyping of rapid-prototyping machines

Rapid Prototyping

Model-driven rapid prototyping with Umple

Rapid Prototyping

Rapid Prototyping

RAPID PROTOTYPING

Rapid Prototyping

RAPID PROTOTYPING

Rapid Prototyping

Rapid Prototyping

Rapid Prototyping?

Rapid Prototyping

Rapid Prototyping

Rapid prototyping.

Database Driven Speech Synthesis Systems

Rapid Prototyping