130 likes | 271 Vues
This document outlines the process of building a diphone voice for Catalan, detailing the defined phoneset which includes 34 Catalan and 2 Spanish phones. It discusses the generation of the diphone schema, the mapping of Catalan phones to English, and the recording and labeling of prompts. The project faced challenges such as ambient noise during recordings and variations in pitch and duration. The implementation involved automatic and manual processes for pitch mark extraction and testing phone synthesis. Additionally, efforts in tokenization, lexicon enhancement, and prosody are highlighted.
E N D
Building a Catalan diphone voice Ariadna Font Llitjos May 10, 2001
Defining the phoneset • Most Catalan phones (34) plus 2 Spanish phones (th and jj) • Reason: All Catalan speakers also have Spanish phones, and there are many Spanish borrowed words that are in most Catalan speaker’s lexicon • Left out phones that need a much finer classification than the ones made for English phones (beta, gamma, etc)
Generating Diphone Schema • Mostly same as Spanish, but with the new set of phones. • Catalan has 8 vowels (w/o considering stress), whereas Spanish has only 5 -> had to add a level of vheight (high mid-high mid-low low) ( draw graph on the board) • Mapping Catalan phones to a predefined set of phones • Over generative. Voice better suited to pronounce foreign or nonsense words that contain phones in the language but no legal combination of those
Mapping Catalan phones to a predefined set of phones • Options: Spanish and English • My choice: English • Reasons: • English has more phones for vowels, more appropriate than Spanish, • Spanish phones have already been mapped to English phones, better to just map the phones directly to English, rather than indirectly
Generating and recording the prompts • 1109 prompts (recorded on festvox0) • Lots of room noise (typing, door, talking, etc.) • Microphone not always in same position • Different power and even different intonation and duration throughout the whole recording process
Labeling nonsense words • Automatically: • make_labs • make_diph_index • Manually: • Find a set of diphones that are wrong and look them up in dic/afldiph.est • Edit and correct the corresponding file with emulabel • Rerun make_diph_index (etc.)
Extracting pitchmarks and LPS coefficients • Automatically: • make_pm_wav (edit to modify pitch range of speaker) • find_powerfactors (tells us what general power difference exists between files, calculated a table of power modifiers for each file) • make_lpc
Testing phone synthesis • (SayPhones ‘(pau o l a pau s o k l a r i a d n a pau)) • Catalan voice • Spanish voice • English voice (modifying the phones)
Catalan voice is still quite bad • Bad example • But it does have a basic Spanish phone… and without it, it would sound like this And here is how kal_diphone sounds
Added tokenization • To be able to tell the numbers in Catalan (followed the Spanish tokenizer) Show file
Added some lexical entries • Letters of the alphabet, symbols, punctuation, some content words…
Phrasing, duration and intonation • Not there yet • Nor can I get it to SayText
Summary: building a diphone voice • Define phoneset • Generate diphone schema • Generate prompts • Record prompts • Label prompts • Extract pitchmarks and LPC coefficients • Test phone synthesis • Hand correct labels • Add tokenizer • Add lexicon • Add prosody, durations and intonation • Test and evaluate voice • Package for distribution