1 / 21

Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009). Kishore Prahallad Email: kishore@iiit.ac.in International Institute of Information Technology (IIIT) Hyderabad, India & Language Technologies Institute, Carnegie Mellon University. Objective.

arama
Télécharger la présentation

Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a Limited Domain Voice Using Festvox(Workshop Talk at IIT Kharagpur, Mar 4-5, 2009) Kishore Prahallad Email: kishore@iiit.ac.in International Institute of Information Technology (IIIT) Hyderabad, India &Language Technologies Institute, Carnegie Mellon University Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  2. Objective • Objective: To provide introduction to the inner details of Festival Synthesis system • Best Resources: Documentation of Festival, Festvox and Speech Tools and their mailing lists • Topics: • Festival, Festvox and Speech Tools • Modules and data structures in Festival • Synthesis Flow • Building a limited domain voice Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  3. Festival & Speech Tools • Festival • Full text to speech system • Multi-lingual • A general framework for building new voices in existing and new languages • APIs: Shell Level, C++ Library, Emacs interface • Speech Tools • A set of modules for common tasks found in speech processing • Example: Feature Extraction • Interface: Stand alone executables and a set of library calls linked into user programs Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  4. Festvox • Voice building tool • Interface created on top of Festival and Speech Tools to build voices Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  5. How Festival, Festvox & Speech Tools are Related Speech Tools Festival Multi-lingual Synthesis Engine Festvox Environment To build voices Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  6. Output of Festvox • Festvox uses SpeechTools and Festival to create a new voice • The Voice created is put back into Festival framework to synthesize text Speech Tools Festival Multi-lingual Synthesis Engine Festvox Environment To build voices Voice Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  7. User Interface with Festival User World Speech Tools Festival Multi-lingual Synthesis Engine Festvox Environment To build voices Voice Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  8. Some Festival-Specific Terminology • Utterance: *Name* of a data structure used in Festival • Segment: A phone is referred to as segment Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  9. Tokens: • White Space separated • European language: Space, CR, newline, tab, • vertical tab etc.. • Asian Languages: No white space separators – • Use dictionaries • Punctuation: • The boy----was usually late-----but arrived on time!! • We have orange/apple/banana flavors Basic Modules of Festival TTS system There are many modules in the Festival system - the basic modules used for text-to-speech are: • Token_POS • basic token identification • Token • Apply the token to word rules (handle non-standard words) • POS • A standard part of speech tagger • Phrasify • A Chunker, detect the phrase boundaries • Word • Implements letter to sound rules Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  10. Basic Modules of Festival TTS system contd.. • Pauses • Prediction of pauses, inserting silences. • Intonation • Prediction of accents: Which syllables have accent (stress) • PostLex • Post lexicon rules that can modify segments based on their context. This is used for things like vowel reduction, contractions, etc. • Duration • Prediction of durations of segments. • Int_Targets • Realization of F0 contour: given the accents/tones generate an F0 contour. • Wave_Synth • A general function that in turn calls the appropriate method to actually generate the waveform. Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  11. Data Structure in Festival • Utterance: A dashboard data structure (as all modules read/write on a common memory) • *Utterance* is the input and the output of every module in the Festival Utterance Utterance Module Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  12. Utterance consist of ? • *Items* and *Relations* • Items: • It is an object to store strings representing word, segment etc. • Relation: • A graph which links the items • For example: “syllable” is a relation which links the items storing segment-names together Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  13. What Each Module Does to an Utterance • Each module access *items* and *relations* in an utterance and generate new features, items and relations in the same utterance • For ex: Token_POS • Input: Utterance with one item - a string representing a sentences • Output: Utterance with multiple items – each item represents a token • Synthesis process in Festival is viewed as applying a set of modules to an utterance Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  14. Synthesis Flow Relations Modules June 25 Text Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  15. Synthesis Flow Relations Modules June 25 Text Tokenize Token June 25 Token2Word Word June Twenty Fifth Noun POS Num Num Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  16. Synthesis Flow Word June Twenty Fifth Noun POS Num Num Word 1 1 1 0 Syllable Segment jh uu n t w e n t ii f i f th Wave Synthesize Wave Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  17. Installation of Festival & Festvox • Step 1: Install Speech tools • Step 2: Install Festival • Synthesize text in English to check the sound card, rate of speech etc. • Step 3: Install Festvox • Detailed Notes available from course web site Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  18. Building Limited Domain • Unit selection is applied to a limited with restricted vocabulary • High quality speech systems • Units are words • Implementation in Festival: • The units are still phone, but are restricted to be coming from a specific word • /p/ from “Pennsylvania” is differentiated from /p/ from “Pittsburgh” • To synthesize “Pittsburgh” all the phones should come from the word “Pittsburgh” (there may be many examples of the same word). • Talking clock, Weather Prediction, Rail/Air Inquiry Systems • http://www.cs.cmu.edu/~awb/papers/ICSLP2000_ldom/index.html Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  19. Limited Domain Setup (http://festvox.org/bsv/bsv-ldom-ch.html) • 1. Set the Environment: $FESTVOXDIR/src/ldom/setup_ldom iiit time pra #This would give a talking clock set up. #To change it to any another domain, all you have to do is to replace "etc/time.data" #with the domain specific training sentences. #For non-english languages, these sentences are transliterated in English. • 2. Generate Prompts • Synthesize the sentence which *you* are going to speak • How can you synthesize? – mostly applicable to English languages only • Why Synthesize at all? – To *prompt* you what to speak festival -b festvox/build_ldom.scm '(build_prompts "etc/txt.done.data")' • 3. Record prompts • For new languages, switch off the * playing of the prompt* by commenting na_play in bin/prompt_them bin/prompt_them etc/txt.done.data • 4. Label Automatically • Uses dynamic programming for labeling the speech • Labeling builds the correspondence between the text and the speech bin/make_labs prompt-wav/*.wav • 4.1 Manually correct the labeling errors emulabel etc/emu_lab time0001 Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  20. Contd… • 5. Generate Pitch markers bin/make_pm_wave wav/*.wav • 6. Correct the pitch markers bin/make_pm_fix pm/*.pm • 7. Generate Mel Cepstral coefficients bin/make_mcep wav/*.wav • 8. Generate Utterance Structure festival -b festvox/build_ldom.scm '(build_utts "etc/txt.done.data")' • 9. Cluster the units festival -b festvox/build_ldom.scm '(build_clunits "etc/txt.done.data")' • 10. Test the voice. festival festvox/iiit_time_pra_ldom '(voice_iiit_time_pra_ldom)' • To see the units selected (set! utt (SayText "abhii samaya hai....") (clunits::units_selected utt "-") Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

  21. References • http://festvox.org • 11-752 CMU course slides • http://festvox.org/festtut/ • 11-752 CMU Course Lecture Notes • http://festvox.org/festtut/notes/festtut_toc.html • Building Synthetic Voices • http://www.festvox.org/bsv/ • The Festival Speech Synthesis System • http://www.festvox.org/docs/manual-1.4.3/festival_toc.html • Edinburgh Speech Tools Library • http://www.festvox.org/docs/speech_tools-1.2.0/book1.htm Kishore Prahallad (kishore@iiit.ac.in), IIIT Hyderabad

More Related