1 / 9

Tuning the Unit Selection Voices (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

Tuning the Unit Selection Voices (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009). Kishore Prahallad Email: kishore@iiit.ac.in International Institute of Information Technology (IIIT) Hyderabad, India & Language Technologies Institute, Carnegie Mellon University. Better Labels. Research:

apria
Télécharger la présentation

Tuning the Unit Selection Voices (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tuning the Unit Selection Voices(Workshop Talk at IIT Kharagpur, Mar 4-5, 2009) Kishore Prahallad Email: kishore@iiit.ac.in International Institute of Information Technology (IIIT) Hyderabad, India &Language Technologies Institute, Carnegie Mellon University Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  2. Better Labels • Research: • Automatic segmentation models such as HMMs or neural networks could be tuned to obtain better labels. • Practical: • Use existing state-of-art speech segmentation algorithm • Manually verify and correct the misaligned labels • For small databases, manual correction is more apt. • Emulabel / Wavesurfer are the tools suited for this purpose. Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  3. Pitch Marks • Why Pitch Marks: • In speech synthesis, pitch synchronous processing is commonly employed to extract features and during concatenation. (different from block processing) • Pitch synchronous processing leads to smoother concatenation of two speech segments (thus better quality) • Pitch extraction is done through autocorrelation based algorithm • Implementation details may be necessary to tune the pitch • Tune the parameters of pitch extraction to tune to the specific speaker (your voice talent) • Reference: http://festvox.org/bsv/bsv-pitchmarks-sect.html Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  4. What you need to do to extract better pitch marks? • Read: http://festvox.org/bsv/bsv-pitchmarks-sect.html • STEP 1: • Open bin/make_pm_wave • Edit the line PM_ARGS • min, max correspond to *expected* time difference between two major peaks in the autocorrelation sequence • -min 0.005 (-min 0.0016 for female) • -max 0.012 (-max 0.007 for female) • -lx_lf 200 (400 depending for female) • -lx_hf 40 (200 depending for female) Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  5. Step 2: Check the output • Modify the script to your approximate needs, • run it on a single file, • then run the script that translates the pitchmark file into a labeled file suitable for emulabel • bin/make_pm_wave wav/awb_0001.wavbin/make_pm_pmlab pm/awb_0001.pm • You can the display the pitchmark with emulabel etc/emu_pm awb_0001 Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  6. Step 2 • A good pitch marks would be as shown above, (red lines at the maximum amplitude positions) • If they are not repeat Step 1 Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  7. Step 3: Rebuild the Voice Once new labels and new pitch marks are extracted repeat the following steps. • 6. Smooth the pitch markers bin/make_pm_fix pm/*.pm • 7. Generate Mel Cepstral coefficients bin/make_mcep wav/*.wav • 8. Generate Utterance Structure festival -b festvox/build_ldom.scm '(build_utts "etc/time.data")' • 9. Cluster the units festival -b festvox/build_ldom.scm '(build_clunits "etc/time.data")' • 10. Test the voice. festival festvox/iiit_time_pra_ldom '(voice_iiit_time_pra_ldom)' Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  8. Evaluation • Compare the voice samples synthesized • before and after changing pitch marks (no change in the labels) • Better labels + better pitch marks Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

  9. References • http://festvox.org • 11-752 CMU course slides • http://festvox.org/festtut/ • 11-752 CMU Course Lecture Notes • http://festvox.org/festtut/notes/festtut_toc.html • http://festvox.org/bsv/bsv-pitchmarks-sect.html Kishore Prahallad (kishore@iiit.ac.in), IIIT-Hyderabad

More Related