1 / 9

Topic: Tuning the Pitch Markers Prerequisite: Pitch Extraction

Topic: Tuning the Pitch Markers Prerequisite: Pitch Extraction. Kishore Prahallad Email: skishore@cs.cmu.edu Carnegie Mellon University & International Institute of Information Technology Hyderabad. Objective of this Lecture. To tune the pitch markers for better quality synthesis.

gusty
Télécharger la présentation

Topic: Tuning the Pitch Markers Prerequisite: Pitch Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic: Tuning the Pitch MarkersPrerequisite: Pitch Extraction Kishore Prahallad Email: skishore@cs.cmu.edu Carnegie Mellon University & International Institute of Information Technology Hyderabad Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  2. Objective of this Lecture • To tune the pitch markers for better quality synthesis Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  3. Pitch Marks • Why Pitch Marks: • In speech synthesis, pitch synchronous processing is commonly employed to extract features and during concatenation. (different from block processing) • Pitch synchronous processing leads to smoother concatenation of two speech segments (thus better quality) • Pitch extraction is done through autocorrelation based algorithm • Implementation details may be necessary to tune the pitch • Tune the parameters of pitch extraction to tune to the specific speaker (your voice talent) • Reference: http://festvox.org/bsv/bsv-pitchmarks-sect.html Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  4. What you need to do to extract better pitch marks? • Read: http://festvox.org/bsv/bsv-pitchmarks-sect.html • STEP 1: • Open bin/make_pm_wave • Edit the line PM_ARGS • min, max correspond to *expected* time difference between two major peaks in the autocorrelation sequence • -min 0.005 (-min 0.0016 for female) • -max 0.012 (-max 0.007 for female) • -lx_lf 200 (400 depending for female) • -lx_hf 40 (200 depending for female) Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  5. Step 2: Check the output • Modify the script to your approximate needs, • run it on a single file, • then run the script that translates the pitchmark file into a labeled file suitable for emulabel • bin/make_pm_wave wav/awb_0001.wavbin/make_pm_pmlab pm/awb_0001.pm • You can the display the pitchmark with emulabel etc/emu_pm awb_0001 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  6. Step 2 • A good pitch marks would be as shown above, (red lines at the maximum amplitude positions) • If they are not repeat Step 1 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  7. Step 3: Rebuild the Voice Once new labels and new pitch marks are extracted repeat the following steps. • 6. Smooth the pitch markers bin/make_pm_fix pm/*.pm • 7. Generate Mel Cepstral coefficients bin/make_mcep wav/*.wav • 8. Generate Utterance Structure festival -b festvox/build_ldom.scm '(build_utts "etc/time.data")' • 9. Cluster the units festival -b festvox/build_ldom.scm '(build_clunits "etc/time.data")' • 10. Test the voice. festival festvox/iiit_time_pra_ldom '(voice_iiit_time_pra_ldom)' Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  8. Evaluation • Compare the voice samples synthesized • before and after changing pitch marks (no change in the labels) • Better labels + better pitch marks Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  9. Additional Reading for the lecture • http://festvox.org • 11-752 CMU Course Lecture Notes • http://festvox.org/festtut/notes/festtut_toc.html • http://festvox.org/bsv/bsv-pitchmarks-sect.html Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

More Related