1 / 15

Vocal Conversion from Speaking voice to Singing voice Using STRAIGHT

Vocal Conversion from Speaking voice to Singing voice Using STRAIGHT. Takeshi SAITOU 1 , Masataka GOTO 1 , Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST)

sovann
Télécharger la présentation

Vocal Conversion from Speaking voice to Singing voice Using STRAIGHT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vocal Conversion from Speaking voice to Singing voice Using STRAIGHT Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan Advanced Institute of Science and Technology (JAIST)

  2. Introduction Our research approach focuses on … not text-to-singing (lyric-to-singing) synthesis ♪ ♪ ♪ ♪ singing singing ♪ ♪ ♪ ♪ butspeech-to-singing synthesis (vocal conversion). ⇒Clarifying acoustic differences between singing and speaking. ⇒Developing novel applications for computer music production. speech

  3. Outline of the vocal conversion system • Vocal conversion system is • based on speech manipulation system • STRAIGHT(Kawahara et al,1998) and • comprises three types of model; • F0 control model • Duration control model • Spectral control model

  4. Inputs Speaking voice: reading the lyrics of a song. c v v c v c v c c Synchronization information Musical score

  5. 1st step: F0 control Musical score Musical notes • Overshoot :Deflection exceeding the target note after note change. Melody contour • Vibrato :Quasi-periodic frequency • modulation with 4 - 7 Hz. F0 control model: Adding four types of F0 fluctuation into musical note. • Preparation :Deflection in the • opposite direction of note change • observed just before note change. • Fine fluctuation :irregularly • fluctuations higher than 10 Hz • in full contour. F0 contour of singing voice

  6. 2nd step: analysis Speaking voice STRAIGHT (analysis part)

  7. 3rd step: duration control Spectral sequence AP sequence Duration control model: is lengthened according to the fix rate. is not lengthened. is lengthened so that the duration of the whole combination corresponds to the note duration.

  8. 4th step: spectral control 1 Lengthened Spectral and AP sequence Spectral envelope and AP of vowel part. Spectral control model1: Adding singing formant by emphasizing peak of spectral envelope and dip of AP. Modified spectral envelope and AP

  9. 5th step: spectral control2 Modified spectral and AP Generated F0 contour STRAIGHT (synthesis) Synthesized singing voice Spectral control model 2: Adding an amplitude modulation (AM) of formants synchronized with vibrato by adding AMs into amplitude envelope of the synthesized singing voice during vibrato. Synthesized singing voice (final version)

  10. Demonstration • Speaking voice (input): (male → female) • Synthesized singing voice: (male → female → chorus)

  11. Thank you!!

  12. F0 control model (saitouet al, 2004)

  13. Fix rates for lengthening consonant part ♪Calculating the ratios of the duration of each consonant in singing-voices to read speech We can control phoneme duration by controlling articulation manner rather than articulation positions: fricative 1.28, plosive 1.00, semivowel 2.37, nasal 1.43, /y/ 1.22

  14. Spectral characteristics in singing-voice • Singers’ formant: Remarkable peak of spectral at around 3 kHz. (Sundberg, 1974) • Amplitude modulationof formants synchronized with vibrato. (Hirano, 1985) Both features are remarkably contained to a professional singing-voice.

  15. Spectral control model Spectral control 2: Amplitude modulation of formants synchronized with vibrato in F0. Spectral control 1: Singing formant that is a remarkable peak of spectrum at around 3 kHz . 2000 Hz

More Related