1 / 29

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model. Didier Cadic 1 , engineering student supervised by Olivier Cappé 1 , Maurice Charbit 1 , Gérard Chollet 1 , Eric Moulines 1 (presented here by Guido Aversano 1,2 ) 2 IIASS, Vietri sul Mare (SA), Italy.

vince
Télécharger la présentation

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic1, engineering studentsupervised by Olivier Cappé1, Maurice Charbit1, Gérard Chollet1, Eric Moulines1 (presented here by Guido Aversano1,2) 2IIASS, Vietri sul Mare (SA), Italy 1Département TSI, ENST, Paris, France

  2. Plan of the presentation • Text-to-speech: classic methods • HNM model • Analysis • Synthesis • Analysis-Synthesis examples • Conclusions

  3. Text-To-Speech by concatenation Examples realized on the AT&T web site: English, male English, female (vocal server example) English, female (another vocal server example) German, male French, female

  4. Text-To-Speech by concatenation 2 major challenges : • smooth connection between acoustic units • flexible prosody

  5. Analysis : TD-PSOLA method • Pitch estimation • Pitch-synchronous windowing Synthesis : • Rearrangement of frames

  6. TD-PSOLA method Some very good-quality results: • Time-scaling Singing, original Singing, modified • Pitch-shifting Cello, original Cello, modified

  7. Artifacts appearing in non-voiced sounds: TD-PSOLA method "ss", original "ss", slowed down (classic method) "ss", slowed down (improved) "rain", original "rain", 0.5 rate

  8. Phase Vocoder method Intuitive description: Compression/stretchingof (narrow-band) spectrogram’s time-frequency scales… time-scaling pitch-shifting

  9. Main problem : Phase Vocoder method • phase coherence is lost in the synthesized signal Examples : "rain", male voice Slow-motion by Vocoder (PSOLA : ) "The quick fox …", female voice Slow-motion by Vocoder

  10. We need a parametric model • TD-PSOLA and Vocoder allow basic prosodic modifications. • The problem of unit concatenation for TTS is not solved. • Other kinds of modifications (timbre, denoising, …) should be considered.

  11. Harmonic plus Noise Model (HNM) • Main assumption : • stationary segments of a speech signal can be always seen as the superposition of a periodic and a noisy part

  12. HNM Model Modelling : = + S(t) H(t) B(t) where : H(t) =  Ak cos ( 2 k f0 t + k ) and B(t) = white noise passed through an AR filter

  13. HNM analysis of a frame • Pitch estimation  Spectral comb method

  14. HNM analysis of a frame • Pitch estimation "aka…aga" • Good results are obtained • In some cases the method erroneously returns f0/2 • Possibility of tracking…

  15. min s(t) – H(t) 2 ak, bk HNM analysis of a frame • Harmonic part: extraction of amplitudes  Least squares method H(t) = akcos ( 2k f0 t ) + bksin ( 2k f0 t )

  16. HNM analysis of a frame • Extraction of amplitudes Problem: the noisy part gives anon-null contribution to the spectral power • Gain correction for the harmonics(using an euristic formula g(DV), where DV is the estimated voicing degree)

  17. HNM analysis of a frame • Extraction of amplitudes  Residual: R(t) = s(t) - H(t)

  18. HNM analysis of a frame • Extraction of amplitudes  Possibility of improving harmonic estimation

  19. 1 a0 + a1 z-1 + … + aN z-N HNM analysis of a frame AR filter estimation for the residual: R(t) = Bg F(t) where Bg = gaussian white noise and F(t) = AR filter, F(z) =  Linear prediction method

  20. . k(ta) = 2k f0(ta) is known by pitch analysis HNM Synthesis • Interpolation for each harmonic between two succesive frames H(t) = ak(t)cos ( 2k f0(t)t ) + bk(t)sin ( 2k f0(t)t ) = =  Ak(t)cos k(t) Ak(ta) and k(ta) are known at analysis instants ta

  21. HNM Synthesis Erroneous pitch (usually f0/2) • harmonic correspondence problem is solved introducing fictitious harmonics

  22. Linear interpolation Unwrapping + cubic interpolation HNM Synthesis Ak cos k(t) 

  23. HNM Synthesis Noisy part • Generation of normally distributed random numbers • AR filtering (abrupt changes of coefficients between 2 windows have no incidence…)

  24. original original original original original original "wazi" : a-e-i-o-u : Tuba : "Carottes" : singing : "Lawyer" : synthesized synthesized synthesized synthesized synthesized synthesized HNM Synthesis Results

  25. original original original original "coiffe" : Discours : "aka aga" : Andie : synthesized synthesized synthesized synthesized HNM Synthesis Results original synthesized Dussolier : noisy part

  26. Synthesis with time-stretching Synthesis instants (ts)  Analysis instants (ta) The following parameters remain unchanged: • Noisy part parameters • The pitch • The amplitudes Ak of the harmonics

  27. Synthesis with time-stretching Phase adaptation • Simple phase trajectories resampling or • "harmonic" rephasing original a-e-i-o-u : slow-motion with phase "stretching" slow-motion with "harmonic" rephasing

  28. Final results Synthesized with rate : Original 1 0.4 0.5 0.6 0.7 0.8 1.2 1.5 2 "carottes" : "lawyer" : tuba : "wazi" : singing : "a-e-i-o-u" : Dussolier : Discours : Andie : "aka aga": "coiffe" :

  29. Conclusions • Good results, showing method’s potential for different applications including TTS • Future work will include other kinds of modifications (pitch shifting, timbre etc.)

More Related