1 / 23

Speaking Style Conversion

Speaking Style Conversion. Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012. Apply VC principles to a different problem…. Speech Intelligibility Context. Speech is often heard in adverse conditions Noisy environments Listener has difficulty hearing/understanding

duena
Télécharger la présentation

Speaking Style Conversion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012

  2. Apply VC principles to a different problem… E.Godoy, Speaking Style Conversion

  3. Speech Intelligibility Context • Speech is often heard in adverse conditions • Noisy environments • Listener has difficulty hearing/understanding • How to transform speech to make it more intelligible…? • To make speech synthesis systems more effective noise no noise Example of speech with environmental barriers:  the speech is not very intelligible! E.Godoy, Speaking Style Conversion

  4. Intelligible Speaking Styles • Lombard speech • Speaker is immersed in noise • Human reflex to increase the speech loudness • Clear speech • Listener faces barrier (noise, hearing, language,…) • Speaker adapts strategy to increase speech clarity normal Lombard casual clear E.Godoy, Speaking Style Conversion

  5. VC to improve speech intelligibility? • Voice Conversion • Modify speech to change the speaker identity • Learn transformation from source-to-target speaker • Speaking Style Conversion • Modify speech to improve intelligibility • Determine transformation from normal-to-intelligible style • Spectral Envelope: still very important! E.Godoy, Speaking Style Conversion

  6. Overview: Analyses-to-Modifications • Acoustic analyses to identify (mainly spectral) characteristics of Lombard & Clear styles • Average Spectra • Vowel Spaces • Result of analyses inspire spectral modifications to improve intelligibility • Spectral energy band boosting (corrective filters) • Formant shifting (frequency warping) E.Godoy, Speaking Style Conversion

  7. Corpora • Lombard-normal: Grid • 8 speakers (4 male, 4 female) • 50 sentences each • LombardNinf96: most extreme (Lu & Cooke) • Clear-casual: LUCID read sentences • 8 speakers (4 male, 4 female) • 50 sentences each • Read speech: most exaggerated (Baker & Hazan) E.Godoy, Speaking Style Conversion

  8. Average Relative Spectra • Recall Amplitude Scaling in DFWA • Average Relative spectra is similar: • difference between normal (X) and intelligible (Y) style • Average across all frames E.Godoy, Speaking Style Conversion

  9. Average Relative Spectra (by Speaker) Clear-casual Lombard-normal E.Godoy, Speaking Style Conversion

  10. Average Relative Spectra (Overall) • Lombard speech: Spectral energy boosting “where formants are” (~500-4500Hz) • Clear speech: Varies depending on speaker strategy, extent of differences mild overall E.Godoy, Speaking Style Conversion

  11. Vowel Spaces (average for all speakers) • Lombard speech: Vowel Space Translation • Clear speech: Vowel Space Expansion E.Godoy, Speaking Style Conversion

  12. Inspiration for Speech Modifications • Spectral energy band boosting (Lombard) • Vowel space expansion (Clear) • Features attributed with increased speech intelligibility • Though not observed together in human speech production… • Signal processing algorithms can accomplish both! E.Godoy, Speaking Style Conversion

  13. Spectral Energy Band Boosting • Corrective Filters Lombard-inspired & Enhanced (high SII) Corrective Filter: Varying Gain E.Godoy, Speaking Style Conversion

  14. Frequency Warping for VS Expansion • Curve fitting formant shifts inspires warping… E.Godoy, Speaking Style Conversion

  15. Sound Samples With Noise (SSN, 0dB) • Original • Warp • Boost • BW No Noise • Original • WarpE • Boost • BW E.Godoy, Speaking Style Conversion

  16. Want more ? • See Maria’s presentation for more details …  E.Godoy, Speaking Style Conversion

  17. Voice & Speaking Style Conversion Parallels • Voice Conversion • Dynamic Frequency Warping + Amplitude Scaling (based on acoustic-phonetic spaces of source & target speakers) • Speaking Style Conversion • Frequency Warping + Corrective Filter • Clear-speech inspired frequency warping for vowel space expansion • Lombard-speech inspired corrective filters to increase loudness E.Godoy, Speaking Style Conversion

  18. Thank you! More Questions?

  19. Extras…

  20. Objective Metrics for Evaluation • Loudness • Energy in frequency bands weighted based on human hearing • Speech Intelligibility Index (SII) • Energy & modulations in frequency bands relative to a noise masker E.Godoy, Speaking Style Conversion

  21. Loudness Distributions • Lombard speech: “louder” for voiced (bi-modal) • Clear speech: not “louder” than casual speech • Transients: neither style distinguishes on average E.Godoy, Speaking Style Conversion

  22. Extended SII Distributions • extSII highly correlated with ave loudness • Lombard speech objectively more intelligible • Clear speech intelligibility gain not captured by extSII • limitations of objective intelligibility metrics E.Godoy, Speaking Style Conversion

  23. Observations from Analyses • Lombard Speech • Spectral boosting in inclusive formant region • Increase in Loudness (also extSII) • Vowel space translation, but no expansion • Clear Speech • Small changes in average spectra (slight spectral “flattening”) • Consistent vowel space expansion • Greater vowel discrimination • Comparison between styles • Acoustic differences • translate into perceptual distinctions • linked to intelligibility gains • Spectral boosting & Vowel space expansion: mutually exclusive E.Godoy, Speaking Style Conversion

More Related