1 / 19

5-Text To Speech (TTS) Speech Synthesis

5-Text To Speech (TTS) Speech Synthesis. Speech Synthesis Concept Phone Units Phone Sequence To Speech Speech Naturalness Concatenative Approaches Rule-Based Approaches. Speech Synthesis Concept. Text. Speech. Speech. Text to Phone Sequence. Phone Sequence to Speech. Text.

loc
Télécharger la présentation

5-Text To Speech (TTS) Speech Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5-Text To Speech (TTS) Speech Synthesis • Speech Synthesis Concept • Phone Units • Phone Sequence To Speech • Speech Naturalness • Concatenative Approaches • Rule-Based Approaches

  2. Speech Synthesis Concept Text Speech Speech Text to Phone Sequence Phone Sequence to Speech Text Natural Language Processing (NLP) Speech Processing

  3. Phone Units • Paragraph ( ) • Sentence ( ) • Word (Depends on the language. Usually more than 100,000) • Syllable • Diphone & Triphone • Phoneme (Between 10 , 100)

  4. Phone Units (Cont’d) • Diphone : We model Transitions between two phonemes . . . . . p1 p3 p2 p4 p5 Diphone Phoneme

  5. Phone Units (Cont’d) • In farsi we have 30 Phoneme. so we have 30*30 Diphone Theoretically. • Practically the only Diphone that we don’t have in farsi is /zho/ • we have 27000 Triphone Theoretically. But practically we have about 15000 Triphone in farsi.

  6. Phone Units (Cont’d) • Syllable = Onset (Consonant) + Rhyme • Syllable is a set of phonemes that exactly contains one vowel • Syllables in Farsi : CV , CVC , CVCC • We have about 4000 Syllables in farsi • Syllables in English :V, CV , CVC ,CCVC, CCVCC, CCCVC, CCCVCC, . . . • Number of Syllables in English is very much

  7. Phone Sequence To Speech • Concatenative Approaches : Trade-Off between Naturality And Memory usage and variety of desired functions • Rule-Based Approaches : The most important Rule-Based approach is Klatt method

  8. Phone Sequence To Speech (Cont’d) Phone Sequence to primitive utterance primitive utterance to Natural Speech Text to Phone Sequence Speech Text NLP Speech Processing

  9. Speech Naturalness • Obviation of undesirable noise and distortion and dissociation from speech • Prosody generation • Speech energy • Duration • pitch • Intonation • Stress

  10. Speech Naturalness (Cont’d) • Intonation and Stress are very effective in speech naturalness • Intonation : Variation of Pitch frequency along speaking • Stress : Increasing the pitch frequency in a specific time

  11. Concatenative Approaches • In this approaches we store units of natural speech for reconstruction of desired speech • We could select the appropriate phone unit for speech synthesis • we can store compressed parameters instead of main waveform

  12. Concatenative Approaches (Cont’d) • Benefits of storing compressed parameters instead of main waveform • Less memory use • General state instead of a specific storedutterance • Generating prosody easily

  13. Concatenative Approaches (Cont’d) Type of Storing Phone Unit Paragraph Sentence Word Syllable Diphone Phoneme Main Waveform Main Waveform Main Waveform Coded/Main Waveform Coded Waveform Coded Waveform

  14. Concatenative Approaches (Cont’d) • Pitch Synchronous Overlap-Add-Method (PSOLA) is a famous method in phoneme transmit smoothing • Overlap-Add-Method is a standard DSP method • PSOLA is a base action for Voice Conversion. • In this method in analysis stage we select frames that are synchronous by pitch markers.

  15. Rule-Based Approach Stages • Determine the speech model and model parameters • Determine type of phone units • Determine some parameter amount for each phone unit • Substitute sequence of phone units by its equivalent parameter sequence • Put parameter sequence in speech model

  16. KLATT 80 Model

  17. KLATT 88 Model

  18. THE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZER FNP FNZ FTP FTZ F1 B1 BNP BNZ BTP BTZ DF1 DB1 F2 B2 F3 B3 F4 B4 F5 B5 GLOTTAL SOUND SOURCES TL CASCADE VOCAL TRACT MODEL LARYNGEAL SOUND SOURCES F0 AV OO FL DI SS CP + AH ANV SO A1V + + - + - - B2F A2F A2V + B3F A3F AF A3V B4F A4F A4V B5F + - + - + - A5F ATV B6F F6 A6F PARALLEL VOCAL TRACT MODEL LYRYNGEAL SOUND SOURCES (NORMALLY NOT USED) AB BYPASS PATH PARALLEL VOCAL TRACT MODEL FRICATION SOUND SOURCES

  19. Three Voicing Source Model In KLATT 88 • The old KLSYN impulsive source • The KLGLOTT88 model • The modified LF model

More Related