html5-img
1 / 33

A Text-to-Speech Synthesis System

A Text-to-Speech Synthesis System. Presented By: Michael Beddaoui Abdel-Aziz El-Solh. Presentation Outline. Introduction Background 3 Components of TTS System Text Pre-processing Aziz Prosody Mike Concatenation Mike Summary What has been done / Future Work Conclusion Questions.

duy
Télécharger la présentation

A Text-to-Speech Synthesis System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh

  2. Presentation Outline • Introduction • Background • 3 Components of TTS System • Text Pre-processing Aziz • Prosody Mike • Concatenation Mike • Summary • What has been done / Future Work • Conclusion • Questions

  3. What is a TTS System? Definition: • A system which takes as input a sequence of words and converts them to speech Applications: • Services for the hearing impaired • Reading email aloud Commercial TTS Systems: • Festival • Bell Labs TTS

  4. Different TTS Systems • Phonemes are: • The minimal distinctive phonetic units • Relatively small in number (39 phonemes in English) Phoneme-Based TTS System Disadvantage: Phonemes ignore transitional sound !!!

  5. Different TTS Systems (cont’d) Diphone-Based TTS System • Diphones are: • Made up of 2 phonemes • Incorporate transitional sound • Make for better sounding speech Disadvantage: Over 1500 diphones in the English language !!!

  6. Fundamental Components TTS System words Text Pre-processing Prosody Concatenation

  7. Text Pre-Processing • Input • String of characters (sentence) • Output • String of diphone symbols • Objective • Perform sentence level analysis • Punctuation marks • Pauses between words • Convert all input to corresponding diphones

  8. Text Pre-Processing (Block Diagram) NumberConverter NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

  9. Number Converter • Replace numerals with their textual versions 100 one hundred • Handle fractional and decimal numbers 0.25 point two five

  10. Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

  11. Acronym Converter • Replace acronyms with single letter components A.B.C. A B C • Change abbreviations to full textual format Mr. Mister

  12. Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

  13. Word Segmenter • Divide sentence into word segments • Special delimiter to separate segments (i.e. ‘||’) • Segments can be: • A single word • An acronym • A numeral • Identify punctuation marks

  14. Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

  15. Word To Diphone Converter (Phonetization) • Purpose • Translate words to their diphone representations • Resource • Dictionary of words and their diphones (derived from CMU phoneme database) • Over 175,000 words supported

  16. W-to-D Converter Cont’d • Implementation • Binary Search Algorithm in C • Start with whole dictionary as search range • start index, end index, middle index • If target word alphabetically less then middle word, • then ignore second half (i.e. end index = middle index) • else ignore first half (i.e. start index = middle index) • Repeat until word found or range contains zero words

  17. W-to-D Converter Cont’d • Advantages • Fast search times • Search range decreases exponentially with each iteration (max of 1 sec currently) • Less complicated to implement • Compared to indexing dictionary or • Importing the dictionary to an internal structure

  18. Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS MLDS Diphone Dictionary

  19. The Multi-Level Data Structure • Contains all necessary data for the next sub-system: • Word • Diphone representation • Prosodic parameters for each diphone • This reflects both word-level and sentence- level prosody • Allows for modularization

  20. Prosody done MLDS Acoustic Manipulation Diphone Retrieval Concatenation yes no Diphone Database

  21. Diphone Retrieval • Database of recorded diphones • Every diphone matched with txt file • Distinguished by type (CC, CV, VC, VV) • References to specific components within waveform • Store diphone waveform and prosodic parameters in variables

  22. Properties of Speech Signals eg. cat.wav c a t Non- Periodic Periodic Non- Periodic

  23. Acoustic Manipulation - MATLab • Recognizes wave files (.WAV) • load, play, write • Vast array of signal processing tools • Built-in functions • Ease of debugging • GUI-capable

  24. Pitch/Duration/Amplitude Alteration • As pitch increases, pitch period shrinks • As pitch decreases, pitch period expands • Need to alter length between pitch marks in order to alter pitch of speech signal Pitch – vowels only

  25. Altering Pitch = X Hanned pitch period Original diphone Extracted pitch period Hanning window ‘C_A’

  26. Altering Pitch Cont’d PSOLA – Pitch Synchronous Overlap and Add = 50% Overlap + Add Pitch Up > 50% Pitch Down < 50%

  27. Altering Pitch Cont’d X Kaiser window X 12 -naturally spoken vowels contain 12-18 pitch marks =

  28. Altering Duration • Increase number of PSOLA iterations (overlaps) to increase duration • Decrease number of PSOLA iterations (overlaps) to decrease duration Altering Amplitude • Multiplying the signal by a constant • If constant > 1, amplitude increase • If constant < 1, amplitude decrease

  29. Concatenation • Using PSOLA at the joining ends • Ensures smooth transition Diphones Words Words Sentence • Straight joining at the end points due to presence of pauses

  30. Summary TTS System words Text Pre-processing Prosody Concatenation • System modularized

  31. Progress • Work Completed / Current Status • Text pre-processing and prosodic manipulation for a multi-syllable word • Diphone concatenation • 200+ diphones in database • Fully functional GUI implemented • Work To Be Done • Sentence level synthesis • Expand diphone database • Fine-tuning and enhancing • Prepare for Poster Fair • Write final report

  32. Questions? Contact Information Michael Beddaoui mich121212@hotmail.com Abdel-Aziz El-Solh zizo01@hotmail.com

More Related