A Text-to-Speech Synthesis System

A Text-to-Speech Synthesis System Presented By: Michael Beddaoui Abdel-Aziz El-Solh

Presentation Outline • Introduction • Background • 3 Components of TTS System • Text Pre-processing Aziz • Prosody Mike • Concatenation Mike • Summary • What has been done / Future Work • Conclusion • Questions

What is a TTS System? Definition: • A system which takes as input a sequence of words and converts them to speech Applications: • Services for the hearing impaired • Reading email aloud Commercial TTS Systems: • Festival • Bell Labs TTS

Different TTS Systems • Phonemes are: • The minimal distinctive phonetic units • Relatively small in number (39 phonemes in English) Phoneme-Based TTS System Disadvantage: Phonemes ignore transitional sound !!!

Different TTS Systems (cont’d) Diphone-Based TTS System • Diphones are: • Made up of 2 phonemes • Incorporate transitional sound • Make for better sounding speech Disadvantage: Over 1500 diphones in the English language !!!

Fundamental Components TTS System words Text Pre-processing Prosody Concatenation

Text Pre-Processing • Input • String of characters (sentence) • Output • String of diphone symbols • Objective • Perform sentence level analysis • Punctuation marks • Pauses between words • Convert all input to corresponding diphones

Text Pre-Processing (Block Diagram) NumberConverter NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Number Converter • Replace numerals with their textual versions 100 one hundred • Handle fractional and decimal numbers 0.25 point two five

Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Acronym Converter • Replace acronyms with single letter components A.B.C. A B C • Change abbreviations to full textual format Mr. Mister

Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Word Segmenter • Divide sentence into word segments • Special delimiter to separate segments (i.e. ‘||’) • Segments can be: • A single word • An acronym • A numeral • Identify punctuation marks

Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Word To Diphone Converter (Phonetization) • Purpose • Translate words to their diphone representations • Resource • Dictionary of words and their diphones (derived from CMU phoneme database) • Over 175,000 words supported

W-to-D Converter Cont’d • Implementation • Binary Search Algorithm in C • Start with whole dictionary as search range • start index, end index, middle index • If target word alphabetically less then middle word, • then ignore second half (i.e. end index = middle index) • else ignore first half (i.e. start index = middle index) • Repeat until word found or range contains zero words

W-to-D Converter Cont’d • Advantages • Fast search times • Search range decreases exponentially with each iteration (max of 1 sec currently) • Less complicated to implement • Compared to indexing dictionary or • Importing the dictionary to an internal structure

Text Pre-Processing (Block Diagram) NumberConverter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS MLDS Diphone Dictionary

The Multi-Level Data Structure • Contains all necessary data for the next sub-system: • Word • Diphone representation • Prosodic parameters for each diphone • This reflects both word-level and sentence- level prosody • Allows for modularization

Prosody done MLDS Acoustic Manipulation Diphone Retrieval Concatenation yes no Diphone Database

Diphone Retrieval • Database of recorded diphones • Every diphone matched with txt file • Distinguished by type (CC, CV, VC, VV) • References to specific components within waveform • Store diphone waveform and prosodic parameters in variables

Properties of Speech Signals eg. cat.wav c a t Non- Periodic Periodic Non- Periodic

Acoustic Manipulation - MATLab • Recognizes wave files (.WAV) • load, play, write • Vast array of signal processing tools • Built-in functions • Ease of debugging • GUI-capable

Pitch/Duration/Amplitude Alteration • As pitch increases, pitch period shrinks • As pitch decreases, pitch period expands • Need to alter length between pitch marks in order to alter pitch of speech signal Pitch – vowels only

Altering Pitch = X Hanned pitch period Original diphone Extracted pitch period Hanning window ‘C_A’

Altering Pitch Cont’d PSOLA – Pitch Synchronous Overlap and Add = 50% Overlap + Add Pitch Up > 50% Pitch Down < 50%

Altering Pitch Cont’d X Kaiser window X 12 -naturally spoken vowels contain 12-18 pitch marks =

Altering Duration • Increase number of PSOLA iterations (overlaps) to increase duration • Decrease number of PSOLA iterations (overlaps) to decrease duration Altering Amplitude • Multiplying the signal by a constant • If constant > 1, amplitude increase • If constant < 1, amplitude decrease

Concatenation • Using PSOLA at the joining ends • Ensures smooth transition Diphones Words Words Sentence • Straight joining at the end points due to presence of pauses

Summary TTS System words Text Pre-processing Prosody Concatenation • System modularized

Progress • Work Completed / Current Status • Text pre-processing and prosodic manipulation for a multi-syllable word • Diphone concatenation • 200+ diphones in database • Fully functional GUI implemented • Work To Be Done • Sentence level synthesis • Expand diphone database • Fine-tuning and enhancing • Prepare for Poster Fair • Write final report

Questions? Contact Information Michael Beddaoui mich121212@hotmail.com Abdel-Aziz El-Solh zizo01@hotmail.com

A Text-to-Speech Synthesis System

A Text-to-Speech Synthesis System

Presentation Transcript

TEXT TO SPEECH SYNTHESIS

Speech synthesis

Text to speech to text: a third orality?

Speech Processing Text to Speech Synthesis

6-Text To Speech (TTS) Speech Synthesis

FLST: Text-to-Speech Synthesis

Stages in “text-to-speech” synthesis

Speech Synthesis

5-Text To Speech (TTS) Speech Synthesis

Speech Synthesis

Towards Synthesis of Focus in Mandarin Text-to-speech System

Speech Synthesis

Introduction to text-to-speech synthesis

Fundamental Frequency Contour Synthesis for Turkish Text to Speech

Numerical Text-to-Speech Synthesis System

Text to speech

Speech To Text Service

Text-to-speech Synthesis

Text-To-Speech Synthesis

A Framework of Emotive Text-to-Speech (TTS) Synthesis Using a Diphone Synthesizer

HTS-based Mandarin Text-to-Speech System

transcription puppy - Text-To-Speech Synthesis Arrangement