1 / 8

Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat.kuleuven.be

Listening to Normalized Speech Mimicking the Normalization Processes of Automatic Speech Recognition. Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat.kuleuven.be. ASR Preprocessing. signal. Fourier Transform. Magnitude (Spectrogram). Phase Spectrum. pitch removal.

trang
Télécharger la présentation

Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat.kuleuven.be

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Listening to Normalized Speech Mimicking the Normalization Processesof Automatic Speech Recognition Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat.kuleuven.be Katholieke Universiteit Leuven – Dept. ESAT Kasteelpark Arenberg 10, 3001 Heverlee, Belgium www.esat.kuleuven.be/~spch

  2. ASR Preprocessing signal Fourier Transform Magnitude (Spectrogram) Phase Spectrum pitch removal Envelope (cepstra) Excitation (pitch) speaker normalization normalized cepstra normalized pitch to ASR Normalized Speech 2

  3. Normalized Speech 3

  4. Speech Normalization normalized signal original signal Magnitude Spectrum Phase Spectrum Magnitude Spectrum Phase Spectrum enhanced spectrum Envelope (spectrum) Excitation (pitch) normalized spectrum normalized excitation Griffin & Jim, 1984 Normalized Speech 4

  5. Speech Normalization - Ingredients • Spectral normalization • concept: remove vocal tract length effect • method: utterance based VTLN by linear frequency warping • Pitch normalization • concept: remove pitch effect • method: scale utterance based average and variance to global cross-speaker averages • Phase resynthesis • concept: exploit redundancy in over-sampled spectral envelope • method: iterative algorithm (Griffin & Jim, 1984) Normalized Speech 5

  6. original normalized Normalized Speech 6

  7. original normalized

  8. original inverted

More Related