1 / 25

TRACHEOESOPHAGEAL SPEECH REPAIR

TRACHEOESOPHAGEAL SPEECH REPAIR. Arantza del Pozo CUED Machine Intelligence Laboratory November 20th 2006. OUTLINE. Speech repair Tracheoesophageal (TE) speech Laryngectomy Acoustic properties Main limitations Excitation repair Previous attempts Adopted approach Baseline system

nike
Télécharger la présentation

TRACHEOESOPHAGEAL SPEECH REPAIR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TRACHEOESOPHAGEAL SPEECH REPAIR Arantza del Pozo CUED Machine Intelligence Laboratory November 20th 2006 MIL Speech Seminar

  2. OUTLINE • Speech repair • Tracheoesophageal (TE) speech • Laryngectomy • Acoustic properties • Main limitations • Excitation repair • Previous attempts • Adopted approach • Baseline system • Enhanced system • Results • Duration repair • Preliminary experiments • Regression tree modelling • Improving TE recognition • Fixing recognition artifacts • Results • Conclusions and future work Arantza del Pozo @ CUED Machine Intelligence Laboratory

  3. OUTLINE • Speech repair • Tracheoesophageal (TE) speech • Laryngectomy • Acoustic properties • Main limitations • Excitation repair • Previous attempts • Adopted approach • Baseline system • Enhanced system • Results • Duration repair • Preliminary experiments • Regression tree modelling • Improving TE recognition • Fixing recognition artifacts • Results • Conclusions and future work Arantza del Pozo @ CUED Machine Intelligence Laboratory

  4. SPEECH REPAIR SYSTEM Deviant features Correction algorithms Speech Model SPEECH REPAIR Arantza del Pozo @ CUED Machine Intelligence Laboratory

  5. OUTLINE • Speech repair • Tracheoesophageal (TE) speech • Laryngectomy • Acoustic properties • Main limitations • Excitation repair • Previous attempts • Adopted approach • Baseline system • Enhanced system • Results • Duration repair • Preliminary experiments • Regression tree modelling • Improving TE recognition • Fixing recognition artifacts • Results • Conclusions and future work Arantza del Pozo @ CUED Machine Intelligence Laboratory

  6. Laryngectomy • Laryngectomy is a surgical procedure which involves the removal of the larynx, i.e. vocal cords, epiglottis and tracheal rings • Speech rehabilitation after laryngectomy • Esophageal speech • TE speech • Electrolaryngeal speech • TE speech is the most frequently used voice restoration technique after laryngectomy Arantza del Pozo @ CUED Machine Intelligence Laboratory

  7. Acoustic properties of TE speech • Voicing source highly variable and deviant • Lower F0 (female) and higher jitter and shimmer • Higher high-frequency noise and lower harmonic-to-noise-ratio(HNR),glottal-to-noise excitation ratio(GNE),band-energy difference (BED) • Some evidence of higher formant values in Spanish and Dutch TE speech • Shorter maximum phonation time, longer vowel duration and slower speaking rates Arantza del Pozo @ CUED Machine Intelligence Laboratory

  8. Main limitations of TE speech • Inability to properly control the EXCITATION • deviant glottal waveforms • irregular pitch and amplitude contours • higher turbulence noise • spectral envelope deviations caused by coupling • DURATION deviations caused by the disconnection between the lungs and the vocal tract • more pauses • longer vowels • slower rates • rushes before breaks Arantza del Pozo @ CUED Machine Intelligence Laboratory

  9. OUTLINE • Speech repair • Tracheoesophageal (TE) speech • Laryngectomy • Acoustic properties • Main limitations • Excitation repair • Previous attempts • Adopted approach • Baseline system • Enhanced system • Results • Duration repair • Preliminary experiments • Regression tree modelling • Improving TE recognition • Fixing recognition artifacts • Results • Conclusions and future work Arantza del Pozo @ CUED Machine Intelligence Laboratory

  10. Previous excitation repair attempts • Qi et al. • Resynthesis of female TE words with a synthetic glottal waveform and with smoothed and raised F0 • Replacement of voice source and conversion of spectral envelopes • Limitations of previous repair attempts • Only most obvious deviant features have been tackled • Evaluation limited to sustained vowels and words • Only a small number of TE speakers and qualities have been tested • Degree of perceptual enhancement has not been quantified Arantza del Pozo @ CUED Machine Intelligence Laboratory

  11. DEVIANT FEATURES: -voice source -jitter & shimmer -spectral envelope Feature correction Perceptual evaluation Adopted approach • DATA • 13 TE speakers (11 male, 2 female) • Patients of the Speech and Language Therapy Department of Addenbrookes Hospital, Cambridge • Control group of 11 normal speakers (8 male, 3 female) • BASELINE SYSTEM • Glottal resynthesis • Jitter and shimmer reduction • ENHANCED SYSTEM • Spectral envelope smoothing and Tilt reduction Arantza del Pozo @ CUED Machine Intelligence Laboratory

  12. VT Lip radiation Baseline system • Glottal resynthesisbreathiness reduction • Jitter and shimmer reductionroughness reduction Arantza del Pozo @ CUED Machine Intelligence Laboratory

  13. Enhanced system (1/2) • Resynthesised speech still has a harsh quality caused by deviations in TE spectral envelopes (SE) • Spectral envelope analysis • Higher std of formant gains, frequencies and bandwidths and spectral distortion • Lower relative gain difference between 1st and 3rd formants and spectral tilt Arantza del Pozo @ CUED Machine Intelligence Laboratory

  14. Enhanced system (2/2) • Enhancement algorithm • To reduce differences between estimated consecutive SE • LSF median smoothing • To decrease spectral tilt • Low-pass filtering Arantza del Pozo @ CUED Machine Intelligence Laboratory

  15. Results • Perceptual tests Arantza del Pozo @ CUED Machine Intelligence Laboratory

  16. OUTLINE • Speech repair • Tracheoesophageal (TE) speech • Laryngectomy • Acoustic properties • Main limitations • Excitation repair • Previous attempts • Adopted approach • Baseline system • Enhanced system • Results • Duration repair • Preliminary experiments • Regression tree modelling • Improving TE recognition • Fixing recognition artifacts • Results • Conclusions and future work Arantza del Pozo @ CUED Machine Intelligence Laboratory

  17. Preliminary experiments • Duration deviations • more pauses • longer vowels • slower rates • rushes before breaks • Possible duration repair approaches • Rule-based • Reduce pauses, reduce vowels, increase speech rate, increase duration of phones before breaks, etc. Difficult to obtain adequate reduction/increase rates Break sentence rhythm • Transplantation of average normal phone durations • Phone durations obtained with Forced Alignment (FA) Overall improvement which increased naturalness of TE sentences Sentence rhythm was preserved • Duration repair algorithm is an automatization of the transplantation experiment Arantza del Pozo @ CUED Machine Intelligence Laboratory

  18. Regression tree modelling (1/2) • Classification and regression trees (CART) are widely used for duration modelling in TTS systems • Employed features are extracted from text • Phone identity • Identities of previous and next phones • Position of syllable in word • Position of word in sentence • Number of syllables before/after a break • Type of lexical stress • Lexical stress type of previous and next syllables • ... • A speech repair framework constrains the possible feature space to recognisable features • For TE speech repair, assumed that only phone recognition is viable • Features relying on word, syllable or lexical stress information cannot be used Arantza del Pozo @ CUED Machine Intelligence Laboratory

  19. Regression tree modelling (2/2) • Several CART trees were built with different features • Explored features • Phone identity • Identities of previous and next phones • Positions of phones in the sentence • Pitch and energy (as an attempt to incorporate some stress info) • Short pauses (SP) not regarded as phones, modelled independently • Trees • T1  F1: phone identity • T2  F2: F1 + previous & next phone identities (broad class) • T3  F3: F2+ position of phone in sentence • T4  F4: F3+ pitch (positive/negative/no slope) • T5  F5: F4+ energy (positive/negative/no slope) • TSP number of phones since previous sp & until next sp • Performance measured as Mean Squared Error (MSE) between normal mean durations used for transplantation and predicted values • T3>T2>T1>T5>T4 • Substitution of T3+TSP predicted durations of TE sentences with FA phone segmentation almost indistinguishable from transplantation Arantza del Pozo @ CUED Machine Intelligence Laboratory

  20. FA REC Improving TE recognition (1/2) • Little work on automatic TE speech recognition • Haderlein et al. (2004) adapted a speech recogniser trained on normal speech to single TE speakers by unsupervised HMM interpolation and obtained an average word accuracy of 36.4% • Focus on improving TE phone recognition • Novel performance measures which take recognition (r), segmentation (s) and duration prediction (p) errors into account Arantza del Pozo @ CUED Machine Intelligence Laboratory

  21. Improving TE recognition (2/2) • Explored systems • Baseline (BL): monophone HMM trained on WSJCAM0 • R1: BL + CMN + CMLLR • R2: R1 + MAP • R3: R1 + bigram LM • R4: R1 + trigram LM • R5: CUHTK 2003 BN LVCSR + CMLLR  phone level output • Results • R5>R4>R3>R1>R2 Arantza del Pozo @ CUED Machine Intelligence Laboratory

  22. Fixing recognition artifacts • Use of best recognised labels for duration repair still produced artifacts • Method for robust duration modification (RM) • Take recognition confidence into account • computed from • TE phone duration probability distributions • recogniser confidence scores • takes phone confusions into account in R4 Arantza del Pozo @ CUED Machine Intelligence Laboratory

  23. Results • Objective evaluation: MSE between repaired sentences and target transplanted durations • R5+RM>R5>R4+RM>R4>original TE durations • Subjective evaluation: perceptual test Arantza del Pozo @ CUED Machine Intelligence Laboratory

  24. OUTLINE • Speech repair • Tracheoesophageal (TE) speech • Laryngectomy • Acoustic properties • Main limitations • Excitation repair • Previous attempts • Adopted approach • Baseline system • Enhanced system • Results • Duration repair • Preliminary experiments • Regression tree modelling • Improving TE recognition • Fixing recognition artifacts • Results • Conclusions and future work Arantza del Pozo @ CUED Machine Intelligence Laboratory

  25. CONCLUSIONS AND FUTURE WORK • Deviant TE excitation and duration features have been identified and repaired • Synthetic quality of excitation repaired speech nullifies results in some cases • Future work • Improve excitation resynthesis quality • Improve TE speech recognition step • Attempt text-based features for duration modelling Arantza del Pozo @ CUED Machine Intelligence Laboratory

More Related