1 / 14

Advances in WP2

Chania Meeting – May 2007. Advances in WP2. www.loquendo.com. Summary. Unsupervised Adaptation Adaptation on Hiwire DB. Chania Meeting – May 2007. Supervised vs Unsupervised Adaptation. www.loquendo.com. transcriptions. ASR Forced segmentation. forced segmentations. Adapted models.

Télécharger la présentation

Advances in WP2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chania Meeting – May 2007 Advances in WP2 www.loquendo.com

  2. Summary • Unsupervised Adaptation • Adaptation on Hiwire DB

  3. Chania Meeting – May 2007 Supervised vs Unsupervised Adaptation www.loquendo.com

  4. transcriptions ASR Forced segmentation forced segmentations Adapted models Speech parameters Adaptation set Gen. models Supervised Adaptation Adaptation Module

  5. transcriptions Confidence based selection forced segmentations Adaptation set ASR Recognition Gen. models ASR segmentations Unsupervised Adaptation ASR Forced segmentation Adaptation Module Adapted models Speech parameters

  6. Chania Meeting – May 2007 Adaptation on HIWIRE DB www.loquendo.com

  7. Kinds of Adaptation Two kind of adaptation were performed: • Multi-Condition: the adaptation data of all the speakers and all noise conditions are pooled. The models are adapted to channel, noise conditions, and non-native common aspects. • Speaker-Dependent: Adaptation and tests are performed for each speaker separately, and all results are finally averaged. The models are adapted mainly to speaker’s voice, but also to channel and noise conditions.

  8. Adaptation Types Two type of adaptation are experimented: • Supervised: the transcriptions of the sentences available in HDB are employed to perform forced segmentation of the adaptation utterances, providing the labels needed by the adaptation process, which is intrinsically supervised. • Unsupervised: the transcriptions of the sentences are not employed, to simulate an “on-the-field” adaptation, and are approximated by the ASR outputs. Only the adaptation utterances recognized with a certain degree of confidence are used in the adaptation process, to avoid divergence due to incorrectly labeled data.

  9. Multi-Condition Adaptation • Adaptation is done with all the speakers and noise conditions together • It adapts to channel, noise conditions, and non-native common aspects

  10. Multi-Condition Adaptation • Adaptation is done with all the speakers and noise conditions together • It adapts to channel, noise conditions, and non-native common aspects

  11. Comments • supervised multi-condition adaptation gives good performance improvement. It operates well even without denoising, since it incorporates information of channel, noise and non-native accents in the models. • The average best results are obtained with supervised adaptation in conjunction with denoising (60.7% E.R.) • As expected, unsupervised adaptation is inferior to supervised adaptation (51.7% vs. 60.7% E.R.), but it proves to be an effective technique for adaptation in real life applications, when transcriptions of vocal material are not available.

  12. Speaker Adaptation • Adaptation is done speaker by speaker • Starting Models: Microphone 16kHz • Denoising method is SNR dep. Ephraim-Malah spectral attenuation

  13. Comments • Speaker adaptation is very effective on HDB. The error reduction achieved by Supervised Adaptation plus Ephraim-Malah noise reduction is quite large • The main improvements are in noisy conditions • As expected, unsupervised adaptation is inferior to supervised adaptation, due to the errors introduced by the ASR transcriptions, but still it is very relevant.

  14. Workplan • Selection of suitable benchmark databases (m6) • Baseline set-up for the selected databases (m8) • LIN adaptation method implemented and experimented on the benchmarks (m12) • Experimental results on Hiwire database with LIN (m18) • Innovative NN adaptation methods and algorithms for acoustic modeling and experimental results (m21) • Further advances on new adaptation methods (m24) • Unsupervised Adaptation: algorithms and experimentation (m33)

More Related