1 / 25

Models of speech dynamics for ASR, using intermediate linear representations

Models of speech dynamics for ASR, using intermediate linear representations. Philip Jackson, Boon-Hooi Lo and Martin Russell. Electronic Electrical and Computer Engineering. http://web.bham.ac.uk/p.jackson/balthasar/. INTRODUCTION. Abstract. INTRODUCTION. Speech dynamics into ASR.

morna
Télécharger la présentation

Models of speech dynamics for ASR, using intermediate linear representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models of speech dynamics for ASR, using intermediate linear representations Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering http://web.bham.ac.uk/p.jackson/balthasar/

  2. INTRODUCTION Abstract

  3. INTRODUCTION Speech dynamics into ASR • dynamics of speech production to constrain recognizer • noisy environments • conversational speech • speaker adaptation • efficient, complete and trainable models • for recognition • for analysis • for synthesis

  4. INTRODUCTION Articulatory trajectories from West (2000)

  5. INTRODUCTION Articulatory-trajectory model

  6. INTRODUCTION Articulatory-trajectory model Level surface source dependent intermediate finite-state

  7. INTRODUCTION Multi-level Segmental HMM • segmental finite-state process • intermediate “articulatory” layer • linear trajectories • mapping required • linear transformation • radial basis function network

  8. INTRODUCTION Linear-trajectory model acoustic layer articulatory-to-acoustic mapping intermediate layer segmental HMM 1 2 3 4 5

  9. THEORY Linear-trajectory equations Defined as where Segment probability:

  10. THEORY Linear mapping Objective function with matched sequences and

  11. THEORY Trajectory parameters Utterance probability, and, for the optimal (ML) state sequence

  12. THEORY Non-linear (RBF) mapping acoustic layer . . . . . . . . . formant trajectories

  13. THEORY Trajectory parameters With the RBF, the least-squares solution is sought by gradient descent:

  14. METHOD Tests on TIMIT • N. American English, at 8kHz • MFCC13 acoustic features (incl. zero’th) • F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker • F1-3+BE5: five band energies added • PFS12: synthesiser control parameters

  15. RESULTS TIMIT baseline performance • Constant-trajectory SHMM (ID_0) • Linear-trajectory SHMM (ID_1)

  16. RESULTS Performance across feature sets

  17. METHOD Phone categorisation

  18. METHOD Discrete articulatory regions

  19. RESULTS Performance across groupings

  20. RESULTS Results across groupings

  21. METHOD Tests on MOCHA • S. British English, at 16kHz • MFCC13 acoustic features (incl. zero’th) • articulatory x- & y-coords from 7 EMA coils • PCA9+Lx: first nine articulatory modes plus the laryngograph log energy

  22. RESULTS MOCHA baseline performance

  23. RESULTS Performance across mappings

  24. DISCUSSION Model visualisation Original acoustic data Constant- trajectory model Linear- trajectory model, (F) PFS12 (c)

  25. SUMMARY Conclusions • Theory of Multi-level Segmental HMMs • Benefits of linear trajectories • Results show near optimal performance with linear mappings • Progress towards unified models of the speech production process • What next? • unsupervised (embedded) training, to derive pseudo-articulatory representations • implement non-linear mapping (i.e., RBF) • include biphone language model, and segment duration models

More Related