1 / 12

Automatic Speech Recognition System

Automatic Speech Recognition System. Experimental Study Effect of parameter variation on WER Performance. Sanjay Patil, Jun-Won Suh Human and Systems Engineering. Details of the experiment. Details of the system: HMM Speech Recognition System TIDigits Database

layne
Télécharger la présentation

Automatic Speech Recognition System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Speech Recognition System Experimental Study Effect of parameter variation on WER Performance Sanjay Patil, Jun-Won Suh Human and Systems Engineering

  2. Details of the experiment • Details of the system: • HMM Speech Recognition System • TIDigits Database • (41300 utterances, 12547 sentences), 11 words – zero to 9, O • Cross-word, loop grammar • Objective: • To study the ASR performance as a function of .. • WER = fn ( frame, Window, IP, State-tying) • Frame = 5 ms to 50 ms • Window = 5 ms to 50 ms • IP = -10 to -200 • State-tying = {split, merge, occupancy} => total # of tied states

  3. Test Results for varying Frame-Window Variation on WER

  4. Test Results for varying Frame-Window Variation on Time

  5. Training Schedule

  6. Command line to run the experiment • tidigit_decode -model_type xwrd_triphone • -train_mode baum_welch • -decode_mode loop_grammar • options: • -model_type : [what type of model you want to build] • xwrd_triphone : context-dependent cross-word • triphone models • -train_mode : [specifies the training algorithm to use] • baum_welch : the standard Baum-Welch, • forward-backward algorithm • -decode_mode : [specifies the type of decoding to perform] • loop_grammar : decodes using a grammar where any • digit can follow any other digit with equal • probability

  7. Language Model • Combining Acoustic and Language Models • Language Model contribution = P(W)LM IPN(W) • LM — language model scale [ we did not observe change in WER] • IP — Insertion Penalty – Penalty of inserting a new word. • IP is determined empirically to optimize the recognition performance

  8. Test Results for varying Insertion Penalty on WER Same will be true for other combinations of Frame and Window pair. The remaining two are: (Frame, Window) pair (10, 25) and (15, 25)

  9. State-Tying Results Ref. : Naveen’s Thesis. These results are from Naveen’s Thesis

  10. References • J.Picone. “Lecture.” [online]. Available: http://www.isip.msstate.edu/publications/courses • X. Huang, A. Acero, H. Hon, Spoken Language Processing (Prentice Hall, 2001) • F. Jelinek, Statistical Methods for Speech Recognition (The MIT Press, 1999)

  11. Questions

  12. State-Tying(reference 3)

More Related