1 / 28

7- Speech Recognition (Cont’d)

Explore the various approaches and concepts in speech recognition, including HMM calculating approaches and neural components. Learn about the bottom-up, top-down, and blackboard approaches, as well as the Viterbi algorithm and state duration modeling.

berner
Télécharger la présentation

7- Speech Recognition (Cont’d)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 7-Speech Recognition (Cont’d) • HMM Calculating Approaches • Neural Components • Three Basic HMM Problems • Viterbi Algorithm • State Duration Modeling • Training In HMM

  2. Speech Recognition Concepts Speech recognition is inverse of Speech Synthesis Speech Text NLP Speech Processing Speech Synthesis Understanding NLP Speech Processing Speech Phone Sequence Text Speech Recognition

  3. Speech Recognition Approaches • Bottom-Up Approach • Top-Down Approach • Blackboard Approach

  4. Bottom-Up Approach Signal Processing Voiced/Unvoiced/Silence Feature Extraction Segmentation Sound Classification Rules Signal Processing Knowledge Sources Phonotactic Rules Feature Extraction Lexical Access Segmentation Language Model Segmentation Recognized Utterance

  5. Top-Down Approach Inventory of speech recognition units Word Dictionary Task Model Grammar Semantic Hypo thesis Syntactic Hypo thesis Unit Matching System Lexical Hypo thesis Feature Analysis Utterance Verifier/ Matcher Recognized Utterance

  6. Blackboard Approach Acoustic Processes Lexical Processes Black board Environmental Processes Semantic Processes Syntactic Processes

  7. top down An overall view of a speech recognition system bottom up From Ladefoged 2001

  8. Recognition Theories • Articulatory Based Recognition • Use from Articulatory system for recognition • This theory is the most successful until now • Auditory Based Recognition • Use from Auditorysystem for recognition • Hybrid Based Recognition • Is a hybrid from the above theories • Motor Theory • Model the intended gesture of speaker

  9. Recognition Problem • We have the sequence of acoustic symbols and we want to find the words that expressed by speaker • Solution : Finding the most probable word sequence having Acoustic symbols

  10. Recognition Problem • A : Acoustic Symbols • W : Word Sequence • we should find so that

  11. Bayse Rule

  12. Bayse Rule (Cont’d)

  13. Simple Language Model Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

  14. Simple Language Model (Cont’d) Trigram : Bigram : Monogram :

  15. Simple Language Model (Cont’d) Computing Method : Number of happening W3 after W1W2 Total number of happening W1W2 AdHoc Method :

  16. 7-Speech Recognition • Speech Recognition Concepts • Speech Recognition Approaches • Recognition Theories • Bayse Rule • Simple Language Model • P(A|W) Network Types

  17. From Ladefoged 2001

  18. P(A|W) Computing Approaches • Dynamic Time Warping (DTW) • Hidden Markov Model (HMM) • Artificial Neural Network (ANN) • Hybrid Systems

  19. Dynamic Time Warping Method (DTW) • To obtain a global distance between two speech patterns a time alignment must be performed Ex : A time alignment path between a template pattern “SPEECH” and a noisy input “SsPEEhH”

  20. Recognition Tasks • Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR) • Speaker Dependent And Speaker Independent • Vocabulary Size • Small <20 • Medium >100 , <1000 • Large >1000, <10000 • Very Large >10000

  21. Error Production Factor • Prosody (Recognition should be Prosody Independent) • Noise (Noise should be prevented) • Spontaneous Speech

  22. Artificial Neural Network . . . Simple Computation Element of a Neural Network

  23. Artificial Neural Network (Cont’d) • Neural Network Types • Perceptron • Time Delay • Time Delay Neural Network Computational Element (TDNN)

  24. Artificial Neural Network (Cont’d) Single Layer Perceptron . . . . . .

  25. Artificial Neural Network (Cont’d) Three Layer Perceptron . . . . . . . . . . . .

  26. Hybrid Methods • Hybrid Neural Network and Matched Filter For Recognition Acoustic Features Output Units Speech Delays PATTERN CLASSIFIER

  27. Neural Network Properties • The system is simple, But too much iterative • Doesn’t determine a specific structure • Regardless of simplicity, the results are good • Training size is large, so training should be offline • Accuracy is relatively good

  28. Hidden Markov Model • Observation : O1,O2, . . . • States in time : q1, q2, . . . • All states : s1, s2, . . . Sj Si

More Related