1 / 17

Speech Recognition

Speech Recognition. What makes speech recognition hard? . Speech Recognition. Task: Identify sequence of words uttered by speaker, given acoustic waveform. Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc.

drea
Télécharger la présentation

Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Recognition

  2. What makes speech recognition hard?

  3. Speech Recognition • Task: Identify sequence of words uttered by speaker, given acoustic waveform. • Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc. • Thus speech recognition is viewed as problem of probabilistic inference.

  4. From Russell and Norvig, Artificial Intelligence Example: “I’m firsty, um, can I hafsomefingto dwink?”

  5. Speech Recognition System Architecture (from Buchsbaum & Giancarlo paper) Acoustic feature extraction Acoustic Features–>Phones model Phones–>Word pronounciation model Language model Here, “lattice” means “Hidden Markov Model”

  6. From Russell and Norvig, Artificial Intelligence Acoustic feature extraction

  7. From Russell and Norvig, Artificial Intelligence

  8. Hidden Markov Models • Markov model: Given stateXt, what is probability of transitioning to next state Xt+1 ? • E.g., word bigram probabilities give P (wordt+1 | wordt ) • Hidden Markov model: There are observable states (e.g., signal S) and “hidden” states (e.g., Words). HMM represents probabilities of hidden states given observable states.

  9. Phone model P( phone | frame features) = P(frame features| phone) P(phone) P(frame features| phone) often represented by Gaussian mixture model

  10. From Russell and Norvig, Artificial Intelligence Acoustic Features–>Phones model

  11. Word Pronunciation model Now we want P (words|phones1:t ) =  P(phones1:t | words) P(words) Represent P(phones1:t | words) as an HMM Phones–>Word pronounciation model

  12. From Russell and Norvig, Artificial Intelligence Example of Phones–>Word pronounciation model

  13. From Russell and Norvig, Artificial Intelligence Language model

  14. To build a speech recognition system, need: • Lots of data • Acoustic signal processing tools • Methods for learning various probability models • Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o1o2o3…on). We want to find W* = (w1w2w3 … wn) such that

  15. To build a speech recognition system, need: • Lots of data • Acoustic signal processing tools • Methods for learning various probability models • Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o1o2o3…on). We want to find W* = (w1w2w3 … wn) such that Search or “decoding” method Language model Combine phone models, segmentation models, word pronunciation models

  16. To build a speech recognition system, need: • Lots of data • Acoustic signal processing tools • Methods for learning various probability models • Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o1o2o3…on). We want to find W* = (w1w2w3 … wn) such that Search or “decoding” method Language model Combine phone models, segmentation models, word pronunciation models

  17. Emotion recognition in speech(by OES high-school students!) http://www.youtube.com/watch?v=NnbsGyViN3Y

More Related