1 / 23

CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya

CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya. Application of Noisy Channel, Channel Entropy. Noisy Channel. S. R. S = {s 1 , s 2 … s q } R = {t 1 , t 2 … t q }. SPEECH RECOGNITION ( ASR – Automatic SR)

aaralyn
Télécharger la présentation

CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 621 Artificial IntelligenceLecture 15 - 06/09/05Prof. Pushpak Bhattacharyya Application of Noisy Channel, Channel Entropy

  2. Noisy Channel S R • S = {s1 , s2 … sq} R = {t1 , t2 … tq} SPEECH RECOGNITION ( ASR – Automatic SR) - Signal processing (low level). - Cognitive Processing (higher level categories).

  3. Noisy Channel Metaphor • Due to Jelinek (IBM) – 1970’s • Main field of study – speech. Problem Definition S = {Speech signals} = {s1 , s2 … ss} R = {w1 , w2 … wr} {s1 , s2 … sp}  {w1 , w2 … wq}

  4. Special and Easier case • Isolated word Recognition (IWR) • Complexity due to ‘Word Boundary’ will not arise. • Example : I got a plate • vs • I got up late

  5. Homophones And Homographs • Homophones: Words have same pronunciation. • Example: bear, beer : • Homographs: Words have same spellings but different meaning • Example: bank; River bank and finance bank

  6. World Of Sounds • World of sounds – speech signals • Phonetics Phonology • World of words  Orthography • letters : Consonants • Vowels

  7. Since alphabet to sound mapping is not one to one • Vowels • Tomato • Tomaeto Tomaato

  8. Sound Variations • Lexical variations • ‘because’ • ‘cause because • Allophonic variations • ‘because’ • because becase

  9. Allophonic variations: More remarkable example • Do  [ δ][U] • Go  [G][0]

  10. Socio-cultural variations • something • something somethin • formal informal • Dialectic variation • Very – bheri in Bengal • apple – ieple in south • eple in north • aapel in bengal

  11. Orthography -- Phonology • complex problem • Very difficult to model using ‘Rule Governed’ system.

  12. N C S W* • Probabilistic Approach • W* = Best estimate for a word given S W* = ARGMAX [ P(w|s) ] w belongs to set of words

  13. P(w|s) called the ‘parameter’ of the system. • Estimation  Training The probability values need to be estimated from “SPEECH CORPORA”. Record speech of many speakers.

  14. Look of Speech Corpora • Annotation – Unique pronunciation. Signal Apple

  15. Repository of Standard Sound Symbols • IPA – International Phonetic Association. • ARPABET – American’s Phonetic STD.

  16. top [ t] IPA tool [θ] IPA • t • Augment the Roman Alphabet with Greek symbols • e [Є] ‘ebb’ • [i] ‘need’

  17. Speech corpora are annotated with IPA/ARPABET symbols. • Indian Scenario • Hindi TIFR • Marathi IITB • Tamil IITM

  18. How to Estimate P(w|s) from speech corpora • count(w,s)/ count(s) Not done this way

  19. Apply Bayes Theorem • P(w|s) = P(w). P(s|w) / P(s) • W* = ARGMAX (P(w). P(s|w)) / P(s)

  20. W* = ARGMAX (P(w). P(s|w)) • w belongs to Words • P(w) = Prior = Language model. • P(s|w) = Likelihood of W being pronounced as ‘s’. • = Acoustic Model.

  21. Acoustic Model • Pronunciation dictionary (Finite State Automata). • Manually Built - Costly Resource. Example 4 t aa 0 m t 0 s 2 6 0 1 3 ae 5

  22. W* obtained from P(w) and P(w|s) • Language model ? • Rel. frequency of w in the corpora Ref freq Ξ unigram model P(knee) > P(need) I _ _ _ _ _ Knee  High probability need  Low probability

  23. Language Modelling by • N-grams • N – grams • N: • 2 – bigrams. • 3 – trigrams (Best empirically for English).

More Related