CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya

CS 621 Artificial IntelligenceLecture 15 - 06/09/05Prof. Pushpak Bhattacharyya Application of Noisy Channel, Channel Entropy

Noisy Channel S R • S = {s1 , s2 … sq} R = {t1 , t2 … tq} SPEECH RECOGNITION ( ASR – Automatic SR) - Signal processing (low level). - Cognitive Processing (higher level categories).

Noisy Channel Metaphor • Due to Jelinek (IBM) – 1970’s • Main field of study – speech. Problem Definition S = {Speech signals} = {s1 , s2 … ss} R = {w1 , w2 … wr} {s1 , s2 … sp}  {w1 , w2 … wq}

Special and Easier case • Isolated word Recognition (IWR) • Complexity due to ‘Word Boundary’ will not arise. • Example : I got a plate • vs • I got up late

Homophones And Homographs • Homophones: Words have same pronunciation. • Example: bear, beer : • Homographs: Words have same spellings but different meaning • Example: bank; River bank and finance bank

World Of Sounds • World of sounds – speech signals • Phonetics Phonology • World of words  Orthography • letters : Consonants • Vowels

Since alphabet to sound mapping is not one to one • Vowels • Tomato • Tomaeto Tomaato

Sound Variations • Lexical variations • ‘because’ • ‘cause because • Allophonic variations • ‘because’ • because becase

Allophonic variations: More remarkable example • Do  [ δ][U] • Go  [G][0]

Socio-cultural variations • something • something somethin • formal informal • Dialectic variation • Very – bheri in Bengal • apple – ieple in south • eple in north • aapel in bengal

Orthography -- Phonology • complex problem • Very difficult to model using ‘Rule Governed’ system.

N C S W* • Probabilistic Approach • W* = Best estimate for a word given S W* = ARGMAX [ P(w|s) ] w belongs to set of words

P(w|s) called the ‘parameter’ of the system. • Estimation  Training The probability values need to be estimated from “SPEECH CORPORA”. Record speech of many speakers.

Look of Speech Corpora • Annotation – Unique pronunciation. Signal Apple

Repository of Standard Sound Symbols • IPA – International Phonetic Association. • ARPABET – American’s Phonetic STD.

top [ t] IPA tool [θ] IPA • t • Augment the Roman Alphabet with Greek symbols • e [Є] ‘ebb’ • [i] ‘need’

Speech corpora are annotated with IPA/ARPABET symbols. • Indian Scenario • Hindi TIFR • Marathi IITB • Tamil IITM

How to Estimate P(w|s) from speech corpora • count(w,s)/ count(s) Not done this way

Apply Bayes Theorem • P(w|s) = P(w). P(s|w) / P(s) • W* = ARGMAX (P(w). P(s|w)) / P(s)

W* = ARGMAX (P(w). P(s|w)) • w belongs to Words • P(w) = Prior = Language model. • P(s|w) = Likelihood of W being pronounced as ‘s’. • = Acoustic Model.

Acoustic Model • Pronunciation dictionary (Finite State Automata). • Manually Built - Costly Resource. Example 4 t aa 0 m t 0 s 2 6 0 1 3 ae 5

W* obtained from P(w) and P(w|s) • Language model ? • Rel. frequency of w in the corpora Ref freq Ξ unigram model P(knee) > P(need) I _ _ _ _ _ Knee  High probability need  Low probability

Language Modelling by • N-grams • N – grams • N: • 2 – bigrams. • 3 – trigrams (Best empirically for English).

CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya

CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya

Presentation Transcript

CS621: Artificial Intelligence

CS344: Artificial Intelligence

CS344: Introduction to Artificial Intelligence

CS344: Principles of Artificial Intelligence

CS621: Introduction to Artificial Intelligence

CS621: Artificial Intelligence

CS621 : Artificial Intelligence

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

CS621: Artificial Intelligence

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

CS344: Introduction to Artificial Intelligence (associated lab: CS386)

CS621: Artificial Intelligence

CS621: Artificial Intelligence

CS621: Artificial Intelligence

CS621: Artificial Intelligence Lecture 14: perceptron training

CS 621 Artificial Intelligence Lecture 8 - 19/08/05 Prof. Pushpak Bhattacharyya

CS 621 Artificial Intelligence Lecture 25 – 14/10/05 Prof. Pushpak Bhattacharyya

CS 621 Artificial Intelligence Lecture 29 – 22/10/05 Prof. Pushpak Bhattacharyya

CS621: Artificial Intelligence

CS344 : Artificial Intelligence

CS 621 Artificial Intelligence Lecture 6 – 09/08/05 Prof. Pushpak Bhattacharyya

CS344: Introduction to Artificial Intelligence (associated lab: CS386)