230 likes | 388 Vues
CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya. Application of Noisy Channel, Channel Entropy. Noisy Channel. S. R. S = {s 1 , s 2 … s q } R = {t 1 , t 2 … t q }. SPEECH RECOGNITION ( ASR – Automatic SR)
E N D
CS 621 Artificial IntelligenceLecture 15 - 06/09/05Prof. Pushpak Bhattacharyya Application of Noisy Channel, Channel Entropy
Noisy Channel S R • S = {s1 , s2 … sq} R = {t1 , t2 … tq} SPEECH RECOGNITION ( ASR – Automatic SR) - Signal processing (low level). - Cognitive Processing (higher level categories).
Noisy Channel Metaphor • Due to Jelinek (IBM) – 1970’s • Main field of study – speech. Problem Definition S = {Speech signals} = {s1 , s2 … ss} R = {w1 , w2 … wr} {s1 , s2 … sp} {w1 , w2 … wq}
Special and Easier case • Isolated word Recognition (IWR) • Complexity due to ‘Word Boundary’ will not arise. • Example : I got a plate • vs • I got up late
Homophones And Homographs • Homophones: Words have same pronunciation. • Example: bear, beer : • Homographs: Words have same spellings but different meaning • Example: bank; River bank and finance bank
World Of Sounds • World of sounds – speech signals • Phonetics Phonology • World of words Orthography • letters : Consonants • Vowels
Since alphabet to sound mapping is not one to one • Vowels • Tomato • Tomaeto Tomaato
Sound Variations • Lexical variations • ‘because’ • ‘cause because • Allophonic variations • ‘because’ • because becase
Allophonic variations: More remarkable example • Do [ δ][U] • Go [G][0]
Socio-cultural variations • something • something somethin • formal informal • Dialectic variation • Very – bheri in Bengal • apple – ieple in south • eple in north • aapel in bengal
Orthography -- Phonology • complex problem • Very difficult to model using ‘Rule Governed’ system.
N C S W* • Probabilistic Approach • W* = Best estimate for a word given S W* = ARGMAX [ P(w|s) ] w belongs to set of words
P(w|s) called the ‘parameter’ of the system. • Estimation Training The probability values need to be estimated from “SPEECH CORPORA”. Record speech of many speakers.
Look of Speech Corpora • Annotation – Unique pronunciation. Signal Apple
Repository of Standard Sound Symbols • IPA – International Phonetic Association. • ARPABET – American’s Phonetic STD.
top [ t] IPA tool [θ] IPA • t • Augment the Roman Alphabet with Greek symbols • e [Є] ‘ebb’ • [i] ‘need’
Speech corpora are annotated with IPA/ARPABET symbols. • Indian Scenario • Hindi TIFR • Marathi IITB • Tamil IITM
How to Estimate P(w|s) from speech corpora • count(w,s)/ count(s) Not done this way
Apply Bayes Theorem • P(w|s) = P(w). P(s|w) / P(s) • W* = ARGMAX (P(w). P(s|w)) / P(s)
W* = ARGMAX (P(w). P(s|w)) • w belongs to Words • P(w) = Prior = Language model. • P(s|w) = Likelihood of W being pronounced as ‘s’. • = Acoustic Model.
Acoustic Model • Pronunciation dictionary (Finite State Automata). • Manually Built - Costly Resource. Example 4 t aa 0 m t 0 s 2 6 0 1 3 ae 5
W* obtained from P(w) and P(w|s) • Language model ? • Rel. frequency of w in the corpora Ref freq Ξ unigram model P(knee) > P(need) I _ _ _ _ _ Knee High probability need Low probability
Language Modelling by • N-grams • N – grams • N: • 2 – bigrams. • 3 – trigrams (Best empirically for English).