Improving Speech Recognition with SVM

Improving Speech Recognition with SVM Jerry Zhu CALD KDD Lab 2001/2/23 (Many thanks to all my reviewers!)

What’s inside a speech recognizer?

Language Model It cites class size quality of life as problems. P(A|S)S -5522539 it sites class eyes quality of life as problems -5556088 it sites class size quality of life has problems -5556088 it cites class size quality of life has problems -5622228 it sitesklaseyes quality of life has problems -5653812 it sites class size quality of life as problems -5653812 it cites class size quality of life as problems ........ (many, many other hypotheses) • S* = argmaxs P(S|A) = argmax P(A|S)*P(S)

Trigram Language Model • Trigram langauge model: P(S)=P(w1…wn)= P(wi|wi-1,wi-2) • Widely used; short-sighted: gives high P(S) to bad sentences: he took over or is it by all human beings but the abortion debate would develop of mark you take the fifth time on foreign relations committee senator dole does have knowledge of forty years so that's one reason i came in for purposes of this i mean the emergency workers working like little crimes the defense lawyers and they have them yesterday do we told you earlier on inside politics weekend

Why is it bad • We may pick the wrong sentence during decoding. • Idea: penalize P(S) if S looks like a bad sentence: S* = argmax P(A|S)*P(S)*P(S is bad) • We need a classifier! SVM Natural sentence trigram-generated sentence S P(S is bad)

Why SVM? • It’s cool. • It performs very good for document classification. • Its kernel trick allows sentence level interactions. • But: sentence vector representation? Bag-of-word vector (doesn’t work) S=“Let me stop you at that point” <a, aardvark, aardwolf, … at …let …me ... that …zoo> <0, 0, 0, … 1 … 1 … 1 … 1 …. 0 > Value can be binary, raw count, frequency.

Part-of-speech sequence vector S=“Let me stop you at that point” pos=“VB PRP VB PRP IN DT NN” Vector space = all sequences of length k, say k=3 <… PRP-IN-NN …VB-PRP-PRP … VB-PRP-VB …> <… 6 … 4 … 3 …> Intuition: sentences with similar sequences are in same class. This doesn’t work (accuracy 58%) (detail: excluding trigram influence actually hurts)

What to try next • POS with stopwords • Parsing • Semantic coherence “zip-lock” and “Japanese bank” “VB me VB you at that NN”

Improving Speech Recognition with SVM

Improving Speech Recognition with SVM

Presentation Transcript

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

SPEECH RECOGNITION:

Speech Recognition

Home Automation with Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition