1 / 8

Improving Speech Recognition with SVM

Improving Speech Recognition with SVM. Jerry Zhu CALD KDD Lab 2001/2/23 (Many thanks to all my reviewers!). What’s inside a speech recognizer?. Language Model. It cites class size quality of life as problems . P(A|S) S -5522539 it sites class eyes quality of life as problems

ulla
Télécharger la présentation

Improving Speech Recognition with SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Speech Recognition with SVM Jerry Zhu CALD KDD Lab 2001/2/23 (Many thanks to all my reviewers!)

  2. What’s inside a speech recognizer?

  3. Language Model It cites class size quality of life as problems. P(A|S)S -5522539 it sites class eyes quality of life as problems -5556088 it sites class size quality of life has problems -5556088 it cites class size quality of life has problems -5622228 it sitesklaseyes quality of life has problems -5653812 it sites class size quality of life as problems -5653812 it cites class size quality of life as problems ........ (many, many other hypotheses) • S* = argmaxs P(S|A) = argmax P(A|S)*P(S)

  4. Trigram Language Model • Trigram langauge model: P(S)=P(w1…wn)= P(wi|wi-1,wi-2) • Widely used; short-sighted: gives high P(S) to bad sentences: he took over or is it by all human beings but the abortion debate would develop of mark you take the fifth time on foreign relations committee senator dole does have knowledge of forty years so that's one reason i came in for purposes of this i mean the emergency workers working like little crimes the defense lawyers and they have them yesterday do we told you earlier on inside politics weekend

  5. Why is it bad • We may pick the wrong sentence during decoding. • Idea: penalize P(S) if S looks like a bad sentence: S* = argmax P(A|S)*P(S)*P(S is bad) • We need a classifier! SVM Natural sentence trigram-generated sentence S P(S is bad)

  6. Why SVM? • It’s cool. • It performs very good for document classification. • Its kernel trick allows sentence level interactions. • But: sentence vector representation? Bag-of-word vector (doesn’t work) S=“Let me stop you at that point” <a, aardvark, aardwolf, … at …let …me ... that …zoo> <0, 0, 0, … 1 … 1 … 1 … 1 …. 0 > Value can be binary, raw count, frequency.

  7. Part-of-speech sequence vector S=“Let me stop you at that point” pos=“VB PRP VB PRP IN DT NN” Vector space = all sequences of length k, say k=3 <… PRP-IN-NN …VB-PRP-PRP … VB-PRP-VB …> <… 6 … 4 … 3 …> Intuition: sentences with similar sequences are in same class. This doesn’t work (accuracy 58%) (detail: excluding trigram influence actually hurts)

  8. What to try next • POS with stopwords • Parsing • Semantic coherence “zip-lock” and “Japanese bank” “VB me VB you at that NN”

More Related