1 / 17

The 1980’s

The 1980’s. Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second major (D)ARPA ASR project HMMs become ready for prime time. Standard Corpora Collection. Before 1984, chaos TIMIT RM (later WSJ)

gary-dean
Télécharger la présentation

The 1980’s

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 1980’s • Collection of large standard corpora • Front ends: auditory models, dynamics • Engineering: scaling to large vocabulary continuous speech • Second major (D)ARPA ASR project • HMMs become ready for prime time

  2. Standard Corpora Collection • Before 1984, chaos • TIMIT • RM (later WSJ) • ATIS • NIST, ARPA, LDC

  3. Front Ends in the 1980’s • Mel cepstrum (Bridle, Mermelstein) • PLP (Hermansky) • Delta cepstrum (Furui) • Auditory models (Seneff, Ghitza, others)

  4. Mel Frequency Scale

  5. Spectral vs Temporal Processing Analysis (e.g., cepstral) frequency Spectral processing Time Processing (e.g., mean removal) frequency Temporal processing

  6. Dynamic Speech Features • temporal dynamics useful for ASR • local time derivatives of cepstra • “delta’’ features estimated over multiple frames (typically 5) • usually augments static features • can be viewed as a temporal filter

  7. “Delta” impulse response .2 .1 0 -2 -1 0 1 2 frames -.1 -.2

  8. HMM’s for ContinuousSpeech • Using dynamic programming for cts speech(Vintsyuk, Bridle, Sakoe, Ney….) • Application of Baker-Jelinek ideas to continuous speech (IBM, BBN, Philips, ...) • Multiple groups developing major HMMsystems (CMU, SRI, Lincoln, BBN, ATT) • Engineering development - coping with data, fast computers

  9. 2nd (D)ARPA Project • Common task • Frequent evaluations • Convergence to good, but similar, systems • Lots of engineering development - now up to 60,000 word recognition, in real time, on aworkstation, with less than 10% word error • Competition inspired others not in project -Cambridge did HTK, now widely distributed

  10. Knowledge vs. Ignorance • Using acoustic-phonetic knowledge in explicit rules • Ignorance represented statistically • Ignorance-based approaches (HMMs) “won”, but • Knowledge (e.g., segments) becoming statistical • Statistics incorporating knowledge

  11. Some 1990’s Issues • Independence to long-term spectrum • Adaptation • Effects of spontaneous speech • Information retrieval/extraction withbroadcast material • Query-style systems (e.g., ATIS) • Applying ASR technology to relatedareas (language ID, speaker verification)

  12. Where Pierce Letter Applies • We still need science • Need language, intelligence • Acoustic robustness still poor • Perceptual research, models • Fundamentals of statistical patternrecognition for sequences • Robustness to accent, stress,rate of speech, ……..

  13. Progress in 25 Years • From digits to 60,000 words • From single speakers to many • From isolated words to continuousspeech • From no products to many products,some systems actually saving LOTSof money

  14. Real Uses • Telephone: phone company services(collect versus credit card) • Telephone: call centers for queryinformation (e.g., stock quotes, parcel tracking) • Dictation products: continuous recognition, speaker dependent/adaptive

  15. But: • Still <97% accurate on “yes” for telephone • Unexpected rate of speech causes doublingor tripling of error rate • Unexpected accent hurts badly • Accuracy on unrestricted speech at 60% • Don’t know when we know • Few advances in basic understanding

  16. ErrorRate Class 1 2 3 4 5 6 7 8 9 0 1 191 0 0 5 1 0 1 0 2 0 4.5 2 0 188 2 0 0 1 3 0 0 6 6.0 3 0 3 191 0 1 0 2 0 3 0 4.5 4 8 0 0 187 4 0 1 0 0 0 6.5 5 0 0 0 0 193 0 0 0 7 0 3.5 6 0 0 0 0 1 196 0 2 0 1 2.0 7 2 2 0 2 0 1 190 0 1 2 5.0 8 0 1 0 0 1 2 2 196 0 0 2.0 9 5 0 2 0 8 0 3 0 179 3 10.5 0 1 4 0 0 0 1 1 0 1 192 4.5 Overall error rate 4.85% Confusion Matrix for Digit Recognition

  17. ‘88 ‘89 ‘90 ‘91 ‘92 ‘93 ‘94 Large Vocabulary CSR ErrorRate% • 12 • 9 • Ø 1 • 6 • 3 Year --- RM ( 1K words, PP 60) ___WSJØ, WSJ1(5K, 20-60K words, PP 100) ~~ ~~

More Related