Sphinx Recognizer Progress Q2 2004 Update

Sphinx Recognizer Progress Q2 2004

Speed • Combination of fast GMM computation techniques with various types of pruning • 0.48xRT in 2k task (Communicator), 0.6xRT in 5k task (WSJ) • Phoneme Look-ahead research completed. • 15-20% gain when fast GMM computation techniques and pruning was applied. • Detail can be found in paper in ICSLP 2004 (Chan et.al) • Compilation Optimization (~8% gain).

Accuracy • S3.4 is much better than S2 • In Communicator task • S2: 17% WERR ~0.3xRT • S3.4: (32 mix, not tuned for speed) 14% WERR 1.1xRT • S3.4: (64 mix, not tuned for speed) 12% WERR 1.6xRT • Continuous HMM performs much better than Semi-Continuous HMM. • ~30% improvement

Interface • Sphinx 3.4 Release Candidate II is distributed since Jun 10. • http://cmusphinx.sourceforge.net/downloads/sphinx3-0.4-rc2.tgz • New feature list: • Fast Gaussian Mixture Model (GMM) computation; the following techniques are supported:a. Down-sampling. b. CI-based GMM selection.c. VQ-based and SVQ-based Gaussian selection.d. Arbitrary configuration of Sub-vectors in sub-vector quantization. • Fast Match based on phoneme look-ahead. • Class-based LMs and dynamic LM selection. • Command-line configuration of many front-end parameters and the feature type. • Bug fixes to make "make test" work in wider range of platforms. • Bug fixes in live mode recognition. • Instructions for compilation using Intel compilers, if desired. • Better compilation support in Windows. • More documentation. • Batch mode recognizer supported on Windows/Linux/FreeBSD/MacOS/Solaris.

Outlook of Sphinx in this year (Sphinx 3.5) • Sphinx 3.5 • Fully support live mode recognition API (now still testing, not in s3.4’s distribution) • ETA (End of July) • Fully support speaker adaptation techniques such as MLLR and VTLN. • ETA (Mid of August) • Sphinx trainer (SphinxTrain) • Better packaging and improvement on the training algorithm • ETA (Beginning of July) • SphinxDoc • A document on using Sphinx to build speech recognition software • ETA (End of October)

Meeting Corpus Transcription and Training • ICSI Transcription Conversion Completed • Processing XML • As well as human transciber’s error • CHMM Training Completed • We chose a very difficult situation to decode • Transcriber meeting. • Many speakers and many cross talk. • Gives us a sense of the worst case performance. • 53% WERR

Sphinx Recognizer Progress Q2 2004 Update

Sphinx Recognizer Progress Q2 2004 Update

Presentation Transcript

Sphinx-3

String Recognizer Example

Sphinx

Speech Recognizer Training

SPHINX

Recognizer Issues

Mission ##: Sphinx

The Sphinx

The sphinx

The Sphinx

3 rd Progress Meeting For Sphinx 3.6 Development

Finite State Recognizer

Sphinx 3.4 Development Progress Report in February

Premio Server Family for Q2/2004

Installing SPHINX

Microsoft’s Cursive Recognizer

MAGE 2004 progress meeting

Progress Presentation of Sphinx 3.6 (2005 Q2)

Progress Report of Sphinx in Summer 2004 (July 1 st to Aug 31 st )

Microsoft’s Cursive Recognizer

Speech Recognizer Training