Progress Report of Sphinx in Summer 2004 (July 1 st to Aug 31 st )

Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st ) By Arthur Chan

New features in Sphinx 3.5 • Live-mode APIs • Speaker Adaptation using linear transformation • Incorporation of Sphinx 3.0 tools into Sphinx 3.x • SphinxTrain • Better support and documentation • (In progress) more support of training scripts. • Documentation of Sphinx 3.x and SphinxTrain

Live mode APIs • Live-mode API is now stable and officially released. • Developer’s API for using the Sphinx 3.x’s recognizer • was for high performance 10X RT speech recognition used in CMU’s evaluation • Use fully continuous HMM (30% relative performance gain from SCHMM) • now have close to 1xRT performance. (measured in >1G CPU) in less than 10k task. • capability of speaker adaptation. • Well documented and commented.

Speaker Adaptation • Acoustic-level of learning is now enabled. • Incorporated from speaker adaptation routine of CMU’s Robust group. • Allow transformation-based speaker adaptation • Y=AX+ b • In SphinxTrain, • mllr_solve: estimation regression matrix/matrices. • mllr_transform: allow mean transformation given a set of regression matrices offline. • In sphinx 3.5 • Allows mean transformation on-line. • Possible to support per utterance-based speaker adaptation. • Interface not yet exposed (part of Q4 plan)

Incorporation of s3.0 tools • s3.0 • Recognizer for research • Include research tools for speech’s recognizer • align, word/phoneme based aligner • astar, N-best hypotheses generator • allphone, phoneme recognizer • dag, best path search in lattice • N-best rescoring is now viable • Will benefit researches in high-level information incorporation

SphinxTrain • Now with better support and documentation • Every tools now support options • -help , a help string • -example, a string that shows how to use the tool • Eliminate possible mismatches of Sphinx3 and SphinxTrain’s feature extraction routines.

Documentation of Sphinx:Project Hieroglyph • Project Hieroglyph: • To build a set of comprehensive documentation for using Sphinx/ SphinxTrain/CMU LM Toolkit. • 3 out of 11 chapters are now completed • They can be found in • www.cs.cmu.edu/~archan/sphinxDoc.html

Q4 Outlook • Three major goals • Better Speaker Adaptation Support • MAP, Multiple Regression Class Support • Enable dynamic addition and deletion of Language Models • Further speed-up of the recognizer (We can still be faster.) • Other goals • Incorporating speaker normalization into feature extraction

Progress Report of Sphinx in Summer 2004 (July 1 st to Aug 31 st )