1 / 8

Progress Report of Sphinx in Summer 2004 (July 1 st to Aug 31 st )

Progress Report of Sphinx in Summer 2004 (July 1 st to Aug 31 st ). By Arthur Chan. New features in Sphinx 3.5. Live-mode APIs Speaker Adaptation using linear transformation Incorporation of Sphinx 3.0 tools into Sphinx 3.x SphinxTrain Better support and documentation

alicia
Télécharger la présentation

Progress Report of Sphinx in Summer 2004 (July 1 st to Aug 31 st )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st ) By Arthur Chan

  2. New features in Sphinx 3.5 • Live-mode APIs • Speaker Adaptation using linear transformation • Incorporation of Sphinx 3.0 tools into Sphinx 3.x • SphinxTrain • Better support and documentation • (In progress) more support of training scripts. • Documentation of Sphinx 3.x and SphinxTrain

  3. Live mode APIs • Live-mode API is now stable and officially released. • Developer’s API for using the Sphinx 3.x’s recognizer • was for high performance 10X RT speech recognition used in CMU’s evaluation • Use fully continuous HMM (30% relative performance gain from SCHMM) • now have close to 1xRT performance. (measured in >1G CPU) in less than 10k task. • capability of speaker adaptation. • Well documented and commented.

  4. Speaker Adaptation • Acoustic-level of learning is now enabled. • Incorporated from speaker adaptation routine of CMU’s Robust group. • Allow transformation-based speaker adaptation • Y=AX+ b • In SphinxTrain, • mllr_solve: estimation regression matrix/matrices. • mllr_transform: allow mean transformation given a set of regression matrices offline. • In sphinx 3.5 • Allows mean transformation on-line. • Possible to support per utterance-based speaker adaptation. • Interface not yet exposed (part of Q4 plan)

  5. Incorporation of s3.0 tools • s3.0 • Recognizer for research • Include research tools for speech’s recognizer • align, word/phoneme based aligner • astar, N-best hypotheses generator • allphone, phoneme recognizer • dag, best path search in lattice • N-best rescoring is now viable • Will benefit researches in high-level information incorporation

  6. SphinxTrain • Now with better support and documentation • Every tools now support options • -help , a help string • -example, a string that shows how to use the tool • Eliminate possible mismatches of Sphinx3 and SphinxTrain’s feature extraction routines.

  7. Documentation of Sphinx:Project Hieroglyph • Project Hieroglyph: • To build a set of comprehensive documentation for using Sphinx/ SphinxTrain/CMU LM Toolkit. • 3 out of 11 chapters are now completed • They can be found in • www.cs.cmu.edu/~archan/sphinxDoc.html

  8. Q4 Outlook • Three major goals • Better Speaker Adaptation Support • MAP, Multiple Regression Class Support • Enable dynamic addition and deletion of Language Models • Further speed-up of the recognizer (We can still be faster.) • Other goals • Incorporating speaker normalization into feature extraction

More Related