1 / 29

Brief Overview of Different Versions of Sphinx

Brief Overview of Different Versions of Sphinx. Arthur Chan. Introduction . Software aspect of the recognizer is very important Research always require correct use of the software. Sphinx II + III + IV + SphinxTrain ~= 100 k lines of code Each of them are fairly complex.

dougal
Télécharger la présentation

Brief Overview of Different Versions of Sphinx

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brief Overview of Different Versions of Sphinx Arthur Chan

  2. Introduction • Software aspect of the recognizer is very important • Research always require correct use of the software. • Sphinx II + III + IV + SphinxTrain • ~= 100 k lines of code • Each of them are fairly complex

  3. This presentation (30 pages) • Introduction (3 pages) • History of Sphinx (13 pages) • Sphinx I (2 pages) • Sphinx II (2 pages) • Sphinx III (3 pages) • SphinxTrain (3 pages) • Sphinx IV (3 pages) • How do I get the source code? (4 pages) • Versioning • Three rules of not getting lost in different recognizers • Where can I get “official” information? (2 pages) • Outlook in each recognizer. (3 pages) • Conclusion

  4. Brief history of Sphinx • Largely adapted from • Rita’s “The Sphinx Speech Recognition Systems” • www.cs.cmu.edu/~rsingh/ • Kevin et al’s “Speech Recognition: Past, Present and Future” • www.cs.cmu.edu/~msiegler/ASR/futureofcmu-final.html

  5. Before Sphinx • Dragon • One of the first use of HMM in speech recognition • One of the first use of “purely statistically model” in speech • Express the knowledge using HMM network • Harpy • One of the first use of beam search • Use phoneme to represent words.

  6. Sphinx I • Before Sphinx …... • From AT&T’s literature, the concept of speaker-independence was proposed in 1979 • In 1979-1987, most systems are either, • Speaker dependent • Speaker independent but in a very small domain (<100 words) • Sphinx I is therefore outstanding • Accuracy is 90% on Resource Management

  7. Sphinx I (1987) • By Kai-Fu Lee and Roberto Bisiani • Key developer included Hsiao-wuen Hon, Fil Alleva • Written in C. • Continuous speech recognizer using discrete HMM with 3 codebooks of size 256. • Using simple word-pair grammar • Generalize triphones • Real-time on Sun3 or Dec 3000 • Where is the source code? Good antique!

  8. Sphinx II (1992) • By Xuedong Huang • Hardwired to 5-state Bakis topology • 3-gram language models • Decision-tree tying of HMM (by Mei-Yuh Huang) • 90% in WSJ task (0 or 1?)

  9. Fast Beam Search v. X • FBS-6 flat lexicon decoder • FBS-7 lexicon tree-based. • FBS-8 decoder (written by Ravi Mosur, see thesis in 96) • Support multiple types of beam pruning. • Lexical tree • Tricks in GMM Computation • Machine optimization: loop unrolling • Predictive Codebook computation • Phoneme lookahead • Best path search .

  10. Other facts about Sphinx II • We license it at the beginning (seem to back till days like 95) • In 2000, it starts to be open-sourced in Sourceforge under Berkeley’s style license • You could incorporate Sphinx’s source code • You don’t need to open your source code. (No recursive legal binding) • Similar to LGPL • In 2001, a major alpha release by Kevin that ensures portability in several platforms.

  11. Sphinx III flat lexicon decoder (“s3”,“s3flat”,”s3slow”) • Sphinx III (by Ravi Mosur) • Flat Lexicon • Support both CHMM and SCHMM • “Poor-man” trigram • Use only the most likely first word, this avoid D^2 expansion of the word lattice. • Arbitrary topology • Very accurate, used in evaluation of BN and others. • Derivative from the search include • N-best generator • Aligner • Phone recognizer

  12. Sphinx III tree lexicon decoder(“s3.x”,”s3fast”,”s3inaccurate”) • What is s3.x actually? • A “spin-off” of the Sphinx III flat lexicon’s source code • First use was in BN 10x RT evaluation in 1999 • From s3.0 -> s3.2 • Use tree-lexicon with unigram lookahead • Lexical tree with approximation to avoid memory problem • One of the first in the world used Sub-vector quantization in speed-up GMM computation

  13. (cont.) • From s3.2 -> s3.3 (Rita, Ricky) • Live mode recognizer (livedecode) and simulator (livepretend) • From s3.3 -> s3.4 (Evandro, Arthur C, Jahanzeb,) • 4-level of speed-up of GMM computation, phoneme lookahead • Bug fixes in live mode • From s3.4 -> s3.5 (Evandro, Arthur C, Yitao) • (Tentative) Speaker adaptation + documentation

  14. Facts about S3 • A Java version exists -> sphin3j • Open source at ~2002 • Always being maintained by Evandro from 2001 to now. • s3.5 is the current active branch in S3 development.

  15. SphinxTrain • Equally important and very complex • But not well understood. • What is SphinxTrain? • A collection of ~40 tools for Sphinx 2, 3 and 4 acoustic model training • A set of perl scripts to do training • Sphinx 2 and 3 all have slight different formats of models

  16. Mini-history • Baum Welch trainer and Viterbi trainer existed very long time ago. • Training tool in general was not systematic and was no structured. • From the chaos, Eric Thayer first pull everything together to create the package SphinxTrain • Rita did numerous bug fixes and modification of the current trainer • Innovate the use of automatic question generation. (make_quest) • Built a set of training scripts for RM (the 0*/ scripts) • Write the first set of systematic tutorial on training • Ricky refined the code and wrote the first set of perl script for Training. • He made a PHD out of it too. (PHD = Push Here Dummy!) • Alan and Kevin • Put the set of code to sourceforge • Alan build a set of training script that can “run-through”

  17. Sphinx IV • Why Sphinx IV? • Too many limitations in SphinxTrain and Sphinx III • Only N-gram • Approximation of triphones • Fast GMM computation could be very troublesome to understood • Bw doesn’t skip silence. We heavily rely on force alignement in training.

  18. Sphinx IV (cont.) • (By no mean complete……) • Lead Design : Bhiksha (MERL) • Lead Team Developer : Willer Walker (Sun) • Key developers : Evandro, Rita, Phillip Kwok and Paul Lamere • Many heavy weight speech advisors: Evandro, Rita, Ravi, Bhiksha, Medro Moreno ……

  19. Is Sphinx IV good? • Very accurate, very fast, very versatile and very nicely-pakcaged Java-based speech recognizer • Some internal benchmark in RM and WSJ 5k is shown to be faster and more accurate than s3.3 (under 1xRT and 10% better) • Support N-gram, FSM and FSG. • Will provide facilities like confidence-scoring • Still under development (just have first alpha release) • Trainer is not stable

  20. Summary of the recognizers and trainers • Sphinx I -> obsolete • Sphinx II -> we are using the fast recognizer now • Sphinx III, the following coexists • S3 flat • S3 fast (s3.4 stable, s3.5 devel) • SphinxTrain (0.92 in the CVS) • Sphinx IV • Recognizer is alpha released • Trainer not yet stable

  21. How can I get version X of Sphinx? • Official Web page of Sphinx • http://cmusphinx.sourceforge.net • Give announcement and news of development • Some documentation is there. • For the tarballs • http://sourceforge.net/projects/cmusphinx • Releases: • sphinx2-0.4.tgz (s2) • sphinx3-0.1.tgz (s3.3) • sphinx3-0.4-rc2.tgz (s3.4 release candidate II) • sphinx4-0.1alpha-src.zip (s4)

  22. Rule 2: If it doesn’t exist in CVS, officially it doesn’t exist • Simply speaking, no one actually support and maintain them. Software fall into this category: • CMU LM Toolkit (we haven’t touched it for a while) • We may do it in the future. • Phoenix (Distributed somewhere else) • Training scripts in csh • Rita always actively support it.

  23. Rule 1: If they were no tarballs, they are in CVS • ANYONE can get the following modules through CVS by using the following commands: • cvs –z3 –d:pserver:anonymous@cvs.sourceforge.net;/cvsroot/cmusphinx co modulname • modulename = • SphinxTrain -> SphinxTrain • archive_s3 -> s3 + s3.0 + s3.2 + s3.3 • sphinx2 -> devel ver. of sphinx2 • sphinx3 =~ s3.4 -> we will check base on this to develop s3.5 • share =~ cepview + lm3g2dmp • sphinx3j = the java version of sphinx3 • Sphinx4 = development version of sphinx4

  24. Rule 3: You may need other modules to complete your task • SphinxTrain heavily rely on force alignment so you also need s3-align • Usage of any s3 recognizers required the LM in DMP format so you need the tool lm3g2dmp which can be found in sphinx2 or share.

  25. Where can I get more information for the recognizer? • People to ask • s2 : Evandro , Ravi • S3 flat : Evandro, Ravi , ArthurC • S3 tree: Evandro, Ravi, ArthurC • SphinxTrain: Rita, Evandro, Ravi, ArthurC, Rong, Ziad, Murali. • S4 : S4’s developers in Sourceforge • Willie, Paul, Phillip, Bhiksha, Rita, Evandro.

  26. Web page to look up • Rita’s web page • www.cs.cmu.edu/~rsingh • Contains the manual of training • Twiki web page for sphinx 4 design • www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view/Sphinx4/WebHome/ • ArthurC’s web page • Risk his life to write a manual for Sphinx 3.4 • Also collect some information for each Sphinx

  27. Outlook of all recognizers • Sphinx II • Sorry, we won’t support it too much. • Reason, s3.4 and s4 are proved to have very nice speed and accuracy performance • Sphinx III • Only active branch is s3.5 • Moderate change in s3flat • Motivated by project CALO • This quarter : make adaptation works. • SphinxTrain • Write a set of scripts for Continuous HMM training • Silence deletion problem will be fixed.

  28. (cont.) • sphinxDoc • Chapter 1 and 2 completed (*sigh*, still 7 left) • Only begin written when Arthur C is procrastinating and don’t want to read and play video game. • Will be there at around Sep or Oct. • Sphinx IV • Alpha release • Trainer will be fixed • Argus • Incorporate the advantages of many speech recognizers together • Not yet started.

  29. Conclusion • This presentation • Summarize the current code status of Sphinx and SphinxTrain. • We still have a lot of work to do…… • Next presentation • s3 or s3.4 from main to the search.

More Related