180 likes | 312 Vues
Sphinx 3.4 Development Progress Report in February. Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004. This Presentation. S3.4 Development Progress Speed-up Language Model facilities CALO and S3.5 Development Which features should be there to make CALO better?
E N D
Sphinx 3.4 DevelopmentProgress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004
This Presentation • S3.4 Development Progress • Speed-up • Language Model facilities • CALO and S3.5 Development • Which features should be there to make CALO better? • Schedule for next three months
Review of Last Month Progress • Last month • Wrote a speed-up version of s3. • Completed some coding of s3.4 speed-up task. • This month • Backbone of speed-up functionalities s3.4 completed and tested. • Basic LM facilities completed and smoked-tested.
Speed-up Facilities in s3.3 GMM Computation Seach Lexicon Structure Tree. Pruning Standard Heuristic Search Speed-up Not Implemented Frame-Level Not implemented Senone-Level Not implemented Gaussian-Level SVQ-based GMM Selection Sub-vector constrained to 3 Component-Level SVQ code removed
Speed-up Facilities in s3.4 GMM Computation Seach Lexicon Structure Tree Pruning (New) Improved Word-end Pruning Heuristic Search Speed-up (New) Phoneme-Look-ahead Frame-Level (New) Naïve Down-Sampling (New) Conditional Down-Sampling Senone-Level (New) CI-based GMM Selection Gaussian-Level (New) VQ-based GMM Selection (New) Unconstrained no. of sub-vectors in SVQ-based GMM Selection Component-Level (New) SVQ code enabled
Issues in Speed Optimization • Implementation Issues: • Beams applied on GMM causing many techniques hard to be implemented • Some facilities were hardwired for specific purpose. • Performance Issues • Each techniques reduced computation by 40-50% with <5% degradation. • However, they didn’t add-up…… • Reduction in computation has certain lower bound (usually 75%-80% reduction is max.) • Overhead is huge in some techniques • E.g. VQ-based Gaussian Selection take 0.25xRT
Language Model Facilities • S3.3 only accept single LM without class in binary format • So far, S3.4 is able to accept multiple class-based LMs in binary format. • One major modification of codes • Affect 6-7 files. • Caveats: • Not perfect implementation. • Text format is not yet supported. Backward compatibility is an issue. • Lack of test-cases. Only slightly smoke-tested • ~1 more week work
Problems with s3.4 (valid for Feb 29th, 2004) • Only accept DMP file. • Txt format reader is very complex in Sphinx 2. • Straight conversion is not clean. • LMs are all loaded into memory • We can work on this. • Lexical tree are all built at the beginning • We tried to avoid the overhead of rebuilding tree in every utterance.
Summary in Sphinx 3.4 Development • Derivative s3.3 • With Speed Optimization • Better LM facilities • Algorithmic Optimization is 90% completed • Still need to improve overhead performance. Tree-based GMM selection is desirable. • Improvement for individual technique. • Go-through the major hurdle of multiple LMs and class-based LMs. • Need more time to make it more stable. • Expected internal release time : March 8, 2004
Sphinx 3.4 and CALO • Which pieces are missing? • Sphinx 3.4’s decoding is still not streamlined => Continuous Listening is not yet enabled. • Sphinx’s speed may still not be ideal. • From s3 to s3.3, ~10% degradation. • Sphinx 3.4 doesn’t learn from data yet.
Sphinx 3.5. What should we do in next 3 months? • Expected release time (May – June) • Interfaces: • Streamlined front-end and decoding • (?) Portaudio based audio routine. • Speed/Accuracy • Improved lexical tree search • Machine optimization of Gaussian computation. • Combination of multiple recognizers • Learning • Acoustic Model adaptation • (?) Language Model adaptation • (In Phoenix) Better semantic parsing • Resource Acquisition and Load Balancing
Highlight I: Speed/Accuracy • Improved lexical tree search • Current implementation used single lexical tree. • May be desirable to create tree copies. • Machine Optimization of Gaussian Computation • SIMD (Single Implementation Multiple Data) • Require help from assembly language experts. (Jason/Thomas)
Highlight II: Multiple Recognizer Combination and Resource Acquisition • Research by Rong suggests combination of multiple recognizer can improve accuracy • Speed worsen by 100% if we run two recognizers. • An interesting solution: • Computation can be shared by other machines in the meeting. • Inspired by routing implementation. • A very natural solution in meeting scenario because usually only one person will be speaking. • Challenges : Bandwidth and Load Balancing
Highlight III: • Learning • Acoustic Model • Maximum Likely Linear Regression (MLLR) • Will be responsible by Jahanzeb • (?)Language Model • How? • Cached-based LM? • (?)Improved Robust Parsing • Better parsing based on previous command history • Phoenix’s source code is not easy to trace • Thomas Harris’s implementation may be a good place to start.