Sphinx3 Speech Recognition System: Training and Testing Analysis
Explore how Sphinx3 utilizes Hidden Markov Models and decision trees for training and testing speech recognition data, presenting results with statistics on error rates.
Sphinx3 Speech Recognition System: Training and Testing Analysis
E N D
Presentation Transcript
SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526
How does Sphinx3 work? • Sphinx3 uses ---HMM with continuous probability density function • Flat initialization state: • Mixture weights: the weights given to every Gaussian in the Gaussian mixture corresponding to a state • transition matrices: the matrix of state transition probabilities • means: means of all Gaussians • variances: variances of all Gaussians
How does Sphinx3 work? • forward-backward re-estimation algorithm (Baum-Welch algorithm) • Use for converging the likelihood training • Untied Modeling - Training for all context-dependent phones (usually triphones) that are seen in the training corpus
How does Sphinx3 work? • Building decision tree • Used to decide which of the HMM states of all the triphones (seen and unseen) are similar to each other • Pruning the decision trees
Our project:::Spelling Bees Use Sphinx3 to train the recorded data Compare the train data with the test data Result: We have used 224 train data and 73 test data. The dictionary has 46 words and 33 phones are used. 32.7% word error rate and 49.3% sentence error rate
The result::: id: (fash-cen2-fash-b) Scores: (#C #S #D #I) 3 0 0 0 REF: a m y HYP: a m y Speaker sentences 1: moe #utts: 8 id: (moe-m_oses1) Scores: (#C #S #D #I) 4 0 1 1 REF: * m o s e S HYP: E m o s e * Eval: I D id: (moe-m_oses2) Scores: (#C #S #D #I) 5 0 0 0 REF: m o s e s HYP: m o se s Eval:
Reference: http://www.speech.cs.cmu.edu/sphinxman/fr4.html Lecture notes from Speech recognition class http://www.ele.uri.edu/~hansenj/projects/ele585/ makeraw.m record.m