100 likes | 224 Vues
This educational project explores the implementation of a real-time voice recognition system based on Hidden Markov Models (HMM). Voice recognition systems have diverse applications in engineering, and we employ HMMs to model unseen system states, utilizing key algorithms such as the Baum-Welch Algorithm for training. Our methodology includes Voice Activity Detection (VAD) and Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction. Future investigations aim to assess the impact of environmental factors on recognition rates and expand the system's capability.
E N D
PURE Research Symposium Spring 2009 VOICE RECOGNITION USING AN HMM BASED DESIGN Richard Muryanto and Nicholas Corso Mentored by: Sun Yu
Introduction • In engineering applications voice recognition systems has many diverse uses. • Many schemes exist to implement voice recognition systems(DTW,HMM...) • In this educational project we used Hidden Markov Models to implement a real-time voice recognition system.
System Overview http://labrosa.ee.columbia.edu/doc/HTKBook21/img15.gif
Hidden Markov Models • Hidden Markov Models (HMM) are a way of modeling probabilities involving states of systems that can not directly observed. • HMMs can be characterized in terms of a few key parameters. http://www.info.ucl.ac.be/Research/Areas/Images/RT-Pict-HMM.png
Hidden Markov Models: Cont. • Classically there are three main algorithms associated to HMMs • evaluation\decoding\learning • For an HMM based voice recognition system the Baum-Welch Algorithm is pivotal to the training of the system.
System Implementation Pre-recorded Data VAD MFCC Feature Extraction Training HMM (Baum-Welch) Recorded Data VAD MFCC Compute Likelihood Display Output ML Decision
VAD and MFCC • Voice Activity Detection (VAD) determines which parts of a voice signal are actual data and which are silence. • The VAD algorithm used here utilizes the short-time energy, and zero crossing rate to decide if there is voice activity. • Mel-Frequency Cepstral Coefficients (MFCC) was used to extract characteristic information from the speech vectors.
Observed Data Ability for the System to Recognize Training Data
Possible Extensions • With more time the effects of environment on recognition rate could be investigated. • With further investigation the effects of parameters in the Baum-Welch Algorithm could explored. • A larger word set could be implemented.
References • Ramírez, J.; J. M. Górriz, J. C. Segura (2007). "Voice Activity Detection. Fundamentals and Speech Recognition System Robustness". in M. Grimm and K. Kroschel. Robust Speech Recognition and Understanding. pp. 1–22. • Rabiner, Lawrence R. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition” in Proceedings of the IEEE. V.77, No.2, February 1989. • Taoran Lu, Chao Zhang, Dan Zhu "Recognition by HMM"