Ekapol Chuangsuwanich and James Glass

Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation frequency EkapolChuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷

Outline • Introduction • Harmonicity • Modulation Frequency • Experiments and Discussions • Speech/Non-speech detection capabilities • Effect on ASR performance • Conclusion

Introduction • Voice Activity Detection(VAD) is the process of identifying segments of speech in a continuous audio stream. • First stage of a speech processing application. • Used to both reduce computation by eliminating unnecessary transmission and processing of non-speech segments, as well as reduce potential mis-recognition errors in such segments.(binary) • In this paper, we consider the task of giving commands to an autonomous forklift.

Introduction • In high quality recording conditions, energy-based methods perform well. • In noisy conditions, energy-based measures ofter produce a considerable number of false alarms. • Large variety of other features have been investigated for use in noisy environments. • Tuning parameters. • Difficulties dealing with non-stationary or instantaneous types of noises that are frequent in our work.

Introduction • Harmonicityis a basic property of any periodic signal, it is not useful by itself. • Some works shows good results even at very low SNR conditions. • Modulation frequencies which measure the temporal rate of change of energy across different frequency band. • Some studies about purely MF-inspired set of features for discriminating between speech and non-speech which gave good generalization to noise types not included in training.

Harmonicity • Figure 1: Example of distant speech from the forklift database.

Harmonicity • Autocorrelation ：as a function of the lag • Periodicity measure will yield a high value for pure tones.(use bandpasscepstralliftering)

Harmonicity

Modulation Frequency

Experiments-Speech/Non-speech detection capabilities • Database consisting of speech commands from 26 subjects with added noise to simulate a variety of SNR values ranging from -5 to 15 dB. • Recorded with an array microphone. • Classification was without any additional post-processing. • Transition between speech and non-speech were excluded. • Training 4 min of speech, 22 min of non-speech.

Experiments-Speech/Non-speech detection capabilities • For comparison, include results based on Relative Spectral Entropy(RSE), Long-Term Signal Variability(LTSV), and statistical model-based VADs using MFCCs and MF as features to Gaussian mixtrue models(GMMs). • EER: Equal error rate • FAR: False alarm rate

Experiments-Speech/Non-speech detection capabilities • Performance varied specific kinds of noise.

Experiments-Speech/Non-speech detection capabilities • EER and FAR for each noise type are usually obtained at different thresholds.

Experiments-Effect on ASR performance • The SNR values ranged from 5 to 25 dB. • Corpus: four microphone channels, 10 hours long, 400 command words. • Commands are sparse, forklift is mostly idle waiting for commands or taking the time to executed commands. • ASR was performed using PocketSUMMIT, had a vocabulary size of 57 words, that out of domain(OOD) command was modeled by a single GMM. • ASR model was trained on over 3600 utterances of commands from 18 talkers.

Experiments-Effect on ASR performance • Examined the influence of different VAD systems on the ASR results in the task of commanding an autonomous forklift in real world environments.

Conclusion • Described the task of VAD on distant speech in low SNR environments for an autonomous robotic forklift. • Designed a two-stage approach for speech /non-speech classification. • Parallel SVM outperformed classification based on the whole MF spectrum. Combination MF and simple harmonicity measure helped reduce false alarm rate by another 9% at low miss rates. • In ASR, VAD outperformed standard VADs and achieved a WER very close to that of hand labeled end point.

Ekapol Chuangsuwanich and James Glass

Ekapol Chuangsuwanich and James Glass

Presentation Transcript

James Joyce and Music

James Hilton james@insidemob.com

SELF-CLEANING GLASS and ELECTROCHROMIC GLASS

Soil and Glass Analysis

James and Galatians

James and Bob

The Presidencies of james Madison and james monroe

James Gosling and Java

Windows and Glass

Glass and Soil

Plastic and Glass

By A EKAPOL C HONGVILAIVAN Institute of Southeast Asian Studies (ISEAS), Singapore AND

Tempered Glass - Safety Glass – Glass-Rite

Laminated Glass and Tempered Glass a Quick Review

Heady Glass Pipe and Rigs - Fourward Glass Gallery

Glass and Crystal Awards

Professional Glass Polishing and Glass Repairs in Hampshire

Facade and Glass Cleaning

Glass And Mirrors Supplier

Glass Maufacturer company And Borosilicate glass manufacturer supplier

Difference between windshield glass and other glass

Embracing Stained Glass Artistry and Decorated Glass Work