1 / 14

In-car Speech Recognition Using Distributed Microphones

In-car Speech Recognition Using Distributed Microphones. Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research Nagoya University. Background. In-car Speech Recognition using multiple microphones

Télécharger la présentation

In-car Speech Recognition Using Distributed Microphones

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research Nagoya University

  2. Background • In-car Speech Recognition using multiple microphones • Since the position of the speaker and noise are not fixed, many sophisticated algorithms are difficult to apply. • Robust criterion for parameter optimizing is necessary. • Multiple Regression of Log Spectra (MRLS) • Minimize the log spectral distance between the reference speech and the multiple regression results of the signals captured by distributed microphones. • Filter parameter optimization for microphone array (M.L. Seltzer, 2002) • Maximize the likelihood by changing the filter parameters of a microphone array system for a reference utterance.

  3. Sample utterances expressway city area idling

  4. Block diagram of MRLS distant microphones Spectrum Analysis ・・・ Spectrum Analysis ・・・ ・・・ Spectrum Analysis MR ・・・ Speech Recognition MR ・・・ Speech Signal log MFB output MR Regression Weights Approximate log MFB output

  5. N X1 Gi Xi Hi S XN Modified spectral subtraction • Assume that power spectrum at each microphone position obey power sum rule.

  6. Taylor expansion of log spectrum

  7. Multiple regression of log spectrum Minimum error is given when

  8. Reduction of freedom in optimization Optimal regression weights 1 1 0

  9. Experimental Setup for Evaluation • Recorded with 6 microphones • Training data • Phonetically balanced sentences • 6,000 sentences while idling • 2,000 sentences while driving • 200 speakers • Test data • 50 isolated word utterances • 15 different driving conditions • road (idling/ city area/ expressway) • in-car (normal/ fan-low/ fan-hi/ CD play/ window open) • 18 speakers side view top view distributed microphone positions

  10. Recognition experiments • HMMs: • Close-talking: close-talking microphone speech. • Distant-mic.: nearest distant microphone (mic. #6) speech. • MLLR: nearest distant mic. speech after MLLR adaptation. • MRLS: MRLS results obtained by the optimal regression weights for each training utterance. • Test Utterances • Close-talking speech (CLS-TALK) • Distant-microphone speech (DIST) • Distant-microphone speech after MLLR adaptation (MLLR) • MRLS results of the 6 different weights optimized for: • each utterance (OPT) • each speaker (SPKER) • each driving condition (DR) • all training corpus (ALL)

  11. Performance Comparison(average over 15 different conditions) MRLS

  12. Clustering in-car sound environment • Clustering in-car sound environment using a spectrum feature concatenating distributed microphone signals Clustering Results

  13. Adapting weights to sound environment • Vary regression weights in accordance with the classification results. • Same performance with speaker/condition dependent weights.

  14. Summary • Results • Log spectral multiple regression is effective for in-car speech recognition using distributed multiple microphones. • Especially, when the regression weights are trained for a particular driving condition, very high performance can be obtained. • Adapting weights to the diving condition improves the performance. • Future works • Combing with microphone array.

More Related