1 / 15

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition. Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA

nellie
Télécharger la présentation

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical automatic identification of microchiroptera from echolocation callsLessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004

  2. Overview • Motivations for bat acoustic research • Review bat call classification methods • Contrast with 1970s human ASR • Experiments • Conclusions

  3. Bat research motivations • Bats are among: • the most diverse, • the most endangered, • and the least studied mammals. • Close relationship with insects • agricultural impact • disease vectors • Acoustical research non-invasive, significant domain (echolocation) • Simplified biological acoustic communication system (compared to human speech)

  4. Echolocation calls • Features (holistic) • Frequency extrema • Duration • Shape • # harmonics • Call interval Mexican free-tailed calls, concatenated

  5. Current classification methods • Expert spectrogram readers • Manual or automatic feature extraction • Comparison with exemplar spectrograms • Automatic classification • Decision trees • Discriminant function analysis Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).

  6. Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm • Frames, boundaries, groups, phonemes, words • Manual or automatic feature extraction • Determined by experts to be important for speech • Classification • Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path

  7. Acoustic phonetics limitations • Variability of conversational speech • Complex rules, difficult to implement • Feature estimates brittle • Variable noise robustness • Hard decisions, errors accumulate Shifted to information theoretic (machine learning) paradigm of human ASR, better able to account for variability of speech, noise.

  8. Information theoretic ASR • Data-driven models from computer science • Non-parametric: dynamic time warp (DTW) • Parametric: hidden Markov model (HMM) • Frame-based • Expert information in feature extraction • Models account for feature, temporal variability

  9. Data collection • UF Bat House, home to 60,000 bats • Mexican free-tailed bat (vast majority) • Evening bat • Southeastern myotis • Continuous recording • 90 minutes around sunset • ~20,000 calls • Equipment: • B&K mic (4939), 100 kHz • B&K preamp (2670) • Custom amp/AA filter • NI 6036E 200kS/s A/D card • Laptop, Matlab

  10. Experiment design • Hand labels • 436 calls (2% of data) • Four classes, a priori: 34, 40, 20, 6% • All experiments on hand-labeled data only • No hand-labeled calls excluded from experiments 1 2 3 4

  11. Experiments • Baseline • Features • Zero crossing • MUSIC super resolution frequency estimator • Classifier • Discriminant function analysis, quadratic boundaries • DTW and HMM • Features • Frequency (MUSIC), log energy, first derivatives (HMM only) • HMM • 5 states/model • 4 Gaussian mixtures/state • diagonal covariances

  12. Results • Baseline, zero crossing • Leave one out: 72.5% correct • Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC • Leave one out: 79.1% • Repeated trials: 77.5 ± 4% • DTW, MUSIC • Leave one out: 74.5 % • Repeated trials: 74.1 ± 4% • HMM, MUSIC • Test on train: 85.3 %

  13. Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW, MUSIC HMM, MUSIC

  14. Conclusions • Human ASR algorithms applicable to bat echolocation calls • Experiments • Weakness: accuracy of class labels • HMM most accurate, undertrained • MUSIC frequency estimate robust, slow • Machine learning • DTW: fast training, slow classification • HMM: slow training, fast classification

  15. Further information • http://www.cnel.ufl.edu/~markskow • markskow@cnel.ufl.edu • DTW reference: • L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 • HMM reference: • L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.

More Related