Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Statistical automatic identification of microchiroptera from echolocation callsLessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004

Overview • Motivations for bat acoustic research • Review bat call classification methods • Contrast with 1970s human ASR • Experiments • Conclusions

Bat research motivations • Bats are among: • the most diverse, • the most endangered, • and the least studied mammals. • Close relationship with insects • agricultural impact • disease vectors • Acoustical research non-invasive, significant domain (echolocation) • Simplified biological acoustic communication system (compared to human speech)

Echolocation calls • Features (holistic) • Frequency extrema • Duration • Shape • # harmonics • Call interval Mexican free-tailed calls, concatenated

Current classification methods • Expert spectrogram readers • Manual or automatic feature extraction • Comparison with exemplar spectrograms • Automatic classification • Decision trees • Discriminant function analysis Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).

Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm • Frames, boundaries, groups, phonemes, words • Manual or automatic feature extraction • Determined by experts to be important for speech • Classification • Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path

Acoustic phonetics limitations • Variability of conversational speech • Complex rules, difficult to implement • Feature estimates brittle • Variable noise robustness • Hard decisions, errors accumulate Shifted to information theoretic (machine learning) paradigm of human ASR, better able to account for variability of speech, noise.

Information theoretic ASR • Data-driven models from computer science • Non-parametric: dynamic time warp (DTW) • Parametric: hidden Markov model (HMM) • Frame-based • Expert information in feature extraction • Models account for feature, temporal variability

Data collection • UF Bat House, home to 60,000 bats • Mexican free-tailed bat (vast majority) • Evening bat • Southeastern myotis • Continuous recording • 90 minutes around sunset • ~20,000 calls • Equipment: • B&K mic (4939), 100 kHz • B&K preamp (2670) • Custom amp/AA filter • NI 6036E 200kS/s A/D card • Laptop, Matlab

Experiment design • Hand labels • 436 calls (2% of data) • Four classes, a priori: 34, 40, 20, 6% • All experiments on hand-labeled data only • No hand-labeled calls excluded from experiments 1 2 3 4

Experiments • Baseline • Features • Zero crossing • MUSIC super resolution frequency estimator • Classifier • Discriminant function analysis, quadratic boundaries • DTW and HMM • Features • Frequency (MUSIC), log energy, first derivatives (HMM only) • HMM • 5 states/model • 4 Gaussian mixtures/state • diagonal covariances

Results • Baseline, zero crossing • Leave one out: 72.5% correct • Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC • Leave one out: 79.1% • Repeated trials: 77.5 ± 4% • DTW, MUSIC • Leave one out: 74.5 % • Repeated trials: 74.1 ± 4% • HMM, MUSIC • Test on train: 85.3 %

Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW, MUSIC HMM, MUSIC

Conclusions • Human ASR algorithms applicable to bat echolocation calls • Experiments • Weakness: accuracy of class labels • HMM most accurate, undertrained • MUSIC frequency estimate robust, slow • Machine learning • DTW: fast training, slow classification • HMM: slow training, fast classification

Further information • http://www.cnel.ufl.edu/~markskow • markskow@cnel.ufl.edu • DTW reference: • L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 • HMM reference: • L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Presentation Transcript

John D. Mark MD

John Pulsifer and Mark Tillack

Meena Ramani, Kaustubh Kale, Dr John G Harris

Neo D. Martinez Pacific Ecoinformatics and Computational Ecology Lab FoodWebs

Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Computational Neuroanatomy John Ashburner john@fil.ion.ucl.ac.uk

Computational Biophysics and Bioinformatics lab

John D. Newman Deployability Engineering Chief Surface Deployment and

Mark Harris Partnerships Manager

John D. Lee Industrial Engineering

BME5002 Image Processing Lecture 1 John G. Harris 1/26/01

Computational Science and Engineering

Computational Science and Engineering Online

D-Lab

D-Lab

Computational Grids and Computational Economy: Nimrod/G Approach

John D.

Mark Silberstein, CS, Technion Dan Geiger, Computational Biology Lab

John D. Mark MD

Veronica G Harris