10 likes | 287 Vues
Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma , Dominic Heger , Ngoc Thang Vu, Mark Erhardt , Tim Schlippe , Matthias Janke , Christian Herff , Tanja Schultz.
E N D
Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, ChristophAmma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz Cognitive Systems Lab, Institute forAnthropomaticsandRobotics, Karlsruhe Institute of Technology (KIT) BioKIT - Real-time decoder for biosignal processing Experiments and Results • EMG-based speech recognition: • Electrical potentials of a user’s facial muscles are captured in order to recognize speech • Multi-stream setup consisting of 8 streams • Each stream corresponds to a phonetic feature and is modeled with GMMs • 20.88% WER on the EMG-UKA corpus (session-dependent, 108 word vocabulary) • Airwritingrecognition: • User’s hand is used as a stylus and text is written in the air and captured by a wearable device • Handwriting motion is measured by an accelerometer and a gyroscope • Corpus contains recordings of 9 subjects with 80 sentences each • 11% WER with an 8000 word vocabulary (leave-one-out cross-validation) • 3% WER for the user-dependent case • Automatic Speech Recognition – Real-time factor: • Real-time factor of BioKIT using a Kaldi trained DNN acoustic model on Vietnamese • DNN output layer size is 2,630 • Language model is 3/5-gram respectively • Test is run on an Intel Core i7-3770 with 3.4GHz • Executing 4 threads in parallel • Automatic Speech Recognition – Decoding: • Comparing error rates of BioKIT and Kaldi • Using Kaldi trained DNN acoustic models • Tested on Bulgarian(BG), Czech(CZ), German(GE), Mandarin(CH), and Vietnamese(VN) • With exception of Mandarin tested with two different language models each • Pruning parameters were similar • Same number of active nodes • Same global pruning beam • Both achieve similar performance • Differences in results not significant at asignificance level of 0.05 except for BGs and CH systems • Goals and Motivation • Fast setup of experiments and flexible code-base • Python scripting layer for fast setup of experiments • Modular C++ core to allow for expansion with new algorithms • Flexible processing and modeling • Matrices are represented as NumPy arrays in Python • Any functions in SciPy library are directly available • Processing of big amounts of data • Utterance-level parallelization and sharing of components between threads • Online-capability • Example scripts allow for easy adaptation to different applications • Error analysis • Integrated tool allows for analysis of decoding passes • Accessibility • General terminology • Tutorials for various use-cases Error Analysis • Capabilitiesofcurrenttoolaresimilartoworkpresentedby Lin Chase* • Statisticsforanalysiscanbedirectlycreatedwiththedecoding • Resultsyieldconfusiontablesaswellas a sentencebysentencelistingoferrors • *L. L. Chase, “Error-Responsive Feedback Mechanisms for Speech Recognizers,” Ph.D. dissertation, Pittsburgh, PA: Carnegie Mellon University, 1997. Conclusion • Toolkit is suitable for several human-machine interfaces • Can be used with a variety of different biosignals • Easily extendable due to two layer design • Comparable results to the Kaldi decoder • Error Analysis yields additional feedback