1 / 15

Speech-to-Speech Infrastructure Based on UIMA

Speech-to-Speech Infrastructure Based on UIMA. Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and Architectures IBM Prague. Sectional slide. Overview. Challenges Approach The Resulting Infrastructure Use Cases Conclusion.

Télécharger la présentation

Speech-to-Speech Infrastructure Based on UIMA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech-to-Speech Infrastructure Based on UIMA Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and Architectures IBM Prague Sectional slide

  2. Overview • Challenges • Approach • The Resulting Infrastructure • Use Cases • Conclusion

  3. What is a speech-to-speech system? • S2S system translates spoken input from a source language to a target language • Speech-to-speech systems typically consist of three main processing blocks: • Transcription • Translation • Synthesis ASR TTS MT

  4. Challenges TC_STAR Project , 2004-2007, www.tc-star.org • Create an open technological infrastructure to support effective delivery of scientific results from speech-to-speech research community • Online distributed speech-to-speech infrastructure for automatic performance evaluation of end-2-end systems as well as individual components • Open technological framework based on open-source Unstructured Information Management Architecture (UIMA)

  5. Key Challenge: Support Online System Combinations and Automatic Evaluations RWTH IBM ? ELDA LIMSI UKA ITC-Irst UPC

  6. Approach: Pick such an infrastructure, which… UIMAComponent Model: • …specifies a common data format understood by all speech-to-speech components • …has well-defined APIs that let the enginespass the data in and read them out • …transparently takes care of network and local connectivity options • …requires just minimum coding to plug the proprietary engines to the infrastructure • Common MUMA Type System • initialize(), process(), destroy(), … • Java/C++/… local calls or SOAP and Vinci • Concept of UIMA Annotators

  7. Unstructured Information Analysis Bridge Structured Information …. Inefficient Search Efficient Search Unstructured Information Management Architecture (UIMA) • What is UIMA? In Business Terms => the Analysis Bridge between unstructured and structured information In Technical Terms => infrastructure for integrating, processing and data managing all kinds of data driven engine entities, incl. support on monitoring • Key features • UIMA is an emerging standard for text and media processing • UIMA SDK is open source under Apache license • UIMA infrastructure supports interoperability between platforms, component interfacing via Java, C++, Python, Perl, and remote/networked services • Offers a simple XML based integration with UIMA APIs • Distributed data exchange which supports complex data structures

  8. CAS CAS Meta-data Meta-data data data UIMA Annotator Wrapper code How to make components UIMA-pluggable? • Step1: Implement the required Annotator interface -=> initiate() & process() methods • Step2: Specify Component Descriptor XML file for configuration and lifecycle • Step3: Define in and out data structures of the Type System component descriptor proprietary engine

  9. TC_STAR Speech to Speech Evaluation infrastructure Collection Processing Engine CAS CAS CAS CAS CAS evaluation target audio target audio target text target text target text source text source text source text source text pcm pcm pcm pcm pcm Upload Download Annotator API Annotator API Annotator API Annotator API Wrapper coder Wrapper coder wrapper code Wrapper code ASR SLT Evaluation TTS Vinci Name Service Evaluation Data results Evaluation Reports Evaluation Data input http

  10. TC_STAR Speech-2-Speech pan-European deployment Download Upload ASR TTS RWTH Data Web server IBM SLT Eval CPE ASR ELDA Control Web Server LIMSI SLT ASR UKA Puncuator Vinci name server ASR Rover ASR ITC-Irst Annotator UIMA/other SLT UPC Profile 1: ASR->SLT->TTS->EVAL (with ASR ROVER) Upload Profile 2 ASR->SLT->TTS->EVAL in different setup TTS

  11. Current user and status UIMA Web Control Console Annotators combination in use for the experiment Experiment ID, and the set of input data Distributed Logging and Monitoring AJAX infrastructure Links to graphical speech-to-speech evaluation results

  12. UIMA Web Control Console Processing engine Indication of active engine Path of completed processing Engine where the data are currently processed

  13. Lessons learned… • Pain in placing machines on public IPs • Firewall configuration for all participating machines, local IT people ;-) • Need to support variety of Linux distributions to host UIMA … • Partially eliminated by UIMA school development warm up • Variety of programming languages for writing Annotators • Java, C++, Perl, Python, … • Broad Requirements on Common Type System • Punctuation, Casing, Lattices • Support for individual secure data download/upload of data server • Authentication, HTTPS, Firewall rules • Web console for controlling the evaluation lifecycle • Concept of profiles, experiment ids, monitoring • Remote Logging and Debugging • Distributed logging capabilities, Logging to Web console • Reliability of components and networks

  14. Speech-to-Speech Showcases • UIMA S2S Evaluation Web Portal • The video demonstrates how S2S portal users (e.g. S2S researchers) set up, test, and evaluate speech-to-speech chains consisting of individual text and media processing components such as ASR, machine translation, TTS, etc. These components, in UIMA jargon called Annotators, are exported as Web services on public Internet and glued together by UIMA. More that 15 annotators are currently exported by IBM and EU institutes and universities. • http://www.tc-star.org/Demo/ibm/web_console_batch.swf • UIMA S2S Translation Video Console • The individual Web service components can be assembled online into remote services that provide direct value to citizens. We show a video console that translates from English to Spanish (EU parliamentary domain). Note that the three Web services involved – ASR, MT, TTS are hosted by three different sites hundred kilometers away – glued together by UIMA. • http://www.tc-star.org/Demo/ibm/video_console_near_real_time.swf

  15. Conclusion • First-of-a-kind online multi-partner speech-to-speech system demonstrated on UIMA (Jun 06-May 07) • Remote speech-to-speech components dynamically combined via UIMA infrastructure to support different combinations, e.g. ROVER • Annotators hosted on public IPs of partner’s site • The framework controlled via UIMA Web AJAX infrastructure • The open infrastructure is used to automatically set-up and evaluate individual components as well as end-to-end systems • Designed to support various use cases from research experiments to technology showcasing

More Related