1 / 33

Speech Service Creation

Speech Service Creation. NY / NJ Chapter December, 2006. An Overview of Speech Service Creation Tools. K. W. (Bill) Scholz. Agenda. Speech Applications – where we were and where we are Building speech applications today Methodologies and Tools Reusable components & packaged applications

keiji
Télécharger la présentation

Speech Service Creation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Service Creation NY / NJ Chapter December, 2006 An Overview of Speech Service Creation Tools K. W. (Bill) Scholz

  2. Agenda • Speech Applications – where we were and where we are • Building speech applications today • Methodologies and Tools • Reusable components & packaged applications • Summary of today’s Leading VUI creation tools • Highlight / compare / contrast industry’s leading tools

  3. What’s it take to build a speech app? Requirements, Use Cases, Project Plan Dialog Design & Test Call flow, Implementation, & Test Prompts, Grammars, & Test Data / Back-end Integration, & Test Unit Test, Integration Test, System Test Pilot, Limited Deployment, Analysis Full Deployment, Analysis

  4. Where We’ve Come From: Building Speech Apps • Development toolkits designed for building DTMF applications were extended to support speech • Call flows had the sound-and-feel of DTMF apps • Grammars were constructed by hand • Back-end integration coded by hand, often targeting closed-architecture information stores • Screen scraping – ‘row 12, column 37, 9 characters’ • Proprietary closed databases • Separate natural language processors driven by recognizer output required separate ‘NL’ grammars • Poor TTS quality generated need for recorded prompts

  5. Where We Are: Building speech apps today • Methodologies and Tools • Methodology: problem statement, use cases, dialog design, project management • Data / Back-end integration • Reusable components • OpenSpeech Dialog Modules • Reusable Dialog Components • Packaged applications • Testing & Analytics

  6. Current Practice Most applications use state-based dialogs • Easiest to design, debug and test for current simple applications • Natural fit with the directed dialogs that are easiest for novice users • Speech recognizer grammars are simpler to construct and therefore less error prone • As developers and users become exposed to more sophisticated dialog approaches, they will become less satisfied with state-based dialogs • Goal-directed • Conversational • Rule-based

  7. And others…… Avaya Dialog Designer IBM WebSphere Intervoice InVision Microsoft Speech .NET NetByTel (TuVox) Nortel MPS Developer (was PeriProducer) Nuance OSD Orange Nextfire OAVS Tools for Building Speech Applications • Dialog design, evaluation, call flow development back-end integration, prototype, deployment, tuning, life cycle support. • Vendors • Active: • Audium: the ‘Audium Builder’ • DBscape Vocabase • Fluency: ‘Voice Runner’ • OpenMethods: ‘OpenVXML’ • TuVox: ‘CVR’ (‘Producer’ + management & analytics) • Vicorp: ‘xMP’ • VoiceObjects: ‘VoiceObjects X6’ • Inactive: • Unisys: the ‘NL Speech Assistant’ • Unveil: ‘Conversation Manager’ • Vocalocity: ‘AppCenter’ • Support: • Eclipse – Back-end integration • Microsoft: ‘Visio’ for call flow representation • Nuance: OSI – Tuning

  8. SCE Tools: what to look for • Manipulable element – what the SCE assembles • Element detailing – how each is tailored for use • Business rule / back-end integration • Architectural model – underlying design pattern • Life cycle support – pre- and post-deployment management and testing

  9. Visio to Represent Dialog Call flow Source: Unisys ‘FFA’ design specification)

  10. Audium (Purchased by Cisco) • Audium Builder: a GUI that permits users to create and manage multiple applications • Visual elements include functions for managing databases, menus, dates and times, or phone transfers, as well as credit card or email processing. • Application creation is done by dragging elements to the workspace to construct the call flow • As elements are added their properties can be configured to load pre-recorded audio or TTS prompts, and configured to play naturally to callers. • Elements are interconnected using the GUI to assign ‘exit states’ to reach an end goal. Source: Joe Oh, Audium, (private communication)

  11. Application treeview Tools Object properties Audium

  12. DBscape Vocabase The VocaBase “Dialog Map” represents the sequence of modules, sub-modules, and steps. Clicking on any element permits access its detailed configuration.

  13. Fluency ‘Voice Runner’ Key features of this tool are: • Visual component assembly • Integrated component assembly analysis & testing • One click assembly deployment • Library of process and rule components: • Address Collection • Credit Card Verification

  14. Vicorp xMP

  15. VoiceObjects 6 Desktop • Tree structure to represent dialog design • Point-and-click authoring. • Layering includes system layers and user-built layers • Single click packages an application for deployment • Back-end integration: ‘connectors’ support both server-side scripting and J2EE code execution • Uses object-oriented concepts Source: http://www.voiceobjects.com/

  16. List of all available VoiceObjects Individual editor for voice object VoiceObjects Desktop – At a glance Components Resources Logic Actions Source: Tiemo Winterkamp, VoiceObjects (private communication)

  17. VoiceObjects Desktop - Control Center Source: Tiemo Winterkamp, VoiceObjects (private communication)

  18. Microsoft Speech (Visual Studio)

  19. Unisys ‘NLSA’

  20. NLSA Grammar Specification

  21. Vocalocity AppCenter Source: Ken Rehor - 2005

  22. OpenVXML – Open Source SCE

  23. Back-end Integration • Java, JSP, C# • Scripting languages • PERL • JSP / ASP • PHP • … • Databases • Oracle • Microsoft SQL Server • MySQL / PostgreSQL • Web Services • AJAX (Asynchronous Javascript and XML)

  24. Eclipse

  25. Testing • Unit – emulation • Callflow – WoZ or live • Usability – WoZ or live • Post deployment analytics

  26. Modules and packaged applications Modules: components and templates Component Template Application A software program designed to perform a specific set of functions A piece of software that can be combined with other pieces to construct a program A pattern used to replicate objects Source: Steve Erlich, Apptera (private communication)

  27. SCE Analysis and Evaluation • Manipulable element – what the SCE assembles • Dialog state • Object module • Conversation step • Element detailing • Properties and values • Element attributes • Prompt and grammar management • Business rule / back-end integration • Built-in primitives • Integration with Java, Web Services, Databases • Architectural model • OO? FSM? SOA? MVC? Design patterns? • Visible dialog metalanguage? • Life cycle: Deployment and post-deployment support • Reuse: create, package, and integrate reusable components • Test capability; test script generation; WoZ capability • Analytics

  28. Audium • Application Development assets • Gui is implemented using Eclipse. VISIO-like view • Inline grammars can be generated directly by the Studio • Centralized prompt management capability; recording scripts generated • OSDM integration supported (but RDCs are not) • XML dialog meta-language documented and the DTD provided • Multiple ‘Form’ elements can be combined to generate mixed-initiative dialog • Multi-user collaboration is well supported and demonstrated at customer sites • Runtime assets • Applications published as XML; interpreted by a Java runtime engine • SNMP queries are generated • Liabilities • Layering is not distinct – common database and external component references • No 3rd party application support • No automatic test script generation • No dedicated form for mixed initiative • No runtime cluster or server management • No speaker verification or video service generation capability • Elements oriented towards programmers, not towards VUI designers

  29. Vicorp • Application Development assets • Explicit separation of presentation layer from business objects layer • Visio-like presentation of application call flow. • Inline grammars with confidence levels generated from item lists • Prompt categories facilitates multiple persona and language management. • Invokes 3rd party applications by URI with arguments. • Directed dialog, mixed initiative, and sub dialogs are supported. • Runtime assets • Applications published as EAR files for execution on J2EE application server. • Service Management Console provided to mange server clusters. • Liabilities • No support for the generation of SSML for TTS • Internal XML dialog meta-language not exposed for use • No automatic testing of applications; no post-deployment analytics • No support for multi-user management or collaboration • Speaker verification and video service generation not shown • It is not possible to open multiple simultaneous projects then cut-and-paste between them.

  30. VoiceObjects • Application Development assets • Layering facilitates runtime prompt and persona remapping • Java extensions easily integrated as external resources • OSDM integration supported • Invokes 3rd party applications by URI with arguments. • XML dialog meta-language documented, DTD provided • Recording script generation by DB query • Multi-user collaboration supported: user logons with specific privileges • Runtime assets • Single runtime engine accesses all applications as data • Runtime data collection through ‘InfoStore’ and a mature Analytics package. • Extensive server cluster management, including SNMP • Support for multi-tenancy: separate JVMs launched for each tenant • Liabilities • Reusable Dialog Components are not supported • No explicit prompt management • Eclipse integration is incomplete • Confidence values not supported • No generation of SSML or recording scripts • No built-in application testing capability or test script generation capability • Natural language apps only supported by reference to external SLMs • External resources such as Java jar files are not managed by app dev environment.

  31. Supported by Multiple Leading Vendors Conclusion • Building speech applications today….. …..a bit like a marriage! Something old, something new, something borrowed, ..... Dialog modules, Packaged apps VUI built with tools ASR and TTS subsystems

  32. Summary • Overview of speech application creation process • Building speech applications today • Methodologies and Tools • Reusable components • Packaged applications • Where the field is going • Dialog description languages and tools: MI, Personalization, automatic call flow generation • SLMs, ASR & TTS improvements, Rule-Based and Case-Based Reasoning

  33. Thank You. K. W. (Bill) Scholz, Ph.D. Home: +1 610.989.0989 Mobile: +1 610.212.8016 bill.scholz@comcast.net

More Related