PPT - MRCPv2 Update and Open Issues for SIP/Enrollment Functions PowerPoint Presentation

MRCPv2 Sarvi Shanmugham Cisco Systems Inc.

Status • At version 00. • Edits from the last meeting added. • Awaiting addition of SI/SV functionality. • A proposal for SI, SV, Enrollment, Hotword & Recording is out as draft currently for MRCPv1 support. • Soon to be integrated into MRCPv2.

Open Issues • Proxy support. • Need to add call flows describing how MRCP proxies would work. • Need to allow MRCP server/proxy to do a SIP re-INVITE and redirect the media as necessary. • Starting/Stopping media from the client to the server for a RECOGNIZE method. • Recording support • Need support for a separate Recording resource. • Does this recording resource relate to Recognizer capability to record utterances. • Does this recording resource relate to Speaker Verification engines capability to record/buffer utterances. • Does the utterance recording capability of Recognizers relate to Speaker verification needs to record utterances or buffer speech.

Open Issues • Resource Types or Profiles • Need to support separate resource types for things like DTMF recognizer, Audio Player, Poorman TTS. • These may not require completely different resource state machines or methods and can be addressed by the same methods as Recognizer and Synthesizer resources. We could address them with resource sub-types or profiles. Any suggestions. • NLSML Vs EMMA • Proposal to add EMMA as a SHOULD have for MRCPv2. • We would continue to have support NLSML for backward compatibility.

Open Issues • Multiple media sessions under a single MRCP session. • Do we need support for this. • If there is a need for a separate media session for Recognition and SI and SV and Recording. Why have it under a single MRCP session? Would it be a reason to have a separate MRCP session? • Do we need to support multiple active SPEAK requests with capability to switch between them by pausing and resuming the different speak request. • Do we need to rename START-OF-SPEECH event to see that it applies for DTMF as well as speech. Should it be START-OF-INPUT?

Open Issues • Content Management • If we add support for Recorders, we may need mechanisms to fetch the recorded audio. • Can we do this content management using HTTP instead of adding new methods into MRCPv2? • Do we need to add support for Grammar management at the server level? • Do we need the capability to delete grammars within a session? • Do we need to support the management Speech content? • Is there a need for Intermediate Recognition Results? • Cookies?

MRCPv2 Update and Open Issues for SIP/Enrollment Functions

Presentation Transcript

MRCPv2 – the end of proprietary speech APIs?