1 / 39

The Speech Speech

The Speech Speech. casey chesnut brains-N-brawn.com Madison .NET April 2007. Powerpoint. Page Up Page Down. brains-N-brawn.com. Pervasive Computing Tablet PC (MVP 03) Compact Framework (MVP 04) Advanced Web Services (MVP 05) Media Center (MVP 06) Speech Location Based Services

rhonda
Télécharger la présentation

The Speech Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Speech Speech casey chesnut brains-N-brawn.com Madison .NET April 2007

  2. Powerpoint • Page Up • Page Down

  3. brains-N-brawn.com • Pervasive Computing • Tablet PC (MVP 03) • Compact Framework (MVP 04) • Advanced Web Services (MVP 05) • Media Center (MVP 06) • Speech • Location Based Services • Artificial Intelligence • 3D

  4. Outline • Speech Overview • Vista Speech Recognition • SAPI 5.3 / System.Speech • Speech Server 2007

  5. Outline : Speech Overview • Voice User Interface • How does it work? • Synthesis (TTS) • Recognition (SR)

  6. Overview • Speech is just another presentation system • Synthesis = Output to user • Recognition = User input • Voice User Interface (VUI)

  7. VUI Modes • Applications • Multi-modal • Voice-only

  8. VUI Tips • Don't replicate the touch-tone-based menu system • Restrict options on the main (opening) menu to 4 or fewer • Make sure your opening greeting is short • Don't design the app solely for the new user • Focus on task completion above all • What can I say? http://blogs.msdn.com/anandis_thoughts/archive/2006/02/08/528181.aspx

  9. Speech Synthesis • Text to Speech • Dynamic • Prompt database

  10. How Synthesis Works • Text parsing • Sentences, numbers, symbols, pauses • Natural language processing • Part of speech, tense • Phonemes are looked up or sounded out • Diphones are appended together • Post process audio to add emphasis • Play speech audio

  11. Demo /xnaSynth app Article http://www.brains-N-brawn.com/ttSpeech/ http://www.brains-N-brawn.com/xnaSynth/ (codebase from /ttSpeech) How Synthesis Works

  12. Speech Recognition • Speech to Text • Dictation • Command and Control

  13. Audio signal is processed Look for signals which might be speech Phonemes are found in audio signals Phonemes are mapped to a dictionary or words Dictation or grammar-based Apply natural language processing How Recognition Works

  14. How Recognition Works • Demo • /wavReader app • Article • http://www.brains-N-brawn.com/noReco/ • http://www.brains-N-brawn.com/speakerVerify/ (codebase from /noReco)

  15. Built-in to Vista’s shell Microphone bar Language support Can be trained to improve accuracy Command-and-control, also Dictation Automagic application support Horrible Office integration UAC problems Outline : Vista Speech Recognizer

  16. Demo • Say what you see • Show numbers • Correct • Spell it • Mouse grid http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/

  17. High Risk Demo

  18. Hack http://news.bbc.co.uk/1/hi/technology/6320865.stm • /micBarExtend – tap and talk

  19. Narrator • Vista’s screen reader

  20. Desktop applications SAPI 5.3 System.Speech Outline : SAPI 5.3 / System.Speech

  21. SAPI 5.3 • COM based • Native applications • Managed apps which need more control

  22. System.Speech • Part of .NET 3.0 WPF • Managed wrapper built on SAPI 5.3 • Simple API • Standards support (SSML, SRGS) • Language support • Vista Speech Recognition integration • Does not work in XBAP

  23. System.Speech.Synthesis • SpeechSynthesizer • SSML • PromptBuilder • Voices

  24. System.Speech.Synthesis • Demo • /speechSamples - /speechSynth

  25. System.Speech.Recognition • SpeechRecognizer / SpeechRecognizerEngine • SRGS • GrammarBuilder • Advanced users • Deep-link functionality • Mixed initiative

  26. System.Speech.Recognition • Demo • /speechSamples - /speechReco

  27. System.Speech • Demo • /micBarExtend • /mceSapiMcpl • Article • http://www.brains-N-brawn.com/speechSamples/ • http://www.brains-N-brawn.com/micBarExtend/ • http://www.brains-N-brawn.com/mceSapi/ (not updated for Vista yet)

  28. What about Mobile Devices • OEMs can add VoiceCommand • VoiceCommand is not accessible to developers • WindowsMobile has the SAPI API, but no engines • PlatformBuilder is supposed to have engines • There are 3rd party engines for purchase

  29. Outline : Speech Server 2007

  30. Speech Server 2007 • Telephony Applications • Outgoing calls • Speaker Independent

  31. VOIP Language support VoiceXML / SALT Workflow development model Reports Still in beta Speech Server 2007

  32. Speech Server 2007 • Speech Synthesis • Inline • PromptBuilder • SSML • Prompt databases • Speech Recognition • Inline • Dynamic Grammar • SRGS • Conversational Grammar Builder • DTMF

  33. VoiceXML • Declarative language • Article • http://www.brains-N-brawn.com/vxml/ • http://www.brains-N-brawn.com/myVoices/ • http://www.brains-N-brawn.com/voiceBio/

  34. SALT • Yet another declarative language • Multimodal support has been dropped • Article • http://www.brains-N-brawn.com/noHands/ • http://www.brains-N-brawn.com/speechMulti/ • http://www.brains-N-brawn.com/tabletWeb/ • http://www.brains-N-brawn.com/mceSalt/

  35. Speech Workflow • Speech Sequence Workflow designer • Speech activities • Statement • QuestionAnswer • Debugging tools

  36. Speech Workflow • Demo • /speechTextAdv • /speakerVerify • /mobileRecord • Article • http://www.brains-N-brawn.com/speechTextAdv/ • http://www.brains-N-brawn.com/speakerVerify/

  37. Where • Accessibility • Telephony • Telematics • Home automation • Mobile Devices / Tablets • Gaming • Warehouses • …

  38. Possible Future • Telematics • Service Pack for Office Support • Exchange Server 2007 • Speech Server 2007 release • Rumors that WindowsMobile will get a public API • Dictation has room to improve • Hope that System.Speech will ultimately work in XBAP

  39. Questions

More Related