1 / 10

Problem Description and Motivation

Problem Description and Motivation. Overall Goal: Produce a technology allowing natural language vocal interactions between live and virtual entities. Overall Motivation: Enable Reduced staffing requirements appropriately. Virtual team members. Virtual trainers/coachers/advisors.

connor
Télécharger la présentation

Problem Description and Motivation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problem Description and Motivation • Overall Goal: Produce a technology allowing natural language vocal interactions between live and virtual entities. • Overall Motivation: Enable • Reduced staffing requirements appropriately. • Virtual team members. • Virtual trainers/coachers/advisors.

  2. 3 Phases of Automated Speech Processing Speech to Text Natural Language Processing MIC Text to Speech SPKR

  3. Speech-To-Text (STT) Processing • Purpose: Convert the spoken word to text. • Techniques: • Match signal (digitized) to dictionary of sounds and words. • Improve accuracy via syntactic analysis (not semantic). • Improve accuracy by tracking history of the speaker. • Challenges: • Differences in speech between persons/genders • Differences in pronunciation given by the same person over time and in different situations.

  4. Speech-To-Text (STT) Processing (cont) • Notes: • Quite a bit of research currrently in this area • Bell Labs • Carnegie Mellon University (SPHINX) • Commercial Products available and used successfully. • VivaVoice • DragonNaturallySpeaking • MSSpeech API • Speaker-dependent systems achieve about 95% accuracy. • Speaker-independent systems may have poor accuracy, rely on limited vocabularies. • Future systems will probably be multi-modal (voice and gesture, voice and touch-screen)

  5. Text-To-Speech (TTS) Processing • Purpose: Converting text to spoken word. • Techniques: • Match text to phonetic dictionary of sounds/words. • Incorporate emotional content by intonations as suggested by punctuation and context. • Direct changes in pitch, volume, and speed to be imbedded explicitly in the text using special symbols (XML). • Challenges: • Lack of models relating intonations to emotion or intent. • Technically difficult to reproduce natural sounds. • Notes: • Commercial product available; not heavily researched by IST.

  6. Natural Language Processing (NLP) • Purpose (1): Extract meaning from text. • Purpose (2): Compose text conveying a specific meaning. • Techniques: • Parse sentences, often using Finite State Machine models ot tree-like data structures. • Store meaning in a knowledge representation database, often rule-based or realtional database. • Produce sentences using parse trees and semi-random word selection to compose sentences.

  7. Natural Language Processing Challenges • Parsing Sentences: Syntax and semantics are intertwined in natural languages. • Storing Meaning: Knowledge representation is still a difficult problem. Number of rules and relationships required to cover non-trivial domains is large. • Extracting Meaning from text: • Word meanings change when context changes. • Idioms, metaphors, and similes provide challenges. • Emotional content colors meaning (e.g. sarcasm or humor)

  8. Natural Language Ambiguity • Lexical Ambiguity - one word, many meanings • Stay away from the bank. • Structural Ambiguity - one sentence, differents grammatical structure. • He saw that gasoline can explode. (Source: Winograd, "Computer Software for Working with Language)

  9. Natural Language Ambiguity • Deep Structural Ambiguity - Same sentence, same grammatical structure, different meaning. • The chickens are ready to eat. • Semantic Ambiguity - Same phrase can have two meanings. • David wants to marry a Norwegian. • Pragmatic Ambiguity - confusin use of pronouns. • She dropped a plate on the table and broke it. (Source: Winograd, "Computer Software for Working with Language)

  10. NLVI Overview: Targets Military Trainers Immersive Command Environments Station

More Related