Natural Language Vocal Interaction Technology for Live and Virtual Entities

Problem Description and Motivation • Overall Goal: Produce a technology allowing natural language vocal interactions between live and virtual entities. • Overall Motivation: Enable • Reduced staffing requirements appropriately. • Virtual team members. • Virtual trainers/coachers/advisors.

3 Phases of Automated Speech Processing Speech to Text Natural Language Processing MIC Text to Speech SPKR

Speech-To-Text (STT) Processing • Purpose: Convert the spoken word to text. • Techniques: • Match signal (digitized) to dictionary of sounds and words. • Improve accuracy via syntactic analysis (not semantic). • Improve accuracy by tracking history of the speaker. • Challenges: • Differences in speech between persons/genders • Differences in pronunciation given by the same person over time and in different situations.

Speech-To-Text (STT) Processing (cont) • Notes: • Quite a bit of research currrently in this area • Bell Labs • Carnegie Mellon University (SPHINX) • Commercial Products available and used successfully. • VivaVoice • DragonNaturallySpeaking • MSSpeech API • Speaker-dependent systems achieve about 95% accuracy. • Speaker-independent systems may have poor accuracy, rely on limited vocabularies. • Future systems will probably be multi-modal (voice and gesture, voice and touch-screen)

Text-To-Speech (TTS) Processing • Purpose: Converting text to spoken word. • Techniques: • Match text to phonetic dictionary of sounds/words. • Incorporate emotional content by intonations as suggested by punctuation and context. • Direct changes in pitch, volume, and speed to be imbedded explicitly in the text using special symbols (XML). • Challenges: • Lack of models relating intonations to emotion or intent. • Technically difficult to reproduce natural sounds. • Notes: • Commercial product available; not heavily researched by IST.

Natural Language Processing (NLP) • Purpose (1): Extract meaning from text. • Purpose (2): Compose text conveying a specific meaning. • Techniques: • Parse sentences, often using Finite State Machine models ot tree-like data structures. • Store meaning in a knowledge representation database, often rule-based or realtional database. • Produce sentences using parse trees and semi-random word selection to compose sentences.

Natural Language Processing Challenges • Parsing Sentences: Syntax and semantics are intertwined in natural languages. • Storing Meaning: Knowledge representation is still a difficult problem. Number of rules and relationships required to cover non-trivial domains is large. • Extracting Meaning from text: • Word meanings change when context changes. • Idioms, metaphors, and similes provide challenges. • Emotional content colors meaning (e.g. sarcasm or humor)

Natural Language Ambiguity • Lexical Ambiguity - one word, many meanings • Stay away from the bank. • Structural Ambiguity - one sentence, differents grammatical structure. • He saw that gasoline can explode. (Source: Winograd, "Computer Software for Working with Language)

Natural Language Ambiguity • Deep Structural Ambiguity - Same sentence, same grammatical structure, different meaning. • The chickens are ready to eat. • Semantic Ambiguity - Same phrase can have two meanings. • David wants to marry a Norwegian. • Pragmatic Ambiguity - confusin use of pronouns. • She dropped a plate on the table and broke it. (Source: Winograd, "Computer Software for Working with Language)

NLVI Overview: Targets Military Trainers Immersive Command Environments Station

Natural Language Vocal Interaction Technology for Live and Virtual Entities