Natural Language Processing and Speech Enabled Applications

Natural Language Processingand Speech Enabled Applications by Pavlovic Nenad

Presentation Content • What is natural language processing • Speech synthesis • Speech recognition • Natural language understanding • Basic concepts and terms • Types of speech recognition engines • Hardware requirements • How speech recognition/synthesis works • Speech enabled applications • Applications of speech enabled system • Commercial & non-commercial software

Natural language processing • Natural Language Processing (NLP) or Computational Linguistic (CL) “is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty” [1]. • “It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science that is aiming at computational models of human cognition” [1].

Natural Language Processing • Other words, NLP is a discipline that aims to build computer systems that will be able to analyze, understand and generate human speech. • Therefore, NLP sub areas of research are: • Speech Recognition (speech analysis), • Speech Synthesis (speech generation), and • Natural Language Understanding (NLU).

Speech Recognition & Synthesis • Speech recognition is the process of converting spoken language to written text or some similar form. • Speech synthesis is the process of converting the text into spoken language.

Natural Language Understanding • Natural Language Understanding (NLU) is a process of analysis of recognized words and transforming them into data meaningful to computer. • Other words, NLU is a computer based system that “understands” human language. • NLU is used in combination with speech recognition.

Basic Terms and Concepts • Utterance is any stream of speech between two periods of silence. • Pronunciationis what the speech engine thinks a word should sound like. • Grammarsdefine a domain (of words) within which recognition engine works. • Vocabulary (dictionary)a list of words (utterances) that can be recognized by the speech recognition engine. • Trainingis the process of adapting the recognition system to a speaker.

Basic Terms and Concepts • Accuracy is the measure of recognizer’s ability to correctly recognize utterances. • Speaker Dependence • Speaker dependent systemis designed for only one user (at the time). • Speaker independent systemis designed for variety of speakers.

Types Of Speech Recognition • Speech recognizers are divided into several different classes according to the type of utterance that they can to recognize: • Isolated words, • Connected words, • Continuous speech (computer dictation) • Spontaneous speech • Voice Verification • Voice Identification

Hardware Requirements • Natural Language Processing requires string systems in order to work accurately and with a minimum response time. • The important hardware parts are: • Sound Card • Microphone • Processor/RAM

How speech synthesis works? • There are five major steps in the process of speech synthesis: • Structure analysis: process the structure of the input text. • Text pre-processing: analyze input text for special constructs of the language. • Text-to-phoneme conversion: converts each word to phonemes (e.g. “times” = “t ay m s”). • Prosody analysis: determining appropriate prosody for the sentence (e.g. pitch, timing, pausing, etc…). • Waveform production: phoneme and prosody information is used to produce the audio waveform.

How speech recognition works? • The basic characteristics of mostly used speech recognizers are: • Mono-lingual, • Process a single input at the time, • Can optionally adopt to the voice of speaker, • Grammars can be dynamically updated, and • Has a small defined set of properties.

How speech recognition works? 1. Grammar design: Defines the words that may be spoken by a user and the pattern in which they may be spoken. 4. Word recognition: Compare the sequence of likely phonemes against the words and patterns of words specified by grammar. 5. Result generation: Provides the information about the words that recognizer has detected. 3. Phoneme Recognition: Compare spectrum patterns To the patterns of the phonemes. 2. Signal Processing: Analyze the spectrum (frequency) characteristics of the incoming audio. Holds the knowledge of the environment (how user pronounces Phonemes) – User profile.

Speech Enabled Applications -1 • The primary aim of speech enabled applications is to improve interaction between user and machine. • For this purpose are used both speech recognition and synthesis or either one of them. It mostly depends of the type of application and its purpose.

Speech Enabled Applications -2 • Speech synthesis is farley easy for usage. After setting up the “type” of voice, the speed of “speaking”, the duration of pause between sentences, and so on, speech synthesis engine is ready for usage.

Speech enabled applications -3 • Applying speech recognition requires careful analysis of what could be the possible inputs to the system, and the way in which user provides the input. • The way in which user provides the input to the system, and the way the application responds to the user is called Natural Language Dialog. • Natural Language Dialog is the first decision that developer must to make.

Natural Language Dialog -1 • Three essential types of interaction that are available to software applications are: • Direct dialog, • Mixed initiative dialog, and • Natural dialog.

Natural Language Dialog -2 • Direct Dialog Interaction directs the user to perform a specific task by asking for information at each turn and expecting the specific words or phrases in response. System: “Welcome to ABC bank customer services system. Please say your name.” User: “Nenad Pavlovic” System: “Please say your account number.” User: “1234-123-12332-1233” System: “Would you like to perform a transfer or to see the status on your account?” User: “Transfer.”, etc…

Natural Language Dialog - 3 • Mixed initiative dialog Is similar to previous interaction dialog but it gives speaker some freedom. However, it allows user to have as much as little control as s/he desire. System: “Welcome to ABC bank customer services system. Please say your name.” User: “My name is Nenad Pavlovic, and my account number is: 1234-123-12332-1233” System: “Would you like to perform a transfer or to see the status on your account?” User: “Show me the status and than go to transfers.”, etc…

Natural Language Dialog - 4 • Natural dialog Allows user to enjoy a more unstructured interaction with an application (as natural as possible) System: “Welcome to City Directory Dialer, how can I help you?” User: “I’d like to call Mr. George Eleftherakis in Tsimiski building.” System: “George Eleftherakis – Tsimiski building. Is this correct?” User: “Yes” System: “George Eleftherakis is found in directory. Calling…”, etc…

Grammars vs. Statistical NLU • More freedom is given to the user to interact with application, the more complex processing of input data become. • According to complexity of possible user inputs and used interaction dialog, it will be used on of two approaches of implementation: • Grammar-based NLU • Statistical NLU

Grammars vs. Statistical NLU • Grammar-based NLU: relies on defining (creating) the grammar, which means constructing the phrases and stating all posible words that can be used. • Advantages: fast, allows freedom of phrases construction. • Disadvantages: used only for small set of phrases and words, if word or phrase is not defined it will not be recognized.

Grammars vs. Statistical NLU • Statistical NLU:relies on usage of statistical model of utterances derived from actual conversation data. • Advantages: huge set of phrases and words • Disadvantages: slow, difficult to add new phrases.

Uses of speech applications • The speech technology is mostly used in the following areas: • Dictation • Command and Control • Telephony • Wearables • Medical Disabilities • Embedded Applications

Speech Systems • Commercial • IBM’s ViaVoice (Linux, Windows, MacOS) • Dragon NaturalySpeaking (Windows) • Microsoft’s Speech Engine (Windows) • BaBear (Linux, Windows, MacOS) • SpeechWorks (Linux, Sparc & x86 Solaris, Tru64, Unixware, Windows) • Non-commercial • OpenMind Speech (Linux) • XVoice (Linux) • CVoiceControl/kVOiceControl (Linux) • GVoice (Linux)

Conclusion • Developers’ perspective: developing speech enabled application does not require redesigning or explicitly designing systems to support speech. It is treated and “attached entity” and can be viewed as separate module. Also, It does not require special linguistic or programming skills. • Business perspective: usage of speech enabled applications can noticeable improve the accuracy and effectives of employees that work with big number of data or people or both.

Thank you  Pavlovic Nenad pavlovic@city.academic.gr

References • [1] Radev, R., D.(2001), “Natural Language Processing FAQ”, Columbia University, Dept. of Computer Science, NYC.

Natural Language Processing and Speech Enabled Applications