Review of Spoken Language Understanding in Dialog Systems

Review ofSpoken Language Understanding in Dialog Systems Rohit Kumar 11-716: Fall 2006 October 3, 2006

Not much emphasis on detailed algorithms • Instead Emphasis on comparison across approaches • And some common principles observable across the approaches

Spoken Language Understanding CMU Ravenclaw/Olympus Architecture Reference: http://www.ravenclaw-olympus.org/

Spoken Language Understanding Show me morning flights from Boston to San Francisco on Tuesday

Spoken Language Understanding SHOW: FLIGHTS: ORIGIN: CITY: Boston DATE: DAY-OF-WEEK: Tuesday TIME: PART-OF-DAY: morning DEST: CITY: San Francisco Show me morning flights from Boston to San Francisco on Tuesday Get some sort of a semantic representation from an utterance

Spoken Language Understanding Boundaries of Spoken Language Understading • Potentially, tight coupling between ASR and Parser. • In some cases, parser constraining the ASR hypothesis generation • Which is good if your grammar can robustly and correctly cover the language coming into the ASR

Spoken Language Understanding Boundaries of Spoken Language Understading

Spoken Language Understanding Boundaries of Spoken Language Understading Parser Robust / Non-Robust Syntactic / Semantic Semantic Labeler (Probabilistic) Hidden state identify semantic slot labels Sort of like Named Entity identification For this talk, I will typically look at SLU in this view

Spoken Language Understanding Boundaries of Spoken Language Understading Confidence Annotator Optional component: Confidence Annotator

In our (minimalist) view of SLU • Loose coupling between ASR and Parser. • Most likely, the ASR is producing language based on a stochastic language model. • Allows for robustness and recovery in the ASR when dis-fluent speech arrives. • Requires to the Parser to be able to deal with artifacts of spontaneous speech like • ungrammatical, ill-formed, and unseen • Contains: Stutters, filled pauses, restarts, repeats, etc.

Typical SDS development at CMU Lets concentrate on SLU development for typical dialog systems around here (e.g. ConQuest) We use a Parser (Phoenix: Robust Parsing using Hand-Written Semantic rules and CFG-ish parsing algorithm) • Some people write the first set of rules to parse the kind of sentences they can think • [query_type]When are the [info_type]papers by [author_name]Alan Black • [query_type]When is the [info_type]session on [keywords]Spoken Dialog And few variations • Do some pilot testing ? Get some real language • Write more rules to incorporate the newly discovered language (and its generalizations) • Repeat 2-3 times • Do some user study ? • Write more rules to incorporate the newly discovered language (and its generalizations) • Repeat 1-2 times before first launch • Then continued improvement efforts for months (may be years) to come ! This procedure typically works ! Pretty well too.

Problems Too much work (hand written rules) Usually very domain specific (so no re-use / adaptability!) Costly (in time and hence money) !!! Discreteness Parse (or No Parse  rarely; due to robustness) no {explicit} measure of accuracy of parse However, heuristics exist (like parse fragmentation) are used Nonetheless, lets try to see some SLU approach in terms of the above problems Also lets add another concern that have been mentioned in the papers Robustness Spontaneous speech (particularly through a speech recognizers) is often ungrammatical, ill-formed, and unseen Contains: Stutters, filled pauses, restarts, repeats, etc.

Systems / Approaches • Gemini (SRI, Dowding et. al.) • Phoenix (CMU, Ward et. al.) • Tina (MIT, Seneff) • Probabilistic Approaches • HMM based semantic labeling, AT&T, Pieraccini et. al. Extensions • Hidden Understanding Model, BBN Miller et. al. • Hidden Vector State Model, Cambridge U., He and Young • Clustering Approach, Cambridge U., Ye and Young

One way of looking at SLU Approaches (Seneff.) Syntactically Based (e.g. Gemini): Complete syntactic analysis is performed which attempts to account for all words in an utterance Pro: Provides strong linguistic constraints to the ASR Useful structure for further linguistic analysis Con: Can break down in presence of unknown words, novel linguistic constructs and recognition errors and other spontaneous speech events (repeats, restarts, etc.) Semantic driven approaches (Phoenix, Pieraccini et. al.): Spotting keywords and phrases in an utterance (and labeling them) Pro: Provides better coverage Can deal with ill-formed sentences Con: Less constraints for the ASR May not provide enough information for complex linguistic analysis

Gemini • Hard - core NLP approach • Too much hand coding !! Look at their rules • Reference mentions: 243 syntactic and 315 semantic rules (just at first stage)

Gemini Stage 1: Constituent Parser (bottom-up, all paths): Lexical, Syntactic and Semantic rules are applied to constituents in the process of building a chart of constituents edges of the chart contain syntactic, semantic and logical form information (derived from the rules applied during constituent parsing) Interleaved syntactic and semantic processing Both of these constraints have strong dependencies when adding fragments to the chart "Only if syntactic edge leads to a well sorted semantically accepted logical form fragment is it added to the chart" **Emphasis on blurring the line between syntactic and semantic processing** Stage 2: Utterance Parser Second set of syntactic and semantic rules are applied These rules span entire utterances Tries to find interpretation (chart edges) which span the whole utterances If none found, recognize and correct certain grammatical disfluencies and try again e.g. Talk by Alex Acero no Alex Rudnicky on Monday Afternoon e.g. Session today today evening Best parse is chosen by a parse preference mechanism (another bunch of rules) Stage 3: Build Quasi-Logical forms from the best selected utterance parse

Gemini • Our Issues: • Hand Coding: Yes  • Too much • Re-use: Not really  • Discreteness: Yes  • Robustness: Not really 

Phoenix • "Try to model the information in a utterance rather than its grammatical structure“ • Tries to optimize on the correctness of extracted information • Partial interpretations is sufficient for the system to take the desired action • Even if not, it provides sufficient basis for clarification dialog with the user

Phoenix Frame based parser which tried to parse as much input as possible Recursive Transition Network to encode semantic grammars • Grammar specifies words patterns which correspond to semantic tokens • Subset of tokens are considered top level tokens • Top level tokens appear as slots in frames • Frames associate tokens filled in slots with functions • Not really an utterance level grammar • Instead slot level grammars exist allowing individual slot to be filled independent of the order • So rules do not need to capture all the orders in which slots can appear (Robustness) Over-generative patterns (without effecting semantics) Multiple versions of the same frame will get formed and scoring is done based on how many words that frame accounts for

Phoenix • Our Issues: • Hand Coding: Yes  • But its not as much as Gemini (relatively easier) • Re-use: Not really  • Discreteness: Yes  • Robustness: Yes 

Tina Pretty much phoenix like but not really Attempts a full parse of the sentence with a CFG-ish grammar with a mix of syntactic and semantic nodes Considers the parses in a order of most probable node. Tends to find most probable complete parse (details in the paper) Probabilities are calculated from data To bring the best of both syntactically orientated and semantically inclined approaches together Gradually relax constraints that the syntactic analysis must account for all words in the sentence Search in 2 stages Stage one searches for complete linguistic analysis Mix of Syntactic and Semantic node names in the network Stage two relaxes constraints to parse fragments

Tina • A typical Tina parse tree • Mix of Syntactic and Semantic elements in the tree

Tina • Our Issues: • Hand Coding: Yes  • But its not as much as Gemini (relatively easier) • Re-use: May be (but I would think no)  • Discreteness: No  • Robustness: Yes 

Probabilistic Semantic Labeling • Generative model (HMM): Hidden state identify semantic slot labels • Dummy state used to capture words which do not fill any semantically important slots • Trained on "hand-"labeled data

Probabilistic Semantic Labeling • Our Issues: • Hand Coding: Yes • But not rules  • Instead requires: Labeling of Training data • Relatively less work • Can be made semi-automatic (2 step) • Expert not needed for long-term (to right rules) • labeling is relatively less costly • can be merged with the transcription task • Re-use: Not really, Label everytime  • Discreteness: No  • Robustness: Yes (typically) 

Probabilistic Semantic Labeling New Problems Data Fragmentation Flights for Boston{origin} to Pittsburgh{destination} But Boston and Pittsburgh are both cities too and eligible candidates to be either origin or destination So our in training data, we can count each of the city for any one of the two classes ! Need for Hierarchy Bottomline: Better model configuration How much training data ? (Pieraccini et. al.) 44 cases (labels) Training set: 547 sentences (501 unique words) They too reported problems of mis-labeling in unseen data

Hidden Understanding Model • Add hierarchy to the HMM based semantic labeling task • They look NLU in 3 step process • Parsing • Semantic Interpretation • Discourse Model (Lets just look at the first two)

Hidden Understanding Model Parsing: Syntactic in form Semantic as well as Syntactic in content Advantages (as mentioned in their paper) Annotations schemes are consistent and well-formed (with possibility of partial re-use) Decoding: Syntactically as well as Semantically coherent parses are searched for by the parser Semantic Interpretation: Semantic labels immediately available and identify basic units of meaning (2 and 3 dont sound too good to me: May be they were 11 years back!) • Each parse is a path in a Recursive transition network • Each state in the network are the various semantic and syntactic classes • Probability of a parse is product of all transition probabilities on its path • Training is done by labeled data • There is a variant to the Earley parsing algorithm which can search the parse in the RTN

Hidden Understanding Model Point to be noted Syntactic and Semantic mix works smoothly and line between semantic and syntactic parsing is blurred ! (we observed the same in Gemini too which can be considered a different class of SLU) However, Syntactic classes have a re-use value Semantic classes: have a task-use value Semantic Interpretation: Augmented tree (with attached instructions) to fill in slots

Hidden Vector State Model • CFG has good expressive power however need hand written rules • They want to have a stochastic model which is closer to CFG in terms of expressive power of hierarchies • Markov Model states are associated with stacks

Hidden Vector State Model • Experiments with Robustness and Adaptability • Noise Corruption of speech • Substantial effect on ASR performance (WER) • Relative influence on semantic parser (slot/value retrieval rate) was somewhat less • Adaptation across domains • Using only 50 adaptation sentences (from out domain) • F-measure restored to 89.2% (compared to 83.8% with no adaptation) • Close to the F-measure of indomain (89.5%)

Clustering Approach • Cluster and then label • Clusters could mean anything • typically discovered in an unsupervised fashion • k-means, y-clustering • Actions can be associated with the cluster

Conclusions • We studied the various SLU approaches and some variations of how people have looked at the problem • Different paradigms of solving the problem are discussed with associated problems and advantages • Re-Use/Adaptability • Cost • Robustness • Discreteness • Also, we saw that the line between semantic and syntactic parse is pretty blurred in practice

References • Jurafsky and Martin,Chapter 19, Dialogue and Conversational Agents Pg 14 – 19 • Dowding et. al.Gemini: A natural language system for Spoken Language Understanding • Ward and Issar,Recent Improvements in CMU Spoken Language Understanding system • Seneff,Robust Parsing for Spoken language systems • Seneff,TINA: A probabilistic syntactic parser for speech understanding systems • Pieraccini et. al., Stochastic representation of conceptual structure in the ATIS task • Miller et. al, A fully statistical approach to natural language interfaces • He and Young, Robustness issues in a data driven spoken language understanding system • He and Young, Hidden vector state model for hierarchical semanatic parsing • Ye and Young, A clustering approach to Semantic decoding

Review of Spoken Language Understanding in Dialog Systems

Review of Spoken Language Understanding in Dialog Systems

Presentation Transcript

belief updating in spoken dialog systems

Spoken dialog

Belief Updating in Spoken Dialog Systems

Spoken Dialog Systems

User Interactions in Spoken Dialog systems

An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems

Belief Updating in Spoken Dialog Systems

Conceptual Language Model Design for Spoken Language Understanding

Subproject III - Spoken Language Systems

Spoken Language Understanding

Research Challenges for Spoken Language Dialog Systems

Spoken Dialog System Architecture

Research Challenges for Spoken Language Dialog Systems

Discriminative Models for Spoken Language Understanding

Stochastic Language Generation for Spoken Dialog Systems

Prosody in Spoken Language Understanding

Spoken Language Understanding

Belief Updating in Spoken Dialog Systems

Belief Updating in Spoken Dialog Systems