1 / 72

Planning, Optimizing, and Characterizing

Three approaches to dialogue management. Planning, Optimizing, and Characterizing. Presented by Lee Becker October 21, 2009. Introduction. “ The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.”

gunda
Télécharger la présentation

Planning, Optimizing, and Characterizing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Three approaches to dialogue management Planning, Optimizing, and Characterizing Presented by Lee Becker October 21, 2009

  2. Introduction “The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.” – Dorothy Neville

  3. Sample Dialogue

  4. The Dialogue Management Problem • Giving an appropriate response • Understanding what was said and how it fits into the overall conversational context • Responding with intention • Obeying social norms • Turn-taking • Feedback / Acknowledgment

  5. Dialogue as Planning “Words are also actions, and actions are a kind of words.” – Ralph Waldo Emerson

  6. System View of Dialogue Management Dialogue Manager User Utterance Response

  7. Planning Agents • Maintain state of the world (beliefs) • Predetermined wants (desires) specify how the world should look • Select goals (intentions) • Build/Execute Plan • Belief Monitoring BDI Architecture

  8. Blocks World Example • Init(On(A, Table) ^ On(B, Table) ^ On(C, Table) ^ Block(A) ^ Block(B) ^ Block(C) ^ Clear(A) ^ Clear(B) ^ Clear(C)) • Goal(On(A,B) ^ On(B,C)) • Action( Move(b,x,y)) • Precondition: On(b,x) ^ Clear(b) ^ Clear(y) ^ Block(b) ^ (b != x) ^ (b != y) ^ (x != y) • Effect: On(b,y) ^ Clear(x) ^ ~On(b,x) ^ ~Clear(y) • Action (MoveToTable(b,x)) • Precondition: On(b,x) ^ Clear(b) ^ Block(b) ^ (b !=x) • Effect: On(b,Table) ^ Clear(x) ^ ~On(b,x) A B B C A B C A C

  9. Speech acts and planning • Planning is intuitive for physical actions • How can utterances fit into a plan? • “Can you give me the directions to The Med?” • “Did you take out the trash?” • “I will try my best to be at home for dinner” • “I name this ship the "Queen Elizabeth”” • Speech Acts (Austin, Searle) • Illocutionary Force • Performative Action

  10. Speech Acts Meet AI • Allen, Cohen, and Perrault • Speech Acts Expressed in terms of • Preconditions • Effects • Related to change in agent’s mental states • Plans are sequence of speech acts

  11. Example Speech Acts • REQUEST(speaker, hearer, act): • effect: speaker WANT hearer DO act • INFORM(speaker, hearer, proposition): • effect: KNOW(hearer, proposition)

  12. TRAINS • A descendant of the Allen, Cohen, and Perrault BDI+Speech Acts tradition • Conversational Agent for Logistics and planning • Users converse with a “Manager” to develop a plan of action in the TRAINS domain.

  13. Sample TRAINS Scenario City B OJ Factory Orange Source City I Empty Car Engine E3 Empty car City G Banana Source

  14. Deliberative Agent

  15. Communicative Agent

  16. Discourse Obligations • BDI does not account for what compels one speaker to answer another • Two Strangers Example: • A: Do you know the time? • B: Sure. It’s half past two. • Answering Person B’s questions does not help Person A attain his goals.

  17. Discourse Obligations • Obligations – Like Speech Acts, produce observable effects on the speaker.

  18. Discourse Obligations • Inherent tension between Obligations and Goals • Approaches • Perform all obligated actions • Perform only actions that will lead to desired state • A blend of the other two approaches

  19. TRAINS Discourse Obligations loop if system has obligations then address obligations else if system has performable intentions then perform actions else deliberate on goals end if end loop

  20. TRAINS Discourse Obligations • Ensure system cooperation even if response is in conflict with the user’s goals • Aids in developing mixed-initiative • Goal-driven actions  Speaker Led Initiative • Obligation-driven actions  Other Led Initiative

  21. Mutual Belief and Grounding • Conversational agents do not act in isolation • Mental states should account for: • Private Beliefs • Shared Beliefs • In TRAINS • Shared Belief needed for: • Modeling the domain-plan under-construction • Common understanding

  22. Mutual Belief and Grounding • Extended Conversation Acts

  23. The TRAINS Approach • Attempts to capture the processes underlying dialogue via: • Speech acts • Discourse Obligations • Mutual Belief, Grounding • Potentially Rigid • Rules and logic handcrafted

  24. Dialogue as a Markov Decision Process “When one admits that nothing is certain, one must, I think, also admit that some things are much more nearly certain than others.” – Bertrand Russell

  25. Flexible Dialogue • Qualities of robust dialogue • Flexible conversation flow • Adapted to users’ preferences / skill levels • Resilient to errors in understanding • The dialogue author’s dilemma: • Robustness vs. Effort: • Other Issues: • Noisy Channels: ASR, NLU • Evaluation • What is an optimal decision?

  26. Dialogue with uncertainty • Markov Decision Process (MDP) • Probabilistic Framework • Ability to model planning and decision-making over time • Based on the Markov Assumption: • Future states depend only on the current state • Future states independent of other past states

  27. Markov Decision Processes • Markov chains with choice! ✓ +

  28. Markov Decision Processes • Agent based process defined by 4-tuple: • S: A set of states describing the agent’s world • A: A set of actions the agent may take • T: A set of transition probabilities: • Pa(s,s’) =P(s’|s,a) = P(st+1|st, at) • R: A set of rewards where r(s,a) is the expected, immediate reward the agent receives for taking action a when in state s.

  29. Markov Decision Processes • Policy Function π(s) • Mapping of states to actions • Optimal policy π*(s) yields highest possible cumulative reward • MDP with a fixed policy is a Markov Chain • Rewards • Cumulative Reward

  30. Solving an MDP • Approaches: • Value Iteration, Policy Iteration, Q-Learning • Ideally: • Encode state space with relevant features and rewards • Compute state transition and reward probabilities directly from a corpus of annotated dialogues • In Practice: • Reduce state space and do random exploration • Simulate a user and produce a corpus

  31. Reinforcement Learning for Dialogue Policy Design • NJFun System [Singh, et al 2002]

  32. NJFun Sample Dialogue • S1: Welcome to NJFun. How may I help you? • U1: I’d like to find um winetasting in Lambertville in the morning. (ASR: I’d like to find out wineries the in the Lambertville in the morning) • S2: Did you say you are interested in Lambertville? • U2: Yes. • S3: Did you say you want to go in the morning? • U3: Yes. • S4: I found a winery near Lambertville that is open in the morning. It is the Poor Richard’s Winery in Lambertville. That you for using the system. Please give me feedback by saying ‘good’, ‘so-so’, or ‘bad’. • U4: Good. • S5: Goodbye and have a nice day.

  33. NJFun – State Spaces • Full state space – encode everything • State explosion quickly makes optimization intractable • State-space Estimator • Encode sufficient information to learn good decisions

  34. NJFun – State Space Estimator • Limited policy optimization for two types of decisions • Initiative – Direct vs. Open Ended • System Initiative: “Please say the name of the town or city that you are interested in.” • User Initiative: “Please give me more information.” • Confirmation – Verify or Assume • “Did you say you are interested in <location>?

  35. NJFun State Features & Values

  36. State Space Estimator • Features yield 62 possible dialogue states • 42 Choice States each with 2 actions per state • Confirm/Not confirm • System/User initiative • In total 242 unique dialogue trajectories

  37. Finding an Optimal Policy • Gathering Training Data • New system built with randomized dialogue policy • Deployed to 54 users each assigned 6 tasks • 311 dialogues in total • Reward Measure • Binary task completion • +1: Dialogues that queried for exact set of attributes (activity type, location, time of day, etc…) • -1: Otherwise • Reinforcement Learning

  38. Finding an Optimal Policy • RL Learned Policy: • Initiative • Begin with user initiative • Back off to mixed or system initiative • Confirmation • In general confirm at lower confidence ASR values • Other features describe more complex interventions

  39. Evaluating the Optimal Policy • System with optimal policy tested on additional 21 users • 124 test dialogues • Did not significantly perform baseline on binary completion measure (p =0.059) • Statistically significant improvement using weak completion and ASR measures

  40. Limited Observability • MDPs assume the world is fully observable • However: • Not all errors or states are directly observable • Undetected errors may propagate • Evidence may not indicate error

  41. Limited Observability

  42. Partially Observable Markov Decision Processes (POMDPS) • Intuition • Maintain parallel hypothesis of what was said • Backpedal or switch strategies when a hypothesis becomes sufficiently false

  43. POMDP Example System / User / [ASR] POMDP belief state Traditional Method Initial State Order: { size: <null> } b S M L S: How can I help you? U: A small pepperoni pizza. [A small pepperoni pizza.] Order: { size: small } b S M L S: Ok, what toppings? U: A small pepperoni [A small pepperoni] Order: { size: small } b S M L S: And why type of crust? U: Uh just normal [large] normal Order: { size: large[?] } b S M L

  44. A comparison of Markov Models Table courtesy of http://www.cassandra.org/pomdp/pomdp-faq.shtml

  45. POMDPs • Extends the MDP Model • O: Set of observations agent can receive about the world • Z: Observation Probabilities • b(s): Belief state, probability of being in state s • Not in fixed state, instead maintains a probability distribution over all possible states

  46. POMDPs • Belief Monitoring • Shifting probability mass to match observations • Optimal action depends only on the agent’s current belief state

  47. POMDPs Influence Diagram R: Reward A: Action S: State O: Observation

  48. POMDPs for Spoken Dialogue Systems • SDS-POMDP [Williams and Young 2007] • Claim: POMDPs perform better for SDS because • Maintain parallel dialogue state • Can incorporate ASR confidence scores directly in the belief state update

  49. SDS-POMDP Architecture

  50. SDS-POMDP Components

More Related