1 / 31

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces. Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217. problem.

reuben
Télécharger la présentation

A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Scalable Reinforcement Learning Approach to Error Handling in Spoken Language Interfaces Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

  2. problem spoken language interfaces lack robustness when faced with understanding errors.

  3. more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

  4. problem source • stems mostly from speech recognition • spans most domains and interaction types • exacerbated by operating conditions • spontaneous speech • medium / large vocabularies • large, varied, and changing user populations

  5. speech recognition impact • typical word-error-rates • 10-20% for natives (novice users) • 40% and above for non-native users • significant negative impact on performance[Walker, Sanders] task success word-error-rate

  6. approaches for increasing robustness • gracefully handle errors through interaction • fix recognition • detect the problems • develop a set of recovery strategies • know how to choose between them (policy) a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  7. outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  8. NON understanding MIS understanding non- and misunderstandings S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ……… a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  9. misunderstandings non-understandings detection strategies confidence threshold model policy implicit explicit 0 1 reject accept six not-so-easy pieces recognition or semantic confidence scores typically trivial [some exceptions may apply] explicit confirmation Did you say 10am? implicit confirmation Starting at 10am… until what time? accept, reject Sorry, I didn’t catch that … Can you repeat that? Can you rephrase that? You can say something like “at 10 a.m.” [MoveOn] Handcrafted heuristics first notify, then ask repeat, then give help, then give up a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  10. outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  11. Speech Recognition Speech Synthesis spoken dialog system architecture LanguageUnderstanding Dialog Manager Domain Back-end Language Generation a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  12. Speech Recognition Speech Synthesis reinforcement learning in dialog systems • debate over design choices • learn choices using reinforcement learning • agent interacting with an environment • noisy inputs • temporal / sequential aspect • task success / failure LanguageUnderstanding noisy semantic input Dialog Manager Domain Back-end actions (semantic output) Language Generation a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  13. NJFun • “Optimizing Dialog Management with Reinforcement Learning: Experiments with the NJFun System” [Singh, Litman, Kearns, Walker] • provides information about “fun things to do in New Jersey” • slot-filling dialog • type-of-activity • location • time • provide information from a database a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  14. NJFun as an MDP • define state-space • define action-space • define reward structure • collect data for training & learn policy • evaluate learned policy a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  15. NJFun as an MDP: state-space • internal system state: 14 variables • state for RL → vector of 7 variables • greet: has the system greeted the user • attribute: which attribute the system is currently querying • confidence: recognition confidence level (binned) • value: value has been obtained for current attribute • tries: how many times the current attribute was asked • grammar: non-restrictive or restrictive grammar was used • history: was there any trouble on previous attributes • 62 different states a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  16. NJFun as an MDP: actions & rewards • type of initiative (3 types) • system initiative • mixed initiative • user initiative • confirmation strategy (2 types) • explicit confirmation • no confirmation • resulting MDP has only 2 action choices / state • reward: binary task success a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  17. NJFun as an MDP: learning a policy • training data: 311 complete dialogs • collected using exploratory policy • learned the policy using value iteration • begin with user initiative • back-off to mixed or system initiative when re-asking for an attribute • specific type of back-off is different for different attributes • confirm when confidence is low a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  18. NJFun as an MDP: evaluation • evaluated policy on 124 testing dialogs • task success rate: 52% → 64% • weak task completion: 1.72 → 2.18 • subjective evaluation: no significant improvements, but move-to-the-mean effect • learned policy better than hand-crafted policies • comparatively evaluated policies on learned MDP a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  19. outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  20. challenge 1: scalability • contrast NJFun with RoomLine • conference room reservation and scheduling • mixed-initiative task-oriented interaction • system obtains list or rooms matching initial constraints • system negotiates with user to identify room that best matches their needs • 37 concepts (slots), 25 questions that can be asked • another example: LARRI • full-blown MDP is intractable • not clear how to do state-abstraction a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  21. challenge 2: reusability • underlying MDP is system-specific • MDP design still requires a lot of human expertise • new MDP for each system • new training & new evaluation • are we really saving time & expertise? • maybe we’re asking for too much? a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  22. addressing the scalability problem • approach 1: user models / simulations • costly to obtain real data → simulate • simplistic simulators [Eckert, Levin] • more complex, task-specific simulators [Scheffler & Young] • real-world evaluation becomes paramount • approach 2: value function approximation • data-driven state abstraction / state aggregation [Denecke] a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  23. outline • a closer look at the problem • RL in spoken dialog systems • current challenges • a proposed RL approach for error handling a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  24. Speech Recognition Speech Synthesis reinforcement learning in dialog systems • Focus RL only on the difficult decisions! LanguageUnderstanding semantic input Dialog Manager Domain Back-end actions / semantic output Language Generation a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  25. error handling decisions • domain-specific dialog control decisions task-decoupled approach • decouple • use reinforcement learning • use your favorite DM framework • advantages • reduces the size of the learning problem • favors reusability of learned policies • lessens system authoring effort a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  26. RoomLine user_name results query registered Login GetQuery GetResults DiscussResults Welcome GreetUser DateTime Location Properties AskRegistered AskName Network Projector Whiteboard registered: [No]-> false, [Yes] -> true registered: [No]-> false, [Yes] -> true user_name: [UserName] ExplicitConfirm Error Handling Decision Process registered: [No]-> false, [Yes] -> true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network] AskRegistered ErrorIndicators Login RoomLine Strategies Dialogue Stack Expectation Agenda RavenClaw Dialogue Task (Specification) Domain-Independent Dialogue Engine a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  27. Explicit Confirmation Topic-MDP No Action user_name registered Topic-MDP No Action Concept-MDP Concept-MDP Explicit Confirm No Action decision process architecture • Small-size models • Parameters can be tied across models • Accommodate dynamic task generation RoomLine Login Welcome GreetUser Gating Mechanism AskRegistered AskName • Favors reusability of policies • Initial policies can be easily handcrafted • Independence assumption a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  28. reward structure & learning • Rewards based on any dialogue performance metric • Atypical, multi-agent reinforcement learning setting Local rewards Global, post-gate rewards Reward Action Action Gating Mechanism Gating Mechanism Reward Reward Reward MDP MDP MDP MDP MDP MDP • Multiple, standard RL problems • Risk solving local problems, but not the global one a closer look : RL in spoken dialog systems : current challenges : RL for error handling

  29. conclusion • reinforcement learning – very appealing approach for dialog control • in practical systems, scalability is a big issue • how to leverage knowledge we have? • state-space design • solutions that account or handle sparse data • bounds on policies • hierarchical models

  30. thankyou!

  31. Structure of Individual MDPs • Concept MDPs • State-space: belief indicators • Action-space: concept scoped system actions ExplConf ExplConf ExplConf ImplConf ImplConf ImplConf LC MC HC NoAct NoAct NoAct NoAct 0 • Topic MDPs • State-space: non-understanding, dialogue-on-track indicators • Action-space: non-understanding actions, topic-level actions

More Related