430 likes | 548 Vues
MITRE Dialog Management Workshop – a review. Dan Bohus Dialogs on Dialogs reading group CMU, November 2003. The Workshop. MITRE Dialog Workshop @ MITRE, Bedford/Boston October 27-28, 2003 Idea Bring together researchers working on dialog management Give them a homework
E N D
MITRE Dialog Management Workshop – a review Dan Bohus Dialogs on Dialogs reading group CMU, November 2003
The Workshop • MITRE Dialog Workshop • @ MITRE, Bedford/Boston • October 27-28, 2003 • Idea • Bring together researchers working on dialog management • Give them a homework • Adapt you dialog manager to a medical diagnosis domain (details in a sec) • Discuss, compare, learn MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
The Homework • Implement a dialog system for the medical diagnosis domain • Task left open-ended (diagnosis, tutoring, etc) • No speech, just text in and out • Backend provided backend.doc • Java version and web-based interface version • 3 diseases: malaria, coccidioidomycosis, another one • List of symptoms: headache, nausea, muscle pain, etc. • Decision tree involving symptoms and tests (fever, blood tests, travel patterns, etc) • Small enough to presumably not be lots of work, but large enough to allow illustration of functionalities, and provide some skeleton to the discussions… MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Participants • MITRE (Carl Burke et al) MiDiKi • Gothenburg (Staffan Larsson) GoDiS (TRINDIKit) • USC ICT (David Traum) ICT Dialogue Manager • NTT/CMU (Matthias Denecke) Ariadne • CMU (Dan, Alex) RavenClaw • Ames (Beth-Ann Hockey) NASA Dialogue Manager • DFKI (Norbert Reithinger) DFKI Dialogue Manager • MERL (Candy Sidner, Charles Rich) COLLAGEN … and others invited but not present MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS GoDiS MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS • TRINDIKit – information state update dialogue management toolkit • Information state • Private: dialog plan, beliefs, agenda (short term goals) • Shared: established facts, QUD, last utterance information • Dialog moves • Update rules • GoDiS: dialog management system implemented in TRINDIKit, handing: • information oriented dialogue • action oriented dialogue MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
control DME input inter- pret update select gene- rate output • TIS • DEVICES LEXICON DOMAIN backend interface lexicon domain knowledge TRINDIKit / GoDiS architecture Dialog plansOntology Connection to Java Backend MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS: Task Representation • Plans; propositional logic • Dialogue plans for dealing with diagnosis (issues opened at dialogue start) • ?x.disease(x): ”which disease is diagnosed?” • ?confirmed_by_interview: ”Is the diagnosis confirmed by additional information?” • ?confirmed_by_tests: ”Is the diagnosis confirmed by medical tests?” • Additional plans • ?x.info(x): ”What information is there about a given disease?” • ?x.treatment(x): ”What treatment is there for a given disease?” MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS: Alternate Tasks • User-driven dialogue (implemented) • Not load issues when resetting; user has to raise all issues • User can ask system to • Provide a diagnosis • Confirm whether user has given disease • Decision trees as dialogue plans • Move backend knowledge into dialogue plans • Information conversion could be done automatically • Separate genre: expert system dialogue • Add special purpose update rules • Dynamic dialogue planning by expert MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS: Highlights / Lowlights • Highlights: • Reuse, you get for free: • Grounding • Accomodation / plan recognition • Multiple simultaneous issues & info sharing • High-level abstraction for dialog plans • Rapid prototyping • Lowlights • Not used in this type of domain so far, so not entirely straight-forward (update rule changes) • Dynamic dialog plans (backend decides) MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS RavenClaw MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
RavenClaw • Captures all domain-specific dialog (task) logic with a hierarchical description • The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine • Manages dialog by executing the dialog task specification • Provides domain-independent conversational strategies MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
have_fever general_feeling diagnostic chart RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
have_fever general_feeling diagnostic chart RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
have_fever general_feeling diagnostic chart RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Welcome Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
have_fever general_feeling diagnostic chart RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
diagnostic chart general_feeling have_fever headache RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… LoadSymptoms Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
diagnostic chart general_feeling have_fever headache RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
diagnostic chart general_feeling have_fever headache RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… GeneralFeel Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
headache chart have_fever diagnostic general_feeling RavenClaw Architecture Madeleine I:Welcome E:LoadSymptoms GeneralFeel GeneralFeel Diagnose R:HowAreYou? I:Glad I:Glad I:Sorry I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… general_feeling: [good], [bad], [soso] How are you feeling today? general_feeling: [good], [bad], [soso] Not so good, I think I have a fever general_feeling: [good], [bad], [soso]have_fever: [fever]. ![yes], ![no]headache: [headache], ![yes], ![no]cough: [cough], ![yes], ![no]… … [soso](not so good)[fever](I think I have a fever) HowAreYou GeneralFeel GeneralFeel Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Illustrated Features • Dynamic generation of dialog task structure • Symptoms loaded from backend, appropriate structures to “talk about them” created on-the-fly • New symptoms – no DM changes • Dynamic dialog control policy • The order in which symptoms are addressed is controlled by the backend • Conversational skills MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Illustrated Features • Dynamic generation of dialog task structure • Symptoms loaded from backend, appropriate structures to “talk about them” created on-the-fly • New symptoms – no DM changes • Dynamic dialog control policy • The order in which symptoms are addressed is controlled by the backend • Conversational skills MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Backend Decision Tree headache have_fever chart general_feeling diagnostic Dynamic Dialog Control … Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you today? Not so good, I think I have a headacheSorry to hear you’re not feeling so good,Tell me more about your symptoms… Do you have abdominal pain? Diagnose Madeleine MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Illustrated Features • Dynamic generation of dialog task structure • Symptoms loaded from backend, appropriate structures to “talk about them” created on-the-fly • New symptoms – no DM changes • Dynamic dialog control policy • The order in which symptoms are addressed is controlled by the backend • Conversational skills MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Conversational Skills • Corresponding agencies added automatically to the dialog task tree • Help • What Can I Say? • Repeat • Suspend / Resume • Start Over • Timeout handling (not illustrated) • Still need all the language generation prompts and grammar, but some of those are develop-once, too MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
RavenClaw Conclusion • Highlights • Set task posed no challenges to the framework • Easy to implement • Dynamic dialog structure and control • Automatic use of domain-independent conversational skills • Lowlights? • Toolkit perspective: how easy would it be for someone else to build it? • Asynchronous behaviors? (timing) • Couple of bugs / fixes (or is that a highlight?) MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS Collagen MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Collaborative Interface Agent * focus stack plan tree Collagen communicate observe observe interact interact COLLAGEN MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
COLLAGEN Systems • air travel planning • email reading and responding (w. IBM/Lotus) • GUI design tool operation • car navigation system operation • airport landing path planning (w. MITRE) • gas turbine operator training (w. USC/ISI) • personal video recorder operation • programmable thermostat operation (with Delft U.) • multi-modal web-based form-filling MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Java Implementation SharedPlan Discourse Theory Intentional purposes, contributes focus stack focus spaces, focus stack segments, lexical items Linguistic Attentional purpose tree (Grosz, Sidner, Kraus, Lochbaum 1974-1998) Collagen: Theory and Implementation MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
(fixing an air compressor, E = expert, A = apprentice) E: Replace the pump and belt please. A: Ok, I found a belt in the back. A: Is that where it should be? A: [removes belt] A: It’s done. E: Now remove the pump. … E: First you have to remove the flywheel. … E: Now take the pump off the base plate. A: Already did. replace belt replace pump and belt replace pump (Grosz, 1974) Collagen: Discourse Segments and Purposes MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Focus Stack Purpose Tree replace pump and belt current focus space replacebelt replace pump and belt replace pump replace belt E: Replace the pump and belt please. A: Ok, I found a belt in the back. A: Is that where it should be? A: [removes belt] A: It’s done replace pump and belt replace belt (Grosz & Sidner, 1986) Discourse state representation MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
focus stack purpose tree • directly achieves the purpose • is a step in the plan for the purpose * • identifies the recipeused to achieve the purpose • identifieswho should perform the purpose or a step in the plan • identifies a parameter of the purpose or a step in the plan An act contributes to the purpose of a segment if it: * does not include recursive plan recognition (see later topic) Discourse interpretation algorithm The current (communication or manipulation) act either: • starts a new segment/focus space (push) • ends the current segment/focus space (pop) • continues (contributes to) the current segment/... (add) (Lochbaum, 1998) MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
COLLAGEN … my take • Separation of task from dialog/discourse engine • Recipes / Domain plans / Task tree • Full-blown HTN • Hierarchical • Preconditions (constraints) • Effects • Completion / failure • Live nodes • Stack to keep track of focus and discourse structure • Tree explicitly contains agent and user nodes • Formalized / descriptive recipe specs (actually Java underneath), with procedure overwrites… MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
GoDiS Themes … MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: Task Representation • Task representation • Separation of task representation from dialog engine • High-level representations of task • Descriptive rather than procedural • Procedural will be unavoidable for complex tasks • Expressive power • GoDiS, RavenClaw, Collagen: plan based representations of task MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: Task/Domain/Gendre • The notion of dialog gendre • Tutoring • Diagnosis • Information Access • Where to fold it in a dialog manager? • GoDiS: update/select rules • Ariadne: plugins • RavenClaw: collapsed with task • How clear is that separation: task vs. gendre? MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: Development time • Systems took on the order of 3-5 days to develop • Significant effort in the backend connection • Some sites shortcut it • Significant effort in grammar/language generation development • Some sites shortcut it • Everyone that had an implementation: “fixed a couple of bugs, but no major changes required” MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: Development tools • Regression testing (GoDiS) • Systems are complex. Change something in a dialog management framework, can you prove that it did not screw up things that used to work? • System-wise, very intractable • Component-wise, maybe: i.e. DM with DM inputs/outputs • System diagnosis / log visualization tools (Collagen) MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: Timing • (Micro)timing • unaddressed • Turn-taking models • in general, very rudimentary • Asynchronous behaviors • Could be accomplished, but no-one seemed to have it • Multi-party conversation • unaddressed MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: the important problems • Different people have different views of what those are: • Plan / Intention recognition • Reference resolution • Backup in complex systems • Tense problems • Negations • Grounding; error prevention / recovery MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: Reasoning • Dialog Managers vs Backends • Where to draw the line? • Who does the reasoning? • Can we avoid duplicating it? • How rich is the interaction between them? • Dialog systems - use language to act in a domain, so they are generally strongly tied • Basic set of conversational skills can be identified • Drawing that line is still an “art”, no general agreement or solutions exist MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes
Themes: Science of Dialog? • How much science do we have? • Theory vs. experiment • Interesting Collagen / RavenClaw similarities • Representation or not? • GUI analogy • Do we have the checkboxes and radio-buttons? MITRE Dialog Management Workshop workshop: godis : ravenclaw : collagen : themes