Applications of Discourse Structure for Spoken Dialogue Systems

Applications of Discourse Structure for Spoken Dialogue Systems Diane Litman Department of Computer Science & Learning Research and Development Center University of Pittsburgh (Currently Leverhulme Visiting Professor, University of Edinburgh) Joint work with Mihai Rotaru, University of Pittsburgh

Spoken Dialogue Systems • Systems that interact with users via speech • Advantages • Naturalness • Efficiency • Eye and hands free • Domains • Information access[Raux et al., 2005; Rudnicky et al., 1999; Zue et al., 2000; Van den Bosch and Lendvai, 2005] • Tutoring [Graesser et al., 2001; Litman and Silliman, 2004; Pon-Barry et al., 2006] • Assistants, Troubleshooting & QA[Allen et al., 2001, 2006; Acomb et al., 2007, ROLAQUAD]

ITSPOKE • ITSPOKE (Intelligent Tutoring SPOKEn Dialogue System) [Litman and Silliman, 2004] • Speech-enabled version of the Why2-Atlas computer tutor [VanLehn, Jordan, Rose et al., 2002] • Domain: Qualitative physics • Sample ITSPOKE problem • Suppose a man is in a free-falling elevator and is holding his keys motionless right in front of his face. He then lets go. What will be the position of the keys relative to the man's face as time passes? Explain.

Before release After release Forces on the man Net force on the man Sample dialogue with ITSPOKE Dialogue Structure TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? ..............

Research problem • What is the utility of discourse structure for spoken dialogue systems?

Discourse structure • Discourse – group of utterances • Monologue • Dialogue • Discourse structure • Grosz & Sidner theory [Grosz and Sidner, 1986] • Linguistic structure • Discourse segments • Intentional structure • Discourse segment purpose/intention • Discourse segment hierarchy • Attentional state

Discourse segment hierarchy Discourse segments Intention/purpose structure Solution walkthrough TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? .............. Two time frames: before release, after release Before release Man’s velocity ? keys’ velocity After release Recipe: Forces  Net force  Acceleration  Velocity Man: Forces/acceleration Forces on the man Net force on the man …………. …………. ………….

Why discourse structure? • Useful for other NLP tasks: • Understand specific lexical and prosodic phenomena[Hirschberg and Nakatani, 1996; Levow, 2004; Passonneau and Litman, 1997] • Anaphoric expressions [Allen et al., 2001] • Natural language generation [Hovy, 1993] • Predictive/generative models of posture shifts [Cassell et al., 2001] • Useful for spoken dialogue systems? • 4 intuitions

…………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… Intuition 1 – Conditioning Student learned? Correctness: Incorrect Correct Correct Incorrect Correct Incorrect Incorrect Correct Correct Incorrect Incorrect Correct Correct Correct • It is more important to be correct at specific “places in the dialogue”. • Phenomena related to performance: • not uniformly important across the dialogue • have more weight at specific places in the dialogue. • Discourse structure can be used to define “places in the dialogue”

…………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… Intuition 2 – Discrimination Student that learned more Student that learned less Different discourse structure

…………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… Intuition 3 – Interaction Dialogue phenomena Certainty: Uncertain Certain Certain Neutral Neutral Uncertain Certain Neutral Neutral Certain Certain Neutral Certain Uncertain • Certainty is not uniformly distributed across the dialogue. • Dialogue phenomena: • not uniformly distributed across the dialogue • more frequent at specific places in the dialogue. • Discourse structure can be used to define “places in the dialogue”

Intuition 4 – Visual • A graphical representation of the discourse structure • Easier for users to follow the conversation • Preferred / learn more The Navigation Map

Intuition 1,2 Conditioning, Discrimination Intuition 3 Interaction Intuition 4 Visual Outline • System side applications • Discourse transitions – defining “places in the dialogue” • Performance analysis • Characterization of discourse phenomena • User side applications • The Navigation Map • Users’ perceived utility of the Navigation Map

…………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… “Places in the dialogue” • Requirements • Domain independent • Automatic • Approach: Discourse structure transitions • Relationship between current system turn and previous system turn – 6 labels • Ingredients • Discourse segment hierarchy • Transition labeling

Problem Essay Q1 Q2 Q3 Dialogue with ITSPOKE Discourse segment hierarchy • Automatically annotate the discourse segment hierarchy • Tutoring information authored in a hierarchical plan structure[VanLehn, Jordan, Rosé et al, 2002]

ESSAY SUBMISSION & ANALYSIS ITSPOKE behavior & Discourse structure annotation Similar automatic annotation possible in other dialogue managers(e.g. COLLAGEN [Rich and Sidner, 1998], RavenClaw [Bohus and Rudnicky, 2003]) Q1 Q2 Q5 Q3 Q4 Remediation subdialogue

Q1 Q2 Q5 Q3 Q4 ESSAY SUBMISSION & ANALYSIS Discourse structure transitions • Properties • Domain independent • Automatic • “Places in the dialogue” • Group turns by transition

Outline • System side applications • Discourse transitions – defining “places in the dialogue” • Performance analysis • Characterization of discourse phenomena • User side applications • The Navigation Map • Users’ perceived utility of the Navigation Map

Performance analysis • Understand where and why a Spoken Dialogue System fails or succeeds • Performance models • Performance metrics – e.g. user satisfaction • Interaction parameters – e.g. number of turns, speech recognition performance • PARADISE framework [Walker et al., 2000] Multivariate linearregression Performance metric Interaction parameters

Correlation Learning Interaction parameters PreTest PostTest Learning Performance analysis – tutoring • Tutoring domain • Performance metric = student learning • Interaction parameters • Correctness • Time on task • User affect (e.g. certainty) • # of hints, # of help requests • Models • Correlation with learning – e.g. [Chi et al., 2001] • PARADISE models [Forbes-Riley and Litman, 2006; Feng et al., 2006] Previous work makes limited use of • Context in which events occur • Dialogue patterns

…………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… Intuition 1 – Conditioning Student learned? Posttest – pretest Correctness: Incorrect Correct Correct Incorrect Correct Incorrect Incorrect Correct Correct Incorrect Incorrect Correct Correct Correct • Correctness overallversusCorrectness after discourse transitions • It is more important to be correct at specific “places in the dialogue”. • Correctness overallversusCorrectness at specific places in the dialogue Push

…………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… Intuition 2 – Discrimination Student that learned more Student that learned less Push Push Push Advance Trajectories 2 consecutive transitions Different discourse structure

Experimental setup - corpus • Corpus - ITSPOKE • 20 students, 5 problems per student • 100 dialogues, 2334 student turns • Annotations • Correctness (manual) • “Perfect” recognition • “Perfect” understanding • Discourse structure transitions (automatic)

Experimental setup - parameters Correctness parameters Counts (#) and percentages (%) for each correctness value per student (e.g. C, PC %) • Comparisons • Correctness overallversusCorrectness after specific discourse transitions • Discourse structure patterns for low learnersversusDiscourse structure patterns for high learners I1 - conditioning I2 - discrimination Transition – correctness parameters Counts (#) and percentages (%) for each transition–correctness value per student (e.g. PopUp–C, Push–UA %)Relative percentage (%rel) (e.g. PopUp–I %rel) Transition – transition parameters Counts (#), percentages (%) and relative percentages (% rel) for each transition–transition value per student (e.g. Push-Push)

Experimental setup • Methodology • Correlations between parameters and learning • Partial Pearson correlation with PostTest controlling for PreTest • Experiment 1 - conditioning • Correctness parametersversusTransition – correctness parameters • Experiment 2 - discrimination • Transition – transition parameters

Results – Experiment 1 (a) • Correctness parameters • No trend/significant correlations • Correctness out of context not very informative for modeling student performance

Q1 Q2 Q3 Q2.1 Q2.2 Results – Experiment 1 (b) • Transition – correctness parameters Correctness • PopUp–Correct, PopUp–Incorrect • Interpretation: Capture successful learning events or failed learning opportunities • Generalizes across corpora • ITSPOKE modification: engage in an additional remediation dialogue

Results – Experiment 1 (c) • Other informative transition-correctness parameters • E.g. PopUpAdv-Correct, NewTopLevel-Incorrect, Advance-Correct • Intuition 1 conditioning - verified • Correctness overall < Correctness after discourse transitions

…………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… …………… Experiment 2 - discrimination Student that learned more Student that learned less Push Push Push Advance Trajectories length 2 Transition-transition parameters Different discourse structure

Results – Experiment 2 • Transition – Transition parameters Q1 Q2 Q3 Q2.1 Q2.2 • Push–Push • Interpretation: system uncovers potential major knowledge gaps • More specific than Push–Incorrect Q2.1.1 Q2.1.2 • Transition – Transition parameters : informative • Overlaps with transition-correctness but offer additional insights • Intuition 2 discrimination - verified

Related work • Most of work ignores discourse structure (e.g. [Möller, 2005; Walker, 2000]) • DATE dialogue act annotation [Walker, 2001] • Identify certain types of discourse segments • Task model: get date, get time, reserve hotel, etc • Compute size of each type of discourse segment • Differences • Domain-dependent • Ignores the discourse structure hierarchy • Does not condition on metrics on discourse structure information • Does not use structure parameters

Conclusions – performance analysis • Discourse structure useful for performance analysis [EMNLP 2006] • Parameters derived from discourse structure transitions • Transition – correctness (1st intuition - conditioning) • Transition – transition (2nd intuition - discrimination) • Informative parameters have intuitive interpretations • ITSPOKE modifications • Monitor for PopUp-Incorrect (failed learning opportunity) • Provide additional tutoring • User study • Experiments with certainty – similar results • Performance models (>=2 parameters) • Parameters that use certainty improve the quality and generality of performance models [UMUAI 2007]

…………… …………… …………… …………………………………………………… ………………………… …………… …………… …………… System turn timeline …………… User turn …………… …………… ………………………… ………………………… ………………………… …………… …………… …………… …………… …………… Intuition 3 – Interaction Dialogue phenomena 1 0 0 1 0 1 1 1 0 0 1 1 0 1 • Dialogue phenomena not uniformly distributed across the dialogue • Dependencies between • Discourse transitions • Dialogue phenomena • User affect - Uncertainty • Speech Recognition Problems Transition χ2 test ? Phenomena

…………………………………………………… ………………………… System turn timeline User turn ………………………… ………………………… ………………………… Results • Significant dependencies • Transition – Uncertainty [NAACL 2007] • E.g. Increased uncertainty after Push, PopUpAdv • Transition – Speech Recognition Problems (SRP) [Interspeech 2006] • E.g. Increase SRP after Push, PopUp • Intuition 3 interaction - validated Transition Transition SRP Uncertainty

Intuition 4 – Visual • A graphical representation of the discourse structure • Easier for users to follow the conversation • Preferred / learn more The Navigation Map (NM)

Issues in complex domains TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same TUTOR2: Yes. So while he is holding his keys they both have the same velocity. Now let's see what happens when he releases them. To do so we will first find the individual forces exerted on the man and his keys, then both net forces, both accelerations, and finally we will describe and compare the velocities. So what are the forces exerted on the man after he releases his keys? Please, specify their directions (e.g., vertically up). USER2: vertically down gravity TUTOR3: Good. Because there is just one force, gravity's, it is trivial to determine the NET force (i.e., the vector sum of all forces). So what is the direction of the NET force on the person? .............. Issues • Increased task complexity • User has limited task knowledge • Longer system turns Similar issues in other complex domain dialogue systems (troubleshooting, assistants)

Audio channel Why the NM? • Implications for users Information Information System User Information Visual channel [Mousavi et al., 1995] What to communicate over the visual channel?

What to communicate? • Current ITSPOKE interface • Dialogue history • Animated talking heads [Graesser et al., 2003] • More important to communicate • Purpose of the current topic • How the topic relates to the overall discussion Digested view Set up expectations Facilitates integration Discourse Structure Discourse segment intention Discourse segment hierarchy

The Navigation Map (NM) • The Navigation Map (NM) – dynamic graphical representation of: • Discourse segment purpose/intention • Discourse segment hierarchy • Additional features • Information highlight • Limited horizon • Correct answers • Auto-collapse

TUTOR1: To analyze this problem we will first describe the motion of the person and his keys while he is holding them. Then we will look at the motion of the person and his keys after he lets go of them. Let's begin by looking at the motion of the man and his keys while he is holding them. How does his velocity compare to that of his keys? USER1: it’s the same Information highlight Correct answers Auto-collapse Limited horizon Manually annotate a superset of the automatic annotation • Discourse segments segmentation • Annotate purpose/intention • Annotate hierarchy

User experiment • Intuition: Easier for users to follow the conversation with the NM • If true then: • Users should prefer the version with the NM (perceived utility) • Users should learn better with the NM (objective utility) • User experiment - user’s perception of the NM presence • Hypothesis: Users will rate the NM version better

Experimental procedure Problem 1 Problem 2 NM noNM NMSurvey Read Pretest Posttest Interview Questionnaire Questionnaire noNM NM S condition F condition Differences due to NM Experimental design • Within-subjects design • 1 problem with the NM; 1 without the NM (noNM) • Rate tutor after each problem • 16 questions, 1 (Strongly Disagree) – 5 (Strongly Agree) scale • Two conditions (to account for order and problem) • F (First) : 1st problem NM; 2nd problem noNM • S (Second) : 1st problem noNM; 2nd problem NM

Experimental design (2) • ITSPOKE dialogue history was disabled • Compare Audio-Only versus Audio+Visual (NM) NM noNM

Results – subjective metrics (1) • Collected corpus • 28 users: 13 First condition, 15 Second condition • Balanced for gender • Significant difference between pretest and posttest • Questionnaire analysis • Repeated measure ANOVA with one between subjects factor • Within-subjects factor : NM Presence (NMPres) • Between-subjects factor : Condition (Cond) • Post-hoc tests

Results – subjective metrics (2) • NM trend/significant effects on system perception during the dialogue: Rating scale 1 - Strongly Disagree ……. 5 - Strongly Agree

Results – subjective metrics (3) • NM trend/significant effects on overall system perception

Results – subjective metrics (4) • 24 out of 28 preferred NM over noNM • 4 liked noNM (2 per condition) • Divided attention problem • NM changing too fast • NM survey • 75-86% of users agreed (4) or strongly agreed (5) that NM helped them: • Follow the dialogue • Learn • Concentrate • Update essay • Open question interview • NM as a structured note taker • Would NM for additional instruction after the dialogue

Applications of Discourse Structure for Spoken Dialogue Systems

Applications of Discourse Structure for Spoken Dialogue Systems

Presentation Transcript

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems: System Overview

Spoken Discourse

Spoken Dialogue Systems

Spoken Dialogue Systems A Tutorial

Discourse and Dialogue Processing in Spoken Intelligent Tutoring Systems

Evaluating Spoken Dialogue Systems

Learning Optimal Strategies for Spoken Dialogue Systems

Discourse Annotation for Improving Spoken Dialogue Systems

Spoken Dialogue Systems

User Simulation for Spoken Dialogue Systems

Discourse Annotation for Improving Spoken Dialogue Systems

Spoken Dialogue Systems

Components of Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems