belief updating in spoken dialog systems

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Alex Rudnicky, CMU Eric Horvitz & Tim Paek, MSR Antoine Raux

spoken dialog systems • use natural language processing technology • engage in a goal-oriented conversation • research community • information access, command-and-control • personal assistants, taskable agents, tutoring systems • industry • simpler automated phone systems

the problem … misunderstandings lead to interaction breakdowns

more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT same PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [flight destination mr WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

understanding errors • stem from the speech recognition process • difficult operating conditions • typical word-error-rates • 20-30% • up to 50% for non-natives • strong negative impact on interactions

pathways to increased robustness • gracefully handle errors through conversation • improve recognition • detectthe problems • develop a set of recovery strategies • know how to choose between them (policy)

/ 0.07 / 0.72 / 0.65 confidence scores / 0.35 / 0.58 / 0.28 guarding against misunderstandings S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND]

guarding against misunderstandings S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] / 0.07 / 0.72 arrival = {Seoul / 0.65} / 0.65 confidence scores confirmation actions • reject • explicit confirmationDid you say Seoul? • implicit confirmationtraveling to Seoul … What day did you need to travel? • accept / 0.35 / 0.58 / 0.28

departure = { … } arrival = { … } departure = { … } arrival = { … } departure = { … } f departure = { … } departure = { … } arrival = { … } departure = { … } arrival = { … } belief updating S: […] What city are you leaving from ? U: [for minute since HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] / 0.07 / 0.72 arrival = {Seoul / 0.65} / 0.65 confidence scores / 0.35 arrival = ? / 0.58 / 0.28

arrival = {Seoul / 0.65} f / 0.35 arrival = ? belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R)

outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results: global performance : conclusion

? detecting misunderstandings and corrections • confidence annotation • word-level [Cox, Chase, Bansal, Ravinshankar, etc] • semantic confidence annotation [Walker, San-Segundo, Bohus, etc] • correction detection [Litman, Swerts, Hirschberg, Krahmer, Levow] • detect when the user corrects the system arrival = {Seoul / 0.65} S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] Conf=0.35 Corr=0.47 arrival = ? related work : proposed approach : data : experiments and results: global performance : conclusion

current solutions for tracking beliefs • most systems only track single values • new values overwrite old values • use simple heuristic rules • explicit confirmation S: did you say you wanted to fly to Seoul? • yes → trust hypothesis • no → delete hypothesis • “other” → non-understanding • implicit confirmation S: traveling to Seoul … what day did you need to travel? • rely on new values overwriting old values related work : proposed approach : data : experiments and results: global performance : conclusion

belief updating: problem statement S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f / 0.35 arrival = ? • given • an initial belief Binitial(C) over concept C • a system action SA(C) • a user response R • construct an updated belief • Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

YUMA, AZ ALPINE, TX ALPENA, MI ALBANY, NY ABILENE, TX ALLIANCE, NE ABERDEEN, TX ALLAKAKET, AK ALLENTOWN, PA ALEXANDRIA, LA ALBUQUERQUE, NM belief representation Bupdated(C)← f(Binitial(C), SA(C), R) • most accurate representation • probability distribution over the set of possible values departure • however • system “hears” only a small number of conflicting values for a concept throughout a session • max = 3 conflicting values heard • only in 7% of cases, more than 1 value heard related work : proposed approach : data : experiments and results: global performance : conclusion

departure_city [k=3, m=2, n=1] Austin Houston other Boston S: Did you say you were flying from Austin? U: [NO ASPEN] Boston Austin other Ø Aspen Boston Aspen other belief representation • compressed belief representation • khypotheses + other • dynamically add and drop hypotheses • remember m hypotheses, add n new ones (m+n=k) Bupdated(C)← f(Binitial(C), SA(C), R) S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] • B…(C) is a multinomial variable of degree k+1 related work : proposed approach : data : experiments and results: global performance : conclusion

system action Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

user response Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

approach • multinomial regression problem • multinomial generalized linear model • sample efficient • stepwise approach • feature selection • BIC to control over-fitting • one separate model for each system action • Bupdated(C) ← fSA(C)(Binitial(C), R) Bupdated(C) ← f(Binitial(C), SA(C), R) related work : proposed approach : data : experiments and results: global performance : conclusion

data • collected with RoomLine • a phone-based mixed-initiative spoken dialog system • conference room reservation • explicit and implicit confirmations • simple heuristic rules for belief updating • explicit confirm: yes / no • implicit confirm: new values overwrite old ones related work : proposed approach : data : experiments and results: global performance : conclusion

corpus • user study • 46 participants (first-time users) • 10 scenario-based interactions each • corpus • 449 sessions, 8848 user turns • orthographically transcribed • manually annotated • misunderstandings • corrections • correct concept values related work : proposed approach : data : experiments and results: global performance : conclusion

outline • related work • proposed approach • data • experiments and results • effects on global performance • conclusion and future work related work : proposed approach : data : experiments and results : global performance : conclusion

models • k=2 + other (m=1, n=1) • k=3 + other (m=2, n=1) • k=4 + other (m=3, n=1) • full model • all features • basic model • all features except priors and confusability • runtime model • all features available at runtime related work : proposed approach : data : experiments and results : global performance : conclusion

baselines • initialbaseline • accuracy of system beliefs before the update • heuristicbaseline • accuracy of heuristic update rule used by the system • correctionbaseline • accuracy if we knew exactly when the user corrects the system related work : proposed approach : data : experiments and results : global performance : conclusion

implicit confirm 30.8 30.3 30% 30% 26.0 21.5 18.3 20% 20% 16.1 15.8 15.0 10% 10% 6.1 6.2 5.0 5.2 0% 0% i h BM FM RM c i h BM FM RM c request other 98.2 79.7 44.8 12% 45% 9.5 8.6 8% 30% 5.7 5.6 19.3 14.8 14.8 4% 15% 0% 0% i h BM FM RM i h BM FM RM results for k=2 hyps + other explicit confirm initial baseline (i) heuristic baseline (h) basic model (BM) full model (FM) runtime model (RM) correctionbaseline (c) related work : proposed approach : data : experiments and results : global performance : conclusion

a question remains … … does this really matter? related work : proposed approach : data : experiments and results : global performance : conclusion

a new user study … • implemented models in RavenClaw • 40 participants, first-time, non-native users • improvements more likely at high word-error-rates • 10 scenario-driven interactions each • between-subjects; 2 gender-balanced groups • control: RoomLine using heuristic update rules • treatment: RoomLine using runtime models related work : proposed approach : data : experiments and results: global performance : conclusion

78% 78% treatment control 64% 30% word error rate 16% word error rate effect on task success • logistic ANOVA on task success p=0.009 logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition 100% 80% probability of task success 60% 40% 20% 0% 0% 20% 40% 60% 80% 100% word error rate related work : proposed approach : data : experiments and results: global performance : conclusion

how about efficiency? • ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition • significant improvement • equivalent to 7.9% absolute reduction in word-error p=0.0003 related work : proposed approach : data : experiments and results: global performance : conclusion

f summary U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul…What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago departure = { … } arrival = { … } / 0.72 / 0.65 arrival = {Seoul / 0.65} departure = { … } / 0.35 arrival = ? departure = { … } • approach for constructing accurate beliefs • integrate information across multiple turns • large gains in task success and efficiency related work : proposed approach : data : experiments and results: global performance : conclusion

other advantages • learns from data • tuned to the domain in which it operates • sample efficient / scalable • performs a local one-turn optimization • works independently on concepts • portable • decoupled from dialog task specification • no strong assumptions about dialog management related work : proposed approach : data : experiments and results: global performance : conclusion

future work • integrate information from n-best list • integrate other high-level knowledge • domain-specific constraints • inter-concept dependencies • unsupervised / implicit learning • domain-specificity related work : proposed approach : data : experiments and results: global performance : conclusion

thank you! questions …

improvements at different WER absolute improvement in task success word-error-rate

user study • 10 scenarios, fixed order • presented graphically (explained during briefing) • participants compensated per task success

informative features • priors and confusability • initial confidence scores • concept identity • barge-in • expectation match • repeated grammar slots

belief updating in spoken dialog systems

belief updating in spoken dialog systems

Presentation Transcript

SDC: The Spoken Dialog Challenge

Spoken dialog

Belief Updating in Spoken Dialog Systems

Spoken Dialog Systems

User Interactions in Spoken Dialog systems

Belief Systems

Belief Updating in Spoken Dialog Systems

Chapter 5 Belief Updating in Bayesian Networks

Research Challenges for Spoken Language Dialog Systems

Spoken Dialog System Architecture

Research Challenges for Spoken Language Dialog Systems

Review of Spoken Language Understanding in Dialog Systems

Belief Systems

Stochastic Language Generation for Spoken Dialog Systems

Belief Systems

Belief Updating in Spoken Dialog Systems

Belief Systems

Belief Updating in Spoken Dialog Systems