What can humans do when faced with ASR errors?

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003

Question • We’re trying to build systems that can deal with a noisy recognition channel • Q: How good are humans are that? • More importantly, how do they do it? • What strategies do they use? • How do they decide which one to use when? • What kind of knowledge used in the process?

WOZ experiments • Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result (text in these cases) • Exploring Human Error Handling Strategies [Gabriel Skantze] • A Study of Human Dialogue Strategies in the Presence of Speech Recognition Errors [Teresa Zollo]

Domain/Task, Experiments • Problem-solving task • Wizard is guiding user through a campus • Wizard has detailed map • User has small fraction of map with their current surroundings • Experiments • 8 users, 8 operators, balanced male/female • 5 scenarios per user → 40 dialogs

WOZ / Experimental Setting • Wizard receives recognition results on a GUI • Not parsed (user plays parser also) • Confidence denoted by color intensity • Users know they are talking to a human • Normal wizard more costly • Hard to maintain subjects for longitudinal studies • Conflicting information on change in linguistic patterns when speaking to a machine vs. to a human • Operators are naïve, they are also subjects of the study

Results • 43% WER, 7.3% OOV • Manual labeling of operator understanding • Full understanding • Partial understanding • Non-understanding • Misunderstanding • Very few misunderstandings • Operators good at rejecting • Users thought they were almost always understood

Results (continued) • 3 main operator strategies (approx equally distributed) for dealing with non- and partial understandings: • Continuation of route description • Signal of non-understanding • Task-related question • PARADISE-likeregression indicatesstrategy 2 isinversely correlatedwith “how well do youthink you did?”

WOZ experiments • Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result • Exploring Human Error Handling Strategies [Gabriel Skantze] • A Study of Human Dialogue Strategies in the Presence of Speech Recognition Errors [Teresa Zollo]

Domain / Experiments • TRIPS-Pacifica: planning the evacuation of the fictitious island Pacifica • Construct a plan to transport all the civilians on Pacifica to Barnacle by 5 am so that they can be evacuated from the island (the play will be deployed at midnight) • + the road between Calypso and Ocean Beach is impassable • Only 7 dialogs (September ’99)

WOZ / Experimental Setting • Wizard assisted by GUI for quick information access and generating synthesized responses • Sphinx-2 (CMU), TrueTalk (Entropics) • Wizard receives string of words (paper does not mention confidence scores) • User debriefing questionnaire • Wizard annotates interaction transcript with knowledge sources used in decisions, etc…

Results • Small corpus • 7 dialogs • 348 utterances • Manually labeled misunderstandings • Overall WER: 30% • Looked at positive and negative feedback

Negative feedback • Request for full repetition: 33/80 • 24/33 cases users complied and repeated/rephrased • WH-replacement of missing or erroneous word: 12/80 • 8/12 cases users responded with the precise info • Attempt to salvage correct word: 20/80 • Possibly increase user satisfaction? • Similar responses to ask for repeat • Request for verification: 15/80 • 10/15 responded by explicit affirmations

What if we wanted to do these? • Request for full repetition: 33/80 • 24/33 cases users complied and repeated/rephrased • WH-replacement of missing or erroneous word: 12/80 • 8/12 cases users responded with the precise info • Attempt to salvage correct word: 20/80 • Possibly increase user satisfaction? • Similar responses to ask for repeat • Request for verification: 15/80 • 10/15 responded by explicit affirmations

More negative feedback results • Wizards gave negative feedback in 80 cases (35%) of the total 227 recognized incorrectly • Compensation for ASR: • Ignoring words that are not salient in the TRIPS domain • Hypothesizing correct words based on phonetic similarity • Q: So, what does that say? Better parsing?

Positive feedback • Using an acknowledgement term (okay, right) • Simple response to question (next relevant contribution) • Conversational/social response i.e. greetings/thanks • Providing a next unsolicited relevant contribution • Clarifying or correcting • Paraphrasing

Conclusions • Observations consistent with theoretical grounding models (Clark et al) • Negative feedback only when really needed • Unless ASR is perfect (and sometimes even then), wizards give explicit indications of their understanding

Discussion… • WOZ setting… • Wizard = Parser + Dialog Manager • Seems that humans can extract more info from text than current parsers • we need better, more robust parsers? • How about Wizard = Dialog Manager? • Domain choice • Skantze results make sense in chosen domain • How can such results hold across domains?

What can humans do when faced with ASR errors?