Uncertainty, Action, and Interaction Eric Horvitz Microsoft Research May 2002

Uncertainty, Action, and InteractionEric HorvitzMicrosoft ResearchMay 2002

User Automation Toward Mixed-Initiative User Interfaces Designs that assume from the ground up that user may guide, collaborate with automated service to achieve desired results

Principles of Mixed-Initiative Interaction • Endow system with ability to infer the likelihood of a user’s goals, intentions • Attempt to scope precision of action to match goals and uncertainties • Determine the expected value of action given costs and benefits of action • Consider status of a user’s attention in timing of action • Allow for dialog at appropriate times to resolve key ambiguities

Principles of Mixed-Initiative Interaction • Provide efficient means for agent–user collaboration to refine guesses • Allow efficient direct invocation and termination • Seek innovative designs that maximize benefit of service, minimize the cost of poor guesses • Allow for natural assumptions of shared memory of recent interactions • Continue to learn by observing

Automated Scoping and Precision of Service • Key goal: Provide the user with clear advance toward goals • Automated, flexible scoping of automated service to precision matching task uncertainty, context Prefer automation to do less, but do it correctly

Automated Reasoning about the Uncertainty of a User’s Goals • Automated reasoners must guess about a user’s goals and desire for services • Good guesses can be quite valuable …but guessing wrong can be costly • Even valuable automation can be distracting and steal user’s scarce attentional resources

Minimizing Cost of Guessing Wrong • Seek design innovation: Advice / assistance valuable when right, but errors with minimal low cost • Natural gestures for declining service • Avoid grabbing focus • Alternate channel overlay: NASA Vista display manager • Nondistracting, simple guessing: Vellum gridpoint guesses • More graceful interaction with potentially focused user • Better timing of services in sync with availability of attention

? *&(#))(@%+ %%$#*%$# *&%*&(^*^ Probability, Utility, & Mixed Initiative Interaction • Perspective for design • Specific functions, layering of componentry • Foundations of intelligence Infrastructure, fabric for UI innovation

Uncertainty and HCI • Meshing learning & reasoning with UI design Probabilities * Utility-directed action Infer likelihoods of key uncertainties, take ideal actions • User query • User activity • Content at focus • Data structures • User location • User profile • Vision, speech, sound

Critical Uncertainties • Beliefs & Intentions • What does a user believe? What are the user’s goals? • Attention • What is the user’s workload? What is a user attending to? What will a user attend to? What should a user attend to? • Preferences • What does the user like and dislike—and how much? • Initiative • What is the cost and benefit of interaction, interruption, intervention? • What is the right mix of user / system initiatives?

Lumière Project User’s Profile User’s Goals User’s Needs User Activity • Actions + Words  Goals Joint work with J. Breese, D. Heckerman, K. Rommelse, D. Hovel, et al.

Studies with Human Subjects

Challenges • Architectures for intelligent user interaction • Reasoning over time • Sensing activity from systems and applications • Integration of probabilistic information retrieval • Models of a user’s competencies over time

Learning Models Computation of Ideal UI Action Events Synthesis Uncertain Inference about User, World Ideal Actions Events New Perceptions, Interactions Control Big Picture

Profile Profile Inference about a User’s Time-Dependent Goals Profile Goalt-n Goalto Goalt-1 Ei,t-n Ej,t-n Ei,to Ei,t-1 Ej,to Ej,t-1 Time

Representing and Updating a Persistent “Competency Terrain” Competency Skill Catogories

Representing and Updating a Persistent “Competency Terrain” Competency User’s Skills

Sensing Context and Content • Toward a “peripheral nervous system” for sensing user activity • SDK with event abstraction language • Compiler for defining filters for user activity Time

Abstraction of Events Eve Event-Specification Language Event Source 1 Atomic Events Modeled Events Event Source 2 Time Event Source n

Bayesian Inference Overall Lumière Architecture Events Event Synthesis • Actions Time • Query Control System

Probability user desires assistance Lumiere Inference and Action

Initiative • User vs. system initiative • Allowing fluid collaboration via a mix of initiatives • Toward principles of mixed-initiative interaction • Projects: Lookout, DeepListener, Quartet Reasoning about initiative is a high-payoff opportunity area for HCI, Ubicomp, IUI

? • Critical decision: • Do nothing. • Ask? • Just do it? Initiative & Interaction: Lookout • Learning by watching • Costs-benefit analysis of initiative • Minimize disruption: Prefer doing less, but doing it correctly Joint work with Andy Jacobs

User Actions / Context Real-Time Probabilistic Inference Cost--Benefit Analysis UI / Service Learning and Real-Time Behavior in Lookout • Watch user’s behavior • Store cases, timing info • Learn model from data

Lookout in Handsfree Mode

Desired Undesired Service A: Computer takes action i u(A,D) u(A,D) Act A: No action i User’s Desire No act u(A,D) u(A,D) D: User desires action i D: User does not desire action i Preferences and Initiative • Expected utility as fundamental in decisions about services

No Action User rushed Action P* Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) u(A,D) u(A,D) 0.0 1.0 p(D|E)

User rushed Action u(A,D) Increase in Amount of Screen Real Estate Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) No Action No Action u(A,D) P* u(A,D) 0.0 1.0 p(D|E)

User rushed u(A,D) Increase in Amount of Screen Real Estate Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) No Action No Action u(A,D) u(A,D) Action P* u(A,D) u(A,D) 0.0 1.0 p(D|E)

Ask Action Engaging in Dialog about Initiative Expected value of engaging the user in dialogue 1.0 u(A,D) u(A,D) No Action u(A,D) P* u(A,D) 0.0 1.0 p(D|E)

Week Day Appt Varying Precision of Service Consider contributions across a spectrum of precision • Assume user will refine partial results • Under uncertainty, trade off reduced precision for higher accuracy

Timing of Initiative • Timing is critical: consider patterns of attention • Record length of message and dwell time before calendar invoked • Perform regression 10 8 6 4 Observed dwell before action (sec) 2 0 0 500 1000 1500 2000 2500 Length of original message (bytes)

Conversational Architectures Project • DeepListener • Bayesian Receptionist • Quartet

Question Why do people find it more difficult and frustrating to converse with a spoken dialog system than with a person? Interpreting spoken language abounds with uncertainty Several answers • Poor recognition of words • Meaning too difficult to capture • Lack of precise user models • Different social and personality dynamics

Intuitions • Despite uncertainty in human–human conversation people manage to understand each other quite well. • People consider the source of their uncertainties and pursue actions to resolve confusions. • Recognition • Language • Context, topic, meaning • Frank troubleshooting • Goal: Models and inference methods that seek mutual understanding under uncertainty given inescapably unreliable components.

Grounding • People resolve uncertainties through a process of grounding Process by which participants establish and maintain the mutual belief that their utterances have been understood well enough for current purposes -Clark & Schaefer, 1987

DeepListener • Utility-directed clarification dialog • Formal model of “understood well enough” • Development environment • Assessment tools • Focus: Spoken command and control systems

Stakes, Likelihoods, and Clarification Actions • Consider stakes of real-world action being considered Should I format your hard drive? Should I try to schedule that? Should I demolish the King Dome now? • Consider uncertainties • Consider expected utility of alternative “repair” actions • Costs and benefits of real-world action vs. alternative dialog repair actions

Approach • Infer likelihoods of alternative spoken intentions • Likelihoods of different spoken intentions given acoustics • Optionally condition on goals inferred by user model external to the speech system • Compute clarification or real-world actions with highest expected utility • Fuse multiple attempts with Bayesian model that considers confidences • Consider history of utterances within a session • No reason to start over at each turn! ..Leverage what was heard before

External User Model Decision Model Dialog or Domain-Level Action(t-1) Utility(t-1) Context Speaker’s Goal(t-1) User’s Spoken Intention(t-1) Content at Focus (t-1) User Actions(t-1) ASR Reliability Indicator(t-1) . . . ASR Candidate n Confidence(t-1) ASR Candidate 1 Confidence(t-1)

Dynamic Model for Reasoning Over Multiple Turns Dialog or Domain-Level Action(t-1) Dialog or Domain-Level Action(t) Utility(t-1) Utility(t) Context Context Speaker’s Goal(t-1) Speaker’s Goal(t) User’s Spoken Intention(t-1) User’s Spoken Intention(t) Content at Focus (t-1) Content at Focus (t) User Actions(t-1) User Actions(t) ASR Reliability Indicator(t-1) ASR Reliability Indicator(t-1) . . . . . . ASR Candidate n Confidence(t-1) ASR Candidate 1 Confidence(t-1) ASR Candidate n Confidence(t) ASR Candidate 1 Confidence(t)

Dialog Actions under Consideration Example: DeepListener for handling confirmation, negation • Perform real-world action (e.g., implode the King Dome now) • Ask for repetition to clarify • Note hesitation or reflection and try again • Note potential overhearingof noise and inquire • Note inattention of user and try to acquire user’s attention • Don’t perform action and just go away • Note problem with conversational interaction and attempt to troubleshoot

DeepListener: SDK and Real-Time Clarification Dialog System • Dynamic Bayesian network modeling and inference • MS command and control speech system • Backchannel animations: MS Agent

DeepListener: SDK and Real-Time Clarification Dialog System

Accruing Evidence Over Repeated Utterances

Uncertainty, Action, and Interaction Eric Horvitz Microsoft Research May 2002