1.34k likes | 1.5k Vues
Uncertainty, Action, and Interaction Eric Horvitz Microsoft Research May 2002. User. Automation. Toward Mixed-Initiative User Interfaces. Designs that assume from the ground up that user may guide, collaborate with automated service to achieve desired results.
E N D
Uncertainty, Action, and InteractionEric HorvitzMicrosoft ResearchMay 2002
User Automation Toward Mixed-Initiative User Interfaces Designs that assume from the ground up that user may guide, collaborate with automated service to achieve desired results
Principles of Mixed-Initiative Interaction • Endow system with ability to infer the likelihood of a user’s goals, intentions • Attempt to scope precision of action to match goals and uncertainties • Determine the expected value of action given costs and benefits of action • Consider status of a user’s attention in timing of action • Allow for dialog at appropriate times to resolve key ambiguities
Principles of Mixed-Initiative Interaction • Provide efficient means for agent–user collaboration to refine guesses • Allow efficient direct invocation and termination • Seek innovative designs that maximize benefit of service, minimize the cost of poor guesses • Allow for natural assumptions of shared memory of recent interactions • Continue to learn by observing
Automated Scoping and Precision of Service • Key goal: Provide the user with clear advance toward goals • Automated, flexible scoping of automated service to precision matching task uncertainty, context Prefer automation to do less, but do it correctly
Automated Reasoning about the Uncertainty of a User’s Goals • Automated reasoners must guess about a user’s goals and desire for services • Good guesses can be quite valuable …but guessing wrong can be costly • Even valuable automation can be distracting and steal user’s scarce attentional resources
Minimizing Cost of Guessing Wrong • Seek design innovation: Advice / assistance valuable when right, but errors with minimal low cost • Natural gestures for declining service • Avoid grabbing focus • Alternate channel overlay: NASA Vista display manager • Nondistracting, simple guessing: Vellum gridpoint guesses • More graceful interaction with potentially focused user • Better timing of services in sync with availability of attention
? *&(#))(@%+ %%$#*%$# *&%*&(^*^ Probability, Utility, & Mixed Initiative Interaction • Perspective for design • Specific functions, layering of componentry • Foundations of intelligence Infrastructure, fabric for UI innovation
Uncertainty and HCI • Meshing learning & reasoning with UI design Probabilities * Utility-directed action Infer likelihoods of key uncertainties, take ideal actions • User query • User activity • Content at focus • Data structures • User location • User profile • Vision, speech, sound
Critical Uncertainties • Beliefs & Intentions • What does a user believe? What are the user’s goals? • Attention • What is the user’s workload? What is a user attending to? What will a user attend to? What should a user attend to? • Preferences • What does the user like and dislike—and how much? • Initiative • What is the cost and benefit of interaction, interruption, intervention? • What is the right mix of user / system initiatives?
Lumière Project User’s Profile User’s Goals User’s Needs User Activity • Actions + Words Goals Joint work with J. Breese, D. Heckerman, K. Rommelse, D. Hovel, et al.
Challenges • Architectures for intelligent user interaction • Reasoning over time • Sensing activity from systems and applications • Integration of probabilistic information retrieval • Models of a user’s competencies over time
Learning Models Computation of Ideal UI Action Events Synthesis Uncertain Inference about User, World Ideal Actions Events New Perceptions, Interactions Control Big Picture
Profile Profile Inference about a User’s Time-Dependent Goals Profile Goalt-n Goalto Goalt-1 Ei,t-n Ej,t-n Ei,to Ei,t-1 Ej,to Ej,t-1 Time
Representing and Updating a Persistent “Competency Terrain” Competency Skill Catogories
Representing and Updating a Persistent “Competency Terrain” Competency User’s Skills
Sensing Context and Content • Toward a “peripheral nervous system” for sensing user activity • SDK with event abstraction language • Compiler for defining filters for user activity Time
Abstraction of Events Eve Event-Specification Language Event Source 1 Atomic Events Modeled Events Event Source 2 Time Event Source n
Bayesian Inference Overall Lumière Architecture Events Event Synthesis • Actions Time • Query Control System
Probability user desires assistance Lumiere Inference and Action
Initiative • User vs. system initiative • Allowing fluid collaboration via a mix of initiatives • Toward principles of mixed-initiative interaction • Projects: Lookout, DeepListener, Quartet Reasoning about initiative is a high-payoff opportunity area for HCI, Ubicomp, IUI
? • Critical decision: • Do nothing. • Ask? • Just do it? Initiative & Interaction: Lookout • Learning by watching • Costs-benefit analysis of initiative • Minimize disruption: Prefer doing less, but doing it correctly Joint work with Andy Jacobs
User Actions / Context Real-Time Probabilistic Inference Cost--Benefit Analysis UI / Service Learning and Real-Time Behavior in Lookout • Watch user’s behavior • Store cases, timing info • Learn model from data
Desired Undesired Service A: Computer takes action i u(A,D) u(A,D) Act A: No action i User’s Desire No act u(A,D) u(A,D) D: User desires action i D: User does not desire action i Preferences and Initiative • Expected utility as fundamental in decisions about services
No Action 1.0 u(A,D) u(A,D) Action P* u(A,D) u(A,D) eu(A) = p(D|E) u(A,D) + p(D|E) u(A,D) 0.0 1.0 p(D|E) eu(A) = p(D|E) u(A,D) + [1 - p(D|E)] u(A,D) eu(A) = p(D|E) u(A,D) + [1 - p(D|E)] u(A,D) Preferences and Initiative eu(A) = Sju(Ai,Dj) p(Dj|E)
No Action User rushed Action P* Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) u(A,D) u(A,D) 0.0 1.0 p(D|E)
User rushed Action u(A,D) Increase in Amount of Screen Real Estate Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) No Action No Action u(A,D) P* u(A,D) 0.0 1.0 p(D|E)
User rushed Action u(A,D) Increase in Amount of Screen Real Estate Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) No Action No Action u(A,D) P* u(A,D) 0.0 1.0 p(D|E)
User rushed Action u(A,D) Increase in Amount of Screen Real Estate Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) No Action No Action u(A,D) P* u(A,D) 0.0 1.0 p(D|E)
User rushed u(A,D) Increase in Amount of Screen Real Estate Initiative and Context Utility of outcomes as function of context,u(A,D,k) 1.0 u(A,D) u(A,D) No Action No Action u(A,D) u(A,D) Action P* u(A,D) u(A,D) 0.0 1.0 p(D|E)
Ask Action Engaging in Dialog about Initiative Expected value of engaging the user in dialogue 1.0 u(A,D) u(A,D) No Action u(A,D) P* u(A,D) 0.0 1.0 p(D|E)
Week Day Appt Varying Precision of Service Consider contributions across a spectrum of precision • Assume user will refine partial results • Under uncertainty, trade off reduced precision for higher accuracy
Timing of Initiative • Timing is critical: consider patterns of attention • Record length of message and dwell time before calendar invoked • Perform regression 10 8 6 4 Observed dwell before action (sec) 2 0 0 500 1000 1500 2000 2500 Length of original message (bytes)
Conversational Architectures Project • DeepListener • Bayesian Receptionist • Quartet
Question Why do people find it more difficult and frustrating to converse with a spoken dialog system than with a person? Interpreting spoken language abounds with uncertainty Several answers • Poor recognition of words • Meaning too difficult to capture • Lack of precise user models • Different social and personality dynamics
Intuitions • Despite uncertainty in human–human conversation people manage to understand each other quite well. • People consider the source of their uncertainties and pursue actions to resolve confusions. • Recognition • Language • Context, topic, meaning • Frank troubleshooting • Goal: Models and inference methods that seek mutual understanding under uncertainty given inescapably unreliable components.
Grounding • People resolve uncertainties through a process of grounding Process by which participants establish and maintain the mutual belief that their utterances have been understood well enough for current purposes -Clark & Schaefer, 1987
DeepListener • Utility-directed clarification dialog • Formal model of “understood well enough” • Development environment • Assessment tools • Focus: Spoken command and control systems
Stakes, Likelihoods, and Clarification Actions • Consider stakes of real-world action being considered Should I format your hard drive? Should I try to schedule that? Should I demolish the King Dome now? • Consider uncertainties • Consider expected utility of alternative “repair” actions • Costs and benefits of real-world action vs. alternative dialog repair actions
Approach • Infer likelihoods of alternative spoken intentions • Likelihoods of different spoken intentions given acoustics • Optionally condition on goals inferred by user model external to the speech system • Compute clarification or real-world actions with highest expected utility • Fuse multiple attempts with Bayesian model that considers confidences • Consider history of utterances within a session • No reason to start over at each turn! ..Leverage what was heard before
External User Model Decision Model Dialog or Domain-Level Action(t-1) Utility(t-1) Context Speaker’s Goal(t-1) User’s Spoken Intention(t-1) Content at Focus (t-1) User Actions(t-1) ASR Reliability Indicator(t-1) . . . ASR Candidate n Confidence(t-1) ASR Candidate 1 Confidence(t-1)
Dynamic Model for Reasoning Over Multiple Turns Dialog or Domain-Level Action(t-1) Dialog or Domain-Level Action(t) Utility(t-1) Utility(t) Context Context Speaker’s Goal(t-1) Speaker’s Goal(t) User’s Spoken Intention(t-1) User’s Spoken Intention(t) Content at Focus (t-1) Content at Focus (t) User Actions(t-1) User Actions(t) ASR Reliability Indicator(t-1) ASR Reliability Indicator(t-1) . . . . . . ASR Candidate n Confidence(t-1) ASR Candidate 1 Confidence(t-1) ASR Candidate n Confidence(t) ASR Candidate 1 Confidence(t)
Dialog Actions under Consideration Example: DeepListener for handling confirmation, negation • Perform real-world action (e.g., implode the King Dome now) • Ask for repetition to clarify • Note hesitation or reflection and try again • Note potential overhearingof noise and inquire • Note inattention of user and try to acquire user’s attention • Don’t perform action and just go away • Note problem with conversational interaction and attempt to troubleshoot
DeepListener: SDK and Real-Time Clarification Dialog System • Dynamic Bayesian network modeling and inference • MS command and control speech system • Backchannel animations: MS Agent