1 / 58

Leveraging Human Capabilities in Perceptual Interfaces

Outline and Goal. What are perceptual interfaces?Perceptive vs perceptualMultimodal interfacesChallenge: Do our interfaces work?How do we find out?Challenge: Broaden our scopeLeverage other natural human capabilities. Perceptive to Perceptual. Perceptive UI: aware of userInput to computer: us

tate
Télécharger la présentation

Leveraging Human Capabilities in Perceptual Interfaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Leveraging Human Capabilities in Perceptual Interfaces George G. Robertson Microsoft Research

    2. Outline and Goal What are perceptual interfaces? Perceptive vs perceptual Multimodal interfaces Challenge: Do our interfaces work? How do we find out? Challenge: Broaden our scope Leverage other natural human capabilities Begin with a review of what we are really talking about Describe the difference between perceptive and perceptual UI Describe multimodal ui and how it fits with perceptual UI Challenge #1 Do our perceptual interfaces really work? How do we find out? Challenge #2 We have been focusing mostly on perceptive UI I will argue that we should consider other human capabilities, particularly other perceptual and cognitive skills Begin with a review of what we are really talking about Describe the difference between perceptive and perceptual UI Describe multimodal ui and how it fits with perceptual UI Challenge #1 Do our perceptual interfaces really work? How do we find out? Challenge #2 We have been focusing mostly on perceptive UI I will argue that we should consider other human capabilities, particularly other perceptual and cognitive skills

    3. Perceptive to Perceptual Perceptive UI: aware of user Input to computer: use human motor skills Multimodal UI: use communication skills We use multiple modalities to communicate Perceptual UI: use many human abilities Perception, cognition, motor, communication Perceptive interfaces Make computer more aware of user Vision-based (or other sensors) Makes use of what user is doing with hands (motor skills) Much of last years PUI focused on this Multimodal interfaces: Human face to face communication is multimodal (speech & gesture) MM UI attempts to use those communication skills Mostly MM input But there are many other uses of the term Some of last years PUI focused on this Perceptual UI Take advantage of other natural human abilities Broaden focus of PUI More consideration for the computer to human part of the equationPerceptive interfaces Make computer more aware of user Vision-based (or other sensors) Makes use of what user is doing with hands (motor skills) Much of last years PUI focused on this Multimodal interfaces: Human face to face communication is multimodal (speech & gesture) MM UI attempts to use those communication skills Mostly MM input But there are many other uses of the term Some of last years PUI focused on this Perceptual UI Take advantage of other natural human abilities Broaden focus of PUI More consideration for the computer to human part of the equation

    4. What are Modalities? What is a modality In human communication Take sensory input (particularly hearing and seeing) Map input into human communication channels A modality is one of those mappings From computer point of view: Multimodal output: audio and video Multimodal input: speaking and gesturingWhat is a modality In human communication Take sensory input (particularly hearing and seeing) Map input into human communication channels A modality is one of those mappings From computer point of view: Multimodal output: audio and video Multimodal input: speaking and gesturing

    5. What are Multimodal Interfaces? Attempts to use human communication skills Provide user with multiple modalities May be simultaneous or not Fusion vs Temporal Constraints Multiple styles of interaction What are multimodal interfaces? Unfortunately, the term has been used in the literature in many different ways Basic: Take advantage of natural human communication skills Provide multiple channels (or modalities) These may or may not be simultaneous Most published literature is about multimodal input Some analysis has been done to understand when the modalities are fused (used as one) used in sequences used independantly Some people use the term to describe systems that support multiple styles of interactionWhat are multimodal interfaces? Unfortunately, the term has been used in the literature in many different ways Basic: Take advantage of natural human communication skills Provide multiple channels (or modalities) These may or may not be simultaneous Most published literature is about multimodal input Some analysis has been done to understand when the modalities are fused (used as one) used in sequences used independantly Some people use the term to describe systems that support multiple styles of interaction

    6. Examples Bolt, SIGGRAPH80 Put That There Speech and gestures used simultaneously Im going to quickly review some of the literature Earliest published system Richard Bolt, MIT Put That There Speech and gestures used simultaneouslyIm going to quickly review some of the literature Earliest published system Richard Bolt, MIT Put That There Speech and gestures used simultaneously

    7. Put That There Here is the Media Room for the Put That There system in 1980 Had large rear-projected wall-sized display As well as two smaller monitors User moved objects by combination of spoken commands and pointing Here is the Media Room for the Put That There system in 1980 Had large rear-projected wall-sized display As well as two smaller monitors User moved objects by combination of spoken commands and pointing

    8. Examples (continued) Buxton and Myers, CHI86 Two-handed input Cohen et al, CHI89 Direct manipulation and NL Hauptmann, CHI89 Speech and gestures The term multimedia interfaces has been used in a wide variety of work CHI86: Buxton and Myers introduced two-handed input CHI89 Cohen et al discussed combination of direct manipulation and NL Haupmann discussed combination of speech and gestures The term multimedia interfaces has been used in a wide variety of work CHI86: Buxton and Myers introduced two-handed input CHI89 Cohen et al discussed combination of direct manipulation and NL Haupmann discussed combination of speech and gestures

    9. Examples (continued) Bolt, UIST92 Two-handed gestures and Gaze Blattner & Dannenberg, 1992 book Hanne: text & gestures (interaction styles) Pausch: selection by multimodal input Rudnicky: speech, gesture, keyboard Bier et al, SIGGRAPH93 Tool Glass; two-handed input 1992: Bolt described system that combined two-handed gestures and gaze at UIST92 Blattner & Dannenberg published book on multimedia when included three papers on multimodal interfaces 1993: Bier et al published the Tool Glass work, which used two-handed input 1992: Bolt described system that combined two-handed gestures and gaze at UIST92 Blattner & Dannenberg published book on multimedia when included three papers on multimodal interfaces 1993: Bier et al published the Tool Glass work, which used two-handed input

    10. Examples (continued) Balboa & Coutaz, Intelligent UI93 Taxonomy and evaluation of MMUI Walker, CHI94 Facial expression (multimodal output) Nigay & Coutaz, CHI95 Architecture for fused multimodal input Also in 1993 Balboa & Coutaz published a taxonomy of MMUI along with some evaluation 1994 Walker published on multimodal output, generating facial expressions 1995 Nigay & Coutaz published a general architecture for fused multimodal input Also, Cohen & Oviatt have published on multimodal integration and Oviatt has published an analysis of the myths of multimodal uiAlso in 1993 Balboa & Coutaz published a taxonomy of MMUI along with some evaluation 1994 Walker published on multimodal output, generating facial expressions 1995 Nigay & Coutaz published a general architecture for fused multimodal input Also, Cohen & Oviatt have published on multimodal integration and Oviatt has published an analysis of the myths of multimodal ui

    11. Why Multimodal Interfaces? Now fall far short of human capabilities Higher bandwidth is possible Different modalities excel at different tasks Errors and disfluencies reduced Multimodal interfaces are more engaging Why so much interest? It is clear that our current interfaces fall far short of what the human can do Much higher bandwidth is possible with these systems Different modalities excel at different tasks And this may change over time and user Errors and disfluencies can be reduced dramatically Multimodal interfaces are more natural, so they are more engagingWhy so much interest? It is clear that our current interfaces fall far short of what the human can do Much higher bandwidth is possible with these systems Different modalities excel at different tasks And this may change over time and user Errors and disfluencies can be reduced dramatically Multimodal interfaces are more natural, so they are more engaging

    12. Leverage Human Capabilities Leverage senses and perceptual system Users perceive multiple things at once So how do multimodal systems work? Multimodal output leverages the human senses and perceptual system Because we can perceive multiple things at once Multimodal input leverages the human motor capabilities and communication skills Because we can do multiple things at onceSo how do multimodal systems work? Multimodal output leverages the human senses and perceptual system Because we can perceive multiple things at once Multimodal input leverages the human motor capabilities and communication skills Because we can do multiple things at once

    13. Senses and Perception Use more of users senses Not just vision Sound Tactile feedback Taste and smell (maybe in the future) Users perceive multiple things at once e.g., vision and sound For multimodal output We current use mostly the visual modality We use audio some, but not nearly enough We dont use tactile channel much at all We dont use taste or smell So, there is room to do much more that we are currently doingFor multimodal output We current use mostly the visual modality We use audio some, but not nearly enough We dont use tactile channel much at all We dont use taste or smell So, there is room to do much more that we are currently doing

    14. Motor & Effector Capabilities Currently: pointing or typing Much more is possible: Gesture input Two-handed input Speech and NL Body position, orientation, and gaze Users do multiple things at once e.g., speak and use hand gestures For multimodal input Most interfaces still use pointing and typing Some are using pointing and speech Few pay any attention to what we do with our bodies Position, pose, orientation, and gaze So, there is room for more here alsoFor multimodal input Most interfaces still use pointing and typing Some are using pointing and speech Few pay any attention to what we do with our bodies Position, pose, orientation, and gaze So, there is room for more here also

    15. Simultaneous Modalities? Single modality at a time Adapt to display characteristics Let user determine input mode Redundant, but only one at a time Multiple simultaneous modalities Two-handed input Speech and hand gestures Graphics and sound Some talk about multimodal systems, but intend that only one modality be used at a time In some cases, the different modalities are used to adapt to hardware configurations In some cases, the user is simply given a choice Other systems use multiple modalities together either simultaneously are nearly so Two-handed input combines actions of the two hands Speech and hand gestures are the most obvious Graphics and sound together (multimodal output)Some talk about multimodal systems, but intend that only one modality be used at a time In some cases, the different modalities are used to adapt to hardware configurations In some cases, the user is simply given a choice Other systems use multiple modalities together either simultaneously are nearly so Two-handed input combines actions of the two hands Speech and hand gestures are the most obvious Graphics and sound together (multimodal output)

    16. Taxonomy (Balboa, 1993) The taxonomy by Balboa & Coutaz at IUI93 shows this chart of temporal constraints along the x axis Independent vs sequential vs concurent degree of fusion along the y axis The degree to which the modalities are fused into one actionThe taxonomy by Balboa & Coutaz at IUI93 shows this chart of temporal constraints along the x axis Independent vs sequential vs concurent degree of fusion along the y axis The degree to which the modalities are fused into one action

    17. Modality = Style of Interaction Many styles exist Command interface NL Direct manipulation (WIMP and non-WIMP) Conversational (with an interface agent) Collaborative Mixed styles produce multimodal UI Direct manipulation and conversational agent Finally, a small camp uses the term in an entirely different way Here we are talking about styles of interaction typed commands NL direct manipulation conversation Some describe mixing these styles in a single interface as MMUIFinally, a small camp uses the term in an entirely different way Here we are talking about styles of interaction typed commands NL direct manipulation conversation Some describe mixing these styles in a single interface as MMUI

    18. Multimodal versus Multimedia Multimedia is about media channels Text, graphics, animation, video: all visual media Multimodal is about sensory modalities Visual, auditory, tactile, Multimedia is a subset of Multimodal Output Weve talked a lot about multimodal interfaces How does multimedia fit in? Multimedia is technology driven It is about particular media channels that are available For example, text, graphics, animation, video are all visual media they are different media but the same modality Multimodal is human driven It is about human sensory modalities All of the visual media are using the visual modality In this sense, multimedia systems are a subset of multimodal outputWeve talked a lot about multimodal interfaces How does multimedia fit in? Multimedia is technology driven It is about particular media channels that are available For example, text, graphics, animation, video are all visual media they are different media but the same modality Multimodal is human driven It is about human sensory modalities All of the visual media are using the visual modality In this sense, multimedia systems are a subset of multimodal output

    19. How Do The Pieces Fit? So, here is how I see these various pieces fitting together A lot of work on multimodal input has been done Recent work on adding awareness, or what Ive called Perceptive UI is partly multimodal input and some additional work A lot of work has been done on multimedia Logically, this is a subset of multimodal output Not much has been done on multimodal output outside of multimedia Perceptual UI is really about both input and output, and includes all of what we have been discussing So, here is how I see these various pieces fitting together A lot of work on multimodal input has been done Recent work on adding awareness, or what Ive called Perceptive UI is partly multimodal input and some additional work A lot of work has been done on multimedia Logically, this is a subset of multimodal output Not much has been done on multimodal output outside of multimedia Perceptual UI is really about both input and output, and includes all of what we have been discussing

    20. Challenge Do our interfaces actually work? How do we find out? So, we are assuming that perceptive, multimodal, and perceptual interfaces are actually better because they come closer to human-human communication. Is that assumption correct? Often our papers describe techniques that intuitively solve some problems Often we get excited because an interface looks cool Rarely do our papers provide any proof that the intuition is correct or that the cool effect is useful and usableSo, we are assuming that perceptive, multimodal, and perceptual interfaces are actually better because they come closer to human-human communication. Is that assumption correct? Often our papers describe techniques that intuitively solve some problems Often we get excited because an interface looks cool Rarely do our papers provide any proof that the intuition is correct or that the cool effect is useful and usable

    21. Why Test For Usability? Commercial efforts require proof Cost benefit analysis before investment Intuitions are great for design But intuition is not always right! Peripheral Lens Large scale commercial efforts require some proof It is clear that these techniques are hard to get right It is clear that some of them are hard to implement Before a large commercial effort will invest what it takes for this to succeed, some kind of proof is needed Intuition is wonderful for design insights But it isnt always right Last year, I spent some time implementing an idea that seemed really right, called the Peripheral LensLarge scale commercial efforts require some proof It is clear that these techniques are hard to get right It is clear that some of them are hard to implement Before a large commercial effort will invest what it takes for this to succeed, some kind of proof is needed Intuition is wonderful for design insights But it isnt always right Last year, I spent some time implementing an idea that seemed really right, called the Peripheral Lens

    22. Peripheral Vision Does peripheral vision make navigation easier? Can we simulate peripheral vision? Intuition: Peripheral vision helps with real world navigation e.g., makes locomotion through real hallways work Peripheral vision may be part of what makes VR more immersive than desktop graphics We ought to be able to simulate peripheral vision in desktop graphics and make navigation easierIntuition: Peripheral vision helps with real world navigation e.g., makes locomotion through real hallways work Peripheral vision may be part of what makes VR more immersive than desktop graphics We ought to be able to simulate peripheral vision in desktop graphics and make navigation easier

    23. A Virtual Hallway Here is a virtual hallway connected to other hallways The letters were part of an experiment, using a visual search task Note where M is. There is one letter to its right Here is a virtual hallway connected to other hallways The letters were part of an experiment, using a visual search task Note where M is. There is one letter to its right

    24. Peripheral Lenses Peripheral Lens uses 3 cameras Two side cameras are adjusted so they look to the side Rendering time is about 2x slower (not 3x because of lower fill) There is about 3x more information in the periphery Peripheral Lens uses 3 cameras Two side cameras are adjusted so they look to the side Rendering time is about 2x slower (not 3x because of lower fill) There is about 3x more information in the periphery

    25. Peripheral Lens This shows the spatial relationship of the 3 cameras. The UIST97 paper gives details about computing these angles This shows the spatial relationship of the 3 cameras. The UIST97 paper gives details about computing these angles

    26. Peripheral Lens Intuitions Locomotion should be easier Especially around corners Wayfinding should be easier You can see far sooner Turning corners is particularly hard in a virtual environment You keep hitting the corner Peripheral vision should help avoid that problem Peripheral Lenses make it possible to see around a corner earlier So, you see distant objects sooner That should help with wayfindingTurning corners is particularly hard in a virtual environment You keep hitting the corner Peripheral vision should help avoid that problem Peripheral Lenses make it possible to see around a corner earlier So, you see distant objects sooner That should help with wayfinding

    27. Peripheral Lens Findings Lenses were about the same speed Harder to use for inexperienced people Corner turning was not faster Users were about the same speed with or without Peripheral Lenses They were harder to use for people with no 3D graphics experience We ran an additional study on corner turning behavior, and found that corner turning was not any faster with Lenses Users were about the same speed with or without Peripheral Lenses They were harder to use for people with no 3D graphics experience We ran an additional study on corner turning behavior, and found that corner turning was not any faster with Lenses

    28. The Lesson Do not rely solely on intuition Test for usability! It is a mistake to rely only on our intuition We must make evaluation a standard part of what we do! Ideal approach: Usability testing is part of research and design cycle Papers that report on a new interfaces should provide some evaluation as part of the workIt is a mistake to rely only on our intuition We must make evaluation a standard part of what we do! Ideal approach: Usability testing is part of research and design cycle Papers that report on a new interfaces should provide some evaluation as part of the work

    29. Challenge Are we fully using human capabilities? Peceptive UI is aware of the body Multimodal UI is aware the we use multiple modalities, sometimes simultaneous Perceptual UI should go beyond both of these Second challenge We focus mainly on the perceptive interfaces and multimodal input Other human capabilities can be brought to bear To make interfaces more effective Particularly the human input Most interfaces have focused on the visual channel But other perceptual channels can be used in parallel And other human abilities can be more fully used Also, it is likely that the interface we present for our visualizations itself presents a distraction Can our UI take advantage of other human capabilities?Second challenge We focus mainly on the perceptive interfaces and multimodal input Other human capabilities can be brought to bear To make interfaces more effective Particularly the human input Most interfaces have focused on the visual channel But other perceptual channels can be used in parallel And other human abilities can be more fully used Also, it is likely that the interface we present for our visualizations itself presents a distraction Can our UI take advantage of other human capabilities?

    30. Research Strategy Approach to fixing these problems Proposed research strategy for UI work in general Three parts: First, identify and leverage natural human capabilities Second, Look for technology discontinuities and exploit them e.g., 3D graphics about to be ubiquitous Third, Pick a task domain that will make a major difference Information access is the driving application of this decade Probably will be next decade as well Over half of current GNP comes from information tasks Key: look at the intersection. If you are only task driven, or only technology driven, you are likely to miss the mark Approach to fixing these problems Proposed research strategy for UI work in general Three parts: First, identify and leverage natural human capabilities Second, Look for technology discontinuities and exploit them e.g., 3D graphics about to be ubiquitous Third, Pick a task domain that will make a major difference Information access is the driving application of this decade Probably will be next decade as well Over half of current GNP comes from information tasks Key: look at the intersection. If you are only task driven, or only technology driven, you are likely to miss the mark

    31. Engaging Human Abilities understand complexity new classes of tasks less effort Key to all this: identify and leverage natural human capabilities These will make it possible to understand added complexity understand new classes of tasks with less effort What follows are examples from each of these areasKey to all this: identify and leverage natural human capabilities These will make it possible to understand added complexity understand new classes of tasks with less effort What follows are examples from each of these areas

    32. Examples: Communication Language Gesture Awareness Emotion Multimodal First, look at how we communicate with each other Natural language is flexible Paraphrasing leads to robust communication Dialog to resolve ambiguity First, look at how we communicate with each other Natural language is flexible Paraphrasing leads to robust communication Dialog to resolve ambiguity

    33. Examples: Communication Language Gesture Awareness Emotion Multimodal When we communicate face to face, what we do with our hands, body, and face convey an enormous amount of information consider the effects of eye contact, frowns, smiles, looking away current interfaces pay no attention to gestureWhen we communicate face to face, what we do with our hands, body, and face convey an enormous amount of information consider the effects of eye contact, frowns, smiles, looking away current interfaces pay no attention to gesture

    34. Camera-Based Conversational Interfaces Leverage face to face communication skills Face to face communication Body posture Hand gestures Facial expressions Eye Contact: Embodied agent can follow you as you move Facial expression to determine mood or emotion Happy, sad, angry, puzzled, Interface that responds to mood will be perceived as more trustworthy and friendly Drive improvements to speech recognition Lip reading Steering phased array microphonesFace to face communication Body posture Hand gestures Facial expressions Eye Contact: Embodied agent can follow you as you move Facial expression to determine mood or emotion Happy, sad, angry, puzzled, Interface that responds to mood will be perceived as more trustworthy and friendly Drive improvements to speech recognition Lip reading Steering phased array microphones

    35. Examples: Communication Language Gesture Awareness Emotion Multimodal Computer has little awareness of what we are doing Restrooms have more awareness... Are we in the room? Are we sitting at the computer? Are we facing the computer? Is there anyone else there? Are we busy doing something else (talking on the phone)? What are we looking at? Interface that is aware will be perceived as more engaging. Computer has little awareness of what we are doing Restrooms have more awareness... Are we in the room? Are we sitting at the computer? Are we facing the computer? Is there anyone else there? Are we busy doing something else (talking on the phone)? What are we looking at? Interface that is aware will be perceived as more engaging.

    36. Camera-Based Awareness What is the user doing? What is the user doing? Is the user in the room? At the computer? Facing the computer? Is anyone else there? Is the user talking on the phone? What is the user looking at? System can be more responsive if it is aware of user actions Suspend activity while user is on the phone talking to someone who just came into the office A responsive system that is aware of the user will be perceived as more engaging and engaged What is the user doing? Is the user in the room? At the computer? Facing the computer? Is anyone else there? Is the user talking on the phone? What is the user looking at? System can be more responsive if it is aware of user actions Suspend activity while user is on the phone talking to someone who just came into the office A responsive system that is aware of the user will be perceived as more engaging and engaged

    37. Examples: Communication Language Gesture Awareness Emotion Multimodal Nass and Reeves (Stanford) Social (or emotional) response to computers User perceives emotion and personality in computer regardless of what the designer does Current interfaces perceived as being cold, uncaring, non-communicative Simple change: careful choice of dialogue text can convey a different personality We can build interfaces that detect our emotional state, and adapt to respond to that state Nass and Reeves (Stanford) Social (or emotional) response to computers User perceives emotion and personality in computer regardless of what the designer does Current interfaces perceived as being cold, uncaring, non-communicative Simple change: careful choice of dialogue text can convey a different personality We can build interfaces that detect our emotional state, and adapt to respond to that state

    38. Examples: Communication Language Gesture Awareness Emotion Multimodal Natural communication uses multiple modalities speech gesture Discussed at length already ---------------------------------------------------------- Bolt 1980 Put That There (Speech and gesture) User has a choice of modality Errors and disfluencies are reduced Higher bandwidth is possible Different modalities excel at different tasks Multimodal interfaces are more engaging Natural communication uses multiple modalities speech gesture Discussed at length already ---------------------------------------------------------- Bolt 1980 Put That There (Speech and gesture) User has a choice of modality Errors and disfluencies are reduced Higher bandwidth is possible Different modalities excel at different tasks Multimodal interfaces are more engaging

    39. Examples: Motor Skills Bimanual skills Muscle memory Multimodal Map Manipulation Two hands Speech Bimanual skills are very natural Non-dominant hand for gross positioning Dominant hand for fine manipulation Ken Hinkleys multimodal map manipulation 2-hands for zooming and panning Speech for long distance jumps modalities are complimentaryBimanual skills are very natural Non-dominant hand for gross positioning Dominant hand for fine manipulation Ken Hinkleys multimodal map manipulation 2-hands for zooming and panning Speech for long distance jumps modalities are complimentary

    40. Camera-Based Navigation How do our bodies move when we navigate? How do our bodies move when we navigate? Observe how Nintendo user leans into a turn Use forward/backward motion for relative speed control Use side-to-side motion to control turns Users hands are free for other purposesHow do our bodies move when we navigate? Observe how Nintendo user leans into a turn Use forward/backward motion for relative speed control Use side-to-side motion to control turns Users hands are free for other purposes

    41. Examples: Perception Spatial relationships Pattern recognition Object constancy Parallax Other Senses Cone Tree: part of Information Visualizer developed at Xerox PARC designed to visualize hierarchical information structures 3D perception helps user understand spatial relationships based on relative size and other depth cues (e.g., occlusion) pre-attentive, no cognitive load Patterns become apparent especially if search result is shown in context of structure Object constancy ability to perceive a moving object as one object makes possible complex changes without cognitive load EXAMPLE (next slide): without animation, selection yields something unobvious with animation, user understands without thinking about itCone Tree: part of Information Visualizer developed at Xerox PARC designed to visualize hierarchical information structures 3D perception helps user understand spatial relationships based on relative size and other depth cues (e.g., occlusion) pre-attentive, no cognitive load Patterns become apparent especially if search result is shown in context of structure Object constancy ability to perceive a moving object as one object makes possible complex changes without cognitive load EXAMPLE (next slide): without animation, selection yields something unobvious with animation, user understands without thinking about it

    42. Cone Tree Object constancy ability to perceive a moving object as one object makes possible complex changes without cognitive load EXAMPLE: without animation, selection yields something unobvious with animation, user understands without thinking about itObject constancy ability to perceive a moving object as one object makes possible complex changes without cognitive load EXAMPLE: without animation, selection yields something unobvious with animation, user understands without thinking about it

    43. Examples: Perception Spatial relationships Pattern recognition Object constancy Parallax Other Senses Key 3D depth cue Sensor issues Camera-based head-motion parallax Motion parallax is one of most effective 3D depth cues More effective than stereopsis (Colin Ware) Head-motion parallax is one key way to get motion parallax VR gets some of its power from this But, many user are not willing or able to wear sensors Camera-based head-motion parallax may be answer Could make desktop 3d graphics more usableMotion parallax is one of most effective 3D depth cues More effective than stereopsis (Colin Ware) Head-motion parallax is one key way to get motion parallax VR gets some of its power from this But, many user are not willing or able to wear sensors Camera-based head-motion parallax may be answer Could make desktop 3d graphics more usable

    44. Camera-Based Head-Motion Parallax Motion parallax is one of strongest 3D depth cues Motion parallax is very effective 3D depth cue More effective than stereopsis (Colin Ware) Head-motion is one good way to get motion parallax VR gets some of its power from this BUT: Most Desktop graphics users not willing to wear sensors Camera-based tracking can solve the problem Can extend usefulness of desktop graphics Horizontal head-motion In plane of body? Look-at point? Rotation about center of object? Non-linear? Vertical Not as natural; not as much range Increased noise Zoom: forward/backward motion to drive zoom Awareness: Shouldnt track when user turns away Motion parallax is very effective 3D depth cue More effective than stereopsis (Colin Ware) Head-motion is one good way to get motion parallax VR gets some of its power from this BUT: Most Desktop graphics users not willing to wear sensors Camera-based tracking can solve the problem Can extend usefulness of desktop graphics Horizontal head-motion In plane of body? Look-at point? Rotation about center of object? Non-linear? Vertical Not as natural; not as much range Increased noise Zoom: forward/backward motion to drive zoom Awareness: Shouldnt track when user turns away

    45. Examples: Perception Spatial relationships Pattern recognition Object constancy Parallax Other Senses Auditory Tactile Kinesthetic Vestibular Taste Olfactory Most work to date has focused on the visual channel Auditory channel: Reinforcement of what happens in visual channel Objects become more real (take on weight, substance) Attention (alerts) Tactile channel: Much work on force feedback devices (Some in game applications) Fred Brooks (molecular docking with atomic forces) Also some work on passive haptics Kinesthetic (muscle movement and body position) Tool belt in VR Vestibular (balance) Reinforcement for sense of locomotion (location-based entertainment) Taste: ?? Olfactory channel: Maybe soonMost work to date has focused on the visual channel Auditory channel: Reinforcement of what happens in visual channel Objects become more real (take on weight, substance) Attention (alerts) Tactile channel: Much work on force feedback devices (Some in game applications) Fred Brooks (molecular docking with atomic forces) Also some work on passive haptics Kinesthetic (muscle movement and body position) Tool belt in VR Vestibular (balance) Reinforcement for sense of locomotion (location-based entertainment) Taste: ?? Olfactory channel: Maybe soon

    46. Examples: Perception Olfactory? Maybe soon? Olfactory displays are an active area of work But little published progress in last 2-3 yearsOlfactory displays are an active area of work But little published progress in last 2-3 years

    47. Examples: Cognition Spatial memory Cognitive chunking Attention Curiosity Time Constants 3D layout designed so that user places objects Assumption: spatial memory works in virtual environment users will remember where they put objects Maya Designs Workscape Xerox PARC Web Forager At MSR, we have been studying this with a visualization we call the Data Mountain Now have good evidence that spatial memory does work in 3D virtual environments3D layout designed so that user places objects Assumption: spatial memory works in virtual environment users will remember where they put objects Maya Designs Workscape Xerox PARC Web Forager At MSR, we have been studying this with a visualization we call the Data Mountain Now have good evidence that spatial memory does work in 3D virtual environments

    48. Data Mountain Favorites Management Exploits: Spatial memory 3D perception Pattern recognition Advantages: Spatial organization Not page at a time 3D advantages with 2D interaction Document management -- IE Favorites, window management Pages of interest are placed on a mountain side (a tilted plane in the initial prototype) Act of placing page makes it easier to remember where it is Usability test: Storage & retrieval test for 100 pages; ~26% faster By exploiting 3D perception and spatial memory User can organize documents spatially Can get more info in same space with no additional cognitive load Can see multiple pages at a time Can see patterns of related documents Advantages of 3D with 2D interaction technique Document management -- IE Favorites, window management Pages of interest are placed on a mountain side (a tilted plane in the initial prototype) Act of placing page makes it easier to remember where it is Usability test: Storage & retrieval test for 100 pages; ~26% faster By exploiting 3D perception and spatial memory User can organize documents spatially Can get more info in same space with no additional cognitive load Can see multiple pages at a time Can see patterns of related documents Advantages of 3D with 2D interaction technique

    49. Sample User Reaction Here is a sample user layout 100 pages Markings are landmarks with no intrinsic meaning May look random, but in fact makes a lot of sense to this user Typical comments Strongest cue is the relative size I know where that is Here is a sample user layout 100 pages Markings are landmarks with no intrinsic meaning May look random, but in fact makes a lot of sense to this user Typical comments Strongest cue is the relative size I know where that is

    50.

    51. Data Mountain Usability Spatial memory works in virtual environments! 26% faster than IE4 Favorites 2x faster with Implicit Query We have run a series of studies reported in UIST next month submitted to CHI future submissions Basic findings Spatial memory does work in virtual environments It works over extended time periods DM is significantly faster than IE4 Favorites (26%) We have additional techniques that get us up to 2x fasterWe have run a series of studies reported in UIST next month submitted to CHI future submissions Basic findings Spatial memory does work in virtual environments It works over extended time periods DM is significantly faster than IE4 Favorites (26%) We have additional techniques that get us up to 2x faster

    52. Implicit Query Visualization Highlight related pages Slightly slower for storage Over 2x faster for retrieval Implicit Query When user selects a page, system finds related pages and highlights them Based on similar contents (word frequency analysis) No action is required by user Highlight is designed to avoid distraction Notice the entertainment related pages to the left of the selected page We tested two versions of Implicit Query with the Data Mountain IQ1 was based on simple word frequency (vector space model) IQ2 was based on proximity analysis of previous users layouts (I.e., two pages are similar if they are spatially close together in several previous users layouts Users took longer to store pages Created more categories Were more consistent in their categories Users were more than 2x faster on retrieval Since typical use patterns suggest than each page is used about 5 times Overall performance will be about 2x faster Implicit Query When user selects a page, system finds related pages and highlights them Based on similar contents (word frequency analysis) No action is required by user Highlight is designed to avoid distraction Notice the entertainment related pages to the left of the selected page We tested two versions of Implicit Query with the Data Mountain IQ1 was based on simple word frequency (vector space model) IQ2 was based on proximity analysis of previous users layouts (I.e., two pages are similar if they are spatially close together in several previous users layouts Users took longer to store pages Created more categories Were more consistent in their categories Users were more than 2x faster on retrieval Since typical use patterns suggest than each page is used about 5 times Overall performance will be about 2x faster

    53. Examples: Cognition Spatial memory Cognitive chunking Attention Curiosity Time Constants Chunking: conscious perception of subtasks depends on the input device Consider map manipulation with mouse it takes 3 or 4 steps to pan and zoom with 2 handed technique, it is one movement Chunking: conscious perception of subtasks depends on the input device Consider map manipulation with mouse it takes 3 or 4 steps to pan and zoom with 2 handed technique, it is one movement

    54. Examples: Cognition Spatial memory Cognitive chunking Attention Curiosity Time Constants Motion attracts attention (evolved survival trait) Prey uses it to spot predator Predator uses it to spot prey Implication: animation can be used to focus attention will distract if used inappropriately (I.e., spinning or blinking web objects) Peripheral vision particularly tuned for motion detection May be one reason that VR appears immersive We are exploring ways to enhance that experience on the desktop Focus in context displays Much of IV work was trying to develop these kind of displays Want focus seamlessly integrated with context Avoid shift of attention EXAMPLEMotion attracts attention (evolved survival trait) Prey uses it to spot predator Predator uses it to spot prey Implication: animation can be used to focus attention will distract if used inappropriately (I.e., spinning or blinking web objects) Peripheral vision particularly tuned for motion detection May be one reason that VR appears immersive We are exploring ways to enhance that experience on the desktop Focus in context displays Much of IV work was trying to develop these kind of displays Want focus seamlessly integrated with context Avoid shift of attention EXAMPLE

    55. Focus in Context Focus in context displays EXAMPLE: Cone Tree Looking at large hierarchy 2D layout causes you to scroll and lose context You can fit it all on the screen, but lose details Wrapped in 3D, you get the details and always see the context This is an example of focus seamlessly integrated with contextFocus in context displays EXAMPLE: Cone Tree Looking at large hierarchy 2D layout causes you to scroll and lose context You can fit it all on the screen, but lose details Wrapped in 3D, you get the details and always see the context This is an example of focus seamlessly integrated with context

    56. Examples: Cognition Spatial memory Cognitive chunking Attention Curiosity Time Constants Discoverability is a key problem in current interfaces Market drives addition of new functionality Discoverability gets worse Fear keeps us from natural exploration Will my action be reversable? Will I destroy my work? Universal undo Suggested by Raj Reddy Would remove the fear Could also allow us to remove Save commands But, it is hard to implementDiscoverability is a key problem in current interfaces Market drives addition of new functionality Discoverability gets worse Fear keeps us from natural exploration Will my action be reversable? Will I destroy my work? Universal undo Suggested by Raj Reddy Would remove the fear Could also allow us to remove Save commands But, it is hard to implement

    57. Examples: Cognition Spatial memory Cognitive chunking Attention Curiosity Time Constants From Allen Newells levels of cognition 0.1 s -- Perceptual fusion -> Animations must be 10 fps or faster 1.0 s -- Immediate response -> Unless a response is planned, cannot respond faster than this -> Ideal for short animations slow enough to get some significant animation fast enough that use doesnt feel like he is waiting EXAMPLE: Cone Tree uses 1 second animation to show complex rotationFrom Allen Newells levels of cognition 0.1 s -- Perceptual fusion -> Animations must be 10 fps or faster 1.0 s -- Immediate response -> Unless a response is planned, cannot respond faster than this -> Ideal for short animations slow enough to get some significant animation fast enough that use doesnt feel like he is waiting EXAMPLE: Cone Tree uses 1 second animation to show complex rotation

    58. Summary: Recommendations Broaden scope! Identify and engage human abilities Go beyond the perceptive and multimodal Test for usability! Case for broadening our scope A lot of work has been done on multimodal input Some work on perceptive interfaces A lot of work on multimedia Only a little on broader multimodal output And many human abilities are not leveraged at all Need to pull together all of these to build perceptual interfaces Fully engaging human abilities will simplify UI and let user focus on the real task We need to change the way we do our research and report on it. Testing for usability should be something we routinely do. Every new technique should be tested in some way before being reported in the literature. Case for broadening our scope A lot of work has been done on multimodal input Some work on perceptive interfaces A lot of work on multimedia Only a little on broader multimodal output And many human abilities are not leveraged at all Need to pull together all of these to build perceptual interfaces Fully engaging human abilities will simplify UI and let user focus on the real task We need to change the way we do our research and report on it. Testing for usability should be something we routinely do. Every new technique should be tested in some way before being reported in the literature.

More Related