1 / 37

CMPUT 301: Lecture 31 Out of the Glass Box

CMPUT 301: Lecture 31 Out of the Glass Box. Martin Jagersand Department of Computing Science University of Alberta. Overview. Idea: why only use the sense of vision in user interfaces?

stu
Télécharger la présentation

CMPUT 301: Lecture 31 Out of the Glass Box

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPUT 301: Lecture 31Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta

  2. Overview • Idea: • why only use the sense of vision in user interfaces? • increase the bandwidth of the interaction by using multiple sensory channels, instead of overloading the visual channel

  3. Overview • Multi-sensory systems: • use more than one sensory channel in interaction • e.g., sound, video, gestures, physical actions etc.

  4. Overview • Usable senses: • sight, sound, touch, taste, smell, • Haptics, proprioception and accelerations • each is important on its own • together, they provide a fuller interaction with the natural world

  5. Overview • Usable senses: • computers rarely offer such a rich interaction • we can use sight, sound, and sometimes touch • Flight simulators and some games uses accelerations to create a multimodal immersion experience. • we cannot (yet) use taste or smell

  6. Overview • Multi-modal systems: • use more than one sense in the interaction • e.g., sight and sound: a word processor that speaks the words as well as rendering them on the screen

  7. Overview • Multi-media systems: • use a number of different media to communicate information • e.g., a computer-based teaching system with video, animation, text, and still images

  8. Speech • Human speech: • natural mastery of language • instinctive, taken for granted • difficult to appreciate the complexities • potentially a useful way to extend human-computer interaction

  9. Speech • Structure: • phonemes (English) • 40 (24 consonant and 16 vowel sounds) • basic atomic units of speech • sound slightly different depending on context …

  10. Speech • Structure: • allophones: • 120 to 130 • all the sounds in the language • count depends on accents

  11. Speech • Structure: • morphemes • basic atomic units of language • part or whole words • formed into sentences using the rules of grammar

  12. Speech • Prosody: • variations in emphasis, stress, pauses, and pitch to impart more meaning to sentences • Co-articulation: • the effect of context on the sound • transforms phonemes into allophones

  13. Speech Recognition • Problems: • different people speak differently(e.g., accent, stress, volume, etc.) • background noises • “ummm …” and “errr …” • speech may conflict with complex cognition

  14. Speech Recognition • Issues: • recognizing words is not enough • need to extract meaning • understanding a sentence requires context, such as information about the subject and the speaker

  15. Speech Recognition • Phonetic typewriter: • developed for Finnish(a phonetic language) • trained on one speaker, tries to generalize to others • uses neural network that clusters similar sounds together, for a character • poor performance on speakers it has not been trained on • requires a large dictionary of minor variations

  16. Speech Recognition • Currently: • single user, limited vocabulary systems can work satisfactorily • no general user, general vocabulary systems are commercial successful, yet • Current commercial examples: • Simple telephone based UI such as Train schedule information systems

  17. Speech Recognition • Potential: • for users with physical disabilities • for lightweight, mobile devices • for when user’s hands are already occupied with a manual task (auto mechanic, surgeon)

  18. Speech Synthesis • What: • computer-generated speech • natural and familiar way of receiving information

  19. Speech Synthesis • Problems: • human find it difficult to adjust to monotonic, non-prosodic speech • computer needs to understand natural language and the domain • Speech is transient(hard to review or browse) • produces noise in the workplace or requires headphones(intrusive)

  20. Speech Synthesis • Potential: • screen readers • read a textual display to a visually impaired person • warning signals • spoken information especially for aircraft pilots whose visual and haptic channels are busy

  21. Speech Synthesis • Virtual newscaster (Ananova)

  22. Uninterpreted Speech • What: • fixed, recorded speech • e.g., played back in airport announcements • e.g., attached as voice annotation to files

  23. Uninterpreted Speech • Digital processing: • change playback speed without changing pitch • to quickly scan phone messages • to manually transcribe voice to text • to figure out the lyrics and chords of a song • spatialization and environmental effects

  24. Non-Speech Sound • What: • boings, bangs, squeaks, clicks, etc. • commonly used in user interfaces to provide warnings and alarms

  25. Non-Speech Sound • Why: • fewer typing mistakes with key clicks • video games harder without sound

  26. Non-Speech Sound? • D’oh!

  27. Non-Speech Sound • Dual mode displays: • information presented along two different sensory channels • e.g., sight and sound • allows for redundant presentation • user uses whichever they find easiest • allows for resolution of ambiguity in one mode through information in the other

  28. Non-Speech Sound • Dual mode displays: • humans can react faster to auditory than visual stimuli • sound is especially good for transient information that would otherwise clutter a visual display • sound is more language and culture independent (unlike speech)

  29. Non-Speech Sound • Auditory icons: • use natural sounds to represent different types of objects and actions in the user interface • e.g., breaking glass sound when deleting a file • direction and volume of sounds can indicate position and importance/size • SonicFinder • not all actions have an intuitive sound

  30. Non-Speech Sound • Earcons: • synthetic sounds used to convey information • structured combinations of motives (musical notes) to provide rich information

  31. Non-Speech Sound • Earcons:

  32. Handwriting Recognition • Handwriting: • text and graphic input • complex strokes and spaces • natural

  33. Handwriting Recognition • Problems: • variation in handwriting between users • variation from day to day and over years for a single user • variation of letters depending on nearby letters

  34. Handwriting Recognition • Currently: • limited success with systems trained on a few users, with separated letters • generic, multi-user, cursive text recognition systems are not accurate enough to be commercially successful • Current applications e.g. pre-sorting of mail (but human has to assist with failures)

  35. Newton: printing or cursive writing recognition dictionary of words contextual recognition fine tune spacing and letter shapes fine tune recognition speed learn handwriting over time Handwriting Recognition

  36. Handwriting Recognition • Newton:

  37. End • What did I learn today? • What questions do I still have?

More Related