1 / 62

Cosc 6326/Psych6750X

Cosc 6326/Psych6750X. Audition and Auditory Displays. Use of auditory displays. Sound in information display. speech provides a high bandwidth communication channel audition is a long distance sense without field of view restrictions

fruma
Télécharger la présentation

Cosc 6326/Psych6750X

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cosc 6326/Psych6750X Audition and Auditory Displays

  2. Use of auditory displays

  3. Sound in information display • speech provides a high bandwidth communication channel • audition is a long distance sense without field of view restrictions • Sound is useful for information display (Cohen & Wenzel 1995) • when origin of message is a sound (voice, music)

  4. when message is simple and short (e.g. event markers) • when message will not be referred to later (e.g. time) • when message deals with events in time • warnings or prompts (hearing is always on, no field of view issues) • continuously changing information (e.g. countdown) • when other systems (e.g. vision) are overloaded

  5. when verbal response is required (compatibility) • when illumination or disability prevents vision (e.g. alarm clock, limited field of view, blindness) • when the user moves from place to place (sound as an ubiquitous I/O channel)

  6. Sonification • In ‘visualization’ situations, ‘sonification’ of data can assist in the exploration of complex datasets • In these applications ‘realism’ is typically not a major issue • Sound can help interpret complex or multidimensional data; can provide an independent display dimension

  7. In addition to information display, in immersive displays sound contributes to: • realism, situational awareness and presence • ambience and emotive context • cueing visual attention • natural communication • space perception

  8. Realism and ambience • High quality sound improves perceived ‘quality’ of visual displays • Sounds in the environment provides vital information that contributes to situational awareness • Persistence of sounds of objects out of field of view may help maintain object permanence

  9. Sound is believed to be vital for conveying emotion and ambience in movies • Ambient sounds can be realistic or abstract (e.g. music to set mood) • Absence of appropriate sound degrades realism

  10. If background sounds are not well matched to visuals participant may feel detached –‘presence’ may be degraded • Relation between presence and realism is not straightforward (later lecture) • Sound is an omni-directional sense and may help user feel immersed in the VE • Auditory collision cues may help navigating a VE (especially with HMDs)

  11. Audition

  12. Sound • Sound is “mechanical vibrations and waves of an elastic medium, particularly in the frequency range of human hearing (16 Hz to 20 kHz)” • Normally, the medium is air. Sound is an air pressure wave. • Sound is usually used to describe the physical stimulus.

  13. Audition refers to perception. • An auditory event is usually elicited by a sound event. • A sinusoidal pressure wave is known as a pure tone.

  14. Sinusoid • x(t) = A cos(2f0t + ) A is amplitude f0 is frequency  is phase • T0 is period •  is related to time shift of peak x(t) t T0=1/f0

  15. Dimensions of sound • Harmonic content: pitch, melody, harmony, waveshape, timbre, vibrato • Timing: duration, tempo, rhythm, • Loudness, envelope • Spatial: azimuth, elevation, distance • Ambience: resonance, reverberation, spaciousness • Representation: literal, auditory icons, abstract

  16. Perceptual and physical dimensions are analogous but distinct • pitch and frequency (directly related for pure tones) • loudness and intensity • timbre and complexity

  17. Matlin and Foley, Sensation and Perception

  18. Kandel et al, Principles of Neural Science

  19. Physiology and psychophysics • Cochlea performs mechanical spectral analysis of sound signal • Pure tone induces traveling wave in basilar membrane. • maximum mechanical displacement along membrane is function of frequency (place coding) • Displacement of basilar membrane changes with compression and rarefaction (frequency coding)

  20. Matlin and Foley, Sensation and Perception Kandel et al, Principles of Neural Science

  21. Perception of pitch • Along the basilar membrane, hair cell response is tuned to frequency • each neuron in the auditory nerve responds to acoustic energy near its preferred frequency • preferred frequency is place coded along the cochlea. Frequency coding believed to have a role at lower frequencies • Higher auditory centers maintain frequency selectivity and are ‘tonotopically mapped’

  22. Pitch is related to frequency for pure tones. • For periodic or quasi-periodic sounds the pitch typically corresponds to inverse of period • Some have no perceptible pitch (e.g. clicks, noise) • Sounds can have same pitch but different spectral content, temporal envelope … timbre

  23. Perception of loudness • Intensity is measured on a logarithmic scale in decibels • Range from threshold to pain is about 120 dB-SPL • Loudness is related to intensity but also depends on many other factors (attention, frequency, harmonics, …)

  24. Spatial hearing • Auditory events can be perceived in all directions from observer • Auditory events can be localized internally or externally at various distances • Audition also supports motion perception • change in direction • Doppler shift

  25. Ability to localize depends on sound source and environment • a tone in reverberant room is difficult to locate in time and space • a click in an anechoic chamber, on the other hand, is precisely located and time limited

  26. Auditory Scene Analysis • Process of separating out the different sources present in the environment • Detection and segregation of distinct sources • Grouping of sounds in spatial and temporal proximity into single streams

  27. Cocktail party effect • In environments with many sound sources it is easier to process auditory streams if they are separated spatially • Spatial sound techniques can help in sound discrimination, detection and speech comprehension in busy immersive environments

  28. Spatial Auditory Cues • Two basic types of head-centric direction cues • binaural cues • spectral cues

  29. Binaural Directional Cues • When a source is located eccentrically it is closer to one ear than the other • sound arrives later and weaker at one ear • head ‘shadow’ also weakens sound arrive at opposite ear • Binaural cues are robust but ambiguous

  30. http://headwize.com/tech/aureal1_tech.htm

  31. Interaural time differences (ITD) • ITD increase with directional deviation from the median plane. It is about 600 sfor a source located directly to one side. • Humans are sensitive to as little as 10 s ITD. Sensitivity decreases with ITD. • For a given ITD, phase difference is linear function of frequency • For pure tones, phase based ITD is ambiguous

  32. At low to moderate frequencies phase difference can be detected. At high frequencies can use ITD in signal envelope. • ITD cues appear to be integrated over a window of 100-200ms (binaural sluggishness, Kollmeier & Gillkey, 1990)

  33. Interaural intensity differences (IID) • With lateral sources head shadow reduces intensity at opposite ear • Effect of head shadow most pronounced for high frequencies. • IID cues are most effective above about 2000 Hz • IID of less than 1dB are detectable. At 4000 Hz a source located at 90° gives about 30 dB IID (Matlin and Foley, 1993)

  34. Ambiguity and Lateralization Goldstein, Sensation and Perception

  35. Ambiguity and Lateralization • These binaural cues are ambiguous. The same ITD/IID can arise from sources anywhere along a ‘cone of confusion’ • Spectral cues and changes in ITD/IID with observer/object motion can help disambiguate • When directional cues are used in headphone systems, sounds are lateralised left versus right but seem to emanate from inside the head (not localised)

  36. also for near sources (less than 1 m) there is significant IID due to differences in distance to each ear even at lower frequencies (Shinn-Cunningham et al 2000) • Intersection of these ‘near field’ IID curves with cones of confusion constrains them to toroids of confusion

  37. Spectral Cues • Pinnae or outer ears and head shadow each each ear and create frequency dependent attenuation of sounds that depend on direction of source • Pinnae are relatively small, spectral cues are effective predominately at higher frequencies (i.e. above 6000 Hz)

  38. Direction estimation requires separation of spectrum of sound source from spectral shaping by the pinnae • Shape of the pinnae shows large individual differences which is reflected in differences in spectral cues

  39. Distance Cues • anechoic • intensity decreases with distance • attenuation is higher at high frequency • confound with spectrum and intensity of source • Near field IID http://headwize.com/tech/aureal1_tech.htm

  40. http://headwize.com/tech/aureal1_tech.htm

  41. reverberation • ratio of direct to reverberant energy indicates distance wrt environment • reverberation pattern indicates ‘spaciousness’ of the environment • reverberation is more realistic but can degrade localisation, speech recognition …

  42. Visual-Auditory Interactions • Auditory cues associated with visual targets can cue visual attention • Latency for audition is less than vision • A sound associated with visual target • can speed visual search • can reduce response times • facilitate saccadic eye movements • can cue attention outside the field of view

  43. Ventriloquism and visual capture • When a visual and auditory source are grouped, the sound is usually perceived in the direction of the visual target

  44. Auditory/Aural Displays

  45. Headphone displays • Precise independent control of inputs to each ear. • Individual display. • Closed ear type can exclude external sounds. Reduces interference from external sources; simplifies AR systems. • Entail an encumbrance. • Diotic, dichotic (stereo) and spatialised displays • Head fixed frame of reference. Display needs to be head tracked to register with virtual world.

  46. Speaker systems • Simpler, less encumbrance, multi-user • Cannot ‘occlude’ real world sounds but can sometimes mask • Complication with echoes and cross-coupling between channels • Interference from/with visual displays • World frame of reference. • Subwoofer allows for deep bass. Could augment headphones

  47. Spatialised audio • simple ITD, IID cues in a display lateralize a sound. Sound is not ‘externalized’ • spatialised audio: generate most of the spatial cues in real world environment using signal processing • with appropriate modeling of sound sources and user tracking can provide a compelling illusion of spatial sound in a VE

  48. Binaural recording http://www.engr.sjsu.edu/~knapp/HCIROD3D/3D_sys1/binaural.htm

  49. Head related transfer function (HRTF) • describes how sound at a given location is transformed (by pinnae etc.) as it travels to the ear, as a function of frequency • function of source direction and distance and frequency (4D) • equivalent to the Fourier transform of the response to a impulse source at the desired position

More Related