Audio in VEs

Audio in VEs Ruth Aylett

Use of audio in VEs • Important but still under-utilised channel for HCI including virtual environments. • Speech recognition for hands-free input • All computers now have sound output • At least a beep • Usually CD-quality stereo sound • Conventional stereo places a sound in any spot between the left and right loudspeakers. • In true 3D sound, the source can be placed in any location: right or left, up or down, near or far.

Potential uses • Associate sounds with particular events • Associate sounds with static objects • Associate sound with the motion of an object • Use localised sound to attract attention to an object • Use ambient sounds to add to the feeling of immersion • Use sounds to add to the feeling of realism • Use speech to communicate with devices or avatars • Use sound as a warning or alarm signal

Overall impact • High Quality audio provides: • Increased realism • Reinforces visuals • Strong immersive sense • World exists beyond part that is seen • Strong positional cues • Extra information about the environment • The shape of the world • What does ‘High Quality’ mean?

VR sound environment • VR equipment creates a difficult sound environment. • CAVE • Stand in a glass box • Pretend you can’t hear the echoes • Also hard to place speakers • Semi-immersive VR theatre • Sit in a big cylinder • Reflects sound in a very strange way.

Workbench • Not so bad but still... • Big flat screen 1 metre in front of you • Sound coming from surround speakers • Creates echoes inappropriate for scene • Much VR audio is based on very high quality headphones • Use head tracking to get position and orientation • Play to user • Problem solved! - well, no

L speaker R speaker Mon-aural source Phantom source Stereo sound • In the entertainment industry, stereo was the first successful commercial product involving spatial sound. • To placesound on the left, send its signal to the left loudspeaker, to place it on the right, send its signal to the right loudspeaker.

Stereo techniques • If same signal sent to both speakers*, phantom source seems to originate from point midway between them. • Crossfading signal from one speaker to the other gives impression of source moving continuously between the two positions. • Simple crossfading cannot create impression of source outside of line segment between speakers. • Can also shift the location of the phantom source by exploiting the precedence effect (delay). *and if the speakers are wired "in phase" and if the listener is more or less midway between the speakers and if the room is not too acoustically irregular

A world of sound • You are surrounded by sound all the time • Silence is unheard of! • The environment affects (shapes?) the sound you hear • Size, shape, materials

Rendering sound: auralisation • To generate correct echoes must model sound behaviour in the space • Rooms are complex • Filled with different materials • Reflective, Absorbant, Frequency filtering • Just like rendering light

----- Direct Sound ---- Reverberant Field ---- Early Reflections Simulation of Room Characteristics

Over time

Putting sound into a VE • What sound? • ‘Ambient’ sounds • ‘Surround’ sound‘ • Often use recorded sounds • Positional sounds • Designed to give a strong sense of something happening in a particular place • Also often provided by using recorded sounds

Positional sound • Using sound to create the sense of active things in the environment • Enhances presence • Enhances immersion • Need to deal with many components • Reflections (echoes) • Diffraction effects

City models • VisClim: scene in Linköping’s Storatorget • Surrounding environment • Vehicles? • Several roads nearby • People? • Many people in the square • Weather noise effects • Rainfall • Snowfall (no sound but damping effect)

Visclim

Air Traffic control • Simulation • No ‘ambient’ sound required • No aircraft noises • No realism wanted? • Positional warnings? • Designed to draw the users attention to the location of a problem • Which may be out of the field of view

What it looked like

Creating positional sound • Amplitude • (or more) • Synchronisation • Audio delays • Frequency • Head-Related Transfer Function (HRTF)

Amplitude • Generate audio from position sources • Calculate amplitude from distance • Include damping factors • Air conditions • Snow • Directional effect of the ears

Synchronisation • Ears are very precise instruments • Very good at hearing when something happens after something else • Sound travels slowly (c 340 m/sec in air): different distance to each ear • Use this to help define direction • Difference in amplitude gives only very approximate direction information

Speed effect • 30 centimetres =0.0008 seconds • Human can hear ≤ 700µS

What is 3D sound? • Able to position sounds all around a listener. • Sounds created by loudspeakers/headphones: perceived as coming from arbitrary points in space. • Conventional stereo systems generally cannot position sounds to side, rear, above, below • Some commercial products claim 3D capability - e.g stereo multimedia systems marketed as having “3D technology”. But usually untrue.

3D positional sound • Humans have stereo ears • Two sound pulse impacts • One difference in amplitude • One difference in time of arrival • How is it that a human can resolve sound in 3D? • Should only be possible in 2D?

Frequency • Frequency responses of the ears change in different directions • Role of pinnae • You hear a different frequency filtering in each ear • Use that data to work out 3D position information

Head-Related Transfer Function • Unconscious use of time delay, amplitude difference, and tonal information at each ear to determine the location of the sound. • Known as sound localisation cues. • Sound localisation by human listeners has been studied extensively. • Transformation of sound from a point in space to the ear canal can be measured accurately • Head-Related Transfer Functions (HRTFs). • Measurements are usually made by inserting miniature microphones into ear canals of a human subject or a manikin.

HRTFs • HRTFs are 3D • Depend on ear shape (Pinnae) and resonant qualities of the head! • Allows positional sound to be 3D • Computationally difficult • Originally done in special hardware (Convolvotron) • Can now be done in real-time using DSP

HRTFs • First series of HRTF measurement experiments in 1994 by Bill Gardner and Keith Martin, Machine Listening Group at MIT Media Lab. • Data from these experiments made available for free on the web. • Picture shows Gardner and Martin with dummy used for experiment - called a KEMAR dummy. • A measurement signal is played by a loudspeaker and recorded by the microphones in the dummy head.

HRTFs • Recorded signals processed by computer, derives two HRTFs (left and right ears) corresponding to sound source location. • HRTF typically consists of several hundred numbers • describes time delay, amplitude, and tonal transformation for particular sound source location to left and right ears of the subject. • Measurement procedure repeated for many locations of sound source relative to head • database of hundreds of HRTFs describing sound transformation characteristics of a particular head.

HRTFs • Mimick process of natural hearing • reproducing sound localisation cues at the ears of listener. • Use pair of measured HRTFs as specification for a pair of digital audio filters. • Sound signal processed by digital filters and listened to over headphones • Reproduces sound localisation cues for each ear • listener should perceive sound at the location specified by the HRTFs. • This process is called binaural synthesis (binaural signals are defined as the signals at the ears of a listener).

HRTFs • You should be able to describe the process involved in generating true 3D audio using HRTFs.

The problem • Rendering audio is really, really hard • Much bigger problem than lighting • Material properties are more complex • Can’t fake it as easily • Properties are always a problem • Good methods exist but problem too computationally hard for these to be in general use at present

What is possible now • Constraint is real time audio rendering • Must adapt to dynamic user who moves unpredictably • Simple (reflectionless) stereo positional sound • Using amplitude • Using synchronization • Using HRTF frequency filtering • Useful for audio cues and simple environmental sounds

What about surround sound? • Principal format for digital discrete surround is the "5.1 channel" system. • The 5.1 name stands for five channels (in front: left, right andcentre, and behind: left surround and right surround) of full-bandwidth audio (20 Hz to 20 kHz) • sixth channel at times contain additional bass information to maximise the impact of scenes such as explosions, etc. • This channel has a narrow freq. response (3 Hz to 120 Hz), thus sometimes referred to as the ".1" channel.

What about surround sound? • Surround sound systems NOT true 3D audio systems - just collection of more speakers. • Various commercial surround sound formats - for home entertainment, Dolby is big name. • Dolby Surround Digital. • Lots of other proprietaryapproaches - e.g. the BattleChair (pictured).

Dolby Headphone • Dolby Headphone: based proprietary algorithm, presumably similar to HRTFs, originally developed by Australian company Lake Technology • attempts to produce convincing surround-sound effects through ordinary stereo headphones. • Technology originally developed for VR or tele-conferencingapplications but not marketed for consumer applications. • A more genuine 3Daudio system developed by UK company Sensaura.

Voice interaction • Voice input for control • Continuous? Discrete? • Voice output for information • Positional - alerts • Non-positional - ‘voice over’ • Character-based - social channel

Voice output • Voice synthesis • Computer strings together set of phonemes (basic language sound units) • Problems with articulation: sounds robotic • Unit selection voices • Uses large database created from real voices • Plus sophisticated algorithm for putting bits together • Good results but need very large memory (1 gigabyte) to hold database • Takes lots of time and expertise to create ‘voice’

Audio in VEs

Audio in VEs

Presentation Transcript

In Concert Audio

Scalar mesons in VES experiment

VES Laptop Training

Irregular Plural ‘ves’