3-D Sound and Spatial Audio

3-D Sound and Spatial Audio MUS_TECH 348

Wightman & Kistler (1989) Headphone simulation of free-field listening I. Stimulus synthesis II. Psychophysical validation

I. Stimulus synthesis Goal is to be able to capture free-field listening acoustics with headphones. • 200-14,000 Hz • Greater than 20 dB S/N (only 20 dB?) • 8 loudspeakers on movable arch creating 144 directions • With & without bite bar Measure loudspeaker-delivered HRTFs and compare to headphone-delivered HRTFs

HRTF measurement system

Variability in HRTF measurements Left ear Assembly replaced 10 times with bite bar Right ear Left ear Assembly left in place with no bite bar Right ear Headphone replacement with assembly in place

HRTF intersubject variability

II. Psychophysical validation Goal is to compare localization performance in free-field and headphone listening Stimuli: 8 250 msec noise bursts 200 - 14,000 Hz random spectral changes by critical band Presentation: 6 loudspeakers at a time mounted on arch headphones 72 positions Task: absolute judgment of azimuth and elevation no measure of distance or quality

Types of Errors • Angle error (mean of difference angles) • Judgment centroid (average direction) • Dispersion of judgments • Front-back reversals are removed! (and examined separately) Results • Substantial individual differences • Less obvious in global measures • Most evident in elevation judgments • Performance varies with region • Best localization: side (contradicts other studies) • Worst localization: top rear • Free-field and headphone judgments very similar • More front-back reversals with headphones

Headphone simulation data in parentheses

SDE has most errors. SDO has fewest errors, especially for elevation.

Elevation Dependency Function Interaural intensity difference compared to 0-degrees elevation Subject SDE’s poor elevation judgments could be explained by the lack of a coherent pattern

Begault: Challenges to the Successful Implementation of 3-D Sound • Focus is on deployable systems, especially audio systems • Individual HRTFs can be quite different • Challenges: • Eliminate front-back reversals & improve externalization • Reduce HRTF data load • Resolve conflicts in data specifications

Begault: Challenges to the Successful Implementation of 3-D Sound Mismatch of Specification and Performance Success depends on: HRTFs: some work better than others different sets create timbral percepts Input sounds broadband sounds localize better Specification Have reasonable expectations What kinds of HRTFs to use for systems? General HRTFs designed for average listeners HRTFs of good localizer

Reality vs Ideal From Begault and Wenzel, 1993

Begault: Challenges to the Successful Implementation of 3-D Sound • Localization error • For dummyhead recordings, 30% of locations suffer reversals • 4:1 front-back vs back-front • Many sounds not externalized • Low-frequency Response Errors • Measurement equipment can’t get it right • Data-reduction for HRTFs • Reduce the number of coefficients • Alternative Strategies like pole-zero modeling

Martens: Perceptual evaluation of filters controlling source direction: Customized and generalized HRTFs for binaural synthesis Focus is on systems supporting directional hearing with special consideration on HRTF design Position of sound source and position of auditory event do not always coincide, but that is not necessarily an issue of accuracy Sound localization might better be called space perception

Martens: Perceptual evaluation of filters controlling source direction: Customized and generalized HRTFs for binaural synthesis Binaural Synthesis Good localizer HRTFs not supported by evidence Given the variety of approaches to binaural synthesis, better to use the term Directional Transfer Functions (DTFs) when they are created analytically Target Exact Analytic One Individualized HRTFs Customized DTFs Many Averaged HRTFs Generalized DTFs Performance evaluation (in additional azimuth and elevation); Externalization Range Coherence Naturalness

Martens: Perceptual evaluation of filters controlling source direction: Customized and generalized HRTFs for binaural synthesis Binaural Synthesis Evaluation What features are needed to make binaural synthesis “ear adequate” Binaural cues can be based on analysis and selected resynthesis Principle Components Analysis (PCA) Selective Reconstruction (for example, leaving out phase information [Pole-zero design] Elevation judgments needed only three out of four cues: ipsilateral magnitude interaural magnitude ipsilateral phase interaural phase

Martens: Perceptual evaluation of filters controlling source direction: Customized and generalized HRTFs for binaural synthesis Customizing HRTFs Calibration methods: Anthropometric (anatomy) Acoustic (HRTFs) Psychophysical (perception) Source Range Ipsilateral gain and contralateral attenuation are important

3-D Sound and Spatial Audio