Digital Audio Signal Processing Lecture-2: Microphone Array Processing

Digital Audio Signal Processing Lecture-2: Microphone Array Processing Marc Moonen & Simon Doclo Dept. E.E./ESAT-STADIUS, KU Leuven marc.moonen@esat.kuleuven.be homes.esat.kuleuven.be/~moonen/

Overview • Introduction & beamforming basics • Data model & definitions • Fixed beamforming • Filter-and-sum beamformer design • Matched filtering • White noise gain maximization • Ex: Delay-and-sum beamforming • Superdirective beamforming • Directivity maximization • Directional microphones (delay-and-subtract) • Adaptive beamforming • LCMV beamforming • Frost beamforming • Generalized sidelobe canceler

Introduction • A microphone is characterized by a `directivity pattern’ which specifies the gain (& phase shift) that the microphone gives to a signal coming from a certain direction (`angle-of-arrival’) • Directivity pattern is a function of frequency (ω) • In a 3D scenario `angle-of-arrival’ is azimuth + elevation angle • Will consider only 2D scenarios for simplicity, with one angle-of arrival (θ), hence directivity pattern is H(ω,θ) • Directivity pattern is fixed and defined by physical microphone design |H(ω,θ)|for 1 frequency:

Introduction • By weighting or filtering (=freq.dependent weighting) and then summing signals from different microphones, a (software controlled) `virtual’ directivity pattern (=weigthed sum of individual patterns) can be produced This assumes all microphones receive the same signals (so are all in the same positions). However... + : [N-tap FIR filters]

Introduction • However, an additional aspect is that in a microphone array different microphones are in different positions/locations, hence also receive different signals • Example : uniform linear array = microphones placed on a line + uniform inter-micr. distances (d) + ideal micr. characteristics (see p.8) For a far-field source signal (plane waveforms), each microphone receives the same signal, up to an angle-dependent delay fs=sampling rate c=propagation speed • Beamforming = `spatial filtering’ based on microphone characteristics (directivity patterns) AND micr. array configuration (`spatial sampling’). +

Introduction • Background/history: ideas borrowed from antenna array design/processing for RADAR & (later) wireless communications. • Microphone array processing considerably more difficult than antenna array processing: • narrowband radio signals versus broadband audio signals • far-field (plane wavefronts) versus near-field (spherical wavefronts) • pure-delay environment versus multi-path environment • Classification: • fixed beamforming: data-independent, fixed filters Fme.g. delay-and-sum, filter-and-sum • adaptive beamforming: data-dependent adaptive filters Fme.g. LCMV-beamformer, Generalized Sidelobe canceler • Applications: voice controlled systems (e.g. Xbox Kinect), speech communication systems, hearing aids,…

H instead of T for convenience (**) Beamforming basics Data model: source signal in far-field (see p.12 for near-field) • Microphone signals are filtered versions of source signal S()at angle  • Stack all microphone signals in a vector d is `steering vector’ • Output signal after `filter-and-sum’ is

microphone-1 is used as a reference (=arbitrary) Beamforming basics Data Model:source signal in far-field • If all microphones have the same directivity patternHo(ω,θ), steering vector can be factored as… • Will often consider arrays with ideal omni-directional microphones : Ho(ω,θ)=1 Example : uniform linear array, see p.5

Beamforming basics Definitions (1) : • In a linear array (p.5) :  =90o=broadside direction  = 0o =end-fire direction • Array directivity pattern(compare to p.3) = `transfer function’ for source at angle  ( -π<< π ) • Steering direction = angle  with maximum amplification (for 1 freq.) • Beamwidth (BW) = region around max with (e.g.) amplification > -3dB (for 1 freq.)

Beamforming basics Data model: source signal + noise • Microphone signals are corrupted by additive noise • Define noise correlation matrix as • Will assume noise field is homogeneous, i.e. all diagonal elements of noise correlation matrix are equal : • Then noise coherence matrix is

Beamforming basics Definitions(2): • Array Gain =improvement in SNRfor source at angle  ( -π<< π ) • White Noise Gain =array gain for spatially uncorrelated noise (=white) (e.g.sensor noise) ps: often used as a measure for robustness • Directivity =array gain for diffuse noise (=coming from all directions) DI and WNG evaluated at max is often used as a performance criterion |signal transfer function|^2 |noise transfer function|^2 (ignore this formula)

PS: Near-field beamforming • Far-field assumptions not valid for sources close to microphone array • spherical wavefronts instead of planar waveforms • include attenuation of signals • 2 coordinates ,r (=position q) instead of 1 coordinate  (in 2D case) • Different steering vector(e.g. with Hm(ω,θ)=1 m=1..M): e e=1 (3D)…2 (2D) with q position of source pref position of reference microphonepm position of mth microphone

PS: Multipath propagation • In a multipath scenario, acoustic waves are reflected against walls, objects, etc.. • Every reflection may be treated as a separate source (near-field or far-field) • A more practical data model is: with q position of source and Hm(ω,q), complete transfer function from source position to m-the microphone (incl. micr. characteristic, position, and multipath propagation) `Beamforming’ aspect vanishes here, see also Lecture-3 (`multi-channel noise reduction’)

Filter-and-sum beamformer design • Basic: procedure based on page 9 Array directivity pattern to be matched to given (desired) pattern over frequency/angle range of interest • Non-linear optimization for FIR filter design (=ignore phase response) • Quadratic optimization for FIR filter design (=co-design phase response)

Filter-and-sum beamformer design • Quadratic optimization for FIR filter design (continued) With optimal solution is Kronecker product

Filter-and-sum beamformer design • Design example M=8 Logarithmic array N=50 fs=8 kHz

Matched filtering : WNG maximization • Basic: procedure based on page 11 • Maximize White Noise Gain (WNG) for given steering angle ψ • A priori knowledge/assumptions: • angle-of-arrival ψ of desired signal + corresponding steering vector • noise scenario = white

Matched filtering : WNG maximization • Maximization in is equivalent to minimization of noise output power (under white input noise), subject to unit response for steering angle (**) • Optimal solution (`matched filter’) is • [FIR approximation]

Matched filtering example: Delay-and-sum • Basic: Microphone signals are delayed and then summed together • Fractional delays implemented with truncated interpolation filters (=FIR) • Consider array with ideal omni-directional micr’s Then array can be steered to angle  : Hence (for ideal omni-dir. micr.’s) this is matched filtersolution

Matched filtering example: Delay-and-sum ideal omni-dir. micr.’s • Array directivity patternH(ω,θ): =destructive interference =constructive interference • White noise gain : (independent of ω) For ideal omni-dir. micr. array, delay-and-sum beamformer provides WNG equal to M for all freqs. (in the direction of steering angle ψ).

Matched filtering example: Delay-and-sum ideal omni-dir. micr.’s • Array directivity patternH(,) for uniform linear array: H(,) has sinc-like shape and is frequency-dependent M=5 microphones d=3 cm inter-microphone distance =60 steering angle fs=16 kHz sampling frequency =endfire =60 wavelength=4cm

Matched filtering example: Delay-and-sum ideal omni-dir. micr.’s For an ambiguity, called spatial aliasing, occurs.This is analogous to time-domain aliasing where now the spatial sampling (=d) is too large. Aliasing does not occur (for any ) if M=5, =60, fs=16 kHz, d=8 cm

Matched filtering example: Delay-and-sum ideal omni-dir. micr.’s • Beamwidth for a uniform linear array: hence large dependence on # microphones, distance (compare p.22 & 23) and frequency (e.g. BW infinitely large at DC) • Array topologies: • Uniformly spaced arrays • Nested (logarithmic) arrays (small d for high , large d for small ) • 2D- (planar) / 3D-arrays with e.g.=1/sqrt(2) (-3 dB)

Super-directive beamforming : DI maximization • Basic: procedure based on page 11 • Maximize Directivity (DI) for given steering angle ψ • A priori knowledge/assumptions: • angle-of-arrival ψ of desired signal + corresponding steering vector • noise scenario = diffuse

Super-directive beamforming : DI maximization • Maximization in is equivalent to minimization of noise output power (under diffuse input noise), subject to unit response for steering angle (**) • Optimal solution is • [FIR approximation]

Super-directive beamforming : DI maximization ideal omni-dir. micr.’s • Directivity patterns for end-fire steering (ψ=0): Superdirective beamformer has highestDI, but very poorWNG (at low frequencies, where diffuse noise coherence matrix becomes ill-conditioned) hence problems with robustness (e.g. sensor noise) ! M=5 d=3 cm fs=16 kHz M 2 WNG=5 DI=25 PS: diffuse noise = white noise for high frequencies DI=5 Maximum directivity=M.M obtained for end-fire steering and for frequency->0 (no proof)

Differential microphones : Delay-and-subtract • First-order differential microphone = directional microphone 2 closely spaced microphones, where one microphone is delayed (=hardware) and whose outputs are then subtracted from each other • Array directivity pattern: • First-order high-pass frequency dependence • P() = freq.independent (!) directional response • 0  1  1 : P() is scaled cosine, shifted up with 1 such thatmax= 0o (=end-fire) and P(max)=1 d/c <<,  <<

Differential microphones : Delay-and-subtract • Types: dipole, cardioid, hypercardioid, supercardioid (HJ84) Cardioid:1= 0.5 zero at 180o DI = 4.8 dB Dipole:1= 0 (=0) zero at 90o DI = 4.8 dB =broadside =broadside =endfire =endfire

Differential microphones : Delay-and-subtract Hypercardioid:1= 0.25 zero at 109o highest DI=6.0 dB Supercardioid:1= zero at 125o, DI=5.7 dB highest front-to-back ratio =endfire =endfire

LCMV-beamforming • Adaptive filter-and-sum structure: • Aim is to minimize noise output power, while maintaining a chosen response in a given look direction (and/or other linear constraints, see below). …compare to (**) p.19&26… • I.e. similar to operation of a superdirective array (in diffuse noise), or delay-and-sum (in white noise), but now noise field is unknown ! • Implemented as adaptive FIR filter (cfr DSP-II) + :

LCMV-beamforming LCMV = Linearly Constrained Minimum Variance • f designed to minimize power (variance) of output z[k] : • To avoid desired signal cancellation, add (J) linear constraints • Ex: fix array response in look-direction ψ for sample freqs wi, i=1..J (**) • With (**) (for sufficiently large J) constrained output power minimization approximately corresponds to constrained noise power minimization (why?) • Solution is (obtained using Lagrange-multipliers, etc..):

Frost Beamforming Frost-beamformer = adaptive version of LCMV-beamformer • If Ryy is known, a gradient-descent procedure for LCMV is : in each iteration filters f are updated in the direction of the constrained gradient. The P and B are such that f[k+1] statisfies the constraints (verify!). The mu is a step size parameter (to be tuned) • If Ryy is unknown, an instantaneous (stochastic) approximation may be substituted, leading to a constrained LMS-algorithm:

Generalized Sidelobe Canceler (GSC) GSC = alternative adaptive filter formulation of the LCMV-problem : constrained optimisation is reformulated as a constraint pre-processing, followed by an unconstrained optimisation, leading to a simpler adaptation scheme • LCMV-problemis • Define `blocking matrix’Ca, ,with columns spanning the null-space of C • Parametrize all f’s that satisfy constraints (verify!) I.e. filter f can be decomposed in a fixed part fq and a variable part Ca. fa • Unconstrained optimization of fa: (MN-J coefficients)

Generalized Sidelobe Canceler GSC (continued) • Hence unconstrained optimization of fa can be implemented as an adaptive filter (adaptive linear combiner), with filter inputs (=‘left- hand sides’) equal to and desired filter output (=‘right-hand side’) equal to • LMS algorithm :

Generalized Sidelobe Canceler GSC then consists of three parts: • Fixed beamformer(cfr.fq ), satisfying constraints but not yet minimum variance), creating `speech reference’ • Blocking matrix (cfr. Ca), placing spatial nulls in the direction of the speech source (at sampling frequencies) (cfr. C’.Ca=0), creating `noise references’ • Multi-channel adaptive filter (linear combiner) your favourite one, e.g. LMS +

y1 yM Postproc Generalized Sidelobe Canceler A popular GSC realization is as follows Note that some reorganization has been done: the blocking matrix now generates (typically) M-1 (instead of MN-J) noise references, the multichannel adaptive filter performs FIR-filtering on each noise reference (instead of merely scaling in the linear combiner). Philosophy is the same, mathematics are different (details on next slide).

Generalized Sidelobe Canceler • Math details: (for Delta’s=0) select `sparse’ blocking matrix such that : =input to multi-channel adaptive filter =use this as blocking matrix now

Generalized Sidelobe Canceler • Blocking matrix Ca • Creating (M-1) independent noise references by placing spatial nulls in look-direction • different possibilities (a la p.38) (broadside steering) Griffiths-Jim • Problems of GSC: • impossible to reduce noise from look-direction • reverberation effects cause signal leakage in noise reference • adaptive filter should only be updated when no speech is present ! Walsh

Digital Audio Signal Processing Lecture-2: Microphone Array Processing