Digital Audio Signal Processing Lecture-3: Noise Reduction

Digital Audio Signal ProcessingLecture-3: Noise Reduction Marc Moonen/Alexander Bertrand Dept. E.E./ESAT-STADIUS, KU Leuven marc.moonen@esat.kuleuven.be homes.esat.kuleuven.be/~moonen/

Overview • Spectral subtraction for single-micr. noise reduction • Single-microphone noise reduction problem • Spectral subtraction basics (=spectral filtering) • Features: gain functions, implementation, musical noise,… • Iterative Wiener filter based on speech modeling • Multi-channel Wiener filter for multi-micr. noise red. • Multi-microphone noise reduction problem • Multi-channel Wiener filter (=spectral+spatial filtering) • Kalman filter based on speech & noise modeling • Kalman filters • Kalman filters for noise reduction

desired signal desired signal estimate y[k] ? noise signal(s) desired signal contribution noise contribution Single-Microphone Noise Reduction Problem • Microphone signalis • Goal: Estimate s[k] based on y[k] • Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, … Digital audio restoration • Will consider speech applications: s[k] = speech signal

Spectral Subtraction Methods: Basics • Signalchopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation is (i-th frame) • However, speech signal is an on/off signal, hence some frames have speech +noise, i.e. some frames have noise only, i.e. • A speech detection algorithm is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…)

Spectral Subtraction Methods: Basics • Definition: () = average amplitude of noise spectrum • Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames • Estimate clean speech spectrum Si() (for each frame), using corrupted speech spectrum Yi() (for each frame, i.e. short-time estimate) + estimated (): based on `gain function’

Spectral Subtraction: Gain Functions Magnitude Subtraction Spectral Subtraction Wiener Estimation Maximum Likelihood Non-linear Estimation Ephraim-Malah = most frequently used in practice see next slide

modifiedBessel functions Spectral Subtraction: Gain Functions skip formulas • Example 1: Ephraim-Malah Suppression Rule (EMSR) with: • This corresponds to a MMSE(*)estimation of the speech spectral amplitude |Si()| based on observation Yi() ( estimate equal toE{ |Si()||Yi() } )assuming Gaussian a priori distributions for Si() and Ni()[Ephraim & Malah 1984]. • Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985]. (*) minimum mean squared error

Spectral Subtraction: Gain Functions • Example 2: Magnitude Subtraction • Signal model: • Estimation of clean speech spectrum: • PS:half-wave rectification

<- cross-correlation in i-th frame <- auto-correlation in i-th frame Spectral Subtraction: Gain Functions • Example 3: Wiener Estimation • Linear MMSE estimation: find linear filter Gi() to minimize MSE • Solution: Assume speech s[k] and noise n[k] are uncorrelated, then... • PS: half-wave rectification

Spectral Subtraction: Implementation • Short-time Fourier Transform(=uniform DFT-modulated analysis filter bank) = estimate for Y(n ) at time i (i-th frame) N=number of frequency bins (channels) n=0..N-1 M=downsampling factor K=frame lengthh[k] = length-Kanalysis window (=prototype filter) • frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2M..3M) • subband processing: • synthesis bank: matched to analysis bank (see DSP-CIS) Y[n,i] y[k] Short-time analysis Gain functions Short-time synthesis

magnitude subtraction Spectral Subtraction: Musical Noise • Audio demo:car noise • Artifact: musical noise What? Short-time estimates of |Yi()| fluctuate randomly in noise-only frames, resulting in random gains Gi() • statistical analysis shows that broadband noise is transformed into signal composed of short-lived tones with randomly distributed frequencies (=musical noise)

probability that speech is present, given observation Spectral Subtraction: Musical Noise • Solutions? • Magnitude averaging: replace Yi() in calculation of Gi() by a local average over frames • EMSR (p7) • augment Gi() with soft-decision VAD: Gi()  P(H1 | Yi()). Gi() … average instantaneous

Spectral Subtraction: Iterative Wiener Filter • Basic: Wiener filtering based spectral subtraction (p.9), with (improved) spectra estimation based on parametric models • Procedure: • Estimate parameters of a speech model from noisy signal y[k] • Using estimated speech parameters, perform noise reduction (e.g. Wiener estimation, p. 9) • Re-estimate parameters of speech model from the speech signal estimate • Iterate 2 & 3

Spectral Subtraction: Iterative Wiener Filter pulse train … … all-pole filter voiced pitch period u[k] speech signal x white noise generator unvoiced frequency domain: time domain: = linear prediction parameters

Spectral Subtraction: Iterative Wiener Filter For each frame (vector) y[m](i=iteration nr.) • Estimate and • Construct Wiener Filter (p.9) with: • estimated during noise-only periods 3. Filter speech frame y[m] repeat until some error criterion is satisfied

Multi-Microphone Noise Reduction Problem speech source ? (some) speech estimate microphone signals noise source(s) speech part noise part

Multi-Microphone Noise Reduction Problem Will estimate speech part in microphone 1 (*) (**) ? (*) Estimating s[k] is more difficult, would include dereverberation (topic 6), etc.. (**) This is similar to single-microphone model (p.3), where additional microphones (m=2..M) help to get a better estimate

Multi-Microphone Noise Reduction Problem • Data model: See Lecture-2 on multi-path propagation, with q left out for conciseness. Hm(ω) is complete transfer function from speech source position to m-the microphone

Multi-Channel Wiener Filter (MWF) • Data model: • Will use linear filters to obtain speech estimate (as in Lecture-2) • Wiener filter (=linear MMSE approach) Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!), hence solution will be `unusual’…

Multi-Channel Wiener Filter (MWF) • Wiener filter solution is (see DSP-CIS) • All quantities can be computed ! • Special case of this is single-channel Wiener filter formula on p.9 compute during speech+noise periods compute during noise-only periods

Multi-Channel Wiener Filter (MWF) • MWF combines spatial filtering (as in Lecture-2) with single-channel spectral filtering (as in Lecture-3 on single-channel noise reduction) : if then…

Multi-Channel Wiener Filter (MWF) • …then it can be shown that • represents a spatial filtering (*) Compare to superdirective& delay-and-sum beamforming (Lecture-2) • Delay-and-sum beamf. maximizes array gain in white noise field • Superdirectivebeamf. maximizes array gain in diffuse noise field • MWF maximizes array gain in unknown (!) noise field. MWF is operated without invoking any prior knowledge (steering vector/noise field) !(the secret is in the voice activity detection… (explain)) (*)Note that spatial filtering can improve SNR, spectral filtering never improves SNR (at one frequency)

Multi-Channel Wiener Filter (MWF) • …then it can be shown that (continued) • represents an additional `spectral post-filter’ i.e. single-channel Wiener estimate (p.9), applied to output signal of spatial filter (prove it!)

filter coefficients Multi-Channel Wiener Filter: Implementation • Implementation with short-time Fourier transform: see p.10 • Implementation with time-domainlinear filtering:

compute during speech+noise periods compute during noise-only periods Multi-Channel Wiener Filter: Implementation Solution is… • Implementation with time-domainlinear filtering:

Multi-Channel Wiener Filter: Implementation • Implementation with time-domainlinear filtering: • Block algorithm: • For each block • -Apply voice activity detection • -Update correlation matrices • -Recompute filter coefficients • -Apply filters • Cheaper stochastic gradient algorithms • Frequency domain algorithms • Details omitted (see literature)

`speech distortion’ `residual noise’ Speech Distortion Weighted MWF skip this slide SDW-MWF is MWF with additional tuning parameter: • Design criterion for can be re-written as i.e. speech distortion+residual noise is minimized

Speech Distortion Weighted MWF skip this slide • Design criterion may now be modified to trade-off noise reduction against speech distortion: • Then optimal solution is… i.e. (rather) straightforward modification • By increasing `mu’, more noise is reduced, at the expense of more speech distortion (which is acceptable to a certain level) . means all emphasis is on noise reduction, speech distortion is ignored ( and then ! )

process noise measurement noise Kalman Filter Given:state space model of a discrete-time MIMO system with v[k] and w[k]: mutually uncorrelated, zero mean, white noises Then: given A, B, C, D and input/output-observations u[k],y[k], k=1,2,... Kalman filter produces MMSE estimates of internal states x[k], k=1,2,... (=`Wiener filter for dynamic systems’)

Kalman Filter Definition:= MMSE-estimate of x[k] using all available data up until time l • `FILTERING’ = estimate • `PREDICTION’ = estimate • `SMOOTHING’ = estimate

Kalman Filter: filtering and 1-step prediction Given together with error covariance matrix P[k|k-1] Then obtain and using u[k], y[k] : Step 1: Measurement Update Step 2:Time Update Kalman Gain=

Kalman Filter: smoothing Estimate states x[1], x[2],…, x[N] based on datau[k],y[k], k = 1, 2, …N How? 1. forward run: apply previous equations for k = 1, 2, … N Result: estimates 2. backward run: apply following equations for k = N, N -1, …1 Result: (better) estimates

Kalman filter for Speech Enhancement • Assume AR model of speech and noise • Equivalent state-space model is… y=microphone signal u[k], w[k] = zero mean, unit variance,white noise

Kalman filter for Speech Enhancement with:

Kalman filter for Speech Enhancement • PS: This was single-microphone case. How can this be extended to multi-microphone case ? Same A, x, v C=?

Kalman filter for Speech Enhancement Iterative algorithm iterations y[k] split signal in frames estimate parameters Kalman Smoother or Kalman Filter reconstruct signal • Disadvantages iterative approach: • complexity • delay

Kalman filter for Speech Enhancement iteration index time index (no iterations) Sequential algorithm D State Estimator: Kalman Filter Parameters Estimator (Kalman Filter) D

CONCLUSIONS • Single-channel noise reduction • Basic system is spectral subtraction • Only spectral filtering, not easily extended to multi-channel case for additional spatial filtering • Hence can only exploit differences in spectra between noise and speech signal: • noise reduction at expense of speech distortion • achievable noise reduction may be limited • Multi-channel noise reduction • Basic system is MWF, possibly extended with speech distortion regularization • Provides spectral + spatial filtering (links with beamforming!) • Kalman filtering • Signal model based approach

Digital Audio Signal Processing Lecture-3: Noise Reduction