560 likes | 759 Vues
Multi-microphone noise reduction and dereverberation techniques for speech applications. Simon Doclo Dept. of Electrical Engineering, KU Leuven, Belgium 8 July 2003. Overview. Introduction Basic principles Robust broadband beamforming Multi-microphone optimal filtering
E N D
Multi-microphone noise reduction and dereverberation techniques for speech applications Simon Doclo Dept. of Electrical Engineering, KU Leuven, Belgium 8 July 2003
Overview • Introduction • Basic principles • Robust broadband beamforming • Multi-microphone optimal filtering • Acoustic transfer function estimation and dereverberation • Conclusion and further research
Overview • Introduction • Motivation and applications • Problem statement • Contributions • Basic principles • Robust broadband beamforming • Multi-microphone optimal filtering • Acoustic transfer function estimation and dereverberation • Conclusion and further research
Speech intelligibility and speech recognition Background noise: - fan, radio - other speakers - generally unknown Reverberation - reflections of signal against walls, objects Motivation • Speech communication applications: hands-free mobiletelephony, voice-controlled systems, hearing aids • Introduction -Motivation -Problem statement -Contributions • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation anddereverberation • Conclusion • Speech acquisition in an adverse acoustic environment • Poor signal quality
Objectives • Signal enhancement techniques: • Introduction -Motivation -Problem statement -Contributions • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation anddereverberation • Conclusion • Noise reduction : reduce amount of background noise without distorting speech signal • Dereverberation : reduce effect of signal reflections • Combined noise reduction and dereverberation • Acoustic source localisation: video camera or spotlight Signal enhancement
Applications • Hands-free mobile telephony: • Most important application from economic point of view • Hands-free car kit mandatory in many countries • Introduction -Motivation -Problem statement -Contributions • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation anddereverberation • Conclusion • Most current systems: 1 directional microphone • Video-conferencing: • Microphone array for source localisation : • point camera towards active speaker • signal enhancement by steering of microphone array
Applications • Voice-controlled systems: • domotic systems, consumer electronics (HiFi, PC software) • added value only when speech recognition system performs reliably under all circumstances • signal enhancement as pre-processing step • Introduction -Motivation -Problem statement -Contributions • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion • Hearing aids and cochlear implants: • most hearing impaired suffer from perceptual hearing loss • amplification • reduction of noise wrt useful speech signal • multiple microphones + DSP in hearing aid • current systems: simple beamforming • robustness important due to small inter-microphone distance
Algorithmic requirements • ‘Blind’ techniques: unknown noise sources and acoustic environment • Adaptive: time-variant signals and acoustic environment • Robustness: • Microphone characteristics (gain, phase, position) • Other deviations from assumed signal model (look direction error, VAD) • Integration of different enhancement techniques • Computational complexity • Introduction -Motivation -Problem statement -Contributions • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Development of multi-microphone noise reduction and dereverberation techniques with better performance and robustnessfor coloured noise scenarios Problem statement • Problem of existing techniques: • Single-microphone techniques: very limited performance • multi-microphone techniques: exploit spatial information • multiple microphones required for source localisation • A-priori assumptions about position of signal sources and microphone array: large sensitivity to deviations • improve robustness (and performance) • Assumption of spatio-temporally white noise • extension to coloured noise • Introduction -Motivation -Problem statement -Contributions • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Single-microphone techniques • spectral subtraction[Boll 79, Ephraim 85, Xie 96] • Signal-independent transformation • Residual noise problem • subspace-based[Dendrinos 91, Ephraim 95, Jensen 95] • Signal-dependent transformation • Signal + noise subspace • Multi-microphone techniques • fixed beamforming[Dolph 46, Cox 86, Ward 95, Elko 00] • Fixed directivity pattern • adaptive beamforming[Frost 72, Griffiths 82, Gannot 01] • adapt to different acoustic environments performance • `Generalised Sidelobe Canceller’ (GSC) • inverse, matched filtering[Myoshi 88, Flanagan 93, Affes 97] only spectral information a-priori assumptions spatial information robustness 2. Multi-microphone optimal filtering State-of-the-art and contributions 1. Robust broadband beamforming 3.Blind transfer function estimation and dereverberation 10
Overview • Introduction • Basic principles • Signal model • Signal characteristics and acoustic environment • Robust broadband beamforming • Multi-microphone optimal filtering • Acoustic transfer function estimation and dereverberation • Conclusion and further research
Acousticimpulse response Speechsignal Additivenoise Signal model • Signal model for microphone signals in time-domain: filtered version of clean speech signal + additive coloured noise • Introduction • Basic principles -Signal model -Characteristics • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Signal model • Multi-microphone signal enhancement: microphone signals are filtered with filters wn[k] and summed • Introduction • Basic principles -Signal model -Characteristics • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion • f[k] = total transfer function for speech component • zv[k] = residual noise component • Techniques differ incalculation of filters: • Noise reduction :minimise residual noise zv[k] and limit speech distortion • Dereverberation : f[k]=δ[k] by estimating acoustic impulse responses hn[k] • Combined noise reduction and dereverberation
0.4 0.3 0.2 0.1 0 Amplitude -0.1 -0.2 -0.3 -0.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (sec) Signal characteristics • Speech: • Broadband (300-8000 Hz) • Non-stationary • On/off-characteristic Speech detection algorithm (VAD) • Linear low-rank model: linearcombination of basis functions • Introduction • Basic principles -Signal model -Characteristics • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion (R=12…20) • Noise: • unknown signals (no reference available) • slowly time-varying (fan) non-stationary (radio, speech) • localised diffuse noise
Impulse response PSK row 9 1 0.8 0.6 0.4 Amplitude 0.2 0 -0.2 -0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time (sec) Acoustic environment • Reverberation time T60: global characterisation • Acoustic impulse responses: • Acoustic filtering between2 points in a room • FIR filter (K=1000…2000 taps) • Non-minimum-phase system no stable inverse • Microphone array: • Assumption: point sensors with ideal characteristics • Deviations: gain, phase, position • Distance speaker – microphone array: far-field near-field • Introduction • Basic principles -Signal model -Characteristics • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Overview • Introduction • Basic principles • Robust broadband beamforming • Novel design procedures for broadband beamformers • Robust beamforming for gain and phase errors • Multi-microphone optimal filtering • Acoustic transfer function estimation and dereverberation • Conclusion and further research
Exploit spatial diversity by using multiple microphones Suppress noise and reverberation from certain directions - Low complexity - Robustness at low signal-to-noise ratio (SNR) • - A-priori knowledge of microphone array characteristics • Signal-independent FIR filter-and-sum structure: arbitrary spatial directivity pattern for arbitrary microphone array configuration Fixed beamforming • Speech and noise sources with overlapping spectrum at different positions • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion • Technique originally developed for radar applications: • Smallband : delay compensation broadband • Far-field : planar waves near-field : spherical waves • Known sensor characteristics deviations
2D filter design in angle and frequency Spatial directivity pattern: Desired spatial directivity pattern: Filter-and-sum configuration • Objective: calculate filters wn[k] such that beamformer performs desired (fixed) spatial and spectral filtering • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion Far-field:- planar waves - equal attenuation
amplitude and phase Double integrals only need to be calculated once Design procedures • Design filter w such that spatial directivity pattern optimally fits minimisation of cost function • Broadband problem: no design for separate frequencies i design over complete frequency-angle region • No approximations of integrals by finite Riemann-sum • Microphone configuration not included in optimisation • Cost functions: • Least-squares quadratic function • Non-linear cost function iterative optimisation = complex! • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion [Kajala 99]
reference point required Design procedures • 2 non-iterative cost functions, based on eigenfilters: • Eigenfilters: 1D and 2D FIR filter design • Extension to design of broadband beamformers • Novel cost functions: • Conventional eigenfilter technique (G)EVD • Eigenfilter based on TLS-criterion GEVD • Conclusion:TLS-eigenfilter preferred non-iterative design procedure • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion [Vaidyanathan 87, Pei 01]
dB dB dB Freq (Hz) Freq (Hz) Freq (Hz) Angle (deg) Angle (deg) Angle (deg) Simulations Delay-and-sum • Parameters: • N=5, d=4cm • L=20, fs=8kHz • Pass: 40o-80o • Stop: 0o-30o + 90o-180o
Finite number (R) of distances Deviation for other distances Trade-off performance for different distances Near-field configuration • Near-field: spherical waves + attenuation • Ultimate goal: design for all distances • One specific distance: very similar to far-field design (different calculation of double integrals) • Several distances: trivial extension for most cost functions, for TLS-eigenfilter = sum of generalised Rayleigh-quotients • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion Take into account distance rbetween speaker - microphones
dB dB dB dB Far-field design Angle (deg) Angle (deg) Angle (deg) Angle (deg) Frequency (Hz) Frequency (Hz) Frequency (Hz) Frequency (Hz) Mixed near-field far-field Simulations • Parameters: • N=5, d=4cm • L=20, fs=8kHz • Pass: 70o-110o • Stop: 0o-60o + 120o-180o
Measurement or calibration procedure Incorporate specific (random) deviations in design Robust broadband beamforming • Small deviations from the assumed microphone characteristics (gain, phase, position) large deviations from desired directivity pattern, especially for small-size microphone arrays • In practice microphone characteristics are never exactly known • Consider all feasible microphone characteristics and optimise • average performance using probability as weight • requires knowledge about probability density functions • worst-case performance minimax optimisation problem • finite grid of microphone characteristics high complexity • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Simulations • Non-linear design procedure • N=3, positions: [-0.01 0 0.015] m, L=20, fs=8 kHz • Passband = 0o-60o, 300-4000 Hz (endfire)Stopband = 80o-180o, 300-4000 Hz • Robust design - average performance:Uniform pdf = gain (0.85-1.15) and phase (-5o-10o) • Deviation = [0.9 1.1 1.05] and [5o -2o 5o] • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
dB dB dB dB Angle (deg) Angle (deg) Angle (deg) Angle (deg) Frequency (Hz) Frequency (Hz) Frequency (Hz) Frequency (Hz) Simulations • Introduction • Basic principles • Beamforming -Design -Robustness • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Simulations 27
Overview • Introduction • Basic principles • Robust broadband beamforming • Multi-microphone optimal filtering • GSVD-based optimal filtering technique • Reduction of computational complexity • Simulations • Acoustic transfer function estimation and dereverberation • Conclusion and further research
Robustness Multi-microphone Signal-dependent Minimise MSE Multi-channel Wiener Filter • Speech and noise independent • 2nd order statistics noise stationary estimate during noise periods (VAD) Multi-microphone optimal filtering Objective: optimal estimate of speech components in microphone signals • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion No a-priori assumptions
coloured noise! Low-rank model Signal-dependent FIR-filterbank Multi-microphone optimal filtering • Implementation procedure: • based on Generalised Eigenvalue Decomposition (GEVD) • take into account low-rank model of speech • trade-off between noise reduction and speech distortion • QRD [Rombouts 2002], subband[Spriet 2001] lower complexity • Generalised Eigenvalue Decomposition (GEVD): • Speech detection mechanism is the only a-priori assumption:required for estimation of correlation matrices • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion
estimation error: • General class:noise reduction speech distortion • =1 : MMSE (equal importance) • <1 : less speech distortion, less noise reduction • >1 : more speech distortion, more noise reduction [Ephraim 95] General class of estimators • Multi-channel Wiener filter: always combination of noise reduction and (linear) speech distortion: • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion speech distortion residual noise
Speech Noise Frequency-domain analysis • Decomposition in spectral and spatial filtering term • Desired beamforming behaviour for simple scenarios • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion spectral filtering(PSD) spatial filtering(coherence)
Real-time implementation possible (N = 4, L = 20, M=80, fs = 16 kHz, P = 4000, Q = 20000) Complexity reduction • Recursive version: each time step calculation GSVD + filter • Complexity reduction using: • Recursive techniques for recomputing GSVD [Moonen 90] • Sub-sampling (stationary acoustic environments) • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion High computational complexity
Adaptive filter Speechreference S S delay Noise reference(s) Optimalfilter – S + Increase noise reduction performance Complexity reduction by using shorter filters Complexity reduction • Incorporation in ‘Generalised Sidelobe Canceller’ (GSC) structure: adaptive beamforming • Creation of speech reference and noise reference signals • Standard multi-channel adaptive filter (LMS, APA) • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion
Simulations • N=4, SNR=0 dB, 3 noise sources (white, speech, music), fs=16 kHz • Performance: improvement of signal-to-noise ratio (SNR) • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion 15 Delay-and-sum beamformer GSC (LANC=400, noise ref=Griffiths-Jim) Recursive GSVD (L=20, no ANC) Recursive GSVD (L=20, LANC=400, all nref) 10 Unbiased SNR (dB) 5 0 0 500 1000 1500 Reverberation time (msec)
0 -5 -10 -15 -20 -25 -30 Simulations • N=4, SNR=0 dB, 3 noise sources, fs=16 kHz, T60=300 msec • ‘Power Transfer Functions’ (PTF) for speech and noise component • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion Speech Noise Spectrum (dB) Recursive GSVD (L=20, no ANC) Recursive GSVD (L=20, LANC=400, all noise ref) 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz)
Conclusions • GSVD-based optimal filtering technique: • Multi-microphone extension of single-microphone subspace-based enhancement techniques • Signal-dependent low-rank model of speech • No a-priori assumptions about position of speaker and microphones • SNR-improvementhigher than GSC for all reverberation times and all considered acoustic scenarios • More robust to deviations from signal model: • Microphone characteristics • Position of speaker • VAD: only a-priori information! • No effect on SNR-improvement • Limited effect on speech distortion • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion
Advantages - Disadvantages • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering -Optimal filtering -Complexity -Simulations • Transfer functionestimation and dereverberation • Conclusion
Overview • Introduction • Basic principles • Robust broadband beamforming • Multi-microphone optimal filtering • Acoustic transfer function estimation and dereverberation • Time-domain technique • Frequency-domain technique • Combined noise reduction and dereverberation • Conclusion and further research
Time-domain Frequency-domain Blind estimation of acoustic impulse responses Source localisation Noise reduction and dereverberation Dereverberation S Objective • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation -Time-domain -Frequency-domain -Dereverberation • Conclusion
Signals Null-space Y0(z) H0(z) -H1(z) E(z) E(z) ±α 0 S(z) S Y1(z) H1(z) H0(z) ±α Time-domain techniques • Signal model for N=2 and no background noise • Subspace-based technique: impulse responses can be computed from null-space of speech correlation matrix • Eigenvector corresponding to smallest eigenvalue • Coloured noise: GEVD • Problems occuring in time-domain technique: • sensitivity to underestimation of impulse response length • low-rank model in combination with background noise • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation -Time-domain -Frequency-domain -Dereverberation • Conclusion
Stochastic gradient algorithm • Batch estimation techniques form basis for deriving adaptive stochastic gradient algorithm • Usage : • Estimation of partial impulse responses time-delay estimation for acoustic source localisation • For source localisation adaptive GEVD algorithm is more robust than adaptive EVD algorithm (and prewhitening) in reverberant environments with a large amount of noise • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation -Time-domain -Frequency-domain -Dereverberation • Conclusion
Frequency-domain techniques • Problems of time-domain technique frequency-domain • Signal model: rank-1 model • Estimation of acoustic transfer function vector H() from GEVD of correlation matrices and • Corresponding to largest generalised eigenvalue no stochastic gradient algorithm available (yet) • Unknown scaling factor in each frequency bin: • can be determined only if norm is known • algorithm only useful when position of source is fixed (e.g. desktop, car) • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation -Time-domain -Frequency-domain -Dereverberation • Conclusion
Residual noise Combined noise reduction and dereverberation • Filtering operation in frequency domain: • Dereverberation: normalised matched filter • Combined noise reduction and dereverberation:Z() is optimal (MMSE) estimate of S() • Optimal estimate of s[k] integration of multi-channel Wiener-filter with normalised matched filter • Trade-off between both objectives • Implementation: overlap-save • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation -Time-domain -Frequency-domain -Dereverberation • Conclusion
Simulations • N=4, d=2 cm, fs=16 kHz, SNR=0 dB, T60= 400 msec • FFT-size L=1024, overlap R=16 • Performance criteria: • Signal-to-noise ratio (SNR) • Dereverberation-index (DI) : • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation -Time-domain -Frequency-domain -Dereverberation • Conclusion
Simulations • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation -Time-domain -Frequency-domain -Dereverberation • Conclusion
Single-microphone techniques: spectral information Standard beamforming: a-priori assumptions No a-priori assumptions Signal-dependent Multi-microphone Multi-microphone optimal filtering Conclusion • Low signal quality due to background noise and reverberation signal enhancement to improve speech intelligibility and ASR performance • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion Robust broadband beamforming Blind transfer function estimation and dereverberation
Contributions • Robust broadband beamforming: • novel cost functions for broadband far-field design (non-linear, eigenfilter-based) • extension to near-field and mixed near-field far-field • 2 procedures for robust design against gain and phase deviations • GSVD-based optimal filter technique for multi-microphone noise reduction: • extension of single-microphone subspace-based techniques multiple microphones • integration in GSC-structure • better performance and robustness than beamforming • Acoustic transfer function estimation and dereverberation: • stochastic gradient algorithm for estimation of time-delay and acoustic source localisation (coloured noise) • combined noise reduction and dereverberation in frequency-domain • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Further research • Combination of multi-channel Wiener-filter and fixed beamforming: • Low SNR: VAD fails poor performance of Wiener-filter • Combined technique: more robust when VAD fails, better performance than fixed beamformers in other scenarios • Acoustic transfer function estimation and dereverberation: • Time-domain: underlying reason for high sensitivity • Frequency-domain: unknown scaling factor BSS ? • other blind identification techniques (LP, NL Kalman-filtering) • Further complexity reduction of multi-channel optimal filtering technique • Stochastic gradient algorithms • Subband/frequency-domain • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion
Relevant publications • S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and multimicrophone speech enhancement,” IEEE Trans. Signal Processing, vol. 50, no. 9, pp. 2230-2244, Sep. 2002. • S. Doclo and M. Moonen, “Multi-Microphone Noise Reduction Using Recursive GSVD-Based Optimal Filtering with ANC Postprocessing Stage,” Accepted for publication in IEEE Trans. Speech and Audio Processing, 2003. • S. Doclo and M. Moonen, “Robust adaptive time delay estimation for speaker localisation in noisy and reverberant acoustic environments, EURASIP Journal on Applied Signal Processing, Sep. 2003. • S. Doclo and M. Moonen, “Combined frequency-domain dereverberation and noise reduction technique for multi-microphone speech enhancement,” in Proc. Int. Workshop on Acoustic Echo and Noise Control (IWAENC), Darmstadt, Germany, Sep. 2001, pp. 31-34. • S. Doclo and M. Moonen, “Design of far-field and near-field broadband beamformers using eigenfilters,” Accepted for publication in Signal Processing, 2003. • S. Doclo and M. Moonen, “Design of robust broadband beamformers for gain and phase errors in the microphone array characteristics,” IEEE Trans. Signal Processing, Oct. 2003. • Introduction • Basic principles • Beamforming • Multi-microphoneoptimal filtering • Transfer functionestimation and dereverberation • Conclusion Available at http://www.esat.kuleuven.ac.be/~doclo/publications.html