Wavelet-Based Speech Enhancement

Wavelet-Based Speech Enhancement Sharif University of Technology

Presentation Outline • Motivation and Goals • Wavelet Transform - Overview • Basic Denoising in Wavelet Domain • Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement

Motivation and Goals Key Applications • Improving perceptual quality of speech • Reduce listener’s fatigue • Hearing aids • Improving performance of • Speech coders • Voice recognition systems Wavelet-Based Speech Enhancement

Motivation and Goals Goals of SE in Wavelet Domain • Variable window size for different frequency components • Long time intervals  precise low frequency info. • Short time intervals  precise high frequency info. • Easy to implement • Fast WT computation complexity: O(n) • FFT computation complexity: O(nlog2n) • Denoising by simple thresholding • Real-time implementation Wavelet-Based Speech Enhancement

Wavelet Transform - Overview • Motivation and Goals Wavelet Transform - Overview • Basic Denoising in Wavelet Domain • Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement

Wavelet Transform - Overview History • Fourier (1807) • Haar (1910) • Math World Wavelet-Based Speech Enhancement

Wavelet Transform - Overview • What kind of Could be useful? • Impulse Function (Haar): Best time resolution • Sinusoids (Fourier): Best frequency resolution • We want both of the best resolutions • Heisenberg (1930) • Uncertainty Principle • There is a lower bound for(An intuitive prove in [Mac91]) Wavelet-Based Speech Enhancement

Wavelet Transform - Overview • Gabor (1945) • Short Time Fourier Transform (STFT) • Disadvantage: Fixed window size Wavelet-Based Speech Enhancement

Wavelet Transform - Overview • Constructing Wavelets • Daubechies (1988) • Compactly Supported Wavelets • Computation of WT Coefficients • Mallat (1989) • A fast algorithm using filter banks Wavelet-Based Speech Enhancement

Wavelet Transform - Overview Multiresolution Signal Representation Coarse version (Approximation) more useful than the Detail • Browsing image databases on the web • Signal transmission for communication • Denoising Wavelet Tree Decomposition • Wavelet Transform (WT) • Undecimated WT (UWT) We may lose what is in the Detail Wavelet-Based Speech Enhancement

Wavelet Transform - Overview Full Tree Decomposition • Wavelet Packet Transform (WPT) • Undecimated WPT (UWPT) S = A1+D1 or S = A1+AD2+DD2 or … Which decomposition path could be the best choice? The answer leads us to the Best Basis Wavelet-Based Speech Enhancement

Wavelet Transform - Overview Best Basis Selection Criterions Cut if: • Entropy • Coifman, Meyer, Wickerhauser (1992) • Rate-Distortion: • Vetterli (1995) Wavelet-Based Speech Enhancement

Basic Denoising in Wavelet Domain • Motivation and Goals • Wavelet Transform - Overview Basic Denoising in Wavelet Domain • Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement

Basic Denoising in Wavelet Domain Principle • Only a few coefficients in the lower bands could be used for approximating the main features of the clean signal. Hence, by setting the smaller coefficients to zero, we can nearly optimally eliminate noise while preserving the important information of clean signal. Wavelet-Based Speech Enhancement

Basic Denoising in Wavelet Domain Notation • Clean signal • Noise signal • Noisy signal Time domain Wavelet domain  Wavelet-Based Speech Enhancement

Basic Denoising in Wavelet Domain Algorithm • Framing input noisy signal • Forward WT of a frame • Thresholding (detail) wavelet coefficients • Inverse WT • Keep center part of the frame • Repeat for all of the frames Wavelet-Based Speech Enhancement

Basic Denoising in Wavelet Domain Threshold Value VisuShrink [DonJ94b] Threshold Estimation of Noise variance Frame length For Gaussian white noise: Another definition (wden.m): MAD: Median Absolute Difference Wavelet-Based Speech Enhancement

Basic Denoising in Wavelet Domain Threshold Value Threshold in the WPT case For the correlated noise situation:Use level dependent threshold (SureShrink [DonJ94b]) Wavelet-Based Speech Enhancement

Basic Denoising in Wavelet Domain How to Threshold Hard Thresholding Soft Thresholding Comparison: Discontinuity Alteration of values Wavelet-Based Speech Enhancement

Literature Survey • Motivation and Goals • Wavelet Transform - Overview • Basic Denoising in Wavelet Domain Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement

Literature Survey [SeoB97], Novelty • Title: • Speech enhancement with reduction of noise components in the wavelet domain • Novelty: • Semisoft thresholding [GaoB95] • Classification of unvoiced region in WD • Different thresholding for unvoiced region Wavelet-Based Speech Enhancement

Literature Survey [SeoB97], Thresholding • Semisoft Thresholding: [GaoB95] • Less sensitivity to small perturbations in the data • Smaller bias Hard Soft Semisoft Like [DonJ94b] Wavelet-Based Speech Enhancement

Literature Survey [SeoB97], Unvoiced Regions • Separation of unvoiced region • Use DWT for finding • Calculate average energy of each subband • Current speech segment is unvoiced if: Wavelet-Based Speech Enhancement

Literature Survey [SeoB97], Implementations • If unvoiced then threshold just highest frequency band • Implementation results • Additive white Gaussian noise • SNR (-10dB  10 dB) • “Should we chase those cowboys?” Wavelet-Based Speech Enhancement

Literature Survey [SooKY97], Novelty • Title: Wavelet for speech denoising • Novelty: • Evaluation of different wavelets and different orders (db1-10, coif1-5, sym2-8, bior1.3-6.8) • Spectral Subtraction in WD • Wiener Filtering in WD (Uses two methods for estimating the a priori SNR) • Maximum Likelihood approach • Decision Directed approach Wavelet-Based Speech Enhancement

Literature Survey [SooKY97], Thresholding 1 Use DWT and find L levels of decomposition 1. Spectral Subtraction (SS) in WD if then Use similar scheme for Denoised value  else Denoised value  Expected value of the noise magnitude, could be estimated from silence frames Wavelet-Based Speech Enhancement

Literature Survey [SooKY97], Thresholding 2 2. Wiener Filtering in WD is the a priori SNR Estimating a. Maximum Likelihood b. Decision Directed [0, 1], Typ. 0.9 Wavelet-Based Speech Enhancement

Literature Survey [SooKY97], Implementations • Implementation results • White Gaussian noise • Both male and female voices • 10 levels of decomposition Wavelet-Based Speech Enhancement

Literature Survey [SooKY97], Conclusions • The methods are not particularly sensitive to the various wavelet types with the exception of Bior3.1 • Wiener filtered speeches have better SNR values than Magnitude subtraction • For Wiener filtering, the decision directed approach gives better SNR values than the maximum likelihood approach Wavelet-Based Speech Enhancement

Literature Survey [KimYK01], Novelty • Title: • Speech enhancement using adaptive wavelet shrinkage • Novelty: • Adaptive threshold value • Threshold value will depend on the variance of estimated clean signal (BayesShrink) • Classification of unvoiced region using entropy • Applies smaller threshold for unvoiced region and calls the method as “Adaptive BayesShrink” Wavelet-Based Speech Enhancement

Literature Survey [KimYK01], Threshold Value • BayesShrink: Adaptive threshold value for minimizingthe Bayesian riskis • Thus, finds the estimated threshold value as Where [ChaYV00a] Wavelet-Based Speech Enhancement

Literature Survey [KimYK01], Unvoiced Regions • Current region is unvoiced if • Unvoiced region has smaller energy, so apply a smaller threshold: are selected by simulation There was no comment about type of entropy,it could be as: Wavelet-Based Speech Enhancement

Literature Survey [KimYK01], Implementations • Implementation results: • Additive white Gaussian noise • SNR: 0db, 10dB and 20dB Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Novelty • Title: Speech enhancement for non-stationary noise environment by adaptive wavelet packet • Novelty: • Node dependent thresholding for adaptation in colored or non-stationary noise • Noise estimation based on spectral entropy not MAD • Modified hard thresholding to alleviate time-frequency discontinuities Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Threshold Value • Create WPT and find best basis tree’s leaf nodes • Node dependent thresholding • Noise estimation could be like:or the following proposed method Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Noise Estimation • Estimate spectral pdf of wavelet packet coefficients through B bins histogram • Calculate normalized spectral entropy for each node in adapted wavelet packet tree Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Noise Estimation (cont.) • Estimate spectral magnitude intensity by histogram • Define an auxiliary threshold • Estimate standard deviation of noise # of Coef. with magnitude equal to or greater than bin’s amplitude node_length bins of Coef. magnitudes Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Noise Estimation (cont.) Greater disorder of wavelet coefficients (less voiced, more unvoiced) More uniform spectral pdf Bigger values for entropy (0  1) Bigger value for alpha Smaller # of bins bigger than alpha Smaller estimation for standard deviation of noise Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Thresholding ModifiedHard Thresholding Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Implementations • Implementation results: • Pink noise, SNR: -5dB ~ 15 dB Subjective tests were in favor of the level dependent thresholding but not every time!Anyway, the proposed method has better spectral performance (spectrogram) Wavelet-Based Speech Enhancement

Literature Survey [ChaKYK02], Implementations (cont.) • SNR (dB) test for various noisy speech: “We like bleu cheese but Victor prefers swiss cheese.” (SNR= 10dB) Wavelet-Based Speech Enhancement

Literature Survey … • To be continued… Thank You. Wavelet-Based Speech Enhancement

References (1 of 2) Wavelet-Based Speech Enhancement

References (2 of 2) Wavelet-Based Speech Enhancement

Wavelet-Based Speech Enhancement Course Project Presentation 1 Thank You FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.aictct.com/dml/

Wavelet-Based Speech Enhancement

Wavelet-Based Speech Enhancement

Presentation Transcript

Directional Lifting-Based Wavelet Transform

Wavelet-Based Speech Enhancement

Subspace Methods for Speech Enhancement

Wavelet-based Image Compression

Speech Enhancement

Advanced Speech Enhancement in Noisy Environments

Wavelet Based Image Coding

Bayesian Enhancement of Speech Signals

Speech Enhancement Using Spectral Subtraction and Cascaded-Median Based Noise Estimation

Wavelet-Based Network Traffic Modeling

Speech Enhancement Using Spectral Subtraction

Wavelet Based Color Compression

Speech Enhancement EE 516 Spring 2009

Speech Enhancement

Speech Enhancement using Excitation Source Information

Bayesian Methods for Speech Enhancement

Speech Enhancement for ASR

Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition

Wearable Speech Enhancement

Speech Enhancement through Noise Reduction

Signal Subspace Speech Enhancement

Speech Enhancement Based on Nonparametric Factor Analysis