460 likes | 972 Vues
Wavelet-Based Speech Enhancement. Course Project Presentation 1. Mahdi Amiri April 2003 Sharif University of Technology. Presentation Outline. Motivation and Goals Wavelet Transform - Overview Basic Denoising in Wavelet Domain Literature Survey Implementation and Results
E N D
Wavelet-Based Speech Enhancement Course Project Presentation 1 Mahdi Amiri April 2003 Sharif University of Technology
Presentation Outline • Motivation and Goals • Wavelet Transform - Overview • Basic Denoising in Wavelet Domain • Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement
Motivation and Goals Key Applications • Improving perceptual quality of speech • Reduce listener’s fatigue • Hearing aids • Improving performance of • Speech coders • Voice recognition systems Wavelet-Based Speech Enhancement
Motivation and Goals Goals of SE in Wavelet Domain • Variable window size for different frequency components • Long time intervals precise low frequency info. • Short time intervals precise high frequency info. • Easy to implement • Fast WT computation complexity: O(n) • FFT computation complexity: O(nlog2n) • Denoising by simple thresholding • Real-time implementation Wavelet-Based Speech Enhancement
Wavelet Transform - Overview • Motivation and Goals Wavelet Transform - Overview • Basic Denoising in Wavelet Domain • Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement
Wavelet Transform - Overview History • Fourier (1807) • Haar (1910) • Math World Wavelet-Based Speech Enhancement
Wavelet Transform - Overview • What kind of Could be useful? • Impulse Function (Haar): Best time resolution • Sinusoids (Fourier): Best frequency resolution • We want both of the best resolutions • Heisenberg (1930) • Uncertainty Principle • There is a lower bound for(An intuitive prove in [Mac91]) Wavelet-Based Speech Enhancement
Wavelet Transform - Overview • Gabor (1945) • Short Time Fourier Transform (STFT) • Disadvantage: Fixed window size Wavelet-Based Speech Enhancement
Wavelet Transform - Overview • Constructing Wavelets • Daubechies (1988) • Compactly Supported Wavelets • Computation of WT Coefficients • Mallat (1989) • A fast algorithm using filter banks Wavelet-Based Speech Enhancement
Wavelet Transform - Overview Multiresolution Signal Representation Coarse version (Approximation) more useful than the Detail • Browsing image databases on the web • Signal transmission for communication • Denoising Wavelet Tree Decomposition • Wavelet Transform (WT) • Undecimated WT (UWT) We may lose what is in the Detail Wavelet-Based Speech Enhancement
Wavelet Transform - Overview Full Tree Decomposition • Wavelet Packet Transform (WPT) • Undecimated WPT (UWPT) S = A1+D1 or S = A1+AD2+DD2 or … Which decomposition path could be the best choice? The answer leads us to the Best Basis Wavelet-Based Speech Enhancement
Wavelet Transform - Overview Best Basis Selection Criterions Cut if: • Entropy • Coifman, Meyer, Wickerhauser (1992) • Rate-Distortion: • Vetterli (1995) Wavelet-Based Speech Enhancement
Basic Denoising in Wavelet Domain • Motivation and Goals • Wavelet Transform - Overview Basic Denoising in Wavelet Domain • Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement
Basic Denoising in Wavelet Domain Principle • Only a few coefficients in the lower bands could be used for approximating the main features of the clean signal. Hence, by setting the smaller coefficients to zero, we can nearly optimally eliminate noise while preserving the important information of clean signal. Wavelet-Based Speech Enhancement
Basic Denoising in Wavelet Domain Notation • Clean signal • Noise signal • Noisy signal Time domain Wavelet domain Wavelet-Based Speech Enhancement
Basic Denoising in Wavelet Domain Algorithm • Framing input noisy signal • Forward WT of a frame • Thresholding (detail) wavelet coefficients • Inverse WT • Keep center part of the frame • Repeat for all of the frames Wavelet-Based Speech Enhancement
Basic Denoising in Wavelet Domain Threshold Value VisuShrink [DonJ94b] Threshold Estimation of Noise variance Frame length For Gaussian white noise: Another definition (wden.m): MAD: Median Absolute Difference Wavelet-Based Speech Enhancement
Basic Denoising in Wavelet Domain Threshold Value Threshold in the WPT case For the correlated noise situation:Use level dependent threshold (SureShrink [DonJ94b]) Wavelet-Based Speech Enhancement
Basic Denoising in Wavelet Domain How to Threshold Hard Thresholding Soft Thresholding Comparison: Discontinuity Alteration of values Wavelet-Based Speech Enhancement
Literature Survey • Motivation and Goals • Wavelet Transform - Overview • Basic Denoising in Wavelet Domain Literature Survey • Implementation and Results • Conclusions and Future Works Wavelet-Based Speech Enhancement
Literature Survey [SeoB97], Novelty • Title: • Speech enhancement with reduction of noise components in the wavelet domain • Novelty: • Semisoft thresholding [GaoB95] • Classification of unvoiced region in WD • Different thresholding for unvoiced region Wavelet-Based Speech Enhancement
Literature Survey [SeoB97], Thresholding • Semisoft Thresholding: [GaoB95] • Less sensitivity to small perturbations in the data • Smaller bias Hard Soft Semisoft Like [DonJ94b] Wavelet-Based Speech Enhancement
Literature Survey [SeoB97], Unvoiced Regions • Separation of unvoiced region • Use DWT for finding • Calculate average energy of each subband • Current speech segment is unvoiced if: Wavelet-Based Speech Enhancement
Literature Survey [SeoB97], Implementations • If unvoiced then threshold just highest frequency band • Implementation results • Additive white Gaussian noise • SNR (-10dB 10 dB) • “Should we chase those cowboys?” Wavelet-Based Speech Enhancement
Literature Survey [SooKY97], Novelty • Title: Wavelet for speech denoising • Novelty: • Evaluation of different wavelets and different orders (db1-10, coif1-5, sym2-8, bior1.3-6.8) • Spectral Subtraction in WD • Wiener Filtering in WD (Uses two methods for estimating the a priori SNR) • Maximum Likelihood approach • Decision Directed approach Wavelet-Based Speech Enhancement
Literature Survey [SooKY97], Thresholding 1 Use DWT and find L levels of decomposition 1. Spectral Subtraction (SS) in WD if then Use similar scheme for Denoised value else Denoised value Expected value of the noise magnitude, could be estimated from silence frames Wavelet-Based Speech Enhancement
Literature Survey [SooKY97], Thresholding 2 2. Wiener Filtering in WD is the a priori SNR Estimating a. Maximum Likelihood b. Decision Directed [0, 1], Typ. 0.9 Wavelet-Based Speech Enhancement
Literature Survey [SooKY97], Implementations • Implementation results • White Gaussian noise • Both male and female voices • 10 levels of decomposition Wavelet-Based Speech Enhancement
Literature Survey [SooKY97], Conclusions • The methods are not particularly sensitive to the various wavelet types with the exception of Bior3.1 • Wiener filtered speeches have better SNR values than Magnitude subtraction • For Wiener filtering, the decision directed approach gives better SNR values than the maximum likelihood approach Wavelet-Based Speech Enhancement
Literature Survey [KimYK01], Novelty • Title: • Speech enhancement using adaptive wavelet shrinkage • Novelty: • Adaptive threshold value • Threshold value will depend on the variance of estimated clean signal (BayesShrink) • Classification of unvoiced region using entropy • Applies smaller threshold for unvoiced region and calls the method as “Adaptive BayesShrink” Wavelet-Based Speech Enhancement
Literature Survey [KimYK01], Threshold Value • BayesShrink: Adaptive threshold value for minimizingthe Bayesian riskis • Thus, finds the estimated threshold value as Where [ChaYV00a] Wavelet-Based Speech Enhancement
Literature Survey [KimYK01], Unvoiced Regions • Current region is unvoiced if • Unvoiced region has smaller energy, so apply a smaller threshold: are selected by simulation There was no comment about type of entropy,it could be as: Wavelet-Based Speech Enhancement
Literature Survey [KimYK01], Implementations • Implementation results: • Additive white Gaussian noise • SNR: 0db, 10dB and 20dB Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Novelty • Title: Speech enhancement for non-stationary noise environment by adaptive wavelet packet • Novelty: • Node dependent thresholding for adaptation in colored or non-stationary noise • Noise estimation based on spectral entropy not MAD • Modified hard thresholding to alleviate time-frequency discontinuities Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Threshold Value • Create WPT and find best basis tree’s leaf nodes • Node dependent thresholding • Noise estimation could be like:or the following proposed method Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Noise Estimation • Estimate spectral pdf of wavelet packet coefficients through B bins histogram • Calculate normalized spectral entropy for each node in adapted wavelet packet tree Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Noise Estimation (cont.) • Estimate spectral magnitude intensity by histogram • Define an auxiliary threshold • Estimate standard deviation of noise # of C. with magnitude equal to or greater than bin’s amplitude node_length bins of C. magnitudes Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Noise Estimation (cont.) Greater disorder of wavelet coefficients (less voiced, more unvoiced) More uniform spectral pdf Bigger values for entropy (0 1) Bigger value for alpha Smaller # of bins bigger than alpha Smaller estimation for standard deviation of noise Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Thresholding ModifiedHard Thresholding Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Implementations • Implementation results: • Pink noise, SNR: -5db ~ 15 dB Subjective tests were in favor of the level dependent thresholding but not every time!Anyway, the proposed method has better spectral performance (spectrogram) Wavelet-Based Speech Enhancement
Literature Survey [ChaKYK02], Implementations (cont.) • SNR (dB) test for various noisy speech: “We like bleu cheese but Victor prefers swiss cheese.” (SNR= 10dB) Wavelet-Based Speech Enhancement
Literature Survey … • To be continued… Thank You. Wavelet-Based Speech Enhancement
References (1 of 2) Wavelet-Based Speech Enhancement
References (2 of 2) Wavelet-Based Speech Enhancement
Wavelet-Based Speech Enhancement Course Project Presentation 1 Thank You FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.aictct.com/dml/