Speech Enhancement Using Spectral Subtraction

Speech Enhancement Using Spectral Subtraction Presentation by Sevakula Rahul Kumar (5464) • under the guidance of Dr. Kishore Kumar

GOALS • Study of Speech Enhancement using Spectral Subtractions. • Simulating the algorithm’s effect of removal of noise using modifications of Magnitude averaging and Half Wave Rectification to reduce Spectral error . • Study of how to implement it practically with hardware.

Assumptions : • Background Noise added acoustically and digitally to the speech. • The background noise environment remains locally stationary to the degree that its spectral magnitude expected value just prior to speech activity equals its expected value during speech activity

Theory • Additive Noise Model: x(k) = s(k) + n(k) Taking the Fourier transform gives X (e jω) = S (e jω) + N (e jω) where x(k) ↔ X (ejω)

Spectral Subtractor Estimator: • The spectral subtraction filter H(ejω) is calculated by replacing the noise spectrum N(ejω) with spectra which can be readily measured. The Magnitude |N(ejω)| of N(ejω) is replaced by its average value μ(ejω) taken during non-speech activity • These substitutions result in the spectral subtraction estimator

Spectral Error: Modifications to reduce the auditory effects of the spectral error include: a) Magnitude averaging-Local averaging of spectral magnitudes can be used to reduce the error. Replacing |X(ejω)| with where M is the number of frames over which averaging is done where, The sample mean of |N (e jω)| will converge to μ(e jω) as a longer average is taken.

b) Half-wave rectification-Wherever the signal spectrum magnitude |X (e jω)| is less than the average noise spectrum magnitude μ(e jω), the output is set to zero. This modification can be simply implemented by half-wave rectifying H (e jω). The estimator then becomes The advantage of half-wave rectification is that the noise floor is reduced by μ(ejω The disadvantage of half-wave rectification can exhibit itself in the situation where the sum of the noise plus speech at a frequency ω is less than μ(e jω)

c) Residual noise reduction- In the absence of speech activity the difference NR = N - μe jθN , which shall be called the noise residual The residual noise reduction scheme is implemented as where, and max = maximum value of noise residual measured during non-speech activity.

d) Additional signal attenuation during non-speech activity- The energy content of relative toprovides an accurate indicator of the presence of speech activity within a given analysis frame Empirically, it is determined that the average (before versus after) power ratio is down at least 12 dB. This implies a measure for detecting the absence of speech given by During the absence of speech activity there are at least three options prior to resynthesis: do nothing, attenuate the output by a fixed factor, or set the output to zero. Thus, the output spectral estimate including output attenuation during non-speech activity is given by

Algorithm Implementation • Input-Output Data Buffering: • Voice Activity Detection: To obtain the noise characteristics it is essential to find the pauses in speech activity. The method that is used here is to make decisions based on compound parameter using various parameters like energy, zero crossing rate and the normalized linear prediction error. In each speech frame the energy in the frame, E, the linear prediction error normalized with respect to the energy of the signal, LPE, and the zero crossing rate, ZCR, are calculated. In general, the frames that contain speech have more energy than those that do not contain speech. However, this method of distinguishing between frames fails at low SNR, where the noise energy is comparable to the signal energy. The zero crossing rate is evidently quite high for noise compared to speech. Using all these three parameters, a compound parameter, D, is calculated as Then the value of is used to determine whether a signal has speech activity or not. The threshold values for the input signal have to be obtained empirically. The frames are thus classified as speech and nonspeech frames.

Spectral Error Reduction: In this we apply Magnitude Averaging, Bias Estimation, Bias Removal and Half Wave Rectification, Residual Noise Reduction and Additional Noise Suppression during Non Speech Activity methods . • Synthesis:

Conclusions It can be concluded that this method improves the intelligibility of noisy signals even at low SNR. However, the presence of musical noise is intolerable to the human auditory system. This necessitates the development of a better algorithm which is able to mask this musical noise Suggestions for future work Speech Enhancement using better algorithms like Signal Subspace Approach, Energy Constrained Signal Subspace Approach and Signal/Noise KLT Approach A problem which is common to the approaches to speech enhancement developed in this project and also in general is the non stationary behavior of the energy of the residual noise, i.e., the non uniformity of the residual noise from frame-to-frame.

Questions ???

Speech Enhancement Using Spectral Subtraction