 Download Download Presentation Speech Processing

# Speech Processing

Télécharger la présentation ## Speech Processing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Speech Processing Short-Time Fourier Transform Analysis and Synthesis

2. Short-Time Fourier Transform Analysis and Synthesis Minimum-Phase Synthesis • Speech & Audio Signals are varying and can be considered stochastic signals that carry information. • This necessitates short-time analysis since a single Fourier transform (FT) can not characterize changes in spectral content over time (i.e., time-varying formants and harmonics) • Discrete-time short-time Fourier transform (STFT) consists of separate FT of the signal in the neighborhood of that instant. • FT in the STFT analysis is replaced by the discrete FT (DFT) • Resulting STFT is discrete in both time and frequency. • Discrete STFT vs. • Discrete-time STFT which is continuous in frequency. • In linear Prediction and Homomorphic Processing, underlying model of the source/filter is assumed. This leads to: • Model based analysis/synthesis, also note that • Analysis methods presented implicitly both used short time analysis methods (to be presented). • In Short-Time Analysis systems no such restrictions apply. Veton Këpuska

3. Short-Time Analysis (STFT) • Two approaches of STFT are explored: • Fourier-transform & • Filterbank Veton Këpuska

4. Fourier-Transform View • Recall (from Chapter 3): • w[n] is a finite-length, symmetrical sequence (i.e., window) of length Nw. • w[n] ≠ 0 for [0, Nw-1] • w[n] – Analysis window or Analysis Filter Veton Këpuska

5. Fourier-Transform View • x[n] – time-domain signal • fn[m]=x[m]w[n-m] - Denotes short-time section of x[m] at point n. That is, signal at the frame n. • X(n,) - Fourier transform of fn[m] of short-time windowed signal data. • Computing the DFT: Veton Këpuska

6. Fourier-Transform View • Thus X(n,k) is STFT for every =(2/N)k • Frequency sampling interval = (2/N) • Frequency sampling factor = N • DFT: Veton Këpuska

7. Fourier-Transform View Veton Këpuska

8. Example 7.1 • Let x[n] be a periodic impulse train sequence: • Also let w[n] be a triangle of length P: … -P P 2P n 3P P/2+1 -P/2 n 0 P-points Veton Këpuska

9. Example 7.1 Non-zero only for m=lP Window located at lP & Linear phase -lP Veton Këpuska

10. Example 7.1 • Since windows w[n] do not overlap, |X(n,)| = constant and ∠X(n,) is linear. • Computation of DFT for N=P gives: 1 DFT of translated, non-overlapping windows with phase shift of zero (due to sampling) Veton Këpuska

11. Spectogram |X(n,)|2 • If analysis window length is ≤ pitch period ⇒ wideband spectrogram⇒ vertical striations • Otherwise⇒ narrowband spectrogram⇒ horizontal striations • How often to apply analysis window to the signal? • X(n,k) is decimated by a temporal decimation factor L: • X(nL,k) = DFT{fnL(m)} • fnL[m] sections are a subset of fn[m] • How to chose sampling rates in time (L) and frequency (N-fft length) it will be addressed in one of the forthcoming sections. Veton Këpuska

12. Analysis window x[m] L w[pL-m] p=3 p=1 p=2 Veton Këpuska

13. Spectrogram |X(n,)|2 Veton Këpuska

14. Fourier-Transform View • Note that in , X(n,) is periodic over 2 (same as Fourier transform) and is Hermetian (H=H’) symmetric. • For real sequences  • Re{X(n,)} or |X(n,)| is symmetric • Im{X(n,)} orarg{X(n,)} is anti-symmetric • A time-shift results in linear phase shift (same as in Fourier Transform): • Thus, a shift by n0 in the original time sequence introduces a linear phase, but also a shift in time, corresponding to a shift in each short-time section by n0. Veton Këpuska

15. Filtering View • In the interpretation w[n] is considered to be a filter whose impulse response is w[n]. • Thus w[n] is referred to as analysis filter. • Let’s fix the value of =o. • The above equation represents the convolution of the sequence x[n]e-jon with the sequence w[n]. Thus: Veton Këpuska

16. Filtering View • The product: x[n]e-jon  Modulation of x[n] up to frequency o. Veton Këpuska

17. Alternate view: The discrete STFT can be also interpreted from the filtering viewpoint. This equation brings the interpretation of the discrete STFT as the output of the filter bank shown in the next slide. Filtering View Veton Këpuska

18. Filtering View Veton Këpuska

19. Filtering View • General Properties: • If x[n] has the length N & w[n] has the length M, then X(n,) has length N+M+1 along n. • The bandwidth of X(n,o) is less than or equal to that of w[n]. • Sequence X(n,o) has its spectrum centered at the origin. Veton Këpuska

20. Example 7.2 • Consider a Gaussian window of the form: • The discrete STFT with DFT length N, therefore, can be considered as a bank of filters with impulse responses: • For x[n]=(n)  x[n]*hk[n]=hk[n] • If N=50, corresponding to bandpass filters spaced by 200 Hz for the sampling rate of 10000 samples/s, then: Veton Këpuska

21. Example 7.2 • For k=0,5,10,15 the following is obtained: Veton Këpuska

22. Example 7.2 Veton Këpuska

23. Example 7.3 • Consider the filter bank of previous example 7.2 that was designed with a Gaussian window of the form: • Figure 7.7 shows the Fourier transform magnitudes of the output of the four complex bandpass filters hk[n] for k=0,5,10, and 15 as presented in previous slide and depicted in the figure 7.6. Veton Këpuska

24. Example 7.3 • After Demodulation the resulting bandpass outputs have the same spectral shape as in the figure but centered at the origin. Veton Këpuska

25. Time-Frequency Resolution Tradeoffs • In Chapter 3 basic issue in analysis window selection is the compromise required between a long window for showing signal detail in frequency and a short window required for representing fine temporal structure: • Since both X() and W() are periodic over 2 linear convolution is essentially circular. • From the equation above: • W() smears (smoothes) X(). • Want W() as narrow as possible ideally W()=() for good frequency resolution. • W()=() will result in a infinitely long w[n]. • Poor time resolution. • Conflicting goal Veton Këpuska

26. Example 7.4 • Figure 7.8 depicts time-frequency resolution tradeoff: Veton Këpuska

27. Time-Frequency Resolution Tradeoffs • From the previous example, smoothing interpretation of STFT is not valid for non-stationary sequences. • For steady signal long analysis windows are appropriate and they yield good frequency resolution as depicted in the next figure. Veton Këpuska

28. Time-Frequency Resolution Tradeoffs • However, for short and transient signals, plosive speech, flaps, diphthongs, etc. , short windows are preferred in order to capture temporal events. • Shorter windows yield poor frequency resolution. Veton Këpuska

29. Short-Time Synthesis • How to obtain original sequence back from its discrete-time STFT? • The inversion is represented mathematically by a synthesis equation which expresses a sequence in terms of its discrete-time STFT. • Recall that for fn[m]=x[m]w[n-m]: • Thus: If w[n]≠0 then recovery is complete. Veton Këpuska

30. Short-Time Synthesis • For each n, we take the inverse Fourier transform of the corresponding function of frequency, then we obtain the sequence fn[m]. • Evaluating fn[m] for m=n the following is obtained: • x[n]w. • For w≠0 x[n] can be obtained by dividing fn[n]/w. • The process of taking the inverse Fourier transform of X(n,) for a specific n and then dividing by w is represented in the following relation:representing synthesis equation for the discrete-time STFT. Veton Këpuska

31. Short-Time Synthesis • In contrast to discrete-time STFT X(n,) the discrete STFT X(n,k) is not always invertible. • Example 1. • Consider the case when w[n] is bandlimited with bandwidth of B. Veton Këpuska

32. Short-Time Synthesis • Note if there are frequency components of x[n] which do not pass through any of the filter regions of the discrete STFT then • it is not a unique representation of x[n], and • x[n] is not invertible. • Example 2. • Consider X(n,k) decimated in time by factor L, i.e., STFT is applied every L samples. • w[n] is non-zero over its length Nw. • If L > Nw then there are gaps in time where x[n] is not represented/considered. • Thus in such cases again x[n] is not invertible. Veton Këpuska

33. L > Nw L x[m] w[pL-m] Nw Veton Këpuska

34. Short-Time Synthesis • Conclusion: • Constraints must be adopted to ensure uniqueness and invertability: • Proper/Adequate frequency sampling: B≥2/Nw (B - Window bandwidth) • Proper Temporal Decimation: L≤Nw Veton Këpuska

35. Filter Bank Summation (FBS) Method • Traditional short-time synthesis method that is commonly referred to as the Filter Bank Summation (FBS). • FBS is best described in terms of the filtering interpretation of the discrete STFT. • The discrete STFT is considered to be the set of outputs of a bank of filters. • The output of each filter is modulated with a complex exponential • Modulated filter outputs are summed at each instant of time to obtain the corresponding time sample of the original sequence (see Figure 7.5(b) in the slide 18). Veton Këpuska

36. Filter Bank Summation (FBS) Method • Recall the synthesis equation given earlier: • FBS method carries out discrete version of this equation by utilizing discrete STFT X(n,k): • Derive conditions such that to ensure that y[n] x[n]. Veton Këpuska

37. 1 Analysis followed by synthesis y[n] x[n] Filter Bank Summation (FBS) Method • From Figure 7.5 • Thus: Interchanging summation operation this equation reduces to: Veton Këpuska

38. Filter Bank Summation (FBS) Method • Furthermore Veton Këpuska

39. Filter Bank Summation (FBS) Method • Thus:y[n] is the output of the convolution of x[n] with a product of the analysis window with a periodic impulse sequence. • Note:reduces to [n] if: • Window length Nw≤N, or • For Nw>N, must have w[rN]=0 for r≠0, that is Veton Këpuska

40. Filter Bank Summation (FBS) Method Veton Këpuska

41. Filter Bank Summation (FBS) Method • This constraint is known as the FBS constraint. • It must be fulfilled in order to ensure exact signal synthesis with the FBS method. • This constrained is commonly expressed in frequency domain: • This expression states that the frequency responses of the analysis filters should sum to a constant across the entire bandwidth. • We will conclude this discussion by stating that a filter bank with N filters, based on an analysis filter of length less than or equal to N, is always an all-pass system. Veton Këpuska

42. Generalized FBS Method • Note: • “Smoothing” function f[n.m] is referred to as the time-varying synthesis filter. • It can be shown that any f[n,m] that fulfills the condition below makes the synthesis equation above valid (Exercise 7.6): • Note also that basic FBS method can be obtained by setting the synthesis filter to be a non-smoothing filter: f[n,m]=[m] Veton Këpuska

43. Generalized FBS Method • Consider the discrete STFT with decimation factor L. Generalized FSB of the synthesized signal is given by: • Furthermore, consider time invariant smoothing filter: f[n,m]=f[m] • That is: f[n,n-rL]=f[n-rL] Veton Këpuska

44. Generalized FBS Method • Thus • This equation holds when the following constrain is satisfied by the analysis and synthesis filters as well as the temporal decimation and frequency sampling factors: • For f[m]=[m] and L=1 this method reduces to the basic FBS method. Veton Këpuska

45. Generalized FBS Method • Interested in L>1 case and in using f[n] as interpolator.  Interpolation FBS Methods: • Helical Interpolation (Partnoff) • Weighted Overlap-add Method (Croshiere) Veton Këpuska

46. Overlap-Add (OLA) Method • FBS Method was motivated from the filtering view of the STFT • OLA method was motivated from the Fourier transform view of the STFT. • In the OLA method: • Inverse DFT for each fixed time in the discrete STFT is taken, • Overlap and add operation between the short-time section is performed, • This works provided that analysis window is designed such that the overlap and add operation effectively eliminates the analysis window from the synthesized sequence. • Basic idea is that the redundancy within overlapping segments and the averaging of the redundant samples remove the effect of windowing. Veton Këpuska

47. Overlap-Add (OLA) Method • Recall the short-time synthesis relation: • If x[n] is averaged over many short-time segments and normalized by W(0) thenwhere Veton Këpuska

48. Overlap-Add (OLA) Method • Discretized version of OLA is given by: • Note that the above IDFT is true provided that N>Nw. The expression for y[n] thus becomes: • Which provided that:then y[n]=x[n] Always True because sum of values of a sequence must always equal the first value of its Fourier transform (D.C. Energy of a signal is by definition sum of signal values) Veton Këpuska

49. Overlap-Add (OLA) Method • For decimation in time by factor of L, it can be shown (Exercise 7.4) that: • Then x[n] can be synthesized using the following equation: • The above equation depicts general constrain imposed by OLA method. It requires that the sum of all the analysis windows (obtained by sliding w[n] with L-point increments) to add up to a constant as shown in the next figure. Veton Këpuska

50. Overlap-Add (OLA) Method Veton Këpuska