1 / 33

Time-Scale Modification of Speech Signal

Time-Scale Modification of Speech Signal. SOLAFS (Synchronized Overlap-Add, Fixed Synthesis). Overview. Introduction Overview of the methods Basic Idea SOLAFS Method Matlab Code The results Conclusion. Introduction.

luke
Télécharger la présentation

Time-Scale Modification of Speech Signal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time-Scale Modification of Speech Signal SOLAFS (Synchronized Overlap-Add, Fixed Synthesis)

  2. Overview • Introduction • Overview of the methods • Basic Idea • SOLAFS Method • Matlab Code • The results • Conclusion

  3. Introduction • There are a large number of applications to modify the time-scale of speech, music or other acoustic material. • Without modifying the pitch. • To speed up or slow down the speech • No Donald Duck or Minnie Mouse effects.

  4. Introduction • TSM-Time scale modification refer to changing the reproduction rate of a signal. • Two primary operation involved - time-scale expansion -slow down - time-scale compression -speed up

  5. Introduction original expansion compression

  6. Overview of methods • Time- scale modification utilizes three basic methods: - frequency domain processing methods - analysis/synthesis methods - time-domain processing methods • SOLAFS is a time-domain processing method.

  7. Basic Idea • SOLAFS is an improvement of the prior SOLA method( Synchronized overlap-add). • SOLA consists of -shifting the beginning of a new speech segment over the end of the preceding segment to find the point of the highest cross-correlation. -when found it, the frame are overlapped and average together.

  8. SOLAFS There are 4 parameters • Window length (W) - smallest unit of input signal that is manipulated by the method • Analysis shift (Sa ) - inter-frame interval between successive search ranges for analysis windows along the input signal • Synthesis shift(Ss) - inter-frame interval between successive analysis windows along the output signal • Shift search interval(kmax) - the duration of the interval over which an analysis window may be shifted for purpose of aligning it with the region of the output signal it will overlap.

  9. SOLAFS • The four parameters used in the SOLAFS

  10. Analysis The analysis windows are chosen as follows: where m = a window index, i.e. it refers to the mth window n = a sample index in an input buffer for the input signal, which buffer is W samples longkm = the number of samples of shift for the mth window xm[n] = the nth sample in the mth analysis window

  11. Analysis • The analysis windows are then used to form the output signal y[i] recursively in accordance to the following: where: Wov= W –Ssis the number of points in the overlap region b[n] = an overlap-add weighting function which is referred to as a fading factor – an averaging function, a linear fade function, and so forth.

  12. Analysis • Calculation for km km is an optimal shift that is determined by the normalized cross-correlation between x and y in the overlap region. where kmax is the maximum allowable shift from the initial string position of the analysis window

  13. Kmcan be often predicted without computation of the similarity. • The mthshift, km, should be determined by: if otherwise

  14. Implement in MATLAB There are 7 steps as follows; 1. As an initialization step , take W samples from the input signal, which samples are stored in an input signal buffer, and place them in an output sample buffer for the output signal. 2. find the start of the first analysis window mSa.

  15. Implement in MATLAB 3. Next, find the maximum similarity between the first Wovsamples at the start of the analysis window and at the end of the output signal by computing the cross-correlation between the samples from the start of the analysis window, and the samples from the end of the output window.

  16. Implement in MATLAB 4. We shift the start of the analysis window by one or two and repeat step 3. 5. Steps 3 and 4 are repeated until we have shifted the analysis window by the maximum amount of kmax that is allowed.

  17. Implement in MATLAB 6. If the maximum cross-correlation occurs for a certain shift of the analysis window, overlap-add the last Wov samples of the output signal and the first Wovsamples of the shifted analysis window, and transfer W – Wovfurther samples into the output buffer.

  18. Implement in MATLAB 7. Steps 2 – 7 are repeated by choosing the next analysis window, until the input signal reaches its end.

  19. Parameter choices • The smallest useful synthesis shift is Ss= Wov • The smallest useful window length is W = 2Wov • Kmax = 2W

  20. MATLAB %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Project Spring 2005 %% Rachan Fugcharoen ECE5525 % % Do SOLAFS timescale mod'n % % Y is X scaled to run F x faster. X is added-in in windows % % W pts long, overlapping by Wov points with the previous output. % % The similarity is calculated over the last Wsim points of output. % % Maximum similarity skew is Kmax pts. % % Each xcorr calculation is decimated by xdecim (8) % % The skew axis sampling is decimated by kdecim (2) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Read the wave file [d,fs,bit]=wavread('we.wav'); W=400; % window length Wov=W/2; % Overlapping point long Kmax=2*W; % maximum number of shifting Wsim=Wov; % Similarity point long of output xdecim=8; % decimation of each xcorr kdecim=2; % decimation of the skew axis sampling X=d'

  21. MATLAB % Factor to run x faster or slower F=4; Ss =W-Wov; size(X); xpts = size(X,2); ypts = round(xpts / F); Y = zeros(1, ypts); % Cross-fade win is Wov pts long - it grows xfwin = (1:Wov)/(Wov+1); % Index to add to ypos to get the overlap region ovix = (1-Wov):0; % Index for non-overlapping bit newix = 1:(W-Wov); % Index for similarity chunks % decimate the cross-correlation simix = (1:xdecim:Wsim) - Wsim; % prepad X for extraction padX = [zeros(1, Wsim), X, zeros(1,Kmax+W-Wov)]; % Startup - just copy first bit Y(1:Wsim) = X(1:Wsim);

  22. MATLAB xabs = 0; lastxpos = 0; km = 0; for ypos = Wsim:Ss:(ypts-W); % Ideal X position xpos = F * ypos; % Overlap prediction - assume all of overlap from last copy kmpred = km + (xpos - lastxpos); lastxpos = xpos; if (kmpred <= Kmax) km = kmpred; % no need to search else

  23. MATLAB % Calculate the skew, km % .. by first figuring the cross-correlation ysim = Y(ypos + simix); % Clear the Rxy array rxy = zeros(1, Kmax+1); rxx = zeros(1, Kmax+1); Kmin = 0;

  24. MATLAB for k = Kmin:kdecim:Kmax xsim = padX(Wsim + xpos + k + simix); rxx(k+1) = norm(xsim); rxy(k+1) = (ysim * xsim'); end % Zero the pts where rxx was zero Rxy = (rxx ~= 0).*rxy./(rxx+(rxx==0)); % Local max gives skew km = min(find(Rxy == max(Rxy))-1); end xabs = xpos+km;

  25. MATLAB % Cross-fade some points Y(ypos+ovix) = ((1-xfwin).*Y(ypos+ovix)) + (xfwin.*padX(Wsim+xabs+ovix)); % Add in remaining points Y(ypos+newix) = padX(Wsim+xabs+newix); end % Plot the result subplot(211); plot(X);grid;original=axis; subplot(212); plot(Y);grid;change=axis;

  26. MATLAB if F > 1 subplot(211); title('Original wave file');axis(original) subplot(212); title(['Modified wave file (Speed=',num2str(F),'X)']);axis(original) else subplot(211); title('Original wave file'); axis(change) subplot(212); title(['Modified wave file (Speed =',num2str(F),'X)']);axis(change) end % Play the wave file and save the wave file sound(Y,fs); wavwrite(Y,fs,8,'we_new_2X.wav');

  27. Results Speed 2X

  28. Results Speed 4X

  29. Results Speed 0.75X

  30. Results Speed 0.5X

  31. Conclusion • The result can be accept with the proper choice of the parameters. • The SOLAFS algorithm provides time-scale modified speech over the wide range of compression and expansion. • It requires significantly less computation than many other methods.

  32. Conclusion • From the MATLAB code, it requires a lot of buffer to hold the sample . It will cause difficulties in real-time applications. • In real-time applications, they have to process everything as fast as possible. If the data is stored in compressed form or the storage units are slow, they will be difficult to process.

  33. References D.J Hejna. Real-time time-scale modification of speech via the synchronized overlap-add algorithm. Master’s thesis, M.I.T.,1990 Don Hejna and Bruce R. Musicus. The SOLAFS Time-Scale Modification Algorithm. Research.1991

More Related