Time-Scale Modification of Speech Signal

Time-Scale Modification of Speech Signal SOLAFS (Synchronized Overlap-Add, Fixed Synthesis)

Overview • Introduction • Overview of the methods • Basic Idea • SOLAFS Method • Matlab Code • The results • Conclusion

Introduction • There are a large number of applications to modify the time-scale of speech, music or other acoustic material. • Without modifying the pitch. • To speed up or slow down the speech • No Donald Duck or Minnie Mouse effects.

Introduction • TSM-Time scale modification refer to changing the reproduction rate of a signal. • Two primary operation involved - time-scale expansion -slow down - time-scale compression -speed up

Introduction original expansion compression

Overview of methods • Time- scale modification utilizes three basic methods: - frequency domain processing methods - analysis/synthesis methods - time-domain processing methods • SOLAFS is a time-domain processing method.

Basic Idea • SOLAFS is an improvement of the prior SOLA method( Synchronized overlap-add). • SOLA consists of -shifting the beginning of a new speech segment over the end of the preceding segment to find the point of the highest cross-correlation. -when found it, the frame are overlapped and average together.

SOLAFS There are 4 parameters • Window length (W) - smallest unit of input signal that is manipulated by the method • Analysis shift (Sa ) - inter-frame interval between successive search ranges for analysis windows along the input signal • Synthesis shift(Ss) - inter-frame interval between successive analysis windows along the output signal • Shift search interval(kmax) - the duration of the interval over which an analysis window may be shifted for purpose of aligning it with the region of the output signal it will overlap.

SOLAFS • The four parameters used in the SOLAFS

Analysis The analysis windows are chosen as follows: where m = a window index, i.e. it refers to the mth window n = a sample index in an input buffer for the input signal, which buffer is W samples longkm = the number of samples of shift for the mth window xm[n] = the nth sample in the mth analysis window

Analysis • The analysis windows are then used to form the output signal y[i] recursively in accordance to the following: where: Wov= W –Ssis the number of points in the overlap region b[n] = an overlap-add weighting function which is referred to as a fading factor – an averaging function, a linear fade function, and so forth.

Analysis • Calculation for km km is an optimal shift that is determined by the normalized cross-correlation between x and y in the overlap region. where kmax is the maximum allowable shift from the initial string position of the analysis window

Kmcan be often predicted without computation of the similarity. • The mthshift, km, should be determined by: if otherwise

Implement in MATLAB There are 7 steps as follows; 1. As an initialization step , take W samples from the input signal, which samples are stored in an input signal buffer, and place them in an output sample buffer for the output signal. 2. find the start of the first analysis window mSa.

Implement in MATLAB 3. Next, find the maximum similarity between the first Wovsamples at the start of the analysis window and at the end of the output signal by computing the cross-correlation between the samples from the start of the analysis window, and the samples from the end of the output window.

Implement in MATLAB 4. We shift the start of the analysis window by one or two and repeat step 3. 5. Steps 3 and 4 are repeated until we have shifted the analysis window by the maximum amount of kmax that is allowed.

Implement in MATLAB 6. If the maximum cross-correlation occurs for a certain shift of the analysis window, overlap-add the last Wov samples of the output signal and the first Wovsamples of the shifted analysis window, and transfer W – Wovfurther samples into the output buffer.

Implement in MATLAB 7. Steps 2 – 7 are repeated by choosing the next analysis window, until the input signal reaches its end.

Parameter choices • The smallest useful synthesis shift is Ss= Wov • The smallest useful window length is W = 2Wov • Kmax = 2W

MATLAB %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Project Spring 2005 %% Rachan Fugcharoen ECE5525 % % Do SOLAFS timescale mod'n % % Y is X scaled to run F x faster. X is added-in in windows % % W pts long, overlapping by Wov points with the previous output. % % The similarity is calculated over the last Wsim points of output. % % Maximum similarity skew is Kmax pts. % % Each xcorr calculation is decimated by xdecim (8) % % The skew axis sampling is decimated by kdecim (2) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Read the wave file [d,fs,bit]=wavread('we.wav'); W=400; % window length Wov=W/2; % Overlapping point long Kmax=2*W; % maximum number of shifting Wsim=Wov; % Similarity point long of output xdecim=8; % decimation of each xcorr kdecim=2; % decimation of the skew axis sampling X=d'

MATLAB % Factor to run x faster or slower F=4; Ss =W-Wov; size(X); xpts = size(X,2); ypts = round(xpts / F); Y = zeros(1, ypts); % Cross-fade win is Wov pts long - it grows xfwin = (1:Wov)/(Wov+1); % Index to add to ypos to get the overlap region ovix = (1-Wov):0; % Index for non-overlapping bit newix = 1:(W-Wov); % Index for similarity chunks % decimate the cross-correlation simix = (1:xdecim:Wsim) - Wsim; % prepad X for extraction padX = [zeros(1, Wsim), X, zeros(1,Kmax+W-Wov)]; % Startup - just copy first bit Y(1:Wsim) = X(1:Wsim);

MATLAB xabs = 0; lastxpos = 0; km = 0; for ypos = Wsim:Ss:(ypts-W); % Ideal X position xpos = F * ypos; % Overlap prediction - assume all of overlap from last copy kmpred = km + (xpos - lastxpos); lastxpos = xpos; if (kmpred <= Kmax) km = kmpred; % no need to search else

MATLAB % Calculate the skew, km % .. by first figuring the cross-correlation ysim = Y(ypos + simix); % Clear the Rxy array rxy = zeros(1, Kmax+1); rxx = zeros(1, Kmax+1); Kmin = 0;

MATLAB for k = Kmin:kdecim:Kmax xsim = padX(Wsim + xpos + k + simix); rxx(k+1) = norm(xsim); rxy(k+1) = (ysim * xsim'); end % Zero the pts where rxx was zero Rxy = (rxx ~= 0).*rxy./(rxx+(rxx==0)); % Local max gives skew km = min(find(Rxy == max(Rxy))-1); end xabs = xpos+km;

MATLAB % Cross-fade some points Y(ypos+ovix) = ((1-xfwin).*Y(ypos+ovix)) + (xfwin.*padX(Wsim+xabs+ovix)); % Add in remaining points Y(ypos+newix) = padX(Wsim+xabs+newix); end % Plot the result subplot(211); plot(X);grid;original=axis; subplot(212); plot(Y);grid;change=axis;

MATLAB if F > 1 subplot(211); title('Original wave file');axis(original) subplot(212); title(['Modified wave file (Speed=',num2str(F),'X)']);axis(original) else subplot(211); title('Original wave file'); axis(change) subplot(212); title(['Modified wave file (Speed =',num2str(F),'X)']);axis(change) end % Play the wave file and save the wave file sound(Y,fs); wavwrite(Y,fs,8,'we_new_2X.wav');

Results Speed 2X

Results Speed 4X

Results Speed 0.75X

Results Speed 0.5X

Conclusion • The result can be accept with the proper choice of the parameters. • The SOLAFS algorithm provides time-scale modified speech over the wide range of compression and expansion. • It requires significantly less computation than many other methods.

Conclusion • From the MATLAB code, it requires a lot of buffer to hold the sample . It will cause difficulties in real-time applications. • In real-time applications, they have to process everything as fast as possible. If the data is stored in compressed form or the storage units are slow, they will be difficult to process.

References D.J Hejna. Real-time time-scale modification of speech via the synchronized overlap-add algorithm. Master’s thesis, M.I.T.,1990 Don Hejna and Bruce R. Musicus. The SOLAFS Time-Scale Modification Algorithm. Research.1991

Time-Scale Modification of Speech Signal

Time-Scale Modification of Speech Signal

Presentation Transcript

Adaptive Playout Scheduling Using Time-scale Modification

Chapter 3 Time Domain Analysis of Speech Signal

GEOLOGIC TIME SCALE

A Novel Approach to Speech Cod ing After Time Scale Modification

Introduction to Speech Signal Processing

Speech Signal Processing I

Introduction to Speech Signal Processing

Speech Signal Processing I

Time Scale Modification （時長調整）

Speech Signal Processing I

Ch4 Short-time Fourier Analysis of Speech Signal

Geologic Time Scale

Geologic Time Scale

Time scale of retrograde amnesia

Time-scale and pitch modification

Signal Subspace Speech Enhancement

Speech Signal Representations I

Speech Signal Representations

Analyzing the Speech Signal

Linear Prediction Coding of Speech Signal

Geologic Time Scale