Chapter 9 DTW and VQ Algorithm

Chapter 9 DTW and VQ Algorithm • 9.1 Basic idea of DTW • 9.2 DTW algorithm • 9.3 Basic idea of VQ • 9.4 LBG algorithm • 9.5 Improvement of VQ

9.1 Basic idea of DTW (1) • The frames of T and Ri are not exactly corresponding. They will have a non-linear corresponding such that the total distance will be minimal. This is a natural idea. • Suppose T(i) is frame i for test utterance (1<=i<=NT), Rk(j) is frame j for reference utterance k in the vocabulary or R(j) for short(1<=j<=Nk). T(i) and R(j) are both vector. d(i,j) is the distance between T(i) and R(j).

Basic idea of DTW (2) • If a set of (i,j) pair could be found and the total distance D along these points (or the path) will be minimal : • D = min Σd(i,j) (i,j)∈path • Suppose the point on the path is (ni,mi); the path has some constraints : (n1,m1) = (1,1) and (nN,mN) = (NT,Nk) (N=NT); for limiting the calculation, it is supposed that Nk/2 <= NT <= 2Nk for any k

Basic idea of DTW (3) • So the average slope of the path will be in 0.5~2.0. For meeting that, if the current point is (ni,mi), the next point will be: (ni+1,mi+2) or (ni+1,mi+1) or (ni+1, mi), the last one is possible only if mi-1!= mi . • Sometime the initial point could be floating to get better matching.

DTW algorithm (1) • DTW means Dynamic Time Warping. • It uses Dynamic Programming method to implement the idea described in 9.1. • The algorithm could be described like following: (1) i=1,j=1, d[i,j] = d(T(i),R(j)) (2) if ++i <=NT, calculate jl and jh according to the constraint condition, and calculate d(T(i),R(j)) for j=jl to j=jh; (3) For all (i,jl) to (i,jh) do

DTW algorithm (2) D[i,j] = d(T(i),R(j)) + D[i-1,j’] j’ could be j-2,j-1 or j determined by D[i-1,j’] = min { D[i-1,j],D[i-1,j-1],D[i-1,j-2] } Store D[i,j] and j’(i) (4)When i>NT, stop. D[NT,j’] = mink D[NT,k], k=Nk, Nk-1 or Nk-2 (5) Start from (NT,j’) to backtrace the points on the path by j’(NT) and get the path.

DTW algorithm (3) • The D[N,j’] will be the distance between T and a reference Rk. By using same procedure to get all distances between T and Ri’s and use minimal distance principle we can easily determine the best matched word for input word. • This algorithm was used often before the HMM being used. The disadvantage is the large computing time. To overcome it, people figure out the VQ algorithm.

9.3 Basic idea of VQ algorithm (1) • VQ stands for Vector Quantization to contrast to scalar quantization. • The basic idea is to partition the whole feature space into a certain number (2n) of regions, and use the center of a region to represent any vector falling into the region. The calculation will become to looking up distance table if the table is pre-calculated before recognition. It will save some computing time.

Basic idea of VQ algorithm (2) • For doing that it uses some clustering algorithm like k-means algorithm to iteratively get the clustering centers and the membership of vectors until convergence. • The clustering centers will form the codebook. The membership of a vector will be a code label according to the minimal distance principle. Every vector will become a code label.

Basic idea of VQ algorithm (3) • The distance between two vectors will be represented by the distance between two centers. It could be pre-calculated as soon as the codebook is obtained. So during the recognition whenever the code is obtained the calculation will be a kind of looking up table operation.

Basic idea of VQ algorithm (4) • It will speed up the DTW process. • VQ get application not only in speech recognition but also in speech synthesis and speech coding. • VQ has also many applications in field of multimedia for data compression. Of course in this case there are some errors for restoration.

The LBG Algorithm (1) • (1) S = { x } is the set of all vector samples • (2) Set maximal number of iteration L • (3) Set threshold t • (4) Set m initial centers y1(0), y2(0), …, ym(0) • (5) Set initial distortion D(0) =∞ • (6) Set iteration number k = 1 • (7) Make partition of all samples into S1, S2, …, Sm according to minimum distance principle.

The LBG Algorithm (2) • (8) Calculate the total distortion : D(k) = Σi=1mΣx d(x,yi(k-1)) x∈Si(k) • (9) Calculate the relative improvement of distortion δ(k) = |D(k)-D(k-1)|/D(k) • (10) Calculate the new centers(codewords) yi(k)=(Σx)/Ni(k), x∈Si(k), i=1~m • (11) if δ(k) < t goto (13) else goto (12)

The LBG Algorithm (3) • (12) if (++k<L) goto (7) else goto (13) • (13) output codewords y1(k), y2(k), …, ym(k) and D(k) • (14) end • The partition mode is called ‘Voronoi’ partition. It implied that for every iteration the total intra-class distances will be reduced.

The LBG Algorithm (4) • Approaches for setting the initial codebook • (1) Random initial codebook • It takes the initial centers arbitrarily. It might not be so good. We can ask the new center must have a distance to all other centers being larger than a threshold, and in this way the m initial centers will be set.

The LBG Algorithm (5) • (2) Splitting approach • At first get the center y0 of all samples, then get y1 with maximal distance to the y0 and y2 with maximal distance to y1. Then get S1 and S2 for y1 and y2. • By using same way for S1 and S2, we can split them into 4 sets. The loop can be done until m=2B initial centers are obtained by B iterations. • This is an often used way.

The improvement of VQ (1) • (1)VQ system by tree search (codebook structure) When doing recognition by VQ, for every vector a search is needed for getting its code or label. In general we need search for every center, it takes time. If we can create a tree of codebook and keep all levels of the codewords, then the search will be easy. The cost is about double storages.

The improvement of VQ (2) • It could be realized like this : At first a codebook of capacity 2 is generated: y0 and y1 and corresponding subset. Then for every these subsets the next level of centers are created : y00,y01,y10,and y11. This is the second level.

The improvement of VQ (3) • By repeating k steps a k levels tree will be created. It will have 2k codewords. • The advantages are : search amount will be down to 2k (vs 2k) distance calculation and k (vs 2k-1) comparison. Also the training amount will be reduced (every splitting only concerns two-codeword codebook) • The disadvantages are : average distortionis worse than the full search codebook and double storages. • Besides binary tree, other number of codewords for a level could be used.

The improvement of VQ (4) • (2) Tree codebook formed by full search codebook At first a full search codebook is created by LBG algorithm. Then m codewords are divided into m/2 pairs by minimal distance, and the center is found. Then next up level is created by same way. After k steps the tree is formed. It is better than previous, but some time it may make mistakes.

The improvement of VQ (5) • (3) Multi levels of VQ system • (4) Split VQ (suitable for LSP vector) • (5) Fast search in full search system

Chapter 9 DTW and VQ Algorithm