1 / 16

D ynamic Time Warping and Minimum Distance Paths for Speech Recognition

D ynamic Time Warping and Minimum Distance Paths for Speech Recognition. Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones Method: Record, parameterise and store vocabulary of reference words

kale
Télécharger la présentation

D ynamic Time Warping and Minimum Distance Paths for Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Time Warping and Minimum Distance Paths for Speech Recognition • Isolated word recognition: • Task : • Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones • Method: • Record, parameterise and store vocabulary of reference words • Record test word to be recognised and parameterise • Measure distance between test word and each reference word • Choose reference word ‘closest’ to test word

  2. Words are parameterised on a frame-by-frame basis Choose frame length, over which speech remains reasonably stationary Overlap frames e.g. 40ms frames, 10ms frame shift 40ms 20ms We want to compare frames of test and reference words i.e. calculate distances between them

  3. Calculating Distances • Easy: • Sum differences between corresponding frames • Problem: • Number of frames won’t always correspond

  4. Solution 1: Linear Time Warping • Stretch shorter sound • Problem? • Some sounds stretch more than others

  5. Solution 2: • Dynamic Time Warping (DTW) 5 3 9 7 3 Test 4 7 4 Reference Using a dynamic alignment, make most similar frames correspond Find distances between two utterences using these corresponding frames

  6. Digression: Dynamic Programming • The shortest route from Dublin to Limerick goes through: • Kildare • Monasterevin • Portlaoise • Mountrath • Roscrea • Nenagh • Now consider the shortest route from Dublin to Nenagh • What towns does the route go through?

  7. Intercity Example

  8. Place distance between frame r of Test and frame c of Reference in cell(r,c) of distance matrix Compute minimum distances dist each point and place in mindist matrix: mindist(5,3) = min{1 + mindist(5,2), 1 + mindist(4,2), 1 + mindist(4,3)} Test Test Reference We can also find the path through the grid that minimizes total cost of path Reference

  9. Examples so far are uni-dimensional Speech is multi-dimensional e.g. two dimensions, using points (4,3) and (5,2) 4 5 54321 x x 1 2 3 4 5 Distance equation for 2 dimensions: Distance equation for multi-dimensional:

  10. Constraints • Global • Endpoint detection • Path should be close to diagonal • Local • Must always travel upwards or eastwards • No jumps • Slope weighting • Consecutive moves upwards/eastwards

  11. Global Constraints

  12. Local Constraints mindist(r,c) 1 mindist(r,c-1) weights 1 2 mindist(r-1,c-1) mindist(r-1,c)

  13. Points to Note • DTW really only suitable for small vocabularies and/or speaker dependent recognition • Should normalise for reference length • Can use multiple utterances and cluster them • Poor performance if recording environment changes • High computation cost

  14. Evaluation • Performance of designs only comparable by evaluation • Use a test set • For single word recognition we can simply quote % accuracy: • In error analysis, it can be helpful to use a confusion matrix

  15. Confusion Matrix

More Related