1 / 21

HMM Toolkit (HTK)

Presentation by Daniel Whiteley AME department. HMM Toolkit (HTK). What is HTK?.

fwilson
Télécharger la présentation

HMM Toolkit (HTK)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presentation by Daniel Whiteley AME department HMM Toolkit (HTK)

  2. What is HTK? The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.

  3. What is HTK? HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems.

  4. Basic HTK command format • The commands in HTK follow a basic command line format: HCommand [options] files • Options are indicated by a dash followed by the option letter. Universal options are capital letters. • In HTK, it is not necessary to use file extentions, but headers to determine their format.

  5. Configuration files • As well, you can set up the configuration of HTK modules using config files. They are implemented using the -C option; or they can be implemented globally using the command setenv HCONFIG myconfig where myconfig is your own config modifications. • All possible configuration variables can be found in chapter 18 of the HTK manual. However, for most of our purposes, we only need to create a config file with these lines: SOURCEKIND = USER %The user defined file format (not sound) TARGETKIND = ANON_D %Keep the file the same format.

  6. Using HTK • Parts of HMM modeling • Data Preparation • Model Training • Pattern Recognition • Model Analysis

  7. Data Preparation • One small problem: • HTK was tailored for speech recognition. Therefore, most of the data preparation tools are for audio. • Due to this, we need to jerry-rig our data to the HTK parameterized data file format. • HTK parameter files consist of a sequence of samples preceeded by a header. The samples are simply data vectors, whose components are 2-byte integers or 4-byte floating point numbers. • For us, these vectors will be a sequence of joint angles received from a motion capture session.

  8. HTK file format • The file begins with a 12-byte header containing the following information: • nSamples (4-byte int): Number of samples • samplePeriod (4-byte int): Sample period (calculated by multiplying the number by 100ns) • sampleSize (2-byte): Number of bytes per vector • parameterKind (2-byte int): Defines the type of data • For our purposes, either this parameter will be 0x2400, which is the user defined parameter kind, or 0x2800, which is the discrete case.

  9. HMM model creation • In order to model the motion capture squence, we need to create a prototype of the HMM. In this prototype, the values of B and  are arbitrary. The same is true for the transition matrix A, save that any transition probability you set to zero will remain as zero. • Models are created using a scripting language similar to HTML. • As well, models in HTK have a beginning and ending state which are non-emitting. These states are not defined in the script.

  10. ... <TransP> 0.0 0.4 0.3 0.3 0.0 0.0 0.2 0.5 0.3 0.0 0.0 0.2 0.2 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.0 0.0 0.0 0.0 0.0 HMM Model Example Name of the file Number of Gaussian distributions Transition matrix A ~h ''prototype'' <BeginHMM> <VectorSize> 4 <USER> <NumStates> 5 <State> 2 <NumMixes> 3 <Mixture> 1 0.3 <Mean> 4 0.0 0.0 0.0 0.0 <Variance> 4 1.0 1.0 1.0 1.0 <Mixture> 2 0.4 ... <State> 3 ... Number of states Mean observation vector Sample size Covariance matrix diagonal All the transition probabilities for the ending state are always zero The distribution’s ID and weight

  11. Vector Quantization • In order to reduce computation, we can make the HMM discreete. • In order to use a discreete HMM, we must first quantize the data into a set of standard vectors. • Warning: in quantizing the data, error is inheritably introduced. • Before quantizing the data, we must first have a standard set of vectors, or a “vector cookbook”. This is made with HQuant.

  12. HQuant • HQuant takes the training data and uses a K-means algorithm to evenly partition the data and find the centriods of these partitions to create our quantization vectors (QVs). • A sample command: HQuant -C config -n 1 64 -S train.scp vqcook • To reduce quatization time, a cookbook using a binary tree search algorithm can be made using the -t option. Number of QVs for a certain data stream You can use a script to list all of your training files Our cookbook will be written to this file Use the configuration variables found in config

  13. Converting to Discrete • The conversion of data files is done using the HCopy command. In order to quantize our data, we do this: HCopy –C quantize rawdata qvdata Where rawdata is our original data, qvdata is our quantized data, and quantize is a config file having these commands: SOURCEKIND = USER %We start with our original data TARGETKIND = DISCRETE %Convert it into discrete data SAVEASVQ = T %We throw away the continuous data VQTABLE = vqcook %We use are previously made %cookbook to quantize the data

  14. ~o <Discrete> <StreamInfo> 1 1 ~h “dhmm” <BeginHMM> <NumStates> 5 <State> 2 <NumMixes> 10 <DProb> 5461*10 .... <EndHMM> Discrete HMM • Discreete HMMs are very similar to their continuous counterparts, save for a few changes. • Discrete probabilities are in logrithmic form, where: P(v) = exp(-d(v)/2371.8) Number of discrete symbols Duplicate function

  15. Model Training (token HMM) • The initialization of our prototype can be done using HInit: HInit [options] hmm data1 data2 data3 ... • HInit is used mainly for left-right HMMs. For more ergodic HMMs, it can be initialized by doing a flat-start. This is done by setting all means and variances to the global counterparts using HCompV: HCompV -m -S trainlist hmm (The HHMM being trained)

  16. Retraining • The model this then retrained using the Welch-Baum algorithm found in HRest: HRest -w 1.0 -v 0.0001 -S trainlist hmm • The -w and -v options are to set floors for the mixture probability and variances respectively. The float used in -w represents a multiplier of 10^-5. • This can be iterated as many times as wanted to achieve desired results.

  17. Dictionary Creation • In order to create a recognition program or script, we must first create a dictionary. • A dictionary in HTK gives the word and its pronunciation. For our purposes, it will just consist of our token HMM that we trained. RUNNING run WALKING walk JUMPING [SKIPPING] jump Word Tokens used to form the word Displayed output (if not specified the word is displayed)

  18. Label Files • Label files contain a transcription of what is going on in the data sequence. 000000 100000 walk 100001 200000 run 200001 300000 jump End of frame in samples Start of frame in samples Token found in that time frame

  19. During training and recognition, we may have many test files and their accompanying label files. The label files can be condensed into one file called a master label file, or MLF. “#!MLF!#” “*/a.lab” 000000 100000 walk 100001 200000 run 200001 300000 jump . “*/b.lab” run . “*/jump*.lab” jump . Master Label Files (MLFs) Same as a original label file If the entire file is one token, it can be labeled with just the token The wildcard operator can be used to label multiple files at once

  20. Pattern Recognition • The recognition of a motion sequence is done by using HVite. • To receive a transcription of the recognition data in MLF format, we use: HVite –a –i results –o SWT –H hmmlist \ –I transcripts.mlf –S testfiles Throws away unnecessary data in the label files Output transcription file in MLF format Text file containing a list of HMM used Create word network from given transcriptions MLF file that has the test files’ transcriptions Motion capture data to be recognized

  21. Model Analysis • The analysis of the recognition results is done by HResults. HResults -I transcripts.mlf -H hmmlist results • Note: The reference labels and the results labels must have different file extensions List of HMMs used MLF containing result labels MLF containing the reference labels

More Related