60 likes | 148 Vues
Explore GMTK parallel tools, scripts, and resources for efficient cluster computing. Includes automatic parallelization settings, script examples, and model file viewing. Easily restart jobs after crashes with sanity checks. Access pre-generated data and scripts for various tasks.
E N D
GMTK parallel tools Arthur Kantor 9/5/06
Overview • Before diving in, consider reading: • Bowan’s Sungrid Engine cheat sheet • http://www.ifp.uiuc.edu/~bowonlee/research/cluster/linux_cluster.htm • GMTK documentation • http://ssli.ee.washington.edu/~bilmes/gmtk/ • Parallel scripts for emtrain and viterbi • Other useful scripts from JHU WS06 • JHU WS06 parallel scripts • More finicky but do more (for now)
Parallel scripts for emtrain and viterbi ifp-32.ifp.uiuc.edu/cworkspace/ifp-32-1/hasegawa/programs/gmtk/parallelImproved • distribute.pl • Reads a list of tasks from file and runs them in parallel on the cluster • example • emtrainParallel.pl (viterbiParallel.pl) • Runs a single iteration of gmtkEmtrainNew (gmtkViterbiNew) • Determines all the parallelisation settings automatically • Can be safely restarted after a crash • Does a sanity check of your settings before launching jobs in parallel • Example
Other useful scripts from JHU WS06 ifp-32.ifp.uiuc.edu/cworkspace/ifp-32-1/hasegawa/programs/gmtk/scripts • View model files: lg • Creating initial gaussian mixtures: genGMParms.pl • Creating DTs • genDictionaryDT.pl creates a DT to determine the phone, given the word, pronunciation variant and phoneCounter • Other scripts to create word transition DTs, word Counter to word, etc… • All DTs have already been generated for Svitchboard
Data ifp-32.ifp.uiuc.edu:/cworkspace/ifp-32-1/hasegawa/jhu06/export/ws06afsr/data/SVB • Svitcboard data for tasks with vocab sizes of 10, 25, 50, 100, 250, 500 • NN outputs for all of Svitchboard to be used in tandem models is available • Broken up into 5-fold cross validation chunks • Filelists already generated
JHU WS06 parallel scripts ifp-32.ifp.uiuc.edu/cworkspace/ifp-32-1/hasegawa/programs/gmtk/parallel • Example config files