230 likes | 462 Vues
Graduate Student Survival Guide: using cluster, gnuplot and LaTeX. Janardhan Rao Doppa School of EECS, Oregon State University doppa@eecs.oregonstate.edu http://web.engr.oregonstate.edu/~doppa. EECS Cluster: what ?. A computing resource to run your jobs Off-shore your computing
E N D
Graduate Student Survival Guide:using cluster, gnuplot and LaTeX Janardhan Rao Doppa School of EECS, Oregon State University doppa@eecs.oregonstate.edu http://web.engr.oregonstate.edu/~doppa
EECS Cluster: what ? • A computing resource to run your jobs • Off-shore your computing • Experiments or simulations for research • Will be handy when you have to run large number of experiments • You don’t want to use your DELL (read as delicate) laptop • Web • http://engr.oregonstate.edu/computing/cluster/
EECS Cluster: how ? • Connection: Connect to one of the “submit” Hosts • Submit32 or submit64 • Availability: Check the availability of slots in each queue • I386, em64t, amd64-low, eecs1 • Compile: Compile your code on the remote machine • Script: Prepare the “submit script” • command to run your program, which queue, where to store the output or error • Submit: Submit the job using “submit script” • Monitor: Monitor the status • auto- email or manually check the status
EECS Cluster: how ? • Connection: Connect to one of the “submit” Hosts • ssh <user> @ {submit32, submit64}.eecs.oregonstate.edu • Availability: Check the availability of slots in each queue • qstat command : learn the usage “qstat - - help” • “qstat –f –q <queue>” where <queue> = i386 or em64t or amd64-low or eecs1 em64t@exec-em64t-01.hpc.engr.o BIP 2/2 2.02 lx24-amd64 1402020 0.50500 run09_26.s matthchr r 10/28/2010 20:05:08 1 1402032 0.50500 closfc mathewm r 10/28/2010 21:03:08 1 #occupied / # total
EECS Cluster: how ? • Script: Prepare the “submit script” #!/bin/csh #Job name #$ -N job_name #Current Working Directory #$ -cwd # Resource request for the faster bees #$ -soft -l mem_total=3.00G # specify the hardware platform to run the job on. # options are: amd64, em64t, i386, volumejob (use em64t if you don't care) #$ -q i386 # Output/error file (merged) #$ -o output_file.out#$ -j y # Command sequence ./source_file
EECS Cluster: how ? • Submit: Submit the job using “submit script” • Change permissions of script: “chmod u+x script.csh” • “qsub script.csh” • Monitor: Monitor the status • “qstat –u <user>” • Cautions: • You should have enough disk space (logs and outputs) and main memory (RAM) to run the program • Don’t monopolize the cluster – think of others also! • Budgeted experimental design – based on the available resources (slots), hard deadlines (time) etc.
gnuplot: what ? • A command-line program to generate 2D and 3D plots • better than Excel – no more frustrating clicks! • specify style, fonts, legends as commands • reuse the code for modifications or similar plots • generates very good PS or EPS figures which are highly compatible with LaTeX • “gnu” is not the same as “GNU”!! • Web • http://www.gnuplot.info/ • Available for both linux and windows
gnuplot: how ? • Data file: Create data file to be used for the plot • Space separated column-wise data • Code file: Create the gnuplot code file • Specify the title of plot, axes names and ranges, legends, thickness of lines, color etc. • Specify the output format (PNG, PS or EPS), along with the filename • Run: run your code on the gnuplot command-line • Copy and paste your code on the command-line and press ENTER
gnuplot: how ? • Data file: Create data file to be used for the plot • Space separated column-wise data 0.1 100 73.13 70.14 0.2 100 70.14 73.13 0.3 100 70.14 73.13 0.4 100 74.62 73.13 0.5 100 74.62 73.13 0.6 84 64.17 70.89
gnuplot: how ? • Code file: Create the gnuplot code file set terminal postscript eps enhanced "Helvetica" 18 set term postscript eps color set key graph 0.75,0.9 set size 0.9, 0.9 set title "Bayes-EM vs Ripper on NFL data \n (Novelty missingness model)“ set ylabel "Accuracy (%)“ set xlabel "Percentage of missing values“ set xrange [0.1:0.6] set yrange [50:100] set output 'EM_comparison_novelty.eps‘ plot \ 'EM_comparison_novelty.txt' using 1:$2 t'Bayes-EM' with linespoint lt 2 lw 1 pt 7,\ 'EM_comparison_novelty.txt' using 1:$3 t'RIPPER-conservative' with linespoint lt 3 lw 3 pt 7,\ 'EM_comparison_novelty.txt' using 1:$4 t'RIPPER-aggressive' with linespoint lt 4 lw 3 pt 7
gnuplot: how ? • Run: run your code on the gnuplot command-line • Copy and paste your code on the command-line and press ENTER
gnuplot: resources • Short and quick reference guide • http://sparky.rice.edu/gnuplot.html • Web resources • http://www.gnuplot.info/ • Demos, tutorials, sample codes and scripts • Lot of useful sample plots are available at: http://www.cse.iitb.ac.in/silmaril/br/lib/exe/fetch.php?id=students&cache=cache&media=students:gnuplot.tgz • Thanks to Bhaskaran Raman and Kameshwari Chebrolu.
LaTeX: what ? • A manuscript preparation system • better than Word – no more equation editors! • Math formulas and equations are easier to write • Bibliography and cross-referencing is much easy • Almost all conference and journal papers are written using LaTeX • Default standard in academia – get used to it! • Web • http://en.wikibooks.org/wiki/LaTeX • Windows editors: TeXnicCenter and WinEdit • Linux editors: Lyx and Kyle
LaTeX: basic files • LaTex code • .tex – LaTeX input code file • .sty – style file • Bibliography • .bib – bibliography file • .bst – bibliography style file • Output • .dvi – device independent file • .ps – postscript file
LaTeX: writing code file • Start with an existing template • Basic commands • \section, \subsection, \subsubsection • Text mode vs. Math mode ($ $) • Math symbols: \alpha, \beta, \gamma • \begin{environment} and \end{environment} • \begin{itemize} and \end{itemize} • \begin{equation} and \end{equation} • \begin{figure} and \end{figure} • \begin{table} and \end{table}
LaTeX: bibliography file • A sample bibliography entry @inproceedings{CRF-ICML:01, author = {John Lafferty and Andrew McCallum and Fernando Pereira}, title = {Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data}, booktitle = {ICML'01: Proceedings of the 18th International Conference on Machine Learning}, year = {2001}, } @article{TRITRAINING-TKDE:05, author = {Zhi-Hua Zhou and Ming Li}, title = {Tri-Training: Exploiting Unlabeled Data Using Three Classifiers}, journal = {IEEE Transactions on Knowledge and Data Engineering}, volume = {17}, issue = {11}, year = {2005}, }
LaTeX: compiling • LaTeX code with “latex” or “pdflatex” • BibTeX code with “bibtex” • Latex <code> • Bibtex <bib> • Latex <code> • two pass algorithm! • Collaborative writing • Use CVS or SVN repository – much easier!
LaTeX: resources • LaTeX cheat sheet • http://www.ctan.org/tex-archive/info/latexcheat/latexcheat/latexsheet.pdf • LaTeX wiki book • http://en.wikibooks.org/wiki/LaTeX/ • Learn tips and tricks • From expert users • From online forums • Grow your bag of tricks – will save your time at deadlines!
LaTeX in PowerPoint • TeXPoint – A LaTeX add-on for ppt and word • http://texpoint.necula.org/ • http://web.engr.oregonstate.edu/~mehtane/latex/index.html • TeXclip – LaTeX to image • http://maru.bonyari.jp/texclip/texclip.php • Beamer slides using LaTeX • http://bitbucket.org/rivanvx/beamer/wiki/Home
MS students: Advice • Hard to fund all the MS students • bad economy, low grant money etc. • Short time investment – faculty will chose their bets carefully! • Look for alternative funding sources • BSG, Media Services, Library, Science laboratories, e.g., chemistry, biology etc. • Bottom line: Grad school is costly, but a very good long term investment!!
MS students: Advice • Immediate reward vs. long-term average reward • Worst: you finish your graduate school with your money • Concentrate on your education and develop skills • Go for a summer internship – money and experience • Specialize in something – good job market! • You can pay your loans in less than 6 months!! • Don'ts • Finish classes quickly and graduate with ME – bad idea! • worry about money while in school – won’t be productive