1 / 31

AMH001 (thesis - 04/16/03)

REMOTE++: A Tool for Automatic Remote Distribution of Programs on Windows Computers. Ashley Hopkins Department of Computer Science and Engineering University of South Florida Tampa, Florida 33620 amhopki2 @csee.usf.edu. AMH001 (thesis.ppt - 04/16/03). Acknowledgements.

tad
Télécharger la présentation

AMH001 (thesis - 04/16/03)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REMOTE++: A Tool for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer Science and Engineering University of South Florida Tampa, Florida 33620 amhopki2@csee.usf.edu AMH001 (thesis.ppt - 04/16/03)

  2. Acknowledgements • I wish to thank Dr. Kenneth Christensen for his encouragement, • his enthusiasm, and his support in writing this thesis. • I also wish to thank my committee member Zornitza Genova • Prodanoff for taking the time to read this thesis and provide • valuable feedback. AMH002

  3. Topics • Introduction – remote distribution • Description of Remote distribution methods • Design of REMOTE++ • Evaluation of REMOTE++ • Summary and future work AMH003

  4. Introduction • Two key issues addressed by remote distribution • Simulation programs require significant time to execute • Many require multiple runs to complete an experiment • Many computer resources are under utilized AMH004

  5. Introduction continued • Parallelization of programs reduces overall execution time • Two types of parallelization • Space based parallelization • Addresses programs that can be broken down easily • Time based parallelization • Addresses programs that require multiple executions • Many simulations fit this category • REMOTE++ implements time parallelization AMH005

  6. Introduction continued • Remote Distribution Network Remote Remote Remote Network Master Remote Remote AMH006

  7. Introduction continued • Remote distribution of programs • Enables execution of independent programs in parallel • Harnesses the idle CPU cycles of remote machines • Reduces the overall execution time of experiments AMH007

  8. Introduction continued • Requirements of Distribution Tools • Distribution must be automatic (no manual interaction) • 2) Tool must be simple for easy maintenance and modification • 3) Output files must be available on the master PC • 4) A single process must be distributed to each remote machine • 5) Once a job completes, the next job must be sent • Each job must be executed only once • The failure of a job to complete must be detected • The failure of a remote host must be detected • Error messages must be displayed at the master PC • A log file should be kept AMH008

  9. Remote Distribution Methods • Methods for Remote Distribution • Remote shell (rsh) and remote execute (rexec) commands • Cluster systems • Beowulf • Grid Computing • SETI@home • Unix based remote distribution tools • Condor • Original REMOTE tool developed by Dr. Christensen • REMOTE++ built upon this tool AMH009

  10. Remote Distribution Methods continued • Drawbacks of current tools • Primarily designed for Unix platforms • Many are large or complex • Many require extensive installation and maintenance AMH010

  11. Remote Distribution Methods continued • Key challenge is… • Develop a Windows based Remote Distribution tool • that is easy to use, maintain, and modify. • Must be able to reduce overall execution time • Overhead in distribution of processes must be overcome • Must be able to execute many different programs • No modification to the programs • Various input and output methods allowed AMH011

  12. Description of REMOTE++ • REMOTE++ is built upon REMOTE • Sockets interface replaced by rcp/rsh commands • Programs read/write to standard input/output • An invalid job is detected • An invalid host is detected • REMOTE++ also has drawbacks • Each remote host required to have an rsh/rcp daemon • Status feature of REMOTE not available • Security concerns with remote shell commands AMH012

  13. Description of REMOTE++ continued • Set-up of REMOTE++ • 1) Each client must have a remote shell/remote copy daemon. • 2) REMOTE++ must be loaded on the master machine. • 3) A joblist.txt file must contain a list of jobs to be • executed. • 4) A hostlist.txt file must contain a list of the hostnames • of all remote machines. • 5) A status.txt file must be created as a log file containing • the success or failure of each job and each remote host. AMH013

  14. Description of REMOTE++ continued • Sample joblist.txt file • file mm1.exe input1.txt output1.txt • std hello.exe input2.txt output2.txt • file mm1.exe input3.txt output3.txt • Sample Hostlist.txt file • giga2.csee.usf.edu • giga3.csee.usf.edu AMH014

  15. Description of REMOTE++ continued • Sample status.txt file • Mode is classic. • Executable file mm1.exe found • Input file input1.txt found • Output file output1.txt found • Mode is new. • Executable file hello.exe found • Input file input2.txt found • Output file output2.txt found • Mode is classic. • Input file input3.txt was not found • Output file output3.txt found • giga2.csee.usf.edu is a valid host • giga3.csee.usf.edu is a valid host AMH015

  16. Description of REMOTE++ continued • Operation of REMOTE++ • 1) The existence of each job in joblist.txt is validated. • 2) Threads are used to assign a job to each host in the host list. • 3) The executable is remote copied (rcp) to the remote host. • rcp failure makes host unavailable and job is reassigned • 4) The job is executed using a remote shell (rsh) command. • 5) When the job finishes the host is assigned another job • until all jobs in joblist.txt are complete. AMH016

  17. Description of REMOTE++ continued • Sample Execution of REMTOE++ AMH017

  18. Description of REMOTE++ continued • Two input/output methods are supported by REMOTE++ • 1) File or “Classic” method • Used with programs that read from and write to files • Implemented in original REMOTE tool • Requires transfer of input and output files • 2) Std or “New” method • Used with programs that use standard input/output • New in REMOTE++ tool • Input and Output redirected from files • No transfer of files required AMH018

  19. Description of REMOTE++ continued • The remote shell/remote copy daemon: • 1) Vendor version (tested with Denicomp’s rshd) • Dependable • Cost prohibitive • Not open source • 2) Free version (by Silviu Marghescu) • Free • Open source • Does not support standard input/output method • Not as reliable AMH019

  20. Evaluation of REMOTE++ • Queuing systems can be modeled using simulation • Queue simulations must be executed numerous times with varying • input to gather statistical information • A queue simulation was utilized to evaluate the REMOTE++ tool AMH020

  21. Evaluation of REMOTE++ continued • A queue is a sequence of customers waiting to receive service • The following features determine the behavior of a queue: • The distribution of time between arriving customers • The distribution of time to service a customer • The number of servers available to service the customers • The capacity of the queue • The population size of customers • The queuing discipline determines the order of service AMH021

  22. Queue Server Arrivals Departures Evaluation of REMOTE++ continued • An M/M/1 queue has the following features: • Markovian (exponentially distributed) inter-arrival of • customers • Markovian (exponentially distributed) service times • A single server • An unlimited queue capacity • An infinite customer population • An M/M/1 queue has FIFO queuing discipline AMH022

  23. Evaluation of REMOTE++ continued • Evaluated REMOTE++ with an M/M/1 queue simulation • Performance of an M/M/1 queue measured by its utilization • Utilization (ρ) is the fraction of the time the system is busy • Utilization is a ratio of arrival rate and the service rate • The length (L) of the queue is dependent on the utilization AMH023

  24. Evaluation of REMOTE++ continued • Goal of Evaluation… • Determine the relationship between the utilization and the • simulation run time for mean queue length within a percent • of the theoretical length • At the same time… • Evaluate the reduction in execution time when executing • simulation with REMOTE++ on five machines AMH024

  25. Evaluation of REMOTE++ continued • M/M/1 queue simulation time was evaluated for... • Utilization from 1% to 99.5% • Length within 10% of the theoretical length • Statistical mean of 10 executions at each interval AMH025

  26. Evaluation of REMOTE++ continued • As the target utilization approaches 100% the simulation time • of the M/M/1 queue increasingly grows longer. AMH026

  27. Evaluation of REMOTE++ continued • Simulation time grows slightly faster than order six • polynomial growth AMH027

  28. Evaluation of REMOTE++ continued • The M/M/1 queue execution... • Projected a five time speed up on five machines • Achieved about two and a half time speed-up on five machines • seven seconds of overhead per job • at low utilization jobs executed in several seconds AMH028

  29. Summary and future work • Remote Distribution can be used to reduce execution time. • Existing systems are Unix-based and complex • Need a simple Windows based tool • REMOTE++ improves upon REMOTE • Complex sockets interface replaced by simple rsh/rcp script • Enables wider variety of programs to be executed • Able to recover from invalid jobs and hosts AMH029

  30. Summary and future work • Improve free remote shell daemon • Support std or “new” input/output method • Reduce overhead in distribution to increase reduction in execution • time. • Support more and mixed input/output methods • Implement security in REMOTE++ • Currently relies on rsh daemon for security • Implement status feature similar to original REMOTE tool AMH030

  31. Questions? Ashley Hopkins Department of Computer Science and Engineering University of South Florida Tampa, Florida 33620 amhopki2@csee.usf.edu • REMOTE++ soon available at: • http://www.csee.usf.edu/~amhopki2/research • http://www.csee.usf.edu/~christen/tools/toolpage.html Thank You AMH031

More Related