150 likes | 278 Vues
This case study details a collaborative research journey undertaken by two Ph.D. students at Berkeley, focusing on dynamic load balancing through process lifetime distributions. From critiquing the limitations of the ELZ88 model on active process migration to proposing refined algorithms based on empirical distributions, this work emphasizes the importance of a balanced system model. By analyzing data from local time-sharing servers and developing simulation tools, the researchers aim to enhance the understanding of load sharing mechanisms, ultimately driving better performance in system resource management. ###
E N D
CS Research – A Case Study Exploiting Process Lifetime Distributions for Dynamic Load Balancing
The Players • Two Ph.D. students at Berkeley • Mor Harchol-Balter • Functional Analysis, Formal, Mathematical • Berkeley 1990-1996 • First Paper 1994 • Allen B. Downey • Systems Research • MS MIT 1990, Berkeley 1991-1997 • First Paper 1993 • Not part of either players Ph.D. Thesis!
How it started • ELZ88 • The limited performance benefits of migrating active processes for load sharing • “There are likely no conditions under which migration could yield major performance improvements beyond those offered by non-migratory load sharing…”
First Steps • Find the hole in ELZ’s argument • Its based on an “unusual” distribution • Is this the observed distribution in 1994? • Collect and Analyze • Something smells….
Interim Status • ELZ Model is limited • ELZ metric is mean residence time, not mean slowdown • ELZ process lifetime distribution is artificial • Lots of zero length processes
Tech Report • Not enough data for an article • It’s a work-in-progress • A note on “The Limited Performance Benefits of Migrating Active Processes for Load Sharing” • Document their results, no conclusions except that ELZ88 looks wrong
Next Steps • Collect data • Identify the empirical distributions • Define system model • Propose Algorithm • Simulate, Analyze • Repeat until done
Collect Data • So many machines, what to choose… • Start with local resources • 7 local time-sharing servers • I.E. Mangal, Pita, Inferno, Sands • Use available tools • lastcomm – print out information about previously executed commands
Identify Empirical Distributions • Graph data • Matlab, ploticus, gnuplot • Eyeball against known distributions • “Looks to me like a lognormal” • Use statistical tests to determine fit • Iteratively weighted least-squares-fit • Document it...
Define System Model • What is the definition of the system that you will analyze? • Real world systems are too complicated • Choose a model that has just the right amount of simplification • Too simple obvious or incorrect results • Too complicated much harder analysis
Propose Algorithm • The paper presents the one that worked • Start with something • Throw it away • Complicate: Add parameters • Simplify: Remove parameters • Refine until • It works • You can explain it
Simulate • Build a simulator of your model • Use Java, C++, Python, whatever • NS-2 • Prove empirically that the algorithm • Works • Is better than your competition
Analyze • Collect data points • Statistics • Mean, Std-Dev, etc. • Sensitivity to parameters
Compare to Competition • Choose the competition so that you look good • Change their algorithm to work in your model • Compare the options (using restricted parameters)
Write Paper • Now document your work