160 likes | 289 Vues
This paper discusses the simulation of large-scale distributed systems using Hadoop, focusing on the comparison between MRPerf and Mumak. By modifying Mumak, we aim to improve the flexibility and accuracy of running time predictions under various parameter settings. The study evaluates the running time based on cluster size and other physical parameters while addressing the limitations of both simulators. Conclusively, we identify the strengths of MRPerf and integrate them into Mumak to enhance overall performance predictions, thereby supporting users in selecting optimal configurations.
E N D
Hadoop System simulation with Mumak Fei Dong, TianyuFeng, Hong Zhang Dec 8, 2010
Agenda • Objective • Comparison between MRPerf and Mumak • Modifications to Mumak • Results and discussion • Conclusion
Objective • Large scale distributed system has enormous amount of parameters. • Running time of a user program depends non-linearly on these parameters. • Predict the running time under various settings to help user choose the “optimal” setting. • We start by varyingthe most basic parameter: cluster size.
MRPerf and Mumak • MRPerf • Build upon a network simulator • Calculate the task running time and network delay from physical parameters • Implemented the Hadoop system in TCL • Flexible in simulation
MRPerf and Mumak Running Time Map slots per node Reduce slots per node 4 nodes double rack data center (Chunk Size = 64M) By MRPerf
MRPerf and Mumak 4 nodes (Chunk Size = 64M) By Mumak
MRPerf and Mumak • Mumak • Inherit the JobTracker class from Hadoop and only defines the simulation interface • Use trace file to build the cluster topology / job story, then feed it into simulator • Can only reproduce previous finished experiment • Designed to verify/debug Hadoop system design • Only simulate the Map/Reduce tasks, no sort phase and shuffle phase
MRPerf and Mumak • The approach taken by MRPerf is better • Take in parameters to estimate running time • Can make predictions • MRPerf is simulating their implementation of Hadoop • The design of Mumak is better • Inherit source code from Hadoop • Easy to understand and to extend • We decide to take the good parts of MRPerf and then implement them in the framework of Mumak • Modify the Rumen log to change the parameters • Modify Mumak source code to add network simulator
Implementation • Simulate a different cluster size • Hack the rumen log, change data replication factor/ locality • Modify the topology, add in / delete nodes, for example, from 2 slave nodes to 6 slave nodes. • The job tracker will assign the tasks to different nodes.
Implementation • Simulate network delay • We defined a simple network simulator interface • Modified the source code of Mumak to add in the network delay • Actual the network delay can be ignored
Results and Discussion • Limitations and future work • Sort phase time not included • Only used single rack topology • Prediction is not always consistent for the same job with the same configuration
Conclusion • Our objective is to predict the running time with different parameters • We take the methods of MRPerf and implemented it on Mumak • To have more flexible and accurate prediction, more modification to Mumak is needed • Independent from trace file • Solve the unstable problem