Hadoop System simulation with Mumak

Hadoop System simulation with Mumak Fei Dong, TianyuFeng, Hong Zhang Dec 8, 2010

Agenda • Objective • Comparison between MRPerf and Mumak • Modifications to Mumak • Results and discussion • Conclusion

Objective • Large scale distributed system has enormous amount of parameters. • Running time of a user program depends non-linearly on these parameters. • Predict the running time under various settings to help user choose the “optimal” setting. • We start by varyingthe most basic parameter: cluster size.

MRPerf and Mumak • MRPerf • Build upon a network simulator • Calculate the task running time and network delay from physical parameters • Implemented the Hadoop system in TCL • Flexible in simulation

MRPerf and Mumak Running Time Map slots per node Reduce slots per node 4 nodes double rack data center (Chunk Size = 64M) By MRPerf

MRPerf and Mumak 4 nodes (Chunk Size = 64M) By Mumak

MRPerf and Mumak • Mumak • Inherit the JobTracker class from Hadoop and only defines the simulation interface • Use trace file to build the cluster topology / job story, then feed it into simulator • Can only reproduce previous finished experiment • Designed to verify/debug Hadoop system design • Only simulate the Map/Reduce tasks, no sort phase and shuffle phase

MRPerf and Mumak • The approach taken by MRPerf is better • Take in parameters to estimate running time • Can make predictions • MRPerf is simulating their implementation of Hadoop • The design of Mumak is better • Inherit source code from Hadoop • Easy to understand and to extend • We decide to take the good parts of MRPerf and then implement them in the framework of Mumak • Modify the Rumen log to change the parameters • Modify Mumak source code to add network simulator

Implementation • Simulate a different cluster size • Hack the rumen log, change data replication factor/ locality • Modify the topology, add in / delete nodes, for example, from 2 slave nodes to 6 slave nodes. • The job tracker will assign the tasks to different nodes.

Implementation • Simulate network delay • We defined a simple network simulator interface • Modified the source code of Mumak to add in the network delay • Actual the network delay can be ignored

Results and Discussion

Results and Discussion • Limitations and future work • Sort phase time not included • Only used single rack topology • Prediction is not always consistent for the same job with the same configuration

Conclusion • Our objective is to predict the running time with different parameters • We take the methods of MRPerf and implemented it on Mumak • To have more flexible and accurate prediction, more modification to Mumak is needed • Independent from trace file • Solve the unstable problem

Questions?

Hadoop System simulation with Mumak

Hadoop System simulation with Mumak

Presentation Transcript

Wireless System Simulation

Hadoop File System

System Simulation Method

Harnessing Big Data with Hadoop

Hadoop Distributed File System

Hadoop Distributed File System

Hadoop , Hadoop , Hadoop !!!

The Hadoop Distributed File System

HDFS ( Hadoop Distributed File System)

Inside Mumak

Computer System Simulation

Parallel Simulation System

Hadoop File System

HDFS Hadoop Distributed File System

System Simulation

Performing System Simulation

The Hadoop Distributed File System

Career With Hadoop

Video Analytics with Hadoop

Hadoop File System

Battery Simulation System