1 / 22

簡報人:碩資工一甲 董耀文

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian , Haojie Zhou , Yongqiang He,Li Zha. 簡報人:碩資工一甲 董耀文 . Outline. Background Question? So! Related work MapReduce procedure analysis MR-Predict Schedule policys Evaluation Conclusion. Background .

delora
Télécharger la présentation

簡報人:碩資工一甲 董耀文

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Dynamic MapReduce Scheduler for Heterogeneous WorkloadsChao Tian, Haojie Zhou , YongqiangHe,LiZha 簡報人:碩資工一甲 董耀文

  2. Outline • Background • Question? • So! • Related work • MapReduce procedure analysis • MR-Predict • Schedule policys • Evaluation • Conclusion

  3. Background • As the Internet scale keeps growing up, enormous data needs to be processed in many Internet Service Providers. • MapReduce framework is now becoming a leading example solution, it’s designed for building large commodity cluster, which consist of thousands of nodes by using commodity hardware.

  4. Background • The performance of a parallel system like MapReduce system closely ties to its task scheduler. • Current scheduler in Hadoopuses a single queue for scheduling jobs with a FCFS method. • Yahoo’s capacity scheduler as well as Facebook’s fair scheduler uses multiple queues for allocation differnet resource in the cluster.

  5. Background • In practical, different kinds of jobs often simultaneously run in the data center.These different jobs make different workloads on the cluster, including the I/O-bound and CPU-bound workloads.

  6. Background • The characters of workloads are not aware by Hadoop's scheduler which prefers to simultaneously run map tasks from the same job on the top of queue. • This may reduce the throughput of the whole system which seriously influences the productivity of data center, because tasks from the same job always have the same character.

  7. Question • How to improve the hardware utilization rate when different kinds of workloads run on the clusters in MapReduceframework?

  8. SO! • They design a new triple-queue scheduler which consist of a workload predict mechanism MR-Predict and three different queues (CPU-bound queue, I/O-bound queue and wait queue). • They classify MapReduceworkloads into three types, and their workload predict mechanism automatically predicts the class of a new coming job based on this classification. • Jobs in the CPU- bound queue or I/O-bound queue are assigned separately to parallel different type of workloads. • Their experiments show that can Approach could increase the system throughput up to 30%

  9. Related work • Scheduling algorithms in parallel system [11,…] • Applications have different workloads • large computation and I/O requirements [10]. • How I/O-bound jobs affect system performance[6]. • A gang schedule algorithm which parallel the CPU- bound jobs and IO-bound jobs to increasing the utilization of hardware[7].

  10. Related work • The schedule problem in MapReduceattracted many attentions[2,10]. • Yahoo and Facebook designed schedulers of Hadoop as capacity scheduler [4] and Fair scheduler [5].

  11. MapReduce procedure analysis • Map-shuffle phase • Init input data • Compute map task • Store ouput result to local disk • Shuffle map tasks result data out • Shuffle reduce input data in

  12. MapReduce procedure analysis • Reduce-Compute phase • tasks run the application logic

  13. MR-Predict

  14. Schedule policys

  15. Schedule policys

  16. Evaluation • Environment • 6 node connect gigabyte Etherent. • DELL1950 • CPU: 2 Quard Core 2.0GHz • Memory: 4GB • Disk: 2 SATA disk • Input data: 15GB • map slots & reduce slot: 8 • DIOR: 31.2 MB/s (without reduce phase in Hadoop)

  17. Evaluation • Resource utilizations TeraSort: Total order sort (sequential I/O )benchmark 8 ( 64MB + 64 MB ) / 8 >= 31.2 MB/s

  18. Evaluation • Resource utilizations Grep-Count: use [.]* as the regular expression. 8 ( 64MB + 1MB + 1MB + SID ) / 92 >= 31.2 MB/s

  19. Evaluation • Resource utilizations WordCount: It splits the input text into words, shuffles every word in map phase and counts its occupation number in reduce phase. 8 ( 64MB + 64 MB + 64MB + SID ) / 35 >= 31.2 MB/s

  20. Evaluation • Triple queue scheduler experiments • Every job runs five times & total 15 jobs will run

  21. Conclusion • Scheduler correctly distributes jobs into different queues in most situations. • Triple Queue Scheduler could • increase the map tasks throughput 30% • save the makespan 20%

  22. Thank you for listening.

More Related