1 / 45

Satisfying Strong Application Requirements in Data-Intensive Clouds

Satisfying Strong Application Requirements in Data-Intensive Clouds. Ph.D Final Exam Brian Cho. Motivating scenario: Using the data-intensive cloud. Researchers contract with defense agency to investigate ongoing suspicious activity e.g., botnet attack, worm, etc.

luna
Télécharger la présentation

Satisfying Strong Application Requirements in Data-Intensive Clouds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Satisfying Strong Application Requirementsin Data-Intensive Clouds Ph.D Final Exam Brian Cho

  2. Motivating scenario: Using thedata-intensive cloud • Researchers contract with defense agency to investigate ongoing suspicious activity • e.g., botnet attack, worm, etc. • Other applications: processing click logs, news items, etc. • Transfer large logs (TBs-PBs) from possible victim sites • Run computations on logs to find vulnerabilities and source of attack • Store data

  3. Can today’s data-intensive cloud meet these demands? The researchers require: • Control over time and $ cost of transfer, to stay within the contracted budget and time • Prioritization of this time-sensitive job over other jobs in its cluster • Consistent updates and reads at data store • Current limitation: Systems are built to optimize key metrics at large scales, but not to meet these strong user requirements

  4. Strong user requirements • Many real-world requirements are too important to relax • Time • $$$ • Priority • Data consistency • It is essential to treat these strong requirements as problem constraints • … not just as side effects of resource limitations in the cloud

  5. Thesis statement • It is feasible to satisfy strong application requirements for data-intensive cloud computing environments, in spite of resource limitations, while simultaneously optimizing run-time metrics. • Strong application requirements: real-time deadlines, dollar budgets, data consistency, etc. • Resource limitations: finite compute nodes, limited bandwidth, high latency, frequent failures, etc. • Run-time metrics: throughput, latency, $ cost, etc.

  6. Contributions: Practical solutions Bulk Data Transfer Computation Key-value Storage

  7. Pandora-A: Bulk Data Transfer via Internet and Shipping Networks • Minimize $ costsubject to time deadline • Transfer options • Internet links with proportional costs but limited bandwidth • Shipping links with fixed costs and shipping times depending on method (e.g. ground, air) • Solution • Transform into time-expanded network • Solve min-cost flow on network • Trace-driven experiments • Pandora-A solutions better than direct Internet or shipping

  8. Pandora-B: Bulk Data Transfer via Internet and Shipping Networks UB LB • Minimize transfer timesubject to $ budget • Bounded binary search on Pandora-A solutions • Bounds created by transforming time-expanded networks Dollar Cost ($) B Transfer Time T (hrs)

  9. Vivace: Consistent data for congested geo-distributed systems • Strongly consistentkey-value store • Low latency across geo-distributed data centers • Under congestion • New algorithms • Prioritize a small amount of critical information • To avoid delay due to congestion • Evaluated using a practical prioritization infrastructure

  10. Natjam: Prioritizing production jobsin MapReduce/Hadoop • Mixed workloads • Production jobs • Time sensitive • Directly affect revenue • Research jobs • e.g., long term analysis • Example: Ad provider Ad click-through logs Count clicks Update ads Is there a better way to place ads? Slow counts → Show old ads → Don’t get paid $$$ Run machine learning analysis Prioritize production jobs Lots of historical logs. Need a large cluster.

  11. Contributions • Natjamprioritizes production jobs • While giving research jobs spare capacity • Suspend/Resume tasks in research jobs • Production jobs can gain resources immediately • Research jobs can use many resources at a time, without wasting work • Develop eviction policies that choose which tasks to suspend

  12. Natjam Outline • Motivation • Contributions • Background: MapReduce/Hadoop • State-of-the-art • Solution: Suspend/Resume • Design • Evaluation

  13. Background: MapReduce/Hadoop • Distributed computation on large cluster • Each job consists of Map and Reduce tasks • Job stages • Map tasks run computations in parallel • Shuffle combines intermediate Map outputs • Reduce tasks run computations in parallel M R M M R R M M

  14. Background: MapReduce/Hadoop • Distributed computation on large cluster • Each job consists of Map and Reduce tasks • Job stages • Map tasks run computations in parallel • Shuffle combines intermediate Map outputs • Reduce tasks run computations in parallel • Map input/Reduce output stored in distributed file system (e.g. HDFS) • Scheduling: Which task to run on empty resources (slots) Job 1 Job 3 M R M M R R R M R R M M M R M M R M M M M M M M Job 2

  15. State-of-the-art: Separate clusters • Submit production jobs to a production cluster • Submit research jobs to a research cluster

  16. State-of-the-art: Separate clusters • Submit production jobs to a production cluster • Submit research jobs to a research cluster • Trace of job submissions to Yahoo production cluster • Periods of under-utilization, where research jobs could potentially fill in 10000 # Reduce slots 8000 Reduce slot capacity 6000 4000 ( under- utilization ) 2000 0 1:00 0:20 0:40 time (hours:mins) Plot used with permission from Yahoo

  17. State-of-the-art: Single clusterHadoop scheduling • Ideally, • Enough capacity for production jobs • Run research tasks on all idle production slots • But, • Killing tasks (e.g. Fair Scheduler) can lead to wasted work 10000 # Reduce slots wasted work 8000 Reduce slot capacity 6000 4000 ( under- utilization ) 2000 0 1:00 0:20 0:40 time (hours:mins) Plot used with permission from Yahoo

  18. State-of-the-art: Single clusterHadoop scheduling • Ideally, • Enough capacity for production jobs • Run research tasks on all idle production slots • But, • Killing tasks (e.g. Fair Scheduler) can lead to wasted work • No preemption (e.g. Capacity Scheduler) can lead to production jobs waiting for resources 10000 # Reduce slots 8000 Reduce slot capacity 6000 4000 production jobs aren’t assigned resources 2000 0 1:00 0:20 0:40 time (hours:mins) Plot used with permission from Yahoo

  19. Approach: Suspend/Resume • Suspend/Resume tasks within and across research jobs • Production jobs can gain resources immediately • Research jobs can use many resources at a time, without wasting work • Focus on Reduce tasks • Reduce tasks take longer, so more work to lose (median Map 19 seconds vs. Reduce 231 seconds [Facebook]) 10000 # Reduce slots 8000 Reduce slot capacity 6000 4000 2000 0 1:00 0:20 0:40 time (hours:mins) Plot used with permission from Yahoo

  20. Goals: Prioritize production jobs • Requirement: Production jobs should have the same completion time as if they were executed in an exclusive production cluster • Possibly with a small overhead • Optimization: Research jobs should have the shortest completion time possible • Constraint: Finite cluster resources

  21. Challenges • Avoid Suspend overhead • Would require production jobs to wait for resources • Avoid Resume overhead • Would delay research jobs from making progress • Optimize task evictions • Job completion time is metric that users care about • Develop eviction policies that have the least impact on job completion times

  22. Natjam Design • Motivation • Contributions • Background: MapReduce/Hadoop • State-of-the-art • Solution: Suspend/Resume • Design • Evaluation • Scheduler • Hadoop→ Natjam • Architecture • Hadoop→ Natjam • Suspend/Resume tasks • Eviction Policies • Task • Job

  23. Background: Capacity Scheduler • Limitation: research jobs cannot scale down • Hadoop capacity shared using queues • Guaranteed capacity (G) • Maximum capacity(M)

  24. Background: Capacity Scheduler • Limitation: research jobs cannot scale down • Hadoop capacity shared using queues • Guaranteed capacity (G) • Maximum capacity(M) • Example • Production (P) queue:G 80%/M 80% • Research (R) queue:G 20%/M 40% • Production jobsubmitted first: • Research jobsubmitted first: (under-utilization) P takes 80% R takes 40% time → R grows to 40% P cannot grow beyond 60% (under-utilization) time →

  25. Natjam Scheduler • Does not require Maximum capacity • Scales down research jobs

  26. Natjam Scheduler • Does not require Maximum capacity • Scales down research jobs • P/R Guaranteed: 80%/20% • P/RGuaranteed: 100%/0% R takes 100% R takes 100% time → P takes 80% P takes 100% time → Prioritize Production Jobs

  27. Background: Hadoop YARN architecture • Resource Manager • Application Master per application • Tasks are launched on containers of memory • Formerly, slots in Hadoop Resource Manager Capacity Scheduler ask container Node A Node B Node Manager A Node Manager B Task (App1) Application Master 1 Application Master 2 Task (App2) (empty container)

  28. Suspend/Resume architecture • Preemptor • Decides when resources should be reclaimed from queues • Chooses victim job • Releaser • Chooses task to evict • Local Suspender • Saves state • Promptly exits • Messaging overheads Resource Manager Capacity Scheduler preempt() Preemptor ask container # containers to release Node A Node B Node Manager A Node Manager B suspend Task (App1) Application Master 1 Task (App2) Application Master 2 Task (App2) saved state resume() release() (empty container) Local Suspender Local Suspender Releaser Releaser

  29. Suspending and Resuming Tasks • When suspending, we must save enough state to be used when resuming the task. • By using existing intermediate datawe save small state • Simple • Low overhead

  30. Suspending and Resuming Tasks • Existing intermediate data used • Reduce inputs,stored at local host • Reduce outputs,stored on HDFS • Suspend state saved • Key counter • Reduce input path • Hostname • List of suspended task attempt IDs (Suspended) Container freed, Suspend state saved HDFS Task Attempt 1 tmp/task_att_1 Key Counter Key Counter outdir/ Inputs (Resumed) Task Attempt 2 tmp/task_att_2 (skip) Inputs

  31. Two-level Eviction Policies • Job-level Eviction • Chooses victim job • Task level-eviction • Chooses task to evict Resource Manager Capacity Scheduler preempt() Preemptor # containers to release Node A Node B Node Manager A Node Manager B Application Master 1 Task (App2) Application Master 2 Task (App2) release() Local Suspender Local Suspender Releaser Releaser

  32. Task eviction policies • Based on time remaining • Last task to finish decides job completion time • Task that finishes earlier releases container earlier • Application Master keeps track of time remaining • Shortest Remaining Time (SRT)  Shortens the tail  Holds on to containers that would be released soon • Longest Remaining Time (LRT)  May lengthen the tail Releases containers as soon as possible

  33. Job eviction policies • Based on amount of resources (e.g. memory) held by job • Resource Manager holds resource information • Least Resources (LR)  Large jobs benefit  Starvation even with small production jobs • Most Resources (MR)  Small jobs benefit  Large jobs may be delayed for a long time • Probabilistically-weighted on Resources (PR)  Avoids biasing tasks: chance of eviction for task is same across all jobs, assuming random task eviction policy  Many jobs may be delayed

  34. Evaluation • Microbenchmarks • Trace-driven experiments • Natjam was implemented based on Hadoop 0.23 (YARN) • 7-node cluster in CCT

  35. Microbenchmarks: Setup • Avg completion times on empty cluster • Research Job: ~200s • Production Job: ~70s • Job sizes: XL (100% of cluster), L (75%), M (50%), S (25%) • Task workloads within a job chosen uniformly between range of (1/2 of largest task, largest task]

  36. Microbenchmark: Comparing Natjam to other techniques time (seconds) 7% more than ideal 40% less than Soft cap 50% more than ideal 90% more than ideal 20% more than ideal 2% more than ideal 15% less than Killing t=50s Production-S t=0s Research-XL

  37. Microbenchmark:Suspend overhead • 1.25s increase due to messaging delays • Task assignments happen in parallel: 4.7s increase in job completion time is • Assign Application Master • Assign Map tasks • Assign Reduce tasks 1.25 s (50%) increase

  38. Microbenchmark:Task eviction policies time (seconds) 17% less than Random t=50s Production-S t=0s Research-XL Theorem 1: When production tasks are the same length, SRT results in shortest job completion time.

  39. Microbenchmark:Job eviction policies time (seconds) Most Resources + SRT = good fit t=50s Production-S t=0s Research-L Research-S Theorem 2: When tasks within each job are the same length, evicting from the minimum number of jobs results in the shortest average job completion time.

  40. Trace-driven evaluation • Yahoo trace: scaled production cluster workload + scaled research cluster • Job completion times

  41. Trace-driven evaluation:Research jobs only 115 seconds

  42. Trace-driven evaluation:CDF of differences (negative is good)

  43. Related Work • Single cluster job scheduling has focused on: • Locality of Map tasks [Quincy, Delay Scheduling] • Speculative execution [LATE Scheduler] • Average fairness between queues [Capacity Scheduler, Fair Scheduler] • Recent work: Elastic queues [Amoeba] • We solve the requirement of prioritizing production jobs

  44. Natjam summary • Natjamprioritizes production jobs • Suspend/Resume tasks in research jobs • Eviction policies that choose which tasks to suspend • Evaluation • Microbenchmarks • Trace-drive experiments

  45. Conclusion • Thesis: It is feasible to satisfystrong application requirementsfor data-intensive cloud computing environments, in spite ofresource limitations,while simultaneously optimizingrun-time metrics. • Contributions: Solutions that reinforce this statement in diverse data-intensive cloud settings.

More Related