1 / 32

Lecture 29: Distributed Systems

Lecture 29: Distributed Systems. CS 105 May 8, 2019. Conventional HPC System. Compute nodes High-end processors Lots of RAM Network Specialized Very high performance Storage server RAID disk array. Conventional HPC Programming Model.

jazmine
Télécharger la présentation

Lecture 29: Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 29: Distributed Systems CS 105 May 8, 2019

  2. Conventional HPC System • Compute nodes • High-end processors • Lots of RAM • Network • Specialized • Very high performance • Storage server • RAID disk array

  3. Conventional HPC Programming Model • Programs described at a very low level • detailed control of processing and scheduling • Rely on a small number of software packages • written by specialists • limits problems and solutions methods

  4. Typical HPC Operation • Characteristics: • long-lived processes • make use of special locality • hold all data in memory • high-bandwidth communication • Strengths • High utilitization of resources • Effective for many scientific applications • Weaknesses • Requires careful tuning of applications to resources • Intolerant of any variability

  5. HPC Fault Tolerance • Checkpoint • Periodically store state of all processes • Significant I/O traffic • Restore after failure • Reset state to last checkpoint • All intervening computation wasted • Performance scaling • Very sensitive to the number of failing components Wasted

  6. Datacenters

  7. Ideal Cluster Programming MOdel • Applications written in terms of high-level operations on the data • Runtime system controls scheduling, load balancing

  8. MapReduce

  9. Map computation across many data objects Aggregate results in many different ways System deals with resource allocation and availability MapReduce Programming Model

  10. In parallel, each worker computes word counts from individual files Collect results, wait until all finished Merge intermediate output Compute word count on merged intermediates Example: Word Count

  11. Process pieces of the dataset to generate (key, value) pairs in parallel Parallel Map Welcome everyone Hello everyone Welcome 1 everyone 1 Hello 1 everyone 1 Map Task 1 Map Task 2

  12. Merge all intermediate values per key Reduce Welcome 1 everyone 1 Hello 1 everyone 1 everyone 2 Welcome 1 Hello 1

  13. Merge all intermediate values in parallel: partition keys, assign each key to one reduce task Partition

  14. MapReduce API

  15. WordCount with MapReduce

  16. WordCount with MapReduce

  17. MapReduce Execution

  18. Map worker writes intermediate output to local disk, separated by partitioning; once completed, tells master node • Reduce worker told of location of map task outputs, pulls their partition’s data from each mapper, executes function across data • Note: • “All-to-all” shuffle between mappers and reducers • Written to disk (“materialized”) before each stage Fault Tolerance in MapReduce

  19. Master node monitors state of system • If master fails, job aborts • Map worker failure • In-progress and completed tasks marked as idle • Reduce workers notified when map task is re-executed on another map worker • Reducer worker failure • In-progress tasks are reset and re-executed • Completed tasks had been written to global file system Fault Tolerance in MapReduce

  20. a straggler is task that takes long time to execute • Bugs, flaky hardware, poor partitioning • For slow map tasks, execute in parallel on second “map” worker as backup, race to complete task • When done with most tasks, reschedule any remaining executing tasks • Keep track of redundant executions • Significantly reduces overall run time Stragglers

  21. Modern Data Processing

  22. Goal 1: Extend the MapReduce model to better support two common classes of analytics apps • Iterative algorithms (machine learning, graphs) • Interactive data mining • Goal 2: Enhance programmability • Integrate into Scala programming language • Allow interactive use from Scala interpreter • Also support for Java, Python... Apache Spark

  23. Most current cluster programming models are based on acyclic data flow from stable storage to stable storage • Example: MapReduce • these models inefficient for applications that repeatedly reuse a working set of data • Iterative algorithms (machine learning, graphs) • Interactive data mining (R, Excel, Python) Data Flow Models Map Map Map Map Map

  24. Resilient distributed datasets (RDDs) are immutable, partitioned collections of objects spread across a cluster, stored in RAM or on disk • Created through parallel transformations (map, filter, groupBy, join, ...) on data in stable storage • Allow apps to cache working sets in memory for efficient reuse • Retain the attractive properties of MapReduce • Fault tolerance, data locality, scalability • Actions on RDDs support many applications • Count, reduce, collect, save... Resilient Distributed Datasets (RDDs)

  25. Transformations: define a new RDD • map, flatMap, filter, sample, groupByKey, sortByKey, union, join, etc. • Actions: return a result to the driver program • collect, reduce, count, lookupKey, save Spark Operations

  26. Example: WordCount

  27. Goal: find best line separating two sets of points Example: Logistic Regression valrdd = spark.textFile(...).map(readPoint) val data = rdd.cache() var w = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = data.map(p => (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x ).reduce(_ + _) w -= gradient } println("Final w: " + w) random line best-fit line

  28. Example: Logisitic Regression

  29. Spark Scheduler • creates DAG of stages • Pipelines functions within a stage • Cache-aware work reuse & locality • Partitioning-aware to avoid shuffles

  30. RDD maintains lineage information that can be used to reconstruct lost partitions RDD Fault Tolerance valrdd = spark.textFile(...).map(readPoint).filter(...) File Filtered RDD Mapped RDD

  31. Machines Fail If you have lots of machines, machines will fail frequently Goals: Reliability, Consistency, Scalability, Transparency Abstractions are good, as long as they don’t cost you too much Distributed Systems Summary

  32. So what's the take away…

More Related