1 / 11

Spark Debugger

Spark Debugger. Ankur Dave , Matei Zaharia , Murphy McCauley, Scott Shenker , Ion Stoica. UC BERKELEY. Motivation. Debugging distributed programs is hard Debuggers for general distributed systems incur high overhead Spark model enables debugging for almost zero overhead.

chuong
Télécharger la présentation

Spark Debugger

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spark Debugger Ankur Dave, MateiZaharia, Murphy McCauley,Scott Shenker, Ion Stoica UC BERKELEY

  2. Motivation Debugging distributed programs is hard Debuggers for general distributed systems incur high overhead Spark model enables debugging for almost zero overhead

  3. Spark Programming Model Example: Find how many Wikipedia articles match a search term HDFS file map(_.split(‘\t’)(3)) Resilient Distributed Datasets (RDDs) articles Deterministic transformations filter(_.contains( “Berkeley”)) matches count() 10,000

  4. Debugging a Spark Program Debug the individual transformations instead of the whole system • Rerun tasks • Recompute RDDs Debugging a distributed program is now as easy as debugging a single-threaded one Also applies to MapReduce and Dryad

  5. Approach As Spark program runs, workers report key events back to the master, which logs them Performance stats Master Exceptions Worker Worker Worker RDD checksums Event log

  6. Approach Later, user can re-execute from the event log to debug in a controlled environment Debugger Master Worker Worker Worker Event log

  7. Detecting Nondeterministic Transformations Re-running a nondeterministic transformation may yield different results We can use RDD checksums to detect nondeterminism and alert the user

  8. Demo Example app: PageRank on Wikipedia dataset

  9. Performance Event logging introduces minimal overhead

  10. Future Plans • Culprit determination • GC monitoring • Memory monitoring

  11. Ankur Dave ankurd@eecs.berkeley.edu http://ankurdave.com The Spark debugger is in development at https://github.com/mesos/spark, branch event-log Try Spark at http://spark-project.org!

More Related