Spark Debugger

Spark Debugger Ankur Dave, MateiZaharia, Murphy McCauley,Scott Shenker, Ion Stoica UC BERKELEY

Motivation Debugging distributed programs is hard Debuggers for general distributed systems incur high overhead Spark model enables debugging for almost zero overhead

Spark Programming Model Example: Find how many Wikipedia articles match a search term HDFS file map(_.split(‘\t’)(3)) Resilient Distributed Datasets (RDDs) articles Deterministic transformations filter(_.contains( “Berkeley”)) matches count() 10,000

Debugging a Spark Program Debug the individual transformations instead of the whole system • Rerun tasks • Recompute RDDs Debugging a distributed program is now as easy as debugging a single-threaded one Also applies to MapReduce and Dryad

Approach As Spark program runs, workers report key events back to the master, which logs them Performance stats Master Exceptions Worker Worker Worker RDD checksums Event log

Approach Later, user can re-execute from the event log to debug in a controlled environment Debugger Master Worker Worker Worker Event log

Detecting Nondeterministic Transformations Re-running a nondeterministic transformation may yield different results We can use RDD checksums to detect nondeterminism and alert the user

Demo Example app: PageRank on Wikipedia dataset

Performance Event logging introduces minimal overhead

Future Plans • Culprit determination • GC monitoring • Memory monitoring

Ankur Dave ankurd@eecs.berkeley.edu http://ankurdave.com The Spark debugger is in development at https://github.com/mesos/spark, branch event-log Try Spark at http://spark-project.org!

Spark Debugger

Spark Debugger

Presentation Transcript

iSeries GUI Debugger

Helikaon Linux Debugger:

Debugger

Spark

Spark

Spark

TotalView Debugger

Spark

Spark

Spark

The Spark Debugger

Spark

The debugger

Debugger?

Debugger

Spark

Debugger