Enhancing Spark Debugging with Arthur Interactive Replay

Arthur Ankur Dave, MateiZaharia, Murphy McCauley,Scott Shenker, Ion Stoica The Spark Debugger UC BERKELEY

Motivation Debugging large parallel jobs is hard Current approaches to debugging: • Repeatedly modify and rerun the program • Run isolated code in Spark shell

Introducing Arthur Interactive replay debugger for Sparkprograms • Reconstruct and query intermediate datasets • Visualize the program’s data flow • Rerun any task in a single-process debugger • Trace records across transformations • Aggregate exceptions at the master

Spark Programming Model Example: Find how many Wikipedia articles match a search term HDFS file map(_.split(‘\t’)(3)) Resilient Distributed Datasets (RDDs) articles Deterministic transformations filter(_.contains( “Berkeley”)) matches count() 10,000

Approach lineage, checksums, events Master Workers Log results, checksums, events tasks

Approach Master Workers lineage Log user input results,checksums tasks

Detecting Nondeterministic Transformations Re-running a nondeterministic transformation may yield different results Arthur checksums RDD contents and alerts the user if necessary

Demo Example dataset: 1 GB partial Wikipedia dump • Reconstruct and query intermediate datasets • Visualize the program’s data flow • Rerun any task in a single-process debugger

Record Tracing Example: query a databaseof users and groups HDFS file A HDFS file B map(_.split(‘\t’)) map(_.split(‘\t’)) users groups join() groupCounts

Performance Event logging introduces minimal overhead

Future Plans • More analyses like backward tracing and culprit detection • Profiling tools for GC and memory • Real bugs

Arthur is in development at https://github.com/mesos/spark, branch arthur Documentation: https://github.com/mesos/spark/wiki/Spark-Debugger Ankur Dave ankurd@eecs.berkeley.edu http://ankurdave.com

Enhancing Spark Debugging with Arthur Interactive Replay

Enhancing Spark Debugging with Arthur Interactive Replay

Presentation Transcript

Rr0d: The Rasta Ring0 Debugger

iSeries GUI Debugger

Helikaon Linux Debugger:

The Spark

The Spark

The Spark

Debugger

TotalView Debugger

Spark Debugger

Rr0d: The Rasta Ring0 Debugger

The Perl Debugger

Lab 1 – Learning the Debugger

The debugger

Debugger?

The Spark

Debugger

Exploring Objects with the Debugger

The Debugger and Inspector

Debugger