Reliable Distributed Messaging System with Flexible Routing and Fault Tolerance
This paper by Peter Bui and Aaron Dingler presents a robust distributed messaging system, Spice Messaging Queue (SMQ), designed to facilitate reliable communication, event-based computation, and flexible routing. The system includes a simple text-based RPC method akin to Chirp, and features a server for message processing, client utilities, and a flexible naming scheme. It emphasizes fault tolerance with mechanisms for handling send and processing failures, allowing for graceful restarts and efficient data flow through the architecture. The work includes examples and future improvements focusing on optimizations and additional features.
Reliable Distributed Messaging System with Flexible Routing and Fault Tolerance
E N D
Presentation Transcript
Spice Messaging QueueCSE 60771 Distributed Systems Peter Bui and Aaron Dingler
Problem (Continued) • Reliable communication • Event-based computation • Flexible routing
Solution • Spice Messaging Queue • Simple text-based RPC (similar to Chirp) • Server for receiving and processing messages • Client utilities and library for access to RPC • Flexible naming scheme • Fault-tolerance • Send failures • Processing failures • Restart gracefully SMQ = Mailboxes + Active Storage
Example (Overview) • Data GeneratorClient Application • Data ProcessorBinding • Data SinkQueue
Example (Create Queues) • Create PNG sink queue$ smq_create png-sink • Create converter queue$ smq_create student00.cse.nd.edu:convert-2-png • Check status$ smq_status SMQ_QUEUE NAME SMQ_PORT VERSION MESSAGES png-sink cclws14.cse.nd.edu 9319 0.0.3 0 relay cclws14.cse.nd.edu 9319 0.0.3 0 relay student00.cse.nd.edu 9319 0.0.3 0 convert-2-png student00.cse.nd.edu 9319 0.0.3 0
Example (Bind Computation) • Bind script$ smq_bind convert-2-png png-to-jpg.rb • List bindings$ smq_bindings convert-2-pngpng-to-jpg.rb • Unbind script$ smq_unbind convert-2-png png-to-jpg.rb • List bindings$ smq_bindings convert-2-png
Example (Put) • Generate message header for each image (body)$ cat mid.metasource data-generatortarget png-sinksubject convert-2-pngoutfile $fileid.png • Put message (header/body)$smq_put convert-2-png mid.meta mid.tiff
Example (List/Get) • List messages in queue$ smq_list convert-2-png1272298206.5302241272297988.9134141272298197.1381051272298150.1394201272257106.876008 • Get message (header/body)$ smq_get convert-2-png 1272298197.138105$ ls1272298197.138105.meta1272298197.138105.body
Conclusions • The data must flow! • 1000 messages in approximately 450 seconds • 1.09 MB/s throughput • Reliable communication • Persistent communication channel • Relay will retry send until success • Flexible naming scheme • Specify host, or select any host • Use catalog server to record queue information • Relay is a binding • Fault tolerance • Use logs to record transactions • Replay on crash
Future Work • Missing RPCs • Delete queue • Remove message • Re-bind script • More testing • Run out of disk space? • More complex pipelines • Understand semantics of edge cases (e.g. failure to bind) • Additional features • Authentication • Optimization (e.g. hardlink vs. RPC) • Scriptable client