“Tuple Space” Scalability: Use a DHT!

“Tuple Space” Scalability: Use a DHT! Antony Rowstron Microsoft Research Cambridge

Linda-like languages: Looking back to the early days • Originally proposed for parallel processing • Shared memory versus message passing • Simple: in, out, rd, (inp, rdp) • Complex compile-time analysis • Closed systems • Translate “shared memory” to “message passing” • Challenge: performance better than message passing • Limited success

Linda: a paradigm for open systems“The second wave” • Exploit temporal and spatial separation • Many different extensions proposed • New primitives • Multiple tuple spaces • Access-control • Open systems • New run-time systems required • Scale: • Networks of Workstations through to the Internet

Linda runtimes: An overview out(<10, “hello”>) in(<?int, “hello”>) <10, “hello”> <10, “hello”> Linda runtime

Linda runtimes I in(<?int, “hello”>) out(<10, “hello”>) <10, “hello”> <10, “hello”> <10, “hello”>

Linda runtimes II <?int, “hello”> H( <int,string> ) <10, “hello”> H( <int,string> ) <int, string>

The main challenge: Hashing <10, “hello”> H( <int,string> )

The challenge: The hashing issue • Distributing the load needs a good function • Uniform distribution • But, Linda: • Tuples and templates • Open systems: resorts to types only • Small set of input symbols for hash function • <?int>,<?bool>,<?float>,<?string>… etc • 1-element templates map to ~ 10 unique keys • 2-element templates map to ~ 100 unique keys • Outcome: Difficult to implement scalable runtimes

Get rid of the hash function • Move the hash function into the application • E.g. Distributed Hash Table • Simple API: • Put(key, value) • Get(key) • Looks very familiar (in,out) • Outcome: Possible to implement scalable runtimes

key nodeId DHTs: Peeking under the covers • Large id space • NodeIds picked randomly from space • Keys picked randomly from space • Key is managed by its rootnode: • Live node with id closest to the key id space root node for key

Node routing state 203231 nodeId leaf set • Topology aware routing table • NodeIds and keys in some base 2b (e.g., 4) • Prefix constraints on nodeIds for each slot • Pick closest node satisfying slot constraints

key nodeId Routing • Prefix matching: each hop resolves an extra key digit 323310 323211 route(m, 323310) 203231 322021 313221

Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS entry: • Rd(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))

Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS: • In(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))

The Drawback: Nothing comes free! • Range/complex queries • But in, out, rd, (inp and rdp) does not really do enumeration E.g. Find me the host names associated with IPAddresses 92.10.10.1 to 192.10.10.254 Vanilla Linda: For (inti = 1; i < 255; i++) { IPAddressaddr = new IPAddress(192.10.10.i); Tuple t = rdp(?string,addr) } Extensions: Tuple[] tuples = fetch(?string, 192.10.10.1 -> 192.10.10.254);

Questions? • Question: “Should you be using a DHT?” • Sub-questions: • “Do we need an implicit hash function?” • “Do we need complex querying/matching?” • “Do we need great scalability?”

“Tuple Space” Scalability: Use a DHT!

“Tuple Space” Scalability: Use a DHT!

Presentation Transcript

Intellectual Scalability

Scalability

Choreography: Use of Space

Tuple Spaces and JavaSpaces

Driver Scalability

Significance of DHT

Handling Churn in a DHT

DO NOT USE THIS SPACE DO NOT USE THIS SPACE DO NOT USE THIS SPACE DO NOT USE THIS SPACE

Scalability

Parallel Scalability

Scalability

Tuple Space Model

Space Use Overview

Scalability

Scalability and Development of Space Networks

Scalability

CARS Space Use Matrix

DHT Selection

Packet classification using diagonal-based tuple space search

Tuple and Domain Calculus

OpenDHT: A Shared, Public DHT Service

SCALABILITY ANALYSIS