170 likes | 292 Vues
This paper by Antony Rowstron explores the scalability of Linda-like languages through the use of Distributed Hash Tables (DHTs). It contrasts traditional shared memory approaches with message passing and discusses the challenges and opportunities in transitioning to open systems. The paper addresses performance issues related to tuple processing and highlights the need for effective runtime systems. By utilizing DHTs, the paper proposes a simplified API for managing tuples and routing in large-scale networks, paving the way for efficient communication in internet-scale applications.
E N D
“Tuple Space” Scalability: Use a DHT! Antony Rowstron Microsoft Research Cambridge
Linda-like languages: Looking back to the early days • Originally proposed for parallel processing • Shared memory versus message passing • Simple: in, out, rd, (inp, rdp) • Complex compile-time analysis • Closed systems • Translate “shared memory” to “message passing” • Challenge: performance better than message passing • Limited success
Linda: a paradigm for open systems“The second wave” • Exploit temporal and spatial separation • Many different extensions proposed • New primitives • Multiple tuple spaces • Access-control • Open systems • New run-time systems required • Scale: • Networks of Workstations through to the Internet
Linda runtimes: An overview out(<10, “hello”>) in(<?int, “hello”>) <10, “hello”> <10, “hello”> Linda runtime
Linda runtimes I in(<?int, “hello”>) out(<10, “hello”>) <10, “hello”> <10, “hello”> <10, “hello”>
Linda runtimes II <?int, “hello”> H( <int,string> ) <10, “hello”> H( <int,string> ) <int, string>
The main challenge: Hashing <10, “hello”> H( <int,string> )
The challenge: The hashing issue • Distributing the load needs a good function • Uniform distribution • But, Linda: • Tuples and templates • Open systems: resorts to types only • Small set of input symbols for hash function • <?int>,<?bool>,<?float>,<?string>… etc • 1-element templates map to ~ 10 unique keys • 2-element templates map to ~ 100 unique keys • Outcome: Difficult to implement scalable runtimes
Get rid of the hash function • Move the hash function into the application • E.g. Distributed Hash Table • Simple API: • Put(key, value) • Get(key) • Looks very familiar (in,out) • Outcome: Possible to implement scalable runtimes
key nodeId DHTs: Peeking under the covers • Large id space • NodeIds picked randomly from space • Keys picked randomly from space • Key is managed by its rootnode: • Live node with id closest to the key id space root node for key
Node routing state 203231 nodeId leaf set • Topology aware routing table • NodeIds and keys in some base 2b (e.g., 4) • Prefix constraints on nodeIds for each slot • Pick closest node satisfying slot constraints
key nodeId Routing • Prefix matching: each hop resolves an extra key digit 323310 323211 route(m, 323310) 203231 322021 313221
Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS entry: • Rd(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))
Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS entry: • Rd(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))
Example: DNS service • Linda: • Add DNS entry: • Out(“msrc401.europe.microsoft.com”,157.58.16.56) • Lookup DNS: • In(“msrc401.europe.microsoft.com”, ?IP address) • DHTs • Add DNS entry: • Put(SHA1(msrc401.europe.microsoft.com”), 157.58.16.56) • Lookup DNS entry: • IP Address = Get(SHA1(msrc401.europe.microsoft.com”))
The Drawback: Nothing comes free! • Range/complex queries • But in, out, rd, (inp and rdp) does not really do enumeration E.g. Find me the host names associated with IPAddresses 92.10.10.1 to 192.10.10.254 Vanilla Linda: For (inti = 1; i < 255; i++) { IPAddressaddr = new IPAddress(192.10.10.i); Tuple t = rdp(?string,addr) } Extensions: Tuple[] tuples = fetch(?string, 192.10.10.1 -> 192.10.10.254);
Questions? • Question: “Should you be using a DHT?” • Sub-questions: • “Do we need an implicit hash function?” • “Do we need complex querying/matching?” • “Do we need great scalability?”