Distributed Coordination- Based Systems I

Distributed Coordination-Based Systems I CSE5306 Lecture Quiz due at 5 PM on 4 August 2014

Distributed Coordination-Based Systems We have studied how data types change, when we distribute inherently centralized systems. Now let’s consider how to coordinate the various activities of inherently distributed systems. A Four-Ship of Chinese SU-27 Flankers [Google Images]

Intro to Coordination Models 4 1 Any multithreaded process has specialized computing parts and coordination parts. Now let us model the latter, the glue that binds the threads together…. • Mailbox coordination has temporal (T) decoupling (D) and referential (R) coupling (C): conventional snail mail. • Generative communication (TD&RD): the Internet is happy to publish everything I know, whether or not anyone everGoogles it. • Meeting-oriented coordination(TC&RD): company executives negotiate assembly line workers’ salaries with (nameless) union representatives in their annual contract event. (Gentlemen’s Quarterly publishes men’s fashion news for any subscriber interested in men’s fashions.) • Direct coordination (TC&RC): conventional phone calls. 2 3

Architectures • What if data items sent to receivers are not clearly identified…? • Before I post my idea on the Internet, I fill it with currently “hot” keywords to attract lots of Google searches. • Likewise the subscription that a publishing process passes to its middleware contains a data item description full of interesting (attribute, value) and (attribute, range) pairs. • Subscribers can match their interests with published free data (TC&RD), or they can read large, expensive data items that a publisher advertises (i.e., storage with notification). • A subscriber may ask to be notified of an event (TD&RD); e.g., anyone in the bank after midnight! But it is hard for an event publisher to stay decoupled from increasingly diverse subscribers, while efficiently scaling upward.

R U O K ? Match the following coordination taxonomy models with the examples of their use below. • Direct __ • Mailbox __ • Generative communication __ • Meeting oriented __ • My wife sticks her grocery list to the refrigerator door for me. • I add my grocery items to her list, wondering which of us will find time to shop. • Returning home with a big homework assignment, I ask her, “Would you please shop for us this week?” • “Husband,” she replies, “let’s talk about that on Saturday!”

R U O K ? 5. Why is it hard for an event publisher to stay decoupled from increasingly diverse subscribers, while scaling upward? • Potential subscribers have very specific interests. • The implementation of matching subscriptions to data items becomes increasingly inefficient. • Actually it is not hard for a music recording artist’s agent to scale event publications upward, as the artist becomes well known. • All of the above. • None of the above.

Traditional Architectures • A centralized client-server architecture is currently the best implementation for matching data items with descriptions: • Pub/sub applications (TC&RD); e.g., IBM’s WebSphere, Sun Microsystems’ JMS. • Generative communication models (TD&RD); e.g., Sun Microsystems’ Jini, JavaSpaces.

Jini and JavaSpaces • Jini is a temporally and referentially decoupled distributed coordination-based system. • JavaSpace (Jini’s shared dataspace) consists of tuples (XML-tagged data records) representing multiple typed references to Java objects. • A processes marshals (organizes data into a form suitable for transfer) a tuple instance into JavaSpace with a write operation. And “take” operation removes it. • To read all tuple instances of interest from JavaSpace, another process marshals a template with the same types and various null types (“wild cards”) . • Semantically similar matches cannot be efficiently implemented in a distributed way yet. Centralized implementations also make it easier to block till a suitable data item is published and then remove it (i.e., synchronization of unrelated processes).

TIB/Rendezvous • TIB/Rendezvous multicasts data (e.g., news.comp.os.books) to all LAN subscribers, who previously expressed interest in it (e.g., news.comp.*.books). • The publisher only multicasts to LANs with rendezvous daemons, which filter out all data that each subscriber did not request. • Simple matching rules must be used, when multicasting in wide area networks.

Peer-to-Peer Architectures • A centralized server can’t scale beyond a few hundred clients, and multicasting is confined to LANs with limited range data item descriptions, but peers are easily coordinated. • Distributed hash tables combine a great many key values to identify one publication (or many on related topics). • Only gossip-based pub/sub schemes can handle more elaborate matching schemes.

Gossip-Based Pub/Sub System • Many gossiping peers can partition the publications space represented by a range of subscribers’ attribute values. • For example, I subscribe to the technical data sheets (s1=26) of embedded (s2=12) computers (s3=20), whose price/speed ratios lie between $16.6 and $21per million instructions per second (16.5 < s4 < 21 in the figure above). • I quickly find that only 7 (horizontal bars above) of the world’s 200-million current publications pertain to the topics s1s2s3 = 26, 12, 20. • In gossiping with my 15 neighbors, I discover that 7 of them (“nodes” above) already subscribe to those 7 data sheets. In fact, neighbors 3, 4, 7 and 10 say their data sheets match my range of attribute values (bidirectional ring above)! • After they send me their data sheets, the four of us agree to catalog our overlapping interests, so that each of us can send to the others any items that may also interest them. • And all of us make similar agreements with others outside our circle, which result in publications being quickly disseminated to thousands of subscribers with a wide range of interests.

R U O K ? 6. Which of the following accurately characterize Jini and its JavaSpace? • Jini is a temporally and referentially decoupled distributed coordination-based system. • JavaSpaceconsists of tuples representing multiple typed references to Java objects. • A processes marshals a tuple instance into JavaSpace with a write operation. • To read all tuple instances of interest from JavaSpace, another process marshals a template with the same types and various null types. • All of the above.

R U O K ? 7. Which of the following accurately characterize TIB/Rendezvous? • It multicasts data to all LAN subscribers, who previously expressed interest in it. • The publisher only multicasts to LANs with rendezvous daemons, which filter out all data that each subscriber did not request. • Simple matching rules must be used, when multicasting in wide area networks. • All of the above. • None of the above.

R U O K ? 8. Which of the following accurately characterize Peer-to-Peer Architectures? • Peers are easily coordinated. • Distributed hash tables can combine a great many key values to uniquely identify one publication. • Only gossip-based pub/sub schemes can handle more elaborate matching schemes. • All of the above. • None of the above.

R U O K ? 9. Which of the following accurately characterize gossip-based pub/sub systems? • Gossiping peers can partition a publications space that is represented by a range of subscribers’ attribute values. • Gossip with all nearest neighbors prompts each to catalog others’ overlapping subscription interests and to share mutually interesting publications in the future. • Such sharing spreads across social networks like a virus. • All of the above. • None of the above.

Discussion • Distributed hash tables (DHT) work even faster than gossip (p.44ff). • The Chord system identifies a subject matter expert for every hash coded set of publication attributes. • If the expert doesn’t have exactly the publication that you seek, you can be sure that one of her near neighbors (up or down the ring) will. • Or each of your publication’s attributes can be searched by a different expert, who passes her subset of interest along to other experts. • Attribute-based naming systems are hard for decentralized systems to search. (How does Google do it so well?) technical data sheet & embedded computers & price/speed ratio $16.6 to $21 / MIPS

R U O K ? 10. How can the old Chord system (p.44ff) be used to distribute publications? • Hash all of you favorite publication’s attributes into one uniquely identifiable publication number. • Request a subscription from the subject matter expert on the Chord ring, who is just downstream from your hashed number. • If the expert doesn’t have exactly the publication that you seek, you can be sure that one of her near neighbors will. • All of the above. • None of the above.

Mobility and Coordination • How do you ensure that a mobile subscriber does not receive a publication more than once? • Have the subscriber delete duplicate publications. • Ensure that routers don’t deliver duplicate packets.

Lime • Take your middleware’s dataspace with you when you leave home, and welcome others within range to publish and subscribe there. • Useful only on single-hop wireless links, not AT&T’s entire global network. Useful only for members of one group, who share a communication protocol. • As in JavaSpace, another can “read” or “take” whatever publications you “write.” But an attribute can specify one person that you enable to do so. • Subscribers can specify automatic “reactions” to tuples matching their templates in your dataspace. A reaction can change the dataspace or even transfer publications between dataspaces and transform them in flight.

Communication • Java-based pub/sub systems typically communicate via remote method invocations. • When those systems are widely distributed, how do we ensure that their publications reach only relevant subscribers? • Gossip-based self-organization can automatically cluster peers, who disseminate publications among their interest groups. • Content-based routers can read publications and send them to all appropriate subscribers.

Content-Based Routing • Every publication carries a succinct description (attributes, values) of its content. Only the publication’s abstract is sent to subscribers, who request the publication if interested. • Simple alternative: routers know the interests of all subscribers downstream from a branch of a broadcast tree, and each publication is sent only to subscribers interested in its single keyword; e.g., TIB/Rendezvous. • Complex alternative: all publications are sent to all servers, which keep lists of their own clients’ interests. • Compromise alternative: all servers broadcast their clients’ interests to all routers, and routing filter R2 handles 3 and 4’s distributions (figure above). R1 forwards the union [0, 5] to R2. • Unfortunately comparing subscribers interests with publication contents can be compute-intensive.

R U O K ? 11. How can you be sure that a mobile subscriber does not receive a publication more than once? • Havethe subscriber delete all duplicate publications. • Ensure that routers don’t deliver duplicate packets. • He very likely will become an immobile subscriber (i.e., get run over by a truck), shortly after he starts reading the first copy of the publication. • All of the above. • None of the above.

R U O K ? 12. What are some notable limitations of the Lime transient shared dataspace? • Useful only on single-hop wireless links. • Can be used by only the members of one group. • Used by those who share a communication protocol. • All of the above. • None of the above.

R U O K ? 13. When widely distributed Java-based pub/sub systems communicate via remote method invocations, how can we ensure that their publications reach only relevant subscribers? • Gossip-based self-organization can automatically cluster peers, who disseminate publications among their interest groups. • Content-based routers can read publications and send them to all appropriate subscribers. • All of the above. • None of the above.

R U O K ? 14. What are some alternatives for content-based routing implementation? • Knowing the interests of all subscribers downstream from a branch in a broadcast tree, routers send each publication only to subscribers interested in its single keyword. • Routers send all publications to all servers, which keep lists of their own clients’ interests. • All servers broadcast their clients’ interests to all routers, which filter incoming publications. • All of the above. • None of the above.

Supporting Composite Subscriptions • What if a subscriber’s interests are more complex; e.g., data items about IBM stocks and data on the subscriber’s revenues? • Routers can be designed like rule databases; i.e., interests are described as rules for selecting published data. • Supporting such subscription composition is related to naming in coordination-based systems.

Naming • In simple coordination-based systems, every publication is named by (attribute, value) pairs. • JavaSpace matches only templates whose values are equal. • Other commercially available pub/sub systems support primitive value range testing. • When a subscriber doesn’t care about a value, she specifies a null. • A data item tagged with only one (attribute, value) pair is called an “event.” Composite events call for… • Composite descriptions. • Means for matching published primitive events according to subscribers’ interests.

Describing Composite Events • Consider the increasingly complex events (above left) in a secure, air con-ditioned computer room. • S1 is a primitive discrete event; S2 is a compo-sition of two simple events; but S3 must be de-scribed by a finite-state machine (above center). • More complex FSMs (right) can be decom-posed into smaller publisher (above) and sub-scriber (below) FSMs.

Matching Events and Subscriptions • Distributed event detectors are multiple FSMs efficiently shared by many subscriptions. • Event detector distribution is like name resolution distribution in the Domain Name System (DNS). Some FSMs trigger other FSMs directly (minimizing network traffic), and some of them publish events for interested subscribers. • More expressive subscription languages tend to be more expensive or less efficient.

R U O K ? 15. What if a content-based router subscriber’s interests are complex; e.g., data items about IBM stocks, data on the subscriber’s revenues, etc., etc., etc.? • Routers can be designed like rule databases; i.e., interests are described as rules for selecting published data. • Support such subscription compositions as one would handle namingin coordination-based systems. • All of the above. • None of the above.

R U O K ? 16. Which of the following is an “event,” in the context of simple coordination-based systems? • Any data item tagged with only one (attribute, value) pair. • A computer room’s air conditioning or security alarm. • An announcement that U2 will appear at UTA’s Activities Center on 1 April 2014. • All of the above. • None of the above.

R U O K ? 17. What is a finite state machine (FSM), in the context of simple coordination-based systems ? • A collection of environmental states, which are connected by event-triggered transitions. • Selected transitions trigger publications that warn interested subscribers of events. • Each state may represent/aggregate another entire FSM. • All of the above. • None of the above.

R U O K ? 18. Where could you buy a rather expensive but efficient and exquisitely reliable event detector? • Diebold Electronic Security Solutions. • WalMart. • Toys’R’Us. • All of the above. • None of the above.

Distributed Coordination- Based Systems I

Distributed Coordination- Based Systems I

Presentation Transcript

Distributed Object-Based Systems

Distributed Component Based Systems

Distributed Web-Based Systems

Distributed Coordination

Distributed Web-Based Systems I

Distributed Coordination-Based Systems II

Distributed Object-Based Systems

DISTRIBUTED COORDINATION BASED MODELS

DISTRIBUTED COORDINATION

Coordination in Distributed Systems

Distributed Systems: Coordination models and languages

Component based distributed systems

Distributed Coordination-Based Systems

Distributed Coordination Based Systems

Distributed Systems Course Coordination and Agreement

Distributed Systems Course Coordination and Agreement

CS 194: Distributed Systems Distributed based Object Systems

Component based distributed systems

Distributed Systems: Coordination models and languages