1 / 29

Distributed Coordination-Based Systems II

Distributed Coordination-Based Systems II. CSE5306 Lecture Quiz due 31 March 2014. Synchronization. Synchronization always is a problem in generative communications.

ebony
Télécharger la présentation

Distributed Coordination-Based Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Coordination-Based Systems II CSE5306 Lecture Quiz due 31 March 2014

  2. Synchronization • Synchronization always is a problem in generative communications. • It is easy to synchronize one publishing server with many clients: simply block the clients’ subscriber until a publication matching their tuples becomes available. • Replicating and distributing publications across many servers makes their synchronization much more difficult.

  3. R U O K ? • How can you synchronize just one publishing server with each client? • Program each of them to consult radio station WWV daily. • Provide each of them a GPS receiver. • Block the client’s subscriber until a tuple matching its template is available. • All of the above. • None of the above.

  4. Consistency and Replication • Replication enables us to scale up coordination-based systems. • This is especially true of those engaged in generative communications. • Publications’ tuples can be distributed statically or (in some cases) automatically and dynamically.

  5. R U O K ? 2. Why is replication an especially good way to scale coordination-based systems that are engaged in generative communications. • Their subscribers are a potentially large pool of anonymous commodity service providers. • Actually it is a good way to scale all distributed systems. • All of the above. • None of the above.

  6. Static Approaches • First let’s insert tuples into a JavaSpace. • Then let’s distribute that JavaSpace across several machines.

  7. General Considerations • Efficient distributed JavaSpaces solve two problems: • Associative addressing without massive searches. • Distributes tuples now, and locate them later. • JavaSpacessolves these problems by: • Partitioning tuples into subspaces by their types. • Organize subspaces as (one tuple) hash tables. • Hash modulo = number of machines for scalability. • Customize storage according to each network’s strength: • Good broadcaster (above left): replicate all tuples on all servers. • Local read and broadcasted writes. • Good listener (above center): each tuple stored on only one server. • Local writeand broadcasted reads. • Tuple can’t be found, repeat request at longer time intervals. • Locally replicating requested tuple saves search time later. • Combination (above right): partial replication. • Publisher broadcasts (writes) tuple to all servers on her row. • Subscriber sends template to (reads) all servers on her column.

  8. R U O K ? 3. What coordination-based system problem do efficient distributed JavaSpacessolve? • Associative addressing without massive searches. • Distributes tuples now; locate them later. • Message storage when a client is not available. • All of the above. • Both a and b above.

  9. Dynamic Replication • Scalability is limited in static replication: • Broadcasting takes time. • Many tuple spaces on many servers in one network is only partial replication. • Simple commercial dataspaces subject to a single network administration policy lack robustness. • Scalability is unlimited in dynamic replication: • Globule’s fine-grained Web documents truly are replicated (pp.63-5). • Gspacedifferentiates replication mechanisms among the various data types in its dataspace.

  10. R U O K ? 4. Why is scalability limited in static replication? • Broadcasting takes time. • Many tuple spaces on many servers in one network is only partial replication. • Simple commercial dataspaces, which are subject to a single network administration policy, lack robustness. • All of the above. • Both a and b above.

  11. GSpace Overview • The Gspace kernel has a centralized version of JavaSpaces in its “Dataspace slice.” • Policies have tuples that match the application’s read, write and take subscription templates. • Independent policies apply to matching publications in the Dataspace slice. • For example, a master/slave policy permits reads but asks a distant master node’s permission for writes. • Policies can be edited at runtime.

  12. R U O K ? 5. Which of the following accurately describes the Gspacekernel? • It has a centralized version of JavaSpaces in its “Dataspace slice.” • Policies have tuples that match the application’s read, write and take subscription templates. • Independent policies apply to matching publications in the Dataspace slice. • All of the above. • Both a and b above.

  13. Adaptive Replication • Policies are automatically generated based on: • Available network bandwidth, latency and memory. • The weighted importance of each of these. • The policies result in a central coordinator: • Moving tuples to different nodes. • Choosing ways to keep replicas current. • Transition policies decide how policies should change and when; e.g., • Banish a group of tuples into the outer darkness. • Stop replicating a group of tuples. • Lazily copy tuples to subscribers after the first call.

  14. R U O K ? 6. What network parameters determine how GSpace policies are automatically generated? • Bandwidth. • Latency. • Unused message storage capacity. • All of the above. • Both a and b above.

  15. Fault Tolerance • Fault tolerance design problems in coordination-based communication systems: • Reliable communications. • Reliable storage (in generative communication systems).

  16. Reliable Pub-Sub Communication • Live subscribers—no data storage required. • A live multicast system… • Under the publish/subscribe application and • Over an unreliable network transport layer. • Design challenges: • Implementing a reliable multicast channel. • Fault tolerance must be handled.

  17. Fault Tolerance in TIB/Rendezvous • Reliability measures: • All nodes keep publication 60 seconds after sending. • Certified message delivery retransmits all messages notACKed. • Deactivates running processes that can only handle events. • Strives to keeps at least one process active (sends heartbeat) in each group. • Inactive nodes accept publications in anticipation of going active later. • Needy receiver asks prior node to send lost packet. • Pragmatic General Multicast (PGM) scalability: • Node forwards only one retransmission request (NAK), after receiving many. • Node retransmits only to requesting nodes.

  18. R U O K ? 7. How do reliable pub/sub (live) communicators “handle fault tolerance”? • All nodes keep publication 60 seconds after sending. • Certified message delivery retransmits all messages notACKed. • Needy receiver asks prior node to resend lost packet. • All of the above. • None of the above.

  19. Fault Tolerance in Shared Dataspaces with Generative Communications • Its regularly writing timestamps to persistent storage reveals each Gspace node’s ups and downs. • Uptime = “time to failure” = restart time – crash time = Tend–Tstart. • Downtime = “time to repair” = crash time – restart time = Tstart – Tend. • Mean time to failure, MTTF = Σn(Tend– Tstart). • Mean time to repair, MTTR = Σn(Tstart– Tend). • One node’s availability, a = MTTF / (MTTF + MTTR). • Availability of a data item replicated among m nodes = 1 – Πk(1 – ak). • Replication policy automatically optimizes node availabilities, network bandwidth and CPU loading.

  20. R U O K ? 8. What is the availability of a data item replicated on two servers, which are up only 20% and 30% of the time respectively? • 28% • 56% • 44% MTTF1 = (Tend–Tstart) = 0.20, MTTR1 = (Tstart–Tend) = 0.80. So a1 = MTTF1/ (MTTF1+ MTTR1) = 0.20 and a2 = 0.30. Data availability = 1–(1–a1)*(1–a2) = 1–0.80*0.70 = 44%. • 94% • None of the above.

  21. Security • Senders and receivers must authenticate each other. • But that violates referential decoupling – I shouldn’t know (explicitly) who you are (bottom p.590). • Ah, a trusted broker (i.e., our mutual friend’s recommendation) can handle your data processing and subscriptions. • And we need not trust all brokers.

  22. R U O K ? 9. Give a practical example of a trusted broker enabling senders and receivers to authenticate each other without violating referential decoupling? • “Manuel Reyes is a great gardener, and his prices are very reasonable.” • “Trust me, here is the phone number of someone who will remodel your house very nicely for only $3K.” • “Lowes will install your new garage door for the lowest price.” • All of the above. • None of the above.

  23. Confidentiality • Information confidentiality • End-to-end encryption (router sees addresses only). • Encrypt only the secret fields (real estate address). • Subscription confidentiality • Keywords are hashes of many encrypted tuples. • Subscriber does same, and router matches them. • Publication confidentiality • Publishers sell texts to students and keys to teachers. • Out-of-band communication authenticates teacher.

  24. R U O K ? 10. Give a practical example of confidential publishing that paradoxically allows content-based routing? • A U.K. Ministry of Defense courier drops off a BAE’s classified document at Boeing’s document control department. • A well-dressed lady buys Vogue magazine from a newstand, without first reading its Table of Contents. • A U.S. government lawyer subpoenas Intel’s tax records. • All of the above. • None of the above.

  25. Decoupling Pubs from Subs • How do we protect secrets from middleware? • Publisher… • Encypts all secret fields in publication • Registers itself with its trusted broker. • An accounting service (AS)… • Accepts certificate from broker. • Generates the publisher’s public key. • Encrypts the entire publication. • Signs it “AS.” • Keeps private key for itself. • Middleware passes publication to subscriber’s broker. • Subscriber’s trusted broker asks AS to… • Decrypt the entire publication. • Encrypt it again using the subscriber’s public key. • Deliver it to the subscriber. • Brokers never see the secret content. • Publishers and subscribers never share key information. • Scaling requires that multiple (foreign) accounting services re-encrypt.

  26. R U O K ? 11. How can one accounting service (AS) help two brokers (PB and SB) keep secrets from a publisher’s (P) and subscriber’s (S) middlewares? • AS accepts PB’s certificate to authenticate P. • AS generates P’s public key and encrypts the document with it. It signs and sends document. • AS verifies its own signature and accepts SB’s certificate to authenticate S. • AS receives and decrypts document. It generates S’s public key and encrypts the document with it. It hands document to S. • All of the above.

  27. Secure Shared Dataspaces • Trusted dataspaces (i.e., processes may see the content) are easy to share. • Global dataspaces allow the subscriber to match templates with published tuples, only if they can decrypt them. • The latter requires publishers to share encryption keys with authorized subscribers.

  28. R U O K ? 12. How might I subscribe to confidential documents in a global shared dataspace? • The publisher hashes all of the document’s keywords, including a secret code word (e.g., “over lord” was the code word for the Normandy invasion in WWII) into a secret key, with which he encrypts the document. • I hash all of the tuples in my template into a key, including the same secret code word into a secret key, with which I decrypt the document. • All of the above. • It can’t be done.

  29. Summary • Coordination-based systems tend to be independently-administered, anonymous (referentially uncoupled) commodity services, and that leads to scalability and fault tolerance. • Furthermore, generative communicators are not necessarily available 24/7 (temporally uncoupled). • TIB/ Rendezvous is a publisher that addresses its subscribers by subject instead of by name. • Content-based pub/sub systems formulate predicates over the many attributes of their published data, and their routers sort messages according to subscribers reading interests. • Generative communicators share dataspaces XML-like tuples; i.e., record-like typed data structures. If a subscribing reader’s template matches a tuple, it is sent, else it blocks.

More Related