1 / 22

Why does Distributed Computing Matter?

Why does Distributed Computing Matter?. Why does Distributed Computing Matter?. Clay Shirkey’s 4 rungs of group interaction Here comes Everybody (2008) Links in a fully connected mesh grow exponentially nodes * (nodes – 1) / 2 Internet technology enables ridiculously easy group forming

lorne
Télécharger la présentation

Why does Distributed Computing Matter?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why does Distributed Computing Matter?

  2. Why does Distributed Computing Matter? • Clay Shirkey’s 4 rungs of group interaction • Here comes Everybody (2008) • Links in a fully connected mesh grow exponentially • nodes * (nodes – 1) / 2 • Internet technology enables ridiculously easy group forming • e.g. email Reply All button • Describes an evolution from • Simple/selfish sharing • example: delicious • conversation • example: high dynamic range photography on Flickr • collaboration • anime – writing software to subtitle Japanese anime for Western consumption • collective action • flash mobs in Belarus – now Tunisia and Egypt • Each rung requires greater co-ordination amongst members

  3. A Distributed System Machine B Machine A Machine C Distributed Applications Middleware Services OS, e.g. Windows OS, e.g. Mac OS X OS, e.g. Linux Network A distributed system is a collection of independent computers that appears to its users as a single coherent system. • Heterogeneous computers – vendors/OS should be able to interoperate • Should mask the heterogeneity from users (applications) • Should be easy to expand and scale • Should be permanently available (even though parts of it are not) • Communication is based on messages messages

  4. Layering • The distributed applications and middleware we will consider occupy the Application Layer of the OSI model • Some interact more or less closely with underlying layers • Typically these systems have various layers within them as well.

  5. Layering • For example a Web Service: • Serializes programming objects to XML, perhaps using W3C XML Schema specification • Wraps this data in a SOAP envelope. • Sticks the SOAP envelope into an HTTP POST message payload. • Then HTTP uses what it uses: TCP/IP, DNS • A Peer to Peer application may: • Define custom messages for file sharing • Use UPnP to negotiate NATs • Tunnel custom messages through another protocol (e.g. HTTP) • In the end, systems will do whatever they need to get their job done and don’t care much for nicely defined layers.

  6. Messages • These define the protocol of the distributed application or middleware. • Provide the insulation from heterogeneous OS’s • Are capable of traversing the network. • This involves some form of serialization • i.e. the message can be converted into a stream of bytes and sent over the wire • Often an in-memory representation of an Object is serialized at the sending side, and de-serialized at the receiving side to create an in-memory representation. • Strings, XML, Java serialization, bencoding…

  7. Messages • All messages are one-way • Protocols may build on top of this to create two way message exchange patterns • E.g. HTTP is request/response based. • RPC emulates a local computing paradigm with the concept of a remote procedure with return values • Messages are typically independent of the transport protocol they use (TCP, UDP) • But some rely/presume certain transport protocols • E.g. bittorrent and HTTP keep the TCP connection open to reduce overhead. Would not work with UDP.

  8. Distributed vs Local Systems • Distributed systems are inherently different from non-distributed systems. • Latency - network speed • Memory access - not shared • Partial Failure • remote failure does not mean local failure • no global coordination (like an OS) • Guaranteed Concurrency • combined with latency, events are not necessarily received in the same order as they are generated • Indeterminacy • Your system is not in control of the whole system • With partial failure, a system may just disappear with no indication of status. • was it the remote machine, remote user or a network link?

  9. Some Terminology Generic term - any computer on a distributed network A Provider of data A Consumer of data Both a Provider and Consumer of data Resource: any hardware or software resource shared on a distributed network e.g. a file storage system, RAM, CPU, a file, a service or a communication channel “A network-enabled entity that provides some capability” Computer/ Device Node Resource Peer Client Server Service

  10. Taxonomy for Distributed Systems Taxonomy is based on following factors and their relation to centralization: Mechanism for discovering resources on a distributed system? 1. Resource Discovery: • Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc 2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network? See example...

  11. Mp3.com, Napster and Gnutella User User Napster.com Mp3.com Napster Scenario MP3.com Scenario Gnutella Scenario

  12. Taxonomy for Distributed Systems Taxonomy is based on following factors and their relation to centralization: Mechanism for discovering resources on a distributed system? 1. Resource Discovery: • Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc 2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network? 3. Resource Communication: Two types: Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point(decentralized -peer to peer): a direct connection between the sender and the receiver.

  13. Centralization of Point-to-Point Connections True Peer to Peer e.g. Gnutella Web Server Equal Peers, communication is supposed to be even i.e. each provider is also a consumer of information and each node has an equal number of connections This is not always the case … as we will learn in lecture 4 Many to one relationship between users and the web server and therefore this can be considered centralized communication

  14. Taxonomy for Distributed Systems Centralized systems -typically, client/server based systems Hybrid – combinations of the 2 extremes e.g. brokered architecture Decentralized systems - Peer to Peer (P2P) Centralized Decentralized Hybrid Taxonomy is based on following factors: Mechanism for discovering resources on a distributed system? 1. Resource Discovery: • Examples: DNS, JXTA Rendezvous, Jini LUS, UDDI etc 2. Resource Availability: Scalability – do resources scale with network? 3. Resource Communication: Two types: Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point(decentralized -peer to peer): a direct connection (although connection maybe multi-hop) between the sender and the receiver.

  15. Resource Discovery Resource Availability Resource Communication Web Server Centralized Decentralized A Web Server: Centralized Web Server • - Clients (i.e. users) use their web browser to navigate web pages on one or more web sites. • - Web site is static to particular domain • Discovery: Centralized, DNS • Availability: available or not • Communication: centralized to the particular web server

  16. The Web as a whole • Discovery - ad hoc • Often highly centralized, e.g. Google, but is also highly decentralized - the Web of links, e.g. the blogosphere and out of bounds • Availability - depends on the granularity of the request • There is a level of replication on the Web and caching can be used to duplicate availability • Communication - centralized • Communication happens via a centralized entity, e.g. Facebook, MySpace, Flickr, blogs, etc

  17. Napster: Brokered User Resource Discovery Resource Availability Resource Communication Napster Centralized Napster.com Decentralized • Clients search through Napster web site (well, they used to….) • Discovery: Centralized through web site • Availability: Once discovered via web site, availability is decentralized. • Communication: decentralized between peers (MP3 sharers)

  18. Resource Discovery Resource Availability Resource Communication Gnutella Centralized Decentralized Gnutella: Decentralized • Discovery: Decentralized through Gnutella messages (ping/pong mechanisms) • Availability: Often an alternate path to resource • Communication: point to point: decentralized between peers

  19. SETI@HOME (Client/Server) 3. SETI client gets data from server and runs Client/Server P2P • Launched In 1996 • Scientific experiment - uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI) • Distributes a screen saver–based application to users • Applies signal analysis algorithms to different data sets to process radio-telescope data. • Has more than 3 million users - used over a million years of CPU time to date 1. Install Screen Saver SETI@Home Main Server 4. Client sends results back to server Radio-telescope Data 2. SETI client (screen Saver) starts

  20. SETI@home Resource Discovery Resource Availability Resource Communication Centralized Decentralized Search for Extraterrestrial Intelligence@home – volunteer computing system generalized to BOINC API

  21. Web Services Resource Discovery Resource Communication Resource Availability Centralized • Discovery: Centralized through registry • Availability: Once discovered via registry, availability is decentralized. • Communication: decentralized between provider and consumer Decentralized Note: This is for the current Web Services, technology stack - in principal you can host web services in a number of ways

  22. Concluding Remarks • Taxonomy • Criteria • Resource Discovery • Resource Availability • Resource Communication • Taxonomy • Centralized • Hybrid • Decentralized

More Related