1 / 45

Models of Distributed Computing

COMP 150-IDS: Internet Scale Distributed Systems (Fall 2012). Models of Distributed Computing. Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah. Architecting a universal Web. Identification: URIs Interaction: HTTP

niyati
Télécharger la présentation

Models of Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP 150-IDS: Internet Scale Distributed Systems (Fall 2012) Models of Distributed Computing Noah Mendelsohn Tufts UniversityEmail: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

  2. Architecting a universal Web • Identification: URIs • Interaction: HTTP • Data formats: HTML, JPEG, GIF, etc.

  3. Goals • Introduce basics of distributed system design • Explore some traditional models of distributed computing • Prepare for discussion of REST: the Web’s model

  4. Communicating systems

  5. CPU Memory Storage Communicating systems CPU Memory Storage We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

  6. Communicating Sequential Processes We’ve got pretty clean higher level abstractions for use on a single machine CPU Memory Storage CPU Memory Storage We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

  7. Communicating systems How can we get a clean model of two communicating machines? CPU Memory Storage CPU Memory Storage We have multiple programs, running asynchronously, sending messages Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

  8. How can we get a clean model of a worldwide network of communicating machines? Large scale systems Internet What are the clean abstractions on this scale?

  9. WARNING!! • This is a very big topic… • …many important approaches have been studied and used… • …there is lots of operational experience, and also formalisms… This presentation does not attempt to be either comprehensive or balanced…the goal is to introduce some key concepts

  10. Traditional Models of Distributed Computing-Message Passing

  11. Message passing CPU Memory Storage CPU Memory Storage Programs send messages to and from each others’ memories

  12. Half duplex: one way at a time CPU Memory Storage CPU Memory Storage Programs send messages to and from each others’ memories

  13. Full duplex: both ways at the same time CPU Memory Storage CPU Memory Storage Programs send messages to and from each others’ memories

  14. Message passing • Data abstraction: • Low level: bytes (octets) • Sometimes: agreed metaformat (XML, C struct, etc.) • Synchronization • Wait for message • Timeout

  15. Interaction Patterns

  16. Request Response Between pairs of machines • Message passing: no constraints • Common pattern: request/response CPU Memory Storage CPU Memory Storage

  17. Traditional Models of Distributed Computing-Client Server

  18. Request service Response Client / server • Request / response is a traffic pattern • Client / server describes the roles of the nodes • Server provides service for client CPU Memory Storage CPU Memory Storage

  19. Client / server • Probably the most common dist. sys. architecture • Simple – well understood • Doesn’t explain: • How to exploit more than 2 machines • How to make programming easier • How to prove correctness: though the simple model helps • Most client/server systems are request/response

  20. Traditional Models of Distributed Computing-N-Tier

  21. Request Request Response Response N-tier – also called Multilevel Client/Server • Layered • Each tier provides services for next higher level • Reasons: • Information hiding • Management • Scalability CPU Memory Storage CPU Memory Storage CPU Memory Storage

  22. ReservationRecords Typical N-tier system: airline reservation iPhone or Android Reservation Application Flight Reservation Logic Browser or Phone App Application - logic Application - logic Many commercial applications work this way

  23. Web Server The Web itself is a 2 or 3 Tier system Browser Proxy Cache(optional!) E.g. Firefox E.g. Squid E.g. Apache Many commercial applications work this way

  24. HTTP HTTP RPC? ODBC? Proprietary? ReservationRecords Web Reservation System Web-Base Reservation Application Flight Reservation Logic Proxy Cache(optional!) E.g. Squid Browser or Phone App Application - logic Application - logic Many commercial applications work this way

  25. Content Management System Web Publishing System Web-Base Reservation Application Content Web Site Content Distribution Network E.g. Akamia Browser or Phone App E.g. cnn.com Database or CMS Many commercial applications work this way

  26. Advantages of n-tier system • Separation of concerns – each layer has own role • Parallism and performance? • If done right: multiple mid-tier servers work in parallel • Back end systems centralize mainly data requiring sharing & synchronization • Mid tier can provide shared, scalable caching • Information hiding • Mid-tier apps shielded from data layout • Security • Credit card numbers etc. not stored at mid-tier

  27. Other patterns • Spanning tree • Broadcast (send to many nodes at once) • Flood • Various P2P • Etc.

  28. Traditional Models of Distributed Computing-Remote Procedure Call

  29. Remote Procedure Call • The term RPC was coined by the late Bruce Nelson in his 1981 CMU PhD thesis • Key idea: an ordinary function call executes remotely • The trick: the language runtime or helper code must automatically generate code to send parameters and results • For languages like C: proxies and stubs are generated • Not needed in dynamic languages like Ruby, JavaScript, etc. • RPC is often (erroneously IMO) used to describe any request / response system

  30. floatsqrt(float n) { send n; read s; return s;} invoke sqrt(4) voiddoMsg(Msg m) { s = sqrt(m.s); send s; } Request result=2 (no exception thrown) Response proxy stub RPC: Call remote functions automatically • Interface definition: float sqrt(float n); • Proxies and stubs generated automatically • RPC provides transparent remote invocation floatsqrt(float n) { …compute sqrt… return result;} x = sqrt(4) CPU Memory Storage CPU Memory Storage

  31. RPC: Pros and Cons • Pros: • Transparency is very appealing • Simple programming model • Useful as organizing principle even when not fully automated • Cons • Getting language details right is tricky (e.g. exceptions) • No client/server overlap: doesn’t work well for long-running operations • May not optimize large transfers well • Not all APIs make sense to remote: e.g. answer = search(tree) • Versioning can be a problem: client and server need to agree exactly on interface (or have rules for dealing with differences)

  32. Traditional Models of Distributed Computing-Distributed Object Systems

  33. Pass object to remote method Call method on remoted object How do you build an RPC for this? Class Point { int x,y int getx() {return x;} int gety() {return y;} } Class Rectangle { …members and constructs not shown… Point getUpperLeft() {…}; Point getLowerRight {…}; } myRect = new Rectangle; …assume position set here.. int a = area(myRect); // REMOTE THIS CALL! int area (Rectangle r) { width=r.getLowerRight().getx() – r.getUpperLeft.getx(); width=r.getLowerRight().gety() – r.getUpperLeft.gety(); } Distributed Object systems make this work!

  34. Distributed object systems • In the 1990s, seemed like a great idea • Advantages of OO encapsulation & inheritance + RPC • Examples • CORBA (Industry standard) • DCOM (Microsoft) • Still quite widely used within enterprises • Complicated • Marshalling object references • Distributed object lifetime management • Brokering: which object provides the service today • Remote “new”: creating objects on remote systems • All the pros & cons of RPC, plus the above • Generally not appropriate at Internet scale

  35. Traditional Models of Distributed Computing-Some Other Options

  36. Special Purpose Models • Remote File System • Network provides transparent access to remote files • Examples: NFS, CIFS • Remote Database • Examples: ODBJ, JDBC • Remote Device • Remote printing, disk drive etc. • Virtual terminal • One computer simulates an interactive terminal to another

  37. Some other interesting models • Broadcast / multicast • Send messages to everyone (broadcast) / named group (multicast) • Publish / subscribe (pub/sub) • Subscribe to named events or based on query filter • Call me whenever Pepsi’s stock price changes • Implements a distributed associative memory • Reliable queuing • Examples: IBM MQSeries, Java Message Service (JMS) • Model: queued messages, preserved across hardware crashes • Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions; • Depends on disk-based transaction systems at each node to keep queues • Tuple spaces • Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM) • Network-scale shared variable space, with synchronization • Good for queues of work to do: some cloud architectures use a related model to distribute work to servers

  38. Stateful and Stateless Protocols

  39. Stateful and Stateless Protocols • Stateful: server knows which step (state) has been reached • Stateless: • Client remembers the state, sends to server each time • Server processes each request independently • Can vary with level • Many systems like Web run stateless protocols (e.g. HTTP) over streams…at the packet level, TCP streams are stateful • HTTP itself is mostly stateless, but many HTTP requests (typically POSTs) update persistent state at the server

  40. Advantages of stateless protocols • Protocol usually simpler • Server processes each request independently • Load balancing and restart easier • Typically easier to scale and make fault-tolerant • Visibility: individual requests more self-describing

  41. Advantages of stateful protocols • Individual messages carry less data • Server does not have to re-establish context each time • There’s usually some changing state at the server at some level, except for completely static publishing systems

  42. Text vs. Binary Protocols

  43. Protocols can be text or binary on the wire • Text: messages are encoded characters • Binary: any bit patterns • Pros and cons quite similar to those for text vs. binary file formats • When sending between compatible machines, binary can be much faster because no conversion needed • Most Internet-scale application protocols (HTTP, SMTP) use text for protocol elements and for all content except photo/audio/video

  44. Summary

  45. Summary • The machine-level model is complex: multiple CPUs, memories • A number of abstractions are widely used for limited-scale distribution • RPC is among the most interesting and successful • Statefulness / statelessness is a key design tradeoff • We’ll see next time why a new model was needed for the Web

More Related