1 / 48

CS514: Intermediate Course in Operating Systems

CS514: Intermediate Course in Operating Systems. Professor Ken Birman Ben Atkin: TA Lecture 13: Oct. 5. Consistency. How can we relate models of consistency to cost and availability? Is it possible to reconcile transactional replication with virtual synchrony replication?. Consistency.

Télécharger la présentation

CS514: Intermediate Course in Operating Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 13: Oct. 5

  2. Consistency • How can we relate models of consistency to cost and availability? • Is it possible to reconcile transactional replication with virtual synchrony replication?

  3. Consistency • Various models • Multiple copies of some object but behavior mimics a single non-faulty object • ACID: 1-copy SR plus durability • FLP style of consensus • Dynamic uniformity versus static model

  4. Basic “design points” • Does the model guarantee anything relative to “last words” of a process that fails? • Yes for transactions: ACID • No, in virtual synchrony • Can do better using “flush” primitive • And can mimic transactional replication if we require that primary partition is also a quorum of some statically specified set of processes

  5. Are actions asynchronous? • No in case of transactions • We can do things locally • But at commit time, we need to synchronize • And most transactional replication schemes are heavily synchronous • Yes for virtual synchrony • But only with cbcast or fbcast

  6. Mixing models • Virtual synchrony is like “weak transactional serializability” • In fact, connection can be made precise • We use a model called linearizability by Wing and Herlihy • Much recent work on database replication mixes models…

  7. Real systems have varied needs • Must match choice of properties to needs of the application • Find that multiple models are hard to avoid • We want the stronger models for database applications • But where data won’t persist, the cheaper models suffice…

  8. Digression • Need to strengthen our intuition • Can we find examples of real systems that might need group communication or data replication? • Ideally, systems that can’t be built in any other way • Use this to think about properties required for application correctness

  9. Joint Battlespace Infosphere ROBUST INFRASTRUCTURE

  10. Distributed Trading System Pricing DB’s Historical Data Market Data Feeds Trader Clients Current Pricing • Availability for historical data • Load balancing and consistentmessage delivery for price distribution • Parallel execution for analytics Analytics Tokyo, London, Zurich, ... Long-Haul WAN Spooler

  11. 1. 2. 3. Distributed Service Node One phone number per person Telephony Data/Digitized Voice Path Telephone Trunk Lines x86/UNIX RISC/UNIX RISC/UNIX Dumb Switch Calls, Changes, Adds, Deletes Ethernet = Isis • Replicated files for digitized voice store • Redundancy for database availability • Load balancing for call handling & routing

  12. Shop Floor Process Control Example Recipe Management Server Data Collection Server WorkStream Server VAX HP HP Enet HP HP PC PC HP HP Station Controller Station Controller Operator Client Operator Client Operator Client Operator Client Factory equipment

  13. The List Goes On • Air traffic control system • Medical decision support in a hospital • Providing real-time data in support of banking or major risk-management strategies in finance • Real-time system for balancing power production and consumption in the power grid • Telephone system for providing services in setting with mobile users and complex requirements

  14. Challenge faced by developers • We have multiple notions of consistency now: • Transactional, with persistent data • Process groups with dynamic uniformity • Process groups without dynamic uniformity • Primary partition notion of progress • Non-primary partitions with merge • How can we make the right choices for a given situation?

  15. One sizes fits all? • One possibility is that we’ll simply need multiple options • User would somehow specify their requirements • Given this information, system would configure protocols appropriately • Alternative is to decide to standardize on one scheme • Likely to be a strong, more costly option

  16. “Understanding” CATOCS • Paper by Cheriton, Skeen in 1993 • They argue that end-to-end approach dictates • Simplicity in the GCS • Properties enforced near end-points • Paper is full of mistakes but the point is well taken • People don’t want to pay for properties they don’t actually require!

  17. French air traffic control • They wanted to use replication and group communication in a system for high availability controller consoles • Issues they faced • How strong is the consistency need? • Where should we use groups? • Where should we use transactions

  18. Air traffic control • Much use of computer technologies • Flight management system (controls airplane) • Flaps, engine controls (critical subsystems) • Navigational systems • TCAS (collision avoidance system) • Air traffic control system on ground • In-flight, approach, international “hand-off” • Airport ground system (runways, gates, etc)

  19. Air traffic control • Much use of computer technologies • Flight management system (controls airplane) • Flaps, engine controls (critical subsystems) • Navigational systems • TCAS (collision avoidance system) • Air traffic control system on ground • In-flight, approach, international “hand-off” • Airport ground system (runways, gates, etc)

  20. ATC system components Onboard Radar Controllers X.500 Directory Air Traffic Database(flight plans, etc)

  21. Possible uses of groups • To replicate data in console clusters • For administration of console clusters • For administration of the “whole system” • For radar communication from radar to the consoles • To inform consoles when flight plan database is updated • To replicate the database itself

  22. ATC system components Onboard Radar Controllers X.500 Directory Air Traffic Database(flight plans, etc)

  23. French air traffic control • Some conclusions • They use transactions for the flight plan database • In fact would love to ways to replicate this “geographically” • But the topic remains research • They use one process group for each set of 3-5 control consoles • They use unreliable hardware multicast to distribute radar inputs • Different groups treated in different ways

  24. French air traffic control • Different consistency in different uses • In some cases, forced changes to the application itself • E.g. different consoles may not have identical radar images • Choices always favored • Simplicity • Avoiding technology performance and scaling limits

  25. Air traffic control example • Controller interacts with service: “where can I safely route flight TWA 857?” • Service responds: “sector 17.8.09 is available” ... what forms of consistency are needed in order to make this a safe action to perform?

  26. Observations that can help • Real systems are client-server structured • Early work on process group computing tended to forget this! • Isis system said “RPC can be harmful” but then took the next step and said “so we won’t think in client-server terms”. This was a mistake! • Beware systems that provide a single API system-wide

  27. A multi-tier API • Separate concerns: • Client system wants a simple interface, RPC to servers and a reliable stream connection back, wants to think of the whole system as a single server • Server wants to implement a WAN abstraction out of multiple component servers • Server itself wants replication and load-balancing for fault-tolerance • Need security and management API throughout

  28. Sample but typical issue • It is very appealing to say • This server poses a problem • So I’ll roll it out… • … and replace it with a high availability group server • Often, in practice, the existing code on the client side precludes such upgrades!

  29. Separate concerns • Consistency goals for client are different from goals within lower levels of many systems • At client level, main issue is dynamic uniformity: does a disconnected client continue to act on basis of information provided before the partitioning? Do we care? • In ATC example, the answer is yes, so we need dynamic uniformity guarantees

  30. WAN architecture • Mental model for this level is a network whose component nodes are servers • Each server initiates updates to data it “owns” and distributes this data to the other servers • May also have globally owned data but this is an uncommon case! • For global data need dynamic uniformity but for locally owned data, weaker solution sufficies

  31. Consistency approach in partitionable network • Free to update your local data • When partition ends, state merges by propogation of local updates to remote sites, which had safe but stale view of other sites’ local data. (Treated formally by Malki, Dolev, Strong; Keidar, others) • Global updates may be done using dynamically uniform protocols, but will obviously be delayed during partitioning events

  32. Within a server • At this level, see a server replicated on multiple nodes for • Fault-tolerance (availability or recoverability) • Load-balancing • Replication to improve response time • Goal is primary component progress and no need for dynamic uniformity

  33. Worst case for a replicated server? • If application wants recoverability, server replication may be costly and counterproductive • Many real database systems actually sacrifice transactional guarantees to fudge this case: • Primary/backup approach with log sent from primary to backup periodically • Failure can cause some transactions to “vanish” until primary recovers and lost log records are discovered

  34. Observations? • Complex systems may exhibit multiple needs, superimposed • Needs are easiest to understand when approached in terms of architectural structure • Literature of distributed consistency is often confusing because different goals are blurred in papers

  35. Example of a blurry goal • Essential point of the famous FLP result: can’t guarantee liveness in a system that also provides an external consistency propertly such as dynamic unformity or database atomicity • Can evade in settings with accurate failure detectors... but real systems can make mistakes • But often, we didn’t actually want this form of consistency!

  36. Example of a blurry goal (cont) • Moreover, FLP result may require a very “clever” adversary strategy. • Friedman and Vaysburd have a proof that an adversary that cannot predict the future is arbitrarily unlikely to prevent consensus! • On the other hand, it is easy to force a system to wait if it wants external consistency. Think about 2PC and 3PC. This is a more serious issue.

  37. Much of the theory is misunderstood! • Theory tends to make sweeping claims: “The impossibility of group membership in asynchronous systems” • These claims are, strictly speaking, correct • But they may not be relevant in specific practical settings because, often, the practical situation needs much weaker guarantees!

  38. When do we need FLP style consistency? • Few real systems need such strong forms of consistency • Yet relatively little is understood about the full spectrum of weaker consistency options • Interplay between consistency of the fault-tolerance solution and other properties like security or real-time further clouds the picture

  39. Why should this bother us? • Main problem is that we can’t just pick a single level of consistency that will make all users happy • Database consistency model is extremely slow: even database vendors don’t respect the model (primary/backup “window of vulnerability” is accepted because 2PC is too costly) • Dynamic uniformity costs a factor of 100-1000 compared to non-uniform protocols

  40. ... but non-uniform protocols are too weak! • Usually, non-uniform protocols are adequate • They capture “all the things that a system can detect about itself” • They make sense when partitioning can’t occur, as on a cluster of computers working as a server • But they don’t solve our ATC example and are too weak for a wide-area database replicated over many servers

  41. Optimal Transactions • Best known dynamic uniformity solution is actually not the static scheme we’ve examined • This optimal approach • Was first developed by Lamport in his Paxos paper, but the paper was very hard to follow • Later, Keidar, Chockler and Dolev showed that a version of 3-phase commit gives optimal progress; the scheme was very similar to Paxos • But performance is still “poor”

  42. Long-term prospects? • Systems in which you pay for what you use • API’s specialized to particular models of computation in which API can make a choice even if same choice wouldn’t work for other API’s • Tremendous performance variation depending on the nature of the tradeoffs accepted • Big difference depending on whether we care about actions by that partitioned-off controller

  43. Theory side • Beginning to see a solid and well founded theory of consistency that deals with non-uniform case • For example, Lynch has an IOA model of vsync. • Remains hard to express the guarantees of such a model in the usual temporal logic style • Cristian and Fetzer did some nice work on this • Challenge is that we want to say things “about” the execution, but we often don’t know “yet” if the process we are talking about will turn out to be a member of the system or partitioned away!

  44. Other directions? • In subsequent lectures will look at probabilistic guarantees • These are fairly basic to guarantees of real-time behavior • They can be integrated into more traditional replication and process group methods, but not easily • Self-stabilization is another option, we won’t consider it here (Dijkstra)

  45. Self-stabilization • Idea is that the system, when pushed out of a consistent state, settles back into one after the environment stops “pushing” • Example: after a failure, if no more failures occur, some system guarantee is restored • Concern is that this may not bound the degree of deviation from correct behavior while the system is being perturbed.

  46. Curious problems? • Is reliability/consistency fundamentally “unstable” • Many systems tend to thrash if reliability technology is scaled up enough (except if goals are probabilistic) • Example: reliable 1-n communication is harder and harder to scale as n gets larger. (But probabilistically reliable 1-n communication may do better as we scale) • Underlying theory completely unknown: research topic ... can we develop a “90% reliable” protocol? Is there a window of stable behavior?

  47. Consistency is ubiquitous • We often talk about “the” behavior of “a” system, not the “joint” behavior of its components • Implication is that our specifications implicitly assume that there is a mapping from “the system” to the execution model • Confusion over consistency thus has fundamental implications. One of the hardest problems we face in distributed computing today!

  48. Moving on… • But enough about replication • Now start to think about higher level system issues • Next week: briefly look at what is known about how and why systems fail • Then look at a variety of structuring options and trends…

More Related