1 / 40

Scaling the Single Universe

Scaling the Single Universe. Jacky Mallett , Distributed Systems Architect jacky@ccpgames.com. Designing Single Universe games. Known limits on scaling distributed systems. Why large systems are different from small ones. Is there a heart of darkness lurking at the core ?

niran
Télécharger la présentation

Scaling the Single Universe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling the Single Universe Jacky Mallett, Distributed Systems Architect jacky@ccpgames.com

  2. Designing Single Universe games • Known limits on scaling distributed systems. • Why large systems are different from small ones. • Is there a heart of darkness lurkingatthe core? • Why there can’t be a single, optimal solution.

  3. Why is a single Universe such a Challenge?

  4. EVE Online • 360,000 subscribers • 63,170 concurrent users (Jan 23rd 2011) • single cluster (everyone in the same virtual world) • Up to 1800 users in most popular solar system • 1000+ users in fleet fights • Over 200 individual servers running solar system simulation and support • 24 client access servers • One database server

  5. EVE Tranquility Cluster Players connect to Proxies Dedicated game mechanic nodes. E.g. markets, fleets, etc. SOLS (solar systems) host the real time game simulation

  6. “Message” Based Architecture Players connect to Proxies SOLS (solar systems) host the real time game simulation

  7. Mesh vs. Embarrassingly Parallel Server Server Server Scaling is relatively simple, as long as it’s a case of adding servers that have little or no need to communicate with each other…

  8. Communication Distributed Systems run on discrete units of computation, triggered by messages received on servers. Svc.getCharInfoResp() Svc.getCharInfo() Server Server Svc.getCharInfo() Svc.getCharInfo() Svc.getCharInfo() Svc.getCharInfo()

  9. Incoming Server load from player updates

  10. Network Communication Limits Information Capacity is the instantaneous limit on the total amount of Shannon Information a distributed system can handle • Shannon Information = unique, single message • E.g. Single remote procedure call

  11. Where is the bottleneck? Proxy SOL SOL Proxy SOLs: Load ≈ O(Proxies * Broadcast msg/s) Proxies: Load ≈ O(Sols * Broadcast msg/s) It depends on the SOL : Proxy ratio. The number of broadcast/s dominates. Proxy

  12. Broadcast Load

  13. Network Communication Limits Where L is individual node link capacity with other nodes (group size) and N is total number of nodes in system : • Strictly Hierarchical Topology scales - O(Lserver) • Full Mesh scales O(L√N) (Gupta & Kamar, Scaglione) • Traffic worst case scales O(N2) as all the nodes try to talk to all the other nodes simultaneously.

  14. Model of Information Limits Maximum Load with increasing players Insufficient Information Capacity Mesh: O(L√N) : L = 40 Accessible to mesh architectures, not to hierarchical SingleServer: L = 40

  15. Close up on Left Hand Corner Maximum Load with increasing players The one place where everything is possible – the Laboratory Mesh: O(L√N) : L = 40 Single Server: L = 40

  16. Effect of increasing connectivity limit N(N-1) L√N : L = 100 L√N : L = 50 L√N : L = 10

  17. Conceptual Tools Define Group Size as the maximum number of direct communicators supported by available hardware. Group Size Limit: Link Capacity ≤ No. of Clients (Players) Group Size Limit: L√N ≤ N(N-1)

  18. Hierarchical vs Mesh Topologies

  19. Latency influences scaling. Message Latency – time to send and receive Computation latency – how long it takes to process. But what this also means is that several different problems can have the same symptoms. Round Trip Time affects Message Frequency

  20. Using Hierarchy to Load balance Character Node Character Node Character Information was sourced by SOLS for their local players. Character Information is static, easy to index per character, and has a long latency communication profile –> moved to its own nodes.

  21. Advantages of Mesh Topology We can divert load with long latency profiles to other servers e.g Character Nodes, Markets, Fleet Management Nodes.

  22. Using Mesh to Load Balance SOL SOL Jita SOL SOL SOL Topology change – moved Jita from a hierarchical position to a mesh, thereby increasing Information capacity.

  23. Fischer Consensus Problem Consensus – agreement on shared values – is a critical problem when it occurs in distributed systems. “It is impossible to guaranteethat a set of asynchronouslyconnected processes can agree on even a single bit value.” “Impossibility of distributed consensus with one faulty process” Fischer, Lynch, Paterson 1985 http://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf “Consensus: The big Misunderstanding.” Guerraoui, Schiper http://www.cs.cornell.edu/Courses/cs734/2000fa/cached papers/gs97.pdf

  24. Origin of the Consensus Problem Fischer’s Consensus problem showed that it is impossible to distinguish between a delayed message and a lost one Lost messages are equivalent to faulty processes “Guarantee” - Most runs will achieve consensus. But there is no possible protocol that can be guaranteed to converge The larger the distributed system – and the more messages/hops involved the higher the probability of a failure or a delay.

  25. Introduced by Message Handling Message from B { Messages from unrelated tasks Message from A • Reality of today’s computing environment is multiple layers of message queuing • As performance degrades, message delays increase, and the probability of a consensus problem increases • Very hard to debug in real time systems, since debugging changes timing.

  26. Symptoms of Consensus Problems • Sleep(1) - Reduces probability of delayed message • Code complexity • Locally analysing every possible ‘delayed/lost message’ instance, and writing code to handle it • Regrettably this merely introduces more occurrences of the problem • In the limit the developer goes quietly crazy Note: Even with theoretically reliable processes, guaranteeing consensus is non-trivial and requires multiple message rounds • Which reduces available Information capacity

  27. Solutions • Consensus will usually occur, but can’t ever be guaranteed • The solutions are: • Error recovery : “The end user will recover” • Synchronization • But, we know that synchronizing at a single point will cause potential congestion.

  28. Jumping typically involves several nodes 30s to resolve all cluster calls should be plenty…

  29. Consensus Problems in Eve • Number one cause of stuck petitions • Technical Support “Stuck Queue” • Game Master support for the “the end user will recover” solution. • Software design can certainly minimize them, but beyond a certain point they have to be worked around. • Solutions to consensus all involve different kinds of performance trade offs.

  30. “Solutions” : Consensus, who needs it? • It is possible to build systems that work with no consensus guarantees at all • E.g. Google, Facebook. • Sensor network solutions have been proposed which exploit the over supply of information relative to the problem space • Raise the probability of agreement to as close to 1 as possible, despite the presence of failure • “Probabilistic consensus algorithms” • Generally speaking, these solutions can be made to work, but they typically rely on multiple rounds of message passing, that are expensive for large systems

  31. Eve Solution: Cache Early Cache Often • All nodes cache data in EVE • SOLS cache to protect the Database from Load • Proxies cache to protect the SOLS • Clients cache to protect the Proxies • Cached data is not synchronised, so at any given instance different nodes may have different views of some data • Programmers can control relatively how out of date the view is

  32. Solutions: Synchronisation • Shared clocks – literal synchronization • Atomic clocks provide highly accurate synchronized time (Stratum 0) • The reliability of timing information provides extra local information that can be used to resolve consensus failure. • Practically, network time based synchronization has its own problems and works best with long latency systems. • Lamport Timestamps • Simple algorithm that provides a partial ordering of events in a distributed system • Vector clocks • Both ways of providing a logical ordering of events/messages in a distributed system "Time, clocks, and the ordering of events in a distributed system” Leslie Lamport, 1978 http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html

  33. Solutions: Design • Consensus requires synchronisation • Avoid the requirement for consensus where ever possible • Critical – will player‘s notice or care about inconsistent state? • If consensus is required – a single server is needed somewhere to handle it • Where possible design consensus to be a long latency process • Minimises probability of occurance.

  34. Pod Kills in Last Hour

  35. EVE Fleet Fights First players converge on a node Sometimes they try to do this all at once. Load is distributed as much as possible over other clustermachines.

  36. Latency Analysis • Typical per player load on the cluster is < 0.5 msg/s • 30s Jump Timer is a game mechanic that provides long latency • During a fleet fight this goes up to > 1 msg/s • Dedicated nodes used for known hot spots • Load balancing over evolution of game • CPU load from fleet fight calculations has high computational latency • Highly optimized by CCP’s lag fighting Gridlock team. • Load on proxies is engineered with spare capacity for bursts • Traffic measured on proxies is well distributed and stable • Non-real time traffic is hosted on separate nodes

  37. EVE Fleet Fights For large numbers of players though, we’re always fighting the real time limits.

  38. Design Approach • Calculate load as calls per second/player and roughly estimate the number of players • How much CPU is each call going to require to process? • How much Network bandwidth? • Is the design “embarrassingly parallel” or can it be made so? • Does it have high latency? • Some form of hierarchical approach should work • If not - Is the problem solvable at all? • Mesh approach should work • Don’t try to solve the Fischer consensus problem! • Can I change the problem and solve it at a higher level? • Fortunately Games give us enormous freedom to do this.

  39. Takeaway. • Very large scale distributed computing is a design problem. • Have a Model for the requirements of your system • Multiple issues have the same symptom • Limits apply at all levels of abstraction • The only systems that scale arbitrarily are the ones that don’t communicate with each other. • Know the limits applicable for your application. • Don’t try to guarantee consensus.

  40. Any Questions?

More Related