parallelization and performance of interactive multiplayer game servers n.
Skip this Video
Loading SlideShow in 5 Seconds..
Parallelization and Performance of Interactive Multiplayer Game Servers PowerPoint Presentation
Download Presentation
Parallelization and Performance of Interactive Multiplayer Game Servers

Parallelization and Performance of Interactive Multiplayer Game Servers

342 Vues Download Presentation
Télécharger la présentation

Parallelization and Performance of Interactive Multiplayer Game Servers

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. ECE 1747 Parallel Programming Paper Presentation Student: Qingan Andy Zhang Parallelization and Performance of Interactive Multiplayer Game Servers Ahmed Abdelkhalek and Angelos Bilas. In IPDPS 2004.

  2. Online Game Intro • Definition • Attraction • Requirement • Category • History • Example A Famous First-Person Shooting Game

  3. Online Game Intro Early MUD Game A Famous RTS Game: StarCraft

  4. Massively Multiplayer Online Game • Over Internet • Supports hundreds or thousands of players playing at the same time • Simulates the real world • Client-Server mode • Centralized server • Example: World of WarCraft (WoW) WoW Picture

  5. Quake Intro • First-Person Shooting Game • Developed in 1990s by ID Software • Support Multi-Player Gaming • Has 3D effect • Pretty Much Like Counter-Strike

  6. Internet Server Clients Quake Client-Server Structure 1, Interpreting client actions 2, Maintaining consistency 3, Communications among clients 1, Accept user input though keyboard, mouse or joystick 2, Display output (Graphics related processing), detailed model of 3D World

  7. 1 2 3 Sequential Server Structure Simulate World: time in frames Client Server Select Start frame Update World Interaction SendReceive Update State Render Receive & Process Requests Rx Update Form & Send Replies Tx End frame • Maintain consistent state • Reply ASAP • Maintain frame rate ~30 fps i.e. duration ~30ms

  8. Sequential Server Performance • Server machine: • Linux, Intel P3 1.4 GHz • 100 Mbit Ethernet • Saturates ~128 players • Scalability limited by server processing • Total bandwidth at server + clients not an issue How can we improve on this?

  9. Quake Sequential Server Summary • Quake: Fine grain (Instantaneous control of player actions) • High degree of interaction among players • Server network bandwidth and memory requirement are not issues • Server CPU processing is the bottleneck for scaling to large number of players

  10. Objectives • To scale to large number of interactive multi-players with cost effective servers • To parallelize the server application • To analyze performance of the parallel server application

  11. How to Parallelize? • Analyze the dependency • What’s a correct order? • Task parallelism • Requests / Replies / Update • How many clients per thread

  12. Methodology • Parallelize the server code using Shared memory model (Pthreads) • Workload distribution (static task assignment) • Synchronization (region based lock synchronization in the request processing phase for correct user request processing also parent locking is used) • Separate server execution into phases by global synchronization; deal with each phase separately • Thread Multiplex

  13. 1 Methodology Start from sequential server Select Update World Small CPU time (5%) Leave as sequential Receive & Process Requests 2 Rx Target these only! Form & Send Replies 3 Tx

  14. 1 2 Parallel Server Architecture Select Update World Multiple server threads Receive & Process Requests Rx Form & Send Replies 3 Tx

  15. 1 2 Parallel Server Architecture Conservative approach! Select Update World Global barriers between phases Receive & Process Requests Rx Form & Send Replies 3 Tx

  16. 1 2 Parallel Server Architecture Intra-phase synchronization? Select Update World Not an issue! Sequential phase Receive & Process Requests Big issue! Rx Read-Write phase Lock Needed! Form & Send Replies Not an issue! Read-only phase 3 Tx

  17. Shared Data Synchronization • Move execution: • Bounding box • Short-range player figure motioneg. Move around • Long-range player interaction/actioneg. Throw a grenade or shoot • Readandwrite world state (regions and objects) • Atomic access to world state • find all related objects • simulating • Key data structure: Areanode tree Top-view of game world Long-range Short-range Objects such as a tree, a wall, a building or your opponent

  18. Analysis of Quake Data • Structure of Game World: irregular, recursive data structure • Objects in World:have associated actions + properties Map 3D Data Structure: Binary Space Partition Server has a secondary data structure: Areanode Tree

  19. QuakeData Structure

  20. Region (leaf) Locking Tree leafs represent distinct regions in world Areanode tree Top-view of world Overlapping regions (squared regions are bounding box regions, objects may exist in this region) Need to lock corresponding leafs – true contention! But, does leaf-locking ensure atomic access to ALL objects within region? ANS: NO

  21. Object (parent node) Locking • Objects linked to tree nodes depending on position • If cross leaf boundary, then link to a parent node • Else link to leaf node • So sometimes (if object crosses the boundary) we have to lock parent nodes. THUS, • Threads in non-overlapping regions may contend for access to parent nodes –false sharing Top-view of world Object lists Play A’s bounding box Non-Overlapping regions Play B’s bounding box Objects cross boundary Region leafs

  22. Lock Optimization ---Reduce Contention • Game specific knowledge needed • Lock only necessary objects • Estimate object moving range instead of conservative locking • Minimize the area of the bounding box • Directional bounding box locking instead of locking whole map • Cut down the dimension of the bounding box

  23. Result • What should be measured?Benchmarking [ISPASS01] Intel P3 1.4 GHz Quad SMP w/ Hyperthreading 2 GB RAM Linux Multiple automatic players per client Server 100 Mbit LAN Clients Measure server execution time breakdown, response rate and response time

  24. Result Parallel server performance---un-optimized locking At saturation: 1, Response rate decreases; 2, Replies arrive late at clients. Max 160 players with 8 threads!

  25. Result Parallel server performance---optimized locking At saturation: 1, Response rate decreases; 2, Replies arrive late at clients. Max 176 players with 8 threads!

  26. Execution Time Breakdown Receive, Reply and Request processing scale!! (good!)Bottlenecks: Wait time ~20-40%, Lock time ~10-20%

  27. Lock time during request processing Wait time at global barriers Rx Tx Bottleneck Source Bottlenecks: Wait time ~20-40%, due to imbalancesLock time ~10-20%, due to contention

  28. Conclusion • Number of players limited by server CPU processingServer load increases super-linearly with player number • Scaling this application is challenging, difficulty comes from irregularity of application • Supports about 25% more players on 4-way Intel P3 SMP with Hyperthreading • Remaining bottlenecks • Wait time (up to 40%), due to imbalances • Lock synchronization time (up to 20%), due to contention

  29. Future Work • Self-learning ability of the server • Dynamic task assignment (game-specific)

  30. Future Work • Why the replies consume 40% time? • Further parallelize this part • Game map factor? • The experimental map is for 16-32 player • Can a bigger map get better performance? • Fraction of time on locking • Global state buffer v.s. Areanode tree, which is the major one?

  31. Discussion about ImbalanceDealing with Waiting Time! • Does task queue help? • Improve their static task assignment • How many threads we should create? • Can we remove barriers? • What is a correct event order? • Is it equal to network latency effect? • Does pipeline help?

  32. Discussion about ContentionDealing with Lock Time! • The influence of the depth of areanode • A leaf may represent a large region • What is a proper range for a leaf • Change data structure • Push all objects into leaves to reduce false sharing • More game knowledge • Lazy locking – lock when we really use it • More accurate movement prediction

  33. Relevant Work • Much work is being done in parallel computer architecture • Much work is being done on client side, such as improving 3D graphics realism and rendering speed

  34. Acknowledgement • Prof. Cristiana Amza for valuable material • Ms. Jin Chen for helpful discussion

  35. THE END Thanks for your attention! Any questions?