1 / 134

Game Connection 2012

Memory Management Strategies Master Class. Game Connection 2012. About myself. Studied computer science at VUT, Austria Working in the games industry since 2004 PC, XBox360, PS2, PS3, Wii, DS Specialization in low-level programming (threading, debugging, optimization) Teaching

ebony-blake
Télécharger la présentation

Game Connection 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Management Strategies Master Class Game Connection 2012

  2. About myself • Studied computer science at VUT, Austria • Working in the games industry since 2004 • PC, XBox360, PS2, PS3, Wii, DS • Specialization in low-level programming (threading, debugging, optimization) • Teaching • Founder & CTO @ Molecular Matters • Middleware for the games industry

  3. Master class • Participation • Exchange of experiences • Discussion • There is no perfect way of doing things • There are many „rights“ & „wrongs“ • Let us talk about past experiences, mistakes, improvements • Share ideas! • Ask questions!

  4. Agenda • C++ new/delete/placement syntax • Virtual memory • Allocators • Allocation strategies • Debugging facilities • Fill patterns • Bounds checking • Memory tracking

  5. Agenda (cont'd) • Custom memory system • Relocatable allocations • Run-time defragmentation • Debugging memory-related bugs • Stack overflow • Memory overwrites

  6. C++ new/delete/placement syntax

  7. What's wrong with that? • void* operator new(size_t size, unsigned int align){ // align memory by some means return _aligned_malloc(size, align);}NonPOD* nonPod = new (32) NonPOD;NonPOD* nonPodAr = new (32) NonPOD[10]; • Addresses of nonPod and array?

  8. C++ new • How do we allocate memory? • Using the new operator (keyword new) • T* instance = new T; • What happens behind the scenes? • Calls operator new to allocate storage for a T • Calls the constructor for non-POD types

  9. C++ delete • How do we free memory? • Using the delete operator (keyword delete) • delete instance; • What happens behind the scenes? • Calls the destructor for non-POD types • Calls operator delete to free storage

  10. C++ new, placement syntax • Keyword new supports placement syntax • Canonical form called placement new • Calls operator new(size_t, void*) • Returns the given pointer, does not allocate memory • Constructs an instance in-place • T* instance = new (memory) T; • Destructor needs to be called manually • instance->~T();

  11. C++ new, placement syntax (cont'd) • Placement syntax supports N parameters • The compiler maps keyword new to the corresponding overload of operator new • T* instance = new (10, 20, 30) T;calls void* operator new(size_t, int, int, int); • First argument must always be of type size_t • sizeof(T) is inserted by the compiler

  12. C++ new, placement syntax (cont'd) • Very powerful! • Custom overloads for operator new • Each overload must offer a corresponding operator delete • Can store arbitrary arguments for each call to new • An operator is just a function • Can be called directly if desired • Needs manual constructor call using placement new • Can use templates

  13. C++ delete, placement syntax • Keyword delete does not support placement syntax • delete (instance, 10, 20); • Treated as a statement using the comma operator • Overloads of operator delete used when an exception is thrown upon a call to new • Overloads can also be called directly • Needs manual destructor call

  14. C++ new[] • Creates an array of instances • Similar to keyword new, calls operator new[] • Calls the constructor for each non-POD instance • Supports placement syntax • Custom overloads of operator new[] possible • First sizeof() argument is compiler-specific • POD vs. non-POD!

  15. C++ new[] (cont'd) • For non-PODs, constructors are called • delete[] needs to call destructors • How many destructors to call? • Compiler needs to store the number of instances • Most compilers add an extra 4 bytes to the allocation size • sizeof(T)*N + 4 (non-POD)sizeof(T)*N (POD)

  16. C++ new[] (cont'd) • Important! • Address returned by operator new[] != address to first instance in the array • Source of confusion • Compiler-specific behaviour, makes it almost impossible to call overloads of operator delete[] directly • Do we need to go back 4 bytes or not? • Makes support for custom alignment harder

  17. C++ delete[] • Deletes an array of instances • Similar to keyword delete, calls operator delete[] • Calls the destructors for each non-POD instance in reverse order • Again, POD vs. non-POD • Number of instances to destruct is stored by the compiler for non-POD types

  18. C++ new vs. delete mismatch • Allocating with new, deleting with delete[] • operator delete[] expects the number of instances • May crash • Allocating with new[], deleting with delete • More subtle bugs, only one destructor will be called • Visual Studio heap implementation is smart enough to detect both mismatches

  19. Summary • new != operator new • delete != operator delete • new[]/delete[] are compiler-specific • Never mix new/delete[] and new[]/delete • new offers powerful placement syntax

  20. Virtual memory

  21. Virtual memory • Each process = virtual address space • Not to be confused with paging to hard disk • Virtual memory != physical memory • Address translation done by MMU • OS allocates/reserves memory in pages • Page sizes: 4KB, 64KB, 1MB, ...

  22. Virtual memory (cont'd) • Virtual addresses are mapped to physical memory addresses • Contiguous virtual addresses != contiguous physical memory • A single page is the smallest amount of memory that can be allocated • Access restrictions on a per-page level • Read, write, execute, ...

  23. Virtual memory (cont'd) • Simplest address translation: • Virtual address = page directory + offset • Page directory = physical memory page + additional info • Page directory entries set by OS • In practice: Multi-level address translation • See „What every programmer should know about memory“ by Ulrich Drepper • http://lwn.net/Articles/253361/

  24. Virtual memory (cont'd) • Address translation is expensive • Several accesses to memory • „Page walk“ • Result of address translation is cached • Translation Look-aside Buffer (TLB) • Multiple levels, like D$ or I$ • TLB = Global resource per processor

  25. Virtual memory (cont'd) • Allows to allocate contiguous memory even if the physical memory is not contiguous • Available on many architectures (PC, Mac, Linux, almost all consoles) • Used by CPU only • GPU, sound hardware, etc. needs contiguous physical memory • E.g. XPhysicalAlloc

  26. Virtual memory (cont'd) • Growing allocators can account for worst-case scenarios more easily when using VM • Different address ranges for different purposes • Heap, stack, code, write-combined, ... • Helps with debugging!

  27. Summary • Virtual memory nice to have, but not a necessity • Can help tremendously with debugging • Virtual memory made availabe to CPU, not GPU or other hardware • Virtual memory address range >> RAM

  28. Allocators

  29. Why different allocators? • No silver bullet, many allocation qualities • Size • Fragmentation • Wasted space • Performance • Thread-safety • Cache-locality • Fixed size vs. growing

  30. Common allocators • Linear • Stack, double-ended stack • Pool • Micro • One-frame, two-frame temporary • Double-buffered I/O • General-purpose

  31. Linear allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • - Must free all allocations at once

  32. Stack allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • +/- Must free allocations in reverse-order

  33. Double-ended stack allocator • Similar to stack allocator • Can allocate from bottom or top • Bottom for resident allocations • Top for temporary allocations • Mostly used for level loading

  34. Pool allocator • - Supports one allocation size only • + Very fast, simple pointer exchange • + Fragments, but can always allocate • + No wasted space • + Lock-free implementation possible • - Holes between allocations • + Memory can be allocated/freed in any order

  35. Pool allocator (cont'd) • In-place free list • No extra memory for book-keeping • Re-use memory of freed allocations • Point to next free entry

  36. Micro allocator • Similar to pool allocator, but different pools for different sizes • + Very fast, lookup & simple pointer exchange • + Fragments, but can always allocate • - Some wasted space depending on size • + Can use pool-local critical sections / lock-free • - Holes between allocations • + Memory can be allocated/freed in any order

  37. One-frame temporary allocator • Similar to linear allocator • Used for scratchpad allocations during a frame • Another alternative is to use stack memory • Fixed-size • alloca()

  38. Two-frame temporary allocator • Similar to one-frame temporary allocator • Ping-pong between two one-frame allocators • Results from frame N persist until frame N+1 • Useful for operations with 1 frame latency • Raycasts

  39. Double-buffered I/O allocator • Two ping-pong buffers • Read into buffer A, consume from buffer B • Initiate reads while consuming • Useful for async. sequential reads from disk • Interface offers Consume() only • Async. reads done transparently & interleaved

  40. General-purpose • Must cope with small & large allocations • Used for 3rd party libraries • Properties • - Slow • - Fragmentation • - Wasted memory, allocation overhead • - Must use heavy-weight synchronization

  41. General-purpose (cont'd) • Common implementations • „High Performance Heap Allocator“ in GPG7 • Doug Lea's „dlmalloc“ • Emery Berger's „Hoard“

  42. Growing allocators • With virtual memory • Reserve worst-case up front • Backup with physical memory when growing • Less hassle during development • Can grow without relocating allocations • Without virtual memory • Resize allocator for e.g. each level • Needs adjustment during development

  43. Allocators • Separate how from where • How = allocator • Where = heap or stack • Offers more possibilities • Allows to use stack with different allocators

  44. Summary • No allocator fits all purposes • Each allocator has different pros/cons • Ideally, for each allocation think about • Size • Frequency • Lifetime • Threading

  45. Allocation strategies

  46. Why do we need a strategy? • Using a general-purpose allocator everywhere leads to • Fragmented memory • Wasted memory • Somewhat unclear memory ownership • Excessive clean-up before shipping • We can do better!

  47. Decision criteria • Lifetime • Application lifetime • Level lifetime • Temporary

  48. Decision criteria (cont'd) • Purpose • Temporary while loading a level • Temporary during a frame • Purely visual (e.g. bullet holes) • LRU scheme • Streaming I/O • Gameplay critical

  49. Decision criteria (cont'd) • Frequency • Once • Each level load • Each frame • N times per frame • Should be avoided in the first place

  50. Where would you put those? • Application-wide singleton allocations • Render queue/command buffer allocations • Level assets • Particles • Bullets, collision points, … • 3rd party allocations • Strings

More Related