1 / 56

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System. Ben Gamsa , Orran Krieger, Jonathan Appavoo , Michael Stumm. Locality. What do they mean by l ocality? locality of reference? temporal locality? spatial locality? . Temporal Locality.

long
Télécharger la présentation

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tornado: Maximizing Locality and Concurrencyin a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm

  2. Locality • What do they mean by locality? • locality of reference? • temporal locality? • spatial locality?

  3. Temporal Locality • Recently accessed data and instructions are likely to be accessed in the near future

  4. Spatial Locality • Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future

  5. Locality of Reference • If we have good locality of reference, is that a good thing for multiprocessors?

  6. Locality in Multiprocessors • Good performance depends on data being local to a CPU • Each CPU uses data from its own cache • cache hit rate is high • each CPU has good locality of reference • Once data is brought into cache it stays there • cache contents not invalidated by other CPUs • different CPUs have different locality of reference

  7. Example: Shared Counter CPU CPU Cache Cache Memory Counter

  8. Example: Shared Counter CPU CPU Memory 0

  9. Example: Shared Counter CPU CPU 0 Memory 0

  10. Example: Shared Counter CPU CPU 1 Memory 1

  11. Example: Shared Counter Read : OK CPU CPU 1 1 Memory 1

  12. Example: Shared Counter Invalidate CPU CPU 2 Memory 2

  13. Performance

  14. Problems • Counter bounces between CPU caches • cache miss rate is high • Why not give each CPU its own piece of the counter to increment? • take advantage of commutativity of addition • counter updates can be local • reads require all counters

  15. Array-based Counter CPU CPU Memory 0 0

  16. Array-based Counter CPU CPU 1 Memory 1 0

  17. Array-based Counter CPU CPU 1 1 Memory 1 1

  18. Array-based Counter Read Counter CPU 2 CPU CPU 1 1 Add All Counters (1 + 1) Memory 1 1

  19. Performance Performs no better than ‘shared counter’!

  20. Problem: False Sharing • Caches operate at the granularity of cache lines • if two pieces of the counter are in the same cache line they can not be cached (for writing) on more than one CPU at a time

  21. False Sharing CPU CPU Memory 0,0

  22. False Sharing CPU CPU 0,0 Memory 0,0

  23. False Sharing Sharing CPU CPU 0,0 0,0 Memory 0,0

  24. False Sharing Invalidate CPU CPU 1,0 Memory 1,0

  25. False Sharing Sharing CPU CPU 1,0 1,0 Memory 1,0

  26. False Sharing Invalidate CPU CPU 1,1 Memory 1,1

  27. Solution? • Spread the counter components out in memory: pad the array

  28. Padded Array CPU CPU Memory 0 0

  29. Padded Array Updates independent of each other CPU CPU 1 1 Memory 1 1

  30. Performance Works better

  31. Locality in OS • Serious performance impact • Difficult to retrofit • Tornado • Ground up design • Object Oriented approach(natural locality)

  32. Tornado • Object oriented approach • Clustered objects • Protected procedure call • Semi-automatic garbage collection • Simplifies locking protocols

  33. Object Oriented Structure • Each resource is represented by an object • Requests to virtual resources handled independently • No shared data structure access • No shared locks

  34. Why Object Oriented? Process 1 Process 2 … Process Table

  35. Why Object Oriented? Coarse-grain locking: Process 1 Lock Process 2 Process 1 … Process Table

  36. Why Object Oriented? Coarse-grain locking: Process 1 Lock Process 2 Process 1 … Process Table Process 2

  37. Object Oriented Approach Class ProcessTableEntry{ data lock code }

  38. Object Oriented Approach Fine-grain, instance locking: Process 1 Lock Process 2 Process 1 … Lock Process Table Process 2

  39. Clustered Objects • Problem: how to improve locality for widely shared objects? • A single logical object can be composed of multiple local representatives • the reps coordinate with each other to manage the object’s state • they share the object’s reference

  40. Clustered Objects

  41. Clustered Object References

  42. Clustered Objects : Implementation • A translation table per processor • Located at same virtual address • Pointer to rep • Clustered object reference is just a pointer into the table • created on demand when first accessed • global miss handling object

  43. Clustered Objects • Degree of clustering • Management of state • partitioning • distribution • replication (how to maintain consistency?) • Coordination between reps? • Shared memory • Remote PPCs

  44. Counter: Clustered Object CPU CPU Object Reference Counter – Clustered Object rep 1 rep 1

  45. Counter: Clustered Object CPU CPU 1 1 Object Reference Counter – Clustered Object rep 1 rep 1

  46. Counter: Clustered Object Update independent of each other CPU CPU 2 1 Object Reference Counter – Clustered Object rep 2 rep 1

  47. Counter: Clustered Object CPU CPU 1 1 Object Reference Counter – Clustered Object rep 1 rep 1

  48. Counter: Clustered Object Read Counter CPU CPU 1 1 Object Reference Counter – Clustered Object rep 1 rep 1 rep 1 rep 1

  49. Counter: Clustered Object Add All Counters (1 + 1) CPU CPU 1 1 Object Reference Counter – Clustered Object rep 1 rep 1 rep 1 rep 1

  50. Synchronization • Two distinct locking issues • Locking • mutually exclusive access to objects • Existence guarantees • making sure an object is not freed while still in use

More Related