1 / 42

A Dynamic Elimination-Combining Stack Algorithm

A Dynamic Elimination-Combining Stack Algorithm. Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011. Presnted by: Ilya Mirsky 28.03.2011. Outline. Concurrent programming terms Motivation Introduction DECS: The Algorithm

naava
Télécharger la présentation

A Dynamic Elimination-Combining Stack Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendlerand AdiSuissaDepartment of Computer Science, BGU, January 2011 Presnted by: Ilya Mirsky 28.03.2011

  2. Outline • Concurrent programming terms • Motivation • Introduction • DECS: The Algorithm • DECS Performance evaluation • NB-DECS • Summary

  3. Concurrent programming terms • Locks (coarse and fine grained) • Non blocking algorithms • Wait-freedom • Lock-freedom • Obstruction-freedom • Linearizability • Memory Contention • Latency

  4. Outline • Concurrent programming terms • Motivation • Introduction • DECS: The Algorithm • DECS Performance evaluation • NB-DECS • Summary

  5. Motivation • Concurrent stacks are widely used in parallel applications and operating systems. • A simple implementation using coarse grained locking mechanism causes a “hot spot” at the central stack object and poses a sequential bottleneck. • There is a need in a scalable concurrent stack, which presents a good performance under low, medium and high workloads, with no dependency in the ratio of the operations type (push/ pop).

  6. Outline • Concurrent programming terms • Motivation • Introduction • DECS: The Algorithm • DECS Performance evaluation • NB-DECS • Summary

  7. Introduction • Two key synchronization paradigms for construction of scalable concurrent data structures are software combining and elimination. • The most highly scalable concurrent stack algorithm previously known is the lock-free elimination-backoff stack )Hendler, Shavit, Yershalmi). • The HSY stack is highly efficient under low contention, as well as under high contention when workload is symmetric. • Unfortunately, when workloads are asymmetric, the performance of HSY deteriorates to a sequential stack. • Flat-combining (by Hendler et al.) significantly outperforms HSY in low and medium contentions, but it does not scale and even deteriorates at high contention level.

  8. Introduction - DECS • DECS employs both combining & elimination mechanism. • Scales well for all workload types, and outperforms other stack implementations. • Maintains the simplicity and low overhead of the HSY stack. • Uses a contention-reduction layer as a backoff scheme for a central stack- an elimination-combining layer. • A non blocking implementation is presented, NB-DECS, a lock-free variant of DECS in which threads that have waited for too long may cancel their “combining contract” and retry their operation on the central stack.

  9. Introduction - DECS

  10. Introduction - DECS Central Stack Elimination-combining layer

  11. Introduction - DECS Central Stack Elimination-combining layer

  12. Introduction - DECS zzz… zzz… zzz… Central Stack Elimination-combining layer

  13. Introduction - DECS zzz… zzz… zzz… Wake up! Central Stack Elimination-combining layer

  14. Introduction - DECS zzz… Central Stack Elimination-combining layer

  15. Introduction - DECS zzz… Central Stack Elimination-combining layer

  16. Introduction - DECS zzz… Central Stack Elimination-combining layer

  17. Outline • Concurrent programming terms • Motivation • Introduction • DECS: The Algorithm • DECS Performance evaluation • NB-DECS • Summary

  18. DECS- The Algorithm • The data structures MultiOp int id; intop; int length; intcStatus; Cellcell; MultiOp next; MultiOp last; 1 6 4 Collision Array Locations Array Elimination-combining layer Cell Data data; Cell next; Cell Data data; Cell next; Cell Data data; Cell next; Cell Data data; Cell next; CentralStack

  19. I wish there was someone in similar situation… DECS- The Algorithm push(data1) I wish there was someone in similar situation… Central Stack push(data2) pop()

  20. DECS- The Algorithm multiOptInfo = initMultiOp(); multiOptInfo = initMultiOp(data);

  21. DECS- The Algorithm Passive collider I’ll wait, maybe someone will arrive… …4 6 EMPTY Collision Array Active collider Yay, I can collide with thread 6! 6 T. 6 Locations Array …4 MultiOp id = 6 op = PUSH length = 1 cStatus= INIT cell next = NULL last MultiOp id = 2 op = POP length = 1 cStatus= INIT cell next = NULL last data1 T. 2 EMPTY

  22. DECS- The Algorithm • Central Stack Functions

  23. DECS- The Algorithm

  24. DECS- The Algorithm

  25. DECS- The Algorithm zzz… I see that T. 6 got PUSH, and I got POP- we can eliminate! Collision Array T. 6 Locations Array MultiOp id = 6 op = PUSH length = 1 cStatus= INIT cell next = NULL last MultiOp id = 2 op = POP length = 1 cStatus= INIT cell next = NULL last data1 T. 2 EMPTY

  26. DECS- The Algorithm • Elimination-Combining Layer Functions

  27. DECS- The Algorithm zzz… MultiOp id = 6 op = PUSH length = 1 cStatus= INIT cell next = NULL last MultiOp id = 6 op = PUSH length = 0 cStatus= FINISHED cell next = NULL last data1 T. 6 Working… MultiOp id = 2 op = POP length = 1 cStatus= INIT cell next = NULL last MultiOp id = 2 op = POP length = 0 cStatus= FINISHED cell next = NULL last T. 2 EMPTY

  28. DECS- The Algorithm zzz… MultiOp id = 6 op = PUSH length = 1 cStatus= INIT cell next = NULL last MultiOp id = 6 op = PUSH length = 0 cStatus= FINISHED cell next = NULL last data1 T. 6 Done! Working… MultiOp id = 2 op = POP length = 1 cStatus= INIT cell next = NULL last MultiOp id = 2 op = POP length = 0 cStatus= FINISHED cell next = NULL last T. 2

  29. DECS- The Algorithm

  30. DECS- The Algorithm Thank you T. 2, let’s go have a beer; I’m buying! zzz… Wake up man, I’ve done your job! T. 6 T. 2

  31. DECS- The Algorithm

  32. DECS- The Algorithm

  33. Outline • Concurrent programming terms • Motivation • Introduction • DECS: The Algorithm • DECS Performance evaluation • NB-DECS • Summary

  34. DECS Performance Evaluation • Hardware • 128-way UltraSparc T2 Plus (T5140) server. A 2 chip system, in which each chip contains 8 cores, and each core multiplexes 8 hardware threads. • Running Solaris 10 OS. • The cores in each CPU share the same L2 cache. • C++ code compiled with GCC with the –O3 flag. • Compared VS: • Treiber stack • The HSY elimination-backoff stacks • Flat-combining stack

  35. DECS Performance Evaluation • Course of experiments • Threads repeatedly apply operations on the stack for a fixed duration of 1 sec, and the resulting throughput is measured, varying the level of concurrency from 1 to 128. • Throughput is measured on both symmetric and asymmetric workloads. • Stacks are pre-populated with enough cells so that pop operations do not operate on an empty stack. • Each data point is the average of 3 runs.

  36. DECS Performance Evaluation • Symmetric workload X-axis: threads number

  37. DECS Performance Evaluation • Moderately-asymmetric workload X-axis: threads number

  38. DECS Performance Evaluation • Fully-asymmetric workload X-axis: threads number

  39. Outline • Concurrent programming terms • Motivation • Introduction • DECS: The Algorithm • DECS Performance evaluation • NB-DECS • Summary

  40. NB-DECS • DECS is blocking. • For some applications non-blocking implementation may be preferable because it’s more robust to thread failures. • NB-DECS is a lock-free variant of DECS that allows threads that delegated their operations to another thread, and have waited for too long, to cancel their “combining contracts”, and retry their operations.

  41. Outline • Concurrent programming terms • Motivation • Introduction • DECS: The Algorithm • DECS Performance evaluation • NB-DECS • Summary

  42. Summary • DECS comprises a combining-elimination layer, therefore benefits from collision of operations of reverse, as well as identical semantics. • Empirical evaluation showed that DECS outperforms all best known stack algorithms for all workloads. • NB-DECS • The idea of combining-elimination layer could be used to efficiently implement other concurrent data-structures.

More Related