1 / 28

Pipelined Broadcast on Ethernet Switched Clusters

Pipelined Broadcast on Ethernet Switched Clusters. Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306. Broadcast communication(MPI_Bcast). n 0. n 1. n 2. n 3. Before. A. B. C. D. n 0. n 1. n 2. n 3. After. A. B.

mattes
Télécharger la présentation

Pipelined Broadcast on Ethernet Switched Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pipelined Broadcast on Ethernet Switched Clusters Pitch Patarasuk, Ahmad Faraj, Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306

  2. Broadcast communication(MPI_Bcast) n0 n1 n2 n3 Before A B C D n0 n1 n2 n3 After A B C D A B C D A B C D A B C D Let T(msize) = time to send a message of size msizeBroadcast(msize) >= T(msize)

  3. Ethernet Switched Cluster switch switch switch switch

  4. Problem statement: • How to efficiently realize the broadcast operation with large message sizes on Ethernet switched clusters. • Using pipelined broadcast can achieve near optimal results (T(msize) time for broadcasting a message of size msize). • Finding contention free broadcast tree • Finding a good segment size

  5. Traditional Broadcast algorithms • Linear tree 0 1 2 3 4 5 6 7 Time = (P-1) x T(msize) • Flat tree 0 1 2 3 4 5 6 7 Time = (P-1) x T(msize)

  6. Binary tree • k-ary tree 0 0 1 2 1 2 3 3 4 5 6 4 5 6 7 7 • Time = 2x(log2(P+1)-1)xT(msize)

  7. Binomial tree 0 • Time = log2P x T(msize) 4 2 1 6 5 3 7

  8. Scatter/Allgather n0 n1 n2 n3 Before A B C D Scatter A B C D Allgather A B C D A B C D A B C D A B C D Time = 2 x T(msize)

  9. Time Complexity for large messages

  10. Pipelined Broadcast Algorithm • Linear pipeline 0 1 2 3

  11. Performance of pipelined broadcast: • Assume no network contention • a message of size msize be broken into X messages of msize/X. • H: tree hight, D: the number of children • Size of pipelined stage: D * T(msize/X) • Total time T: (X + H –1) * (D * T(msize /X)) • linear tree: H = P, D = 1, T = T(msize) • Binary tree: H = log(P), D= 2, T = 2T(msize) • K-ary tree: H = log_k(P), D = k, in general not as efficient as binary tree.

  12. Time Complexity for large messages

  13. Pipelined broadcast • How to find a contention-free broadcast tree? • How to select the best segment size?

  14. Example of network contention switch switch • There is a link contention cause by communication (14), (25), (2  6), and (3  7) • Binary tree n4,n5,n6,n7 n0,n1,n2,n3 0 1 2 3 4 5 6 7

  15. Linear tree switch switch n2,n3,n6,n7 n0,n1,n4,n5 The linear tree 0123…7 will have a contention caused by (12) and (56)

  16. Algorithm for constructing contention free linear tree • Step 1: Traverse through all switches using depth-first-search (DFS) algorithm, name the switch by the order of their arrival in DFS tree • Step 2: The linear tree consists of all machines in switch S0, follows by all machines in S1, then S2,and so on

  17. Example of contention free linear tree n0,n1,n4,n5 n2,n3,n6,n7 n12,n13,n14,n15 Switch S0 Switch S1 Switch S3 Switch S2 n8,n9,n10,n11 Linear tree: n0n1n4n5236789…15

  18. Algorithm for constructing contention free binary tree • Start with a contention free linear tree • Recursively divide the tree into 2 sub-trees • Make sure that the cannot be a contention • The sub-trees are chosen such that the height of the whole tree will be minimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  19. Binary tree height • Performance of binary pipeline broadcast depends on the height of a binary tree • Even though contention free binary tree may not be a complete binary tree, its height is not that much more than a complete binary tree

  20. Average tree heights for 20 randomly generated topologies

  21. Evaluation • Contention free pipelined algorithms: • Routine generators from topology information • The generated routines are based on MPICH p2p primitives. • Linear tree • Binary tree • 3-nary tree • Targets for comparison: • MPICH: Binomial tree, Scatter/allgather • LAM: Flat-tree, Binomial • Topology unaware pipelined linear and binary algorithms

  22. Evaluation

  23. Performance of different pipelined trees (topology 1)

  24. Comparing pipelined broadcast with other schemes

  25. Topology unaware and contention-free pipelined broadcast

  26. Segment size for pipelined broadcast

  27. Conclusions • Pipelined broadcast is faster than the current broadcast algorithm for medium and large messages • Linear pipeline has a completion time roughly equal to T(msize) • binary pipeline broadcast is best for medium messages • Contention free broadcast tree is necessary for pipelined algorithms • A good segment size for pipelined broadcast is not difficult to find.

  28. Questions?

More Related