1 / 22

Parallel Prefix and Data Parallel Operations

Parallel Prefix and Data Parallel Operations. Motivation: basic parallel operations which occurs repeatedly . Let ) be an associative operation. (a 1 ) a 2 ) ) a 3 = a 1 ) (a 2 ) a 3 ) How to compute (a 1 ) a 2 ) …. ) a n ) in parallel in O(logn) time?. Approach 1.

Télécharger la présentation

Parallel Prefix and Data Parallel Operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Prefix and Data Parallel Operations Motivation:basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a1) a2) ) a3 = a1) (a2) a3 ) How to compute (a1) a2 )…. ) an ) in parallel in O(logn) time?

  2. Approach 1 a0 a1 a2 a3 a4 a5 a6 a7 [0:0] [0:1] [1:2] [2:3] [3:4] [4:5] [5:6] [6:7] d=1 [0:0] [0:1] [0:2] [0:3] [1:4] [2:5] [3:6] [4:7] d=2 [0:0] [0:1] [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] d=4 Assume that n = 2k for i = 0 to k-1 for j = 0 to n-1-2i do in parallel x[j+ 2i ] = x[j] + x[j+ 2i ]

  3. St R Sr Sl How to do on Tree Architecture? for each node if there is a signal from left and right St <- Sl + Sr if there is a signal R, send R to both its children if the node is a leaf and there is a signal R, X <- X + R

  4. How to do on a Hypercube A complete binary tree can be embedded into a hypercube Simpler solution: each node computes prefix and total sum for i = 0 to k-1 for j = 0 to n-1 do in parallel x[j] = x[j] + sum[ji] if i-th bit of j = 1 sum[j] = sum[j] + sum[ji], where ji and j have the same binary number representation except their i-th bit, where the i-th bit of ji is the complement of the i-bit of j.

  5. a0 a1 a2 a3 a4 a5 a6 a7 X SUM [0:0] [0:3] [0:0] [0:1] [0:0] [0:7] [0:1] [0:7] [0:1] [0:1] [0:1] [0:3] [2:2] [0:7] [2:2] [2:3] [2:2] [0:3] [2:3] [2:3] [2:3] [0:7] [2:3] [0:3] [4:4] [4:5] [4:4] [4:7] [0:4] [0:7] [4:5] [4:5] [4:5] [4:7] [0:5] [0:7] [0:6] [0:7] [6:6] [6:7] [4:6] [4:7] [6:7] [6:7] [4:7] [4:7] [0:7] [0:7] d=1 X SUM X SUM d=2 d=4 Prefix on Hypercube for i = 0 to k-1 for j = 0 to n-1 do in parallel x[j] = x[j] + sum[ji] if i-th bit of j = 1 sum[j] = sum[j] + sum[ji],

  6. Applications of Data Parallel Operations Any associative operations: Examples: • min, max, add • adding two binary numbers • finite state automata • radix sort • segmented prefix sum • routing • packing • unpacking • broadcast (copy-scan) • solving recurrence equations • straight line computation (parallel arithmetic evaluation)

  7. Adding two n bit numbers as parallel prefix • a = an-1 …. a0 • b = bn-1 …. b0 • s = a + b • note that si = ai  bi ci-1 • to compute ci define g and p as: gi = ai  bi , pi = ai  bi • define  as : (g,p)  (g’,p’) = (g  (p  g’), p  p’) Then carry bit ci can be computed by: (g,p)  (g’,p’) = (g  (p  g’), p  p’) (Gi, Pi) = (gi,pi)  (gi-1, pi-1)  …  (g0,p0) and Gi = ci

  8. a15 b15 a14 b14 a13 b13 a12 b12 a11 b11 a10 b10 a9 b9 a8 b8 a7 b7 a6 b6 a5 b5 a4 b4 a3 b3 a2 b2 a1 b1 a0 b0 Hardware circuit of recursive look-ahead adder

  9. b b q2 q0 q1 c c b   c b c c q1’ q2’ q3’ q2 q0 qr q1 qr q0 q1 qr q0 q2 q0 qr q1’ q2’ q3’ q0 q1 qr q0 qr q2 q1’ q2’ q3’ q0 q1 qr q0 qr qr q0 q1 qr q0 qr q2 q1 qr q0 q0->q2 q1->q0 q2->qr Parsing a regular language (q0,b) = q2, (q0,c) = q1, (q1,b) = q0, (q1,c) = qr, (q2,b) = qr, (q2,c) = q0 qr: reject state b

  10. before Segment boundary 1 2 3 4 5 6 7 8 after 1 3 3 7 12 18 7 15 Segmented Prefix operation

  11. ’ b | b a a  b | b | a | (a  b) | b Segmented Prefix computation Let  be any associative operation. For segmented operation of , define ’ as follows: Then ’ is associative and we can compute segmented operation in O(logn) time.

  12. Enumerating Data = [5 6 3 1 8 3 7 5 9 2] active procs = [1 0 1 1 0 0 1 0 1 0] enumerated = [0 x 1 2 x x 3 x 4 0]

  13. packing data = [5 6 3 1 8 3 7 5 9 2] active procs = [1 0 1 1 0 0 1 0 1 0] enumerated = [0 x 1 2 x x 3 x 4 x] packed data = [5 3 1 7 9 x x x x x]

  14. Packing and Unpacking on Hypercube Packing • adjust bit 0 • adjust bit 1 • adjust bit 2 • ... • adjust bit k-1 Unpacking • adjust bit k-1 • adjust bit k-2 • ... • adjust bit 1 • adjust bit 0 How about in the order of adjust bit 0, 1, ..., k-1 for packing?

  15. Unpacking Address 0 1 2 3 4 5 6 7 8 9 data = [6 2 3 5 9 x x x x x] active procs = [1 0 1 1 0 0 1 0 1 0] enumerated = [0 x 1 2 x x 3 x 4 x] destination = [0 2 3 6 8 x x x x x] unpacked data = [6 x 2 3 x x 5 x 9 x]

  16. Copy Scan (broadcast) address 0 1 2 3 4 5 6 7 8 9 data = [ 6 2 3 5 9 4 1 7 8 10] segmented bit = [ 1 0 1 1 0 0 1 0 1 0] result = [ 6 6 3 5 5 5 1 1 8 8]

  17. Radix Sort for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { if j-th bit of x[i] is 0 { y[i] = enumerate c = count } if j-th bit of x[i] is 1 y [i] <- enumerate + c x [y[i]] = x [i] } Radix sort another code for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { pack left x[i] if j-th bit of x[i] pack right x[i] if j-th bit of x[i] }

  18. Quick Sort 1. Pick a pivot p 2. Broadcast p 3. For all PE i, compare A[i] with p { if A[i] <p, pack left A[i] in the segment if A[i] >= p, pack right A[i] in the segment } 4. Mark the segment boundary 5. Each segment, quick sort recursively

  19. Solving Linear Recurrence Equations fn=an-1fn-1 + an-2fn-2 fn fn-1

  20. 22 14 13 18 13 1 2 3 4 5 6 7 10 18 25 27 28 13 22 7 9 3 7 7 18 5 7 11 Pointer Jumping and Tree Computation How to compute a prefix on a linked list? If NEXT[i] != NILL then X[i] <- X[i] + X[NEXT[i]] NEXT[i] <- NEXT[NEXT[i]] How to make 1 3 6 10 15 21 28 order?

  21. Each node 1 Leaf node 1 Application: Tree computation Pre-order numbering Can be applied to in order, post order number of children, depth etc. Bi-component, etc also

  22. Recurrence Equation Example: LU decomposition on a triangular matrix

More Related