1 / 25

CS137: Electronic Design Automation

CS137: Electronic Design Automation. Day 6: April 17, 2002 Parallel Prefix. Today. Parallel Prefix Sample Applications. Key Result. Can compute cascaded result sequence on any associative operator In O(log(N)) time With O(N) hardware. Familiar Instance.

gerardh
Télécharger la présentation

CS137: Electronic Design Automation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS137:Electronic Design Automation Day 6: April 17, 2002 Parallel Prefix

  2. Today • Parallel Prefix • Sample Applications

  3. Key Result • Can compute cascaded result sequence on any associative operator • In O(log(N)) time • With O(N) hardware

  4. Familiar Instance • Carry-Lookahead Adder is a special case of this general result

  5. CLA • Observation: • Each bit of adder will do one of three things: • S - Squash the carry: 0,0 • G - Generate a carry: 1,1 • P - Propagate a carry: 0,1 or 1,0

  6. Further • Each continuous sequence will do these same things: • Squash • Generate • Propagate

  7. Combining • And can be computed from the base elements • ? S  S • ? G  G • S P  S • G P  G • P P  P

  8. Apply Recursively • PG(i,i) = f(A,B) • PG(i,j) = PG(I,k) PG(k,j) • PG(0,1) = PG(0) PG(1) • PG(0,3) = PG(0,1) PG(2,3) • PG(0,7) = PG(0,3) PG(4,7) • … • PG(0,N-1) = PG(0,N/2-1),PG(N/2,N-1) • Cout(N) = Cin(0) PG(0,N-1)

  9. All Carries • Further, once have full tree can compute all prefixes in another log steps • E.g. • PG(0,13) = PG(0,8) PG(9,12) PG(13)

  10. Complete Sum • After 2log(N) time: • Up tree to compute PG’s • Down tree to compute PG(0,m)’s • Compute results in O(1) time • C(m) = Cin PG(0,m) • S(m)=F(A,B,C(m-1))

  11. Resulting CLA

  12. Associative • Works because associative • Can go ahead and compute PG(N/2,N-1) • Before know PG(0,N/2-1) • Then combine in unit time.

  13. Consequence • Allows us to perform many seemingly sequential operations in parallel

  14. Prefix Sum • Common Operation: • Want B[x] such that B[x]=A[0]+A[1]+…A[x] • For I=0 to x • B[x]=B[x-1]+A[x]

  15. Prefix Sum • Compute in tree fashion • A[I]+A[I+1] • A[I]+A[I+1]+A[I+2]+A[I+3] • … • Combine partial sums back down tree • S(0:7)+S(8:9)+S(10)=S(0:10)

  16. Other simple operators • Prefix-OR • Prefix-AND • Prefix-MAX • Prefix-MIN

  17. Find-First One • Useful for arbitration • Finds first (highest-priority) requestor • Also magnitude finding in numbers • How: • Prefix-OR • Locally compute X[I-1]^X[I] • Flags the first one

  18. Arbitration • Often want to find first M requestors • E.g. Assign unique memory ports to first M processors requesting • Prefix-sum across all potential requesters • Counts requesters, giving unique number to each • Know if one of first M • Perhaps which resource assigned

  19. Others • Parsing • FSM-state-trace • Recurrence relationships • Rank finding (sorting) • Partitioning • Sorting • Sequential Instruction evaluation (Ultrascalar) • Saturating Accumulation (kp)

  20. FSM-trace • Build composite FSM • I.e. view FSM as F(state)state • Compute new transition functions • FSM[i,j](state) • Give input state at step I, compute output of FSM after step j • FSMs accept regular languages so works for regular expression parsers

  21. Rank Finding • Looking for I’th ordered element • Do a prefix-sum on high-bit only • Know m=number of things > 01111111… • High-low search on result • I.e. if number > I, recurse on half with leading zero • If number < I, search for (I-m)’th element in half with high-bit true • Find median in log2(N) time

  22. Partitioning • Use something to order • (Like we’re thinking about) • Parallel prefix on area of units • If not all same area • Know where the midpoint is

  23. Channel Width • Prefix sum on delta wires at each node • To compute net channel widths at all points along wire

  24. Variations • Segmented • Cyclic Segmented

  25. Big Ideas • Any associative operation can be made parallel

More Related