Loop Restructuring
170 likes | 651 Vues
Loop Restructuring. Loop unswitching Loop peeling Loop fusion Loop alignment for fusion Loop reversal Loop fission Loop alignment Loop index set splitting Loop interchange Scalar expansion. Unswitching.
Loop Restructuring
E N D
Presentation Transcript
Loop Restructuring • Loop unswitching • Loop peeling • Loop fusion • Loop alignment for fusion • Loop reversal • Loop fission • Loop alignment • Loop index set splitting • Loop interchange • Scalar expansion
Unswitching DO I = 1, N DO J = 2, N IF T(I) > 0 THEN A(I,J) = A(I,J-1)*T(I)+B(I) ELSE A(I,J) = 0.0 ENDIF ENDDOENDDO • Loop unswitching removes loop-independent conditionals • Reduces the frequency of executing branches • But: leads to code expansion DO I = 1, N IF T(I) > 0 THEN DO J = 2, N A(I,J) = A(I,J-1)*T(I)+B(I) ENDDO ELSE DO J = 2, N A(I,J) = 0.0 ENDDO ENDIFENDDO
Peeling J = 0K = MDO I = 0, N A(K) = B(J) - B(K) K = J J = J + 1 ENDDO • Loop peeling removes the first (or last) iteration of a loop into separate code • Enables loop fusion by changing bounds of one loop to match bounds of another • But: leads to code expansion J = 0K = MA(K) = B(J) - B(K)K = JJ = J + 1DO I = 1, N A(K) = B(J) - B(K) K = J J = J + 1ENDDO
Fusion S1 B(1) = T(1)*X(1)S2 DO I = 2, NS3 B(I) = T(I)*X(I)S4 ENDDOS5 DO I = 2, NS6 A(I) = B(I) - B(I-1)S7 ENDDO • Combine two consecutive loops with same IV and loop bounds into one • Fused loop must preserve all dependence relations of the original loop • Enables more effective scalar optimizations in fused loop • But: may reduce temporal locality S1 S6S3 S6 S1 B(1) = T(1)*X(1)Sx DO I = 2, NS3 B(I) = T(I)*X(I)S6 A(I) = B(I) - B(I-1)Sy ENDDO S1 S6S3(=)S6S3(<)S6 Original code has dependencesS1 S6 and S3 S6Fused loop has dependencesS1 S6 and S3(=)S6 and S3(<)S6
Example a) S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDOS4 DO I = 1, NS5 C(I) = A(I)/2S6 ENDDOS7 DO I = 1, NS8 D(I) = 1/C(I+1) S9 ENDDO S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDO Sx DO I = 1, NS5 C(I) = A(I)/2S8 D(I) = 1/C(I+1) Sy ENDDO b) Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2Sy ENDDO S7 DO I = 1, NS8 D(I) = 1/C(I+1) S9 ENDDO Which of the threefused loops is legal? c) Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2S8 D(I) = 1/C(I+1) Sy ENDDO
Alignment for Fusion S1 DO I = 1, NS2 B(I) = T(I)/CS3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO • Alignment for fusion changes iteration bounds of one loop to enable fusion when dependences would otherwise prevent fusion S2 S5 S1 DO I = 0, N-1S2 B(I+1) = T(I+1)/CS3 ENDDO S4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO S2 S5 Sx B(1) = T(1)/CS1 DO I = 1, N-1S2 B(I+1) = T(I+1)/CS5 A(I) = B(I+1) - B(I-1)S6 ENDDOSy A(N) = B(N+1) - B(N-1) Loop deps:S2(=)S5S2(<)S5
Reversal S1 DO I = 1, NS2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1)S6 ENDDO • Reverse the direction of the iteration • Only legal for loops that have no carried dependences • Enables loop fusion by ensuring dependences are preserved between loop statements S2 S5 S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = N, 1, -1S5 A(I) = B(I+1)S6 ENDDO S2 S5 S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S5 A(I) = B(I+1)S6 ENDDO S2(<)S5
Fission (1) S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO S6 ENDDO • Loop fission (or loop distribution) splits a single loop into multiple loops • Enables vectorization • Enables parallelization of separate loops if original loop is sequential • Loop fission must preserve all dependence relations of the original loop S3(=,<)S4 S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)Sx ENDDO Sy DO J = 1, 10S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO S6 ENDDO S3(=,<)S4 S1 PARALLEL DO I = 1, 10S3 A(I,1:10)=B(I,1:10)+C(I,1:10)S4 D(I,1:10)=A(I,0:9) * 2.0S6 ENDDO S3(=,<)S4
Fission (2) S1 DO I = 1, 10S2 A(I) = A(I) + B(I-1)S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)S5 D(I) = sqrt(C(I))S6 ENDDO • Compute the acyclic condensation of the dependence graph to find a legal order of the loops S3(<)S2S4(<)S3 S3(=)S4S4(=)S5 S2 S1 DO I = 1, 10S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)Sx ENDDO Sy DO I = 1, 10S2 A(I) = A(I) + B(I-1)Sz ENDDO Su DO I = 1, 10S5 D(I) = sqrt(C(I))Sv ENDDO 1 S3 S4 S3 1 0 S2 S5 S4 0 Acyclic condensation S5 Dependence graph
Alignment S1 DO I = 2, NS2A(I) = B(I) + C(I)S3 D(I) = A(I-1) * 2.0S4 ENDDO • Align statements in a loop body by expanding the iteration set • Attempts to transform loop-carried dependences into loop-independent dependences • Enables loop parallelization S2(<)S3 S1 DO i = 1, NS2 IF (i>1) A(i) = B(i) + C(i)S3 IF (i<N) D(i+1) = A(i) * 2.0S4 ENDDO S2(=)S3 S1 Before S2 S1 After S2
Index Set Splitting S1 DO I = 1, 100S2 A(I) = B(I) + C(I)S3 IF I > 10 THENS4 D(I) = A(I) + A(I-10)S5 ENDIF S6 ENDDO • Divide index set into two portions • Removes conditionals to enable other transformations • General case handles affine conditions in multi-dimensional loops by detecting a hyperplane through the iteration space polytope • But: code expansion S1 DO I = 1, 10S2 A(I) = B(I) + C(I)Sx ENDDO Sy DO I = 11, 100S2 A(I) = B(I) + C(I)S4 D(I) = A(I) + A(I-10) Su ENDDO 3*J>I Loop1 Loop2 J I
Loop Interchange (1) S1 DO I = 1, NS2 DO J = 1, MS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO • Changes the nesting order of nested loops • Loop interchange must preserve all dependence relations of the original loop • Enables vectorization of an outer loop • Can be used to improve spatial locality S3(=,<)S3 S2 DO J = 1, MS1 DO I = 1, NS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO S3(<,=)S3 S2 DO J = 1, MS3 A(1:N,J)=A(1:N,J-1)+B(1:N,J)S5 ENDDO S3(<,=)S3
Loop Interchange (2) S1 DO I = 1, NS2 DO J = 1, MS3 DO K = 1, LS4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1)S5 ENDDOS6 ENDDOS7 ENDDO • Compute the direction matrix and find which columns can be permuted without violating dependence relations in original loop nest S4(<,<,=)S4S4(<,=,>)S4 < < =< = > < < =< = > < = <= > < Invalid Direction matrix < < =< = > < < == < > Valid
Scalar Expansion S1 DO I = 1, NS2T = A(I) + B(I)S3 C(I) = T + 1/TS4 ENDDO • Breaks anti-dependence relations by expanding or promoting a scalar into an array • Scalar anti-dependence relations prevent certain loop transformations such as loop fission and loop interchange S2(=)S3S2-1(<)S3 Sx IF N > 0 THENSyALLOC Tx(1:N)S1 DO I = 1, NS2Tx(I) = A(I) + B(I)Sx C(I) = Tx(I) + 1/Tx(I)S4 ENDDOSz T = Tx(N)Su ENDIF S2(=)S3
Example S1 DO I = 1, 10S2 T = A(I,1)S3 DO J = 2, 10S4 T = T + A(I,J)S5 ENDDO S6 B(I) = TS7 ENDDO S1 DO I = 1, 10S2 Tx(I) = A(I,1)S3 DO J = 2, 10S4 Tx(I) = Tx(I)+A(I,J)S5 ENDDO S6 B(I) = Tx(I)S7 ENDDO S2(=)S4S4(=,<)S4S4(=)S6S2-1(<)S6 S2(=)S4S4(=,<)S4S4(=)S6 S1 DO I = 1, 10S2 Tx(I) = A(I,1)Sx ENDDO S1 DO I = 1, 10S3 DO J = 2, 10S4 Tx(I) = Tx(I) + A(I,J)S5 ENDDO Sy ENDDO Sz DO I = 1, 10S6 B(I) = Tx(I)S7 ENDDO S2 Tx(1:10) = A(1:10,1)S3 DO J = 2, 10S4 Tx(1:10) = Tx(1:10)+A(1:10,J)S5 ENDDO S6 B(1:10) = Tx(1:10) S2 S4S4(<,=)S4S4 S6 S2 S4S4(=,<)S4S4 S6
Other Loop Restructuring Transformations • Loop skewing: denormalize iteration vectors to change the shape of the iteration space (skew) to allow loop interchange • Strip mining: decompose a single loop into two nested loops (where the inner loop computes a strip of the data) • Loop tiling: the loop space is divided into tiles