1 / 41

Symbolic Program Transformation for Numerical Codes

Symbolic Program Transformation for Numerical Codes. Vijay Menon Cornell University. Background. Compiler/Runtime Support for Numerical Programs (under Keshav Pingali) Sparse matrix code generation Memory hierarchy optimizations Compiler/runtime techniques for MATLAB

Gabriel
Télécharger la présentation

Symbolic Program Transformation for Numerical Codes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Symbolic Program Transformation for Numerical Codes Vijay Menon Cornell University

  2. Background • Compiler/Runtime Support for Numerical Programs (under Keshav Pingali) • Sparse matrix code generation • Memory hierarchy optimizations • Compiler/runtime techniques for MATLAB • MAJIC (with University of Illinois) • MultiMATLAB

  3. Motivation • Using matrix properties to generate efficient code • e.g., Associativity, Commutativity of matrix operations • Problem: loop notation • Matrix operations hidden within loop nests

  4. Example: LU Factorization with Partial Pivoting • Also called: Gaussian Elimination • Key algorithm for solving systems of linear equations: • To solve A x = b for x: • => Factor A into L U • => Solve L y = b for y • => Solve U x = y for x

  5. LU Factorization do j = 1,N p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i; do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp; do i = j+1,N A(i,j) = A(i,j)/A(j,j); do k = j+1,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k); Select pivot row: Swap pivot row with current: Scale column (to store L): Update(to compute partial U): • Problem: Poor cache behavior x x x x x x 0 x x x x x 0 0 5 x x x 0 0 3 x x x 0 0 7x x x 0 0 2 x x x

  6. Automatic Blocking do jB = 1,N,B do j = jB,jB+B-1 p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i; do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp; do i = j+1,N A(i,j) = A(i,j)/A(j,j); do k = j+1,jB+B-1 do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k); do j = jB,jB+B-1 do k = jB+B,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k); • Compiler transformations: • Stripmine • Index-Set Split • Loop Distribute • Tile/Map to BLAS • Problems: • Establishing legality • Mapping to BLAS

  7. LU Performance

  8. Key Points • Conventional techniques (dependence analysis) are insufficient to establish legality • Detection of matrix operations buys extra performance

  9. Overview of this Talk • Fractal Symbolic Analysis • Framework for Symbolically determining Legality of Program Transformations • Matrixization • Generalization of Vectorization to Matrix Operations

  10. Fractal Symbolic Analysis • Symbolic test to establish legality of program transformations for i = 1 : n S1(i); for i = 1 : n S2(i); for i = 1 : n S1(i); S2(i);

  11. Dependence Analysis • Independent operations may be reordered • Legality: No data dependence from S2(m) to S1(l) where l > m for i = 1 : n S1(i); for i = 1 : n S2(i); for i = 1 : n S1(i); S2(i);

  12. Symbolic Comparison • Dependence Analysis is conservative: • Symbolic execution shows equality: • aout = 2*(ain + bin) • bout = 2*bin • But, intractable for recurrent loops s1: a = 2*a s2: b = 2*b s3: a = a+b s3: a = a+b s1: a = 2*a s2: b = 2*b ?

  13. Distillation of LU • Given p(j)  j, prove: • Dependence analysis: too conservative • Symbolic comparison: intractable for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for j = 1:n for i = j+1:n a(i) = a(i)/a(j) for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for i = j+1:n a(i) = a(i)/a(j) B1(j): B1(j): B2(j): B2(j):

  14. Idea • Simplify • Prove equality of complex programs via equality of simpler programs • Repeat if necessary • Sufficient, but not necessary • equality of simpler programs -> equality of original programs

  15. Commutativity S1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n) i,j  1..n | i < j  f(i) > f(j). Si; Sj = Sj; Si • Permutation  Sequence of Adjacent Transpositions

  16. Commutative Legality • Loop Distribution • Legality:  1  m < l  N. S2(m); S1(l) = S1(l); S2(m) for i = 1 : n S1(i); S2(i); for i = 1 : n S1(i); for i = 1 : n S2(i);

  17. Commutative Legality • Similar conditions: • Statement reordering • Loop interchange • Loop reversal • Loop tiling • Compiler establishes legality of transformation by testing commute condition

  18. Back to Example • Given p(j)  j, prove: • Simplified comparison => for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for j = 1:n for i = j+1:n a(i) = a(i)/a(j) for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for i = j+1:n a(i) = a(i)/a(j) B1(j): B1(j): B2(j): B2(j):

  19. Simplified Comparison • Given p(j)  j  l > m prove: • Further simplification? for i = m+1:n a(i) = a(i)/a(m) tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp for i = m+1:n a(i) = a(i)/a(m) B2(m): B1(l): B1(l): B2(m):

  20. Too Simple! • Given p(j)  j  l > m  i > m prove: • Too conservative - no longer legal! • Hypothesis: • Repeated application -> dependence analysis tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp a(i) = a(i)/a(m) a(i) = a(i)/a(m) tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp

  21. Symbolic Comparison • Can we symbolically prove?: • Effect can be summarized: • non-recurrent loops • affine indices/loop bounds tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp for i = m+1:n a(i) = a(i)/a(m) for i = m+1:n a(i) = a(i)/a(m) tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp B2(m): B1(l): B1(l): B2(m):

  22. Guarded Expressions • Compare symbolic values of live output variable in terms of input variables: • aout (k)= • Each • guard : affine expression defining part of aout • expr : symbolic expression on input data guard1 -> expr1 guard2 -> expr2 …… guardn -> exprn

  23. Guarded Expressions • For both program blocks: • aout(k) = • Omega Library (integer programming tool) to manipulate/compare affine guards k  m => ain(k) k = l => ain(p(l))/ain(m) k = p(l) => ain(l)/ain(m) else => ain(k)/ain(m)

  24. Summary of Fractal Symbolic Analysis • Powerful legality test • Explores tradeoffs between • tractability (dependence analysis) • accuracy (symbolic comparison) • Similar application for LU w/ pivoting • 2 recursive simplification steps • 6 guarded regions/expressions • Prototype implemented in OCAML

  25. Overview of this Talk • Fractal Symbolic Analysis • Framework for Symbolically determining Legality of Program Transformations • Matrixization • Generalization of Vectorization to Matrix Operations

  26. Matrixization • Detect Matrix Operations • Map to • hand-optimized libraries (BLAS) • special-purpose hardware • Eliminate loop overhead • MATLAB: type/bounds checks • Exploit Matrix Semantics • Ring Properties of Matrix (+,*)

  27. BLAS Performance

  28. MATLAB Performance

  29. Galerkin Example • In Galerkin, 98% of time spent in: • for i = 1:N • for j = 1:N • phi(k) += A(i,j)*x(i)*y(i); • end • end • A - Matrix • x, y - Vector

  30. Vectorized Code • In Optimized Galerkin: phi(k) += x*A*y’; • Fragment Speedup: 260 • Program Speedup: 110

  31. Conventional Approaches • Syntactic Pattern Replacement (KAPF,VAST,FALCON) • Can encode high-level properties • Limited use on loops • Vector Code Generation • Can detect array operations in loops • Cannot detect/exploit matrix products

  32. Our Approach • Map code to: Abstract Matrix Form (AMF) • Convert to Symmetric AMF formulation • Optimize AMF expressions • factorization • invariant removal • Detect Matrix Products • Map AMF to MATLAB/BLAS

  33. Expansion • Array expressions: • Forall loops: a(1:m,1:n) -> i j ai,j for i = 1:n x(i) = x(i) + y(i); -> ixi = i(xi+yi) end

  34. Reduction • Sum Reduction Loops for i = 1:n k = k + x(i); -> k = k + î i xi end

  35. Product • Matrix Product C = A*B -> ijCi,j= P(ijAi,h,ijBh,j) • Product-Reduction Equivalence: • e1 and e2 are scalar • e1 is constant w.r.t. ik+1…in • e2 is constant w.r.t. i1…ik-1 îk (i1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)

  36. Other AMF Properties • Distributive Properties • e1·î e2 = î((ie1) ·e2) • when e1 is constant w.r.t. i • Interchange Properties • if(e1, e2,…, en) = f(ie1, ie2,…, ien) • i  e = i e • where î= • i j e = j i e • î  e = î e

  37. Back to Galerkin • Original: • k = k + îi j (ai,j * xi * yj) • Convert to Symmetric Form: • k = k + î(i j ai,j * i j xi * i j yj) • Optimize: • k = k + îi xi * (i j ai,j * i j yj) • Map to Matrix Operations: • k = k + Pî (i xi , P(i j ai,j,j yj)) • => k = k + x’*A*y;

  38. Other Vectorization Examples • Taken from USENET: for i = 1:n for j = 1:n C(i,j) = A(i,j)*x(i); C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n) for i = 1:n x(i) = y(:,i)*A*y(:,i)’; x(1:n) = sum(y.*(y*A’),2);

  39. Summary/Status of Matrixization • General technique to detect matrix/vector operations • Implemented in MAJIC • MATLAB interpreter/compiler • Future work: • Extend to more ops: recurrences • JIT Matrixization

  40. Related Work • Commutativity Analysis • Rinard & Diniz • Parallelization of OO programs • High-Level Pattern Replacement • DeRose, Marsolf, … • Exploit high-level matrix properties • Structure • Symmetry • Orthogonality

  41. Conclusion • High-level symbolic techniques: • Fractal Symbolic Analysis • Matrixization • New techniques to analyze & exploit underlying computation and achieve substantial performance gains

More Related