Symbolic Program Transformation for Numerical Codes

Symbolic Program Transformation for Numerical Codes Vijay Menon Cornell University

Background • Compiler/Runtime Support for Numerical Programs (under Keshav Pingali) • Sparse matrix code generation • Memory hierarchy optimizations • Compiler/runtime techniques for MATLAB • MAJIC (with University of Illinois) • MultiMATLAB

Motivation • Using matrix properties to generate efficient code • e.g., Associativity, Commutativity of matrix operations • Problem: loop notation • Matrix operations hidden within loop nests

Example: LU Factorization with Partial Pivoting • Also called: Gaussian Elimination • Key algorithm for solving systems of linear equations: • To solve A x = b for x: • => Factor A into L U • => Solve L y = b for y • => Solve U x = y for x

LU Factorization do j = 1,N p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i; do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp; do i = j+1,N A(i,j) = A(i,j)/A(j,j); do k = j+1,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k); Select pivot row: Swap pivot row with current: Scale column (to store L): Update(to compute partial U): • Problem: Poor cache behavior x x x x x x 0 x x x x x 0 0 5 x x x 0 0 3 x x x 0 0 7x x x 0 0 2 x x x

Automatic Blocking do jB = 1,N,B do j = jB,jB+B-1 p(j) = j; do i = j+1,N if (A(i,j)>A(p(j),j)) p(j) = i; do k = 1,N tmp = A(j,k); A(j,k) = A(p(j),k); A(p(j),k) = tmp; do i = j+1,N A(i,j) = A(i,j)/A(j,j); do k = j+1,jB+B-1 do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k); do j = jB,jB+B-1 do k = jB+B,N do i = j+1,N A(i,k) = A(i,k) - A(i,j)*A(j,k); • Compiler transformations: • Stripmine • Index-Set Split • Loop Distribute • Tile/Map to BLAS • Problems: • Establishing legality • Mapping to BLAS

LU Performance

Key Points • Conventional techniques (dependence analysis) are insufficient to establish legality • Detection of matrix operations buys extra performance

Overview of this Talk • Fractal Symbolic Analysis • Framework for Symbolically determining Legality of Program Transformations • Matrixization • Generalization of Vectorization to Matrix Operations

Fractal Symbolic Analysis • Symbolic test to establish legality of program transformations for i = 1 : n S1(i); for i = 1 : n S2(i); for i = 1 : n S1(i); S2(i);

Dependence Analysis • Independent operations may be reordered • Legality: No data dependence from S2(m) to S1(l) where l > m for i = 1 : n S1(i); for i = 1 : n S2(i); for i = 1 : n S1(i); S2(i);

Symbolic Comparison • Dependence Analysis is conservative: • Symbolic execution shows equality: • aout = 2*(ain + bin) • bout = 2*bin • But, intractable for recurrent loops s1: a = 2*a s2: b = 2*b s3: a = a+b s3: a = a+b s1: a = 2*a s2: b = 2*b ?

Distillation of LU • Given p(j)  j, prove: • Dependence analysis: too conservative • Symbolic comparison: intractable for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for j = 1:n for i = j+1:n a(i) = a(i)/a(j) for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for i = j+1:n a(i) = a(i)/a(j) B1(j): B1(j): B2(j): B2(j):

Idea • Simplify • Prove equality of complex programs via equality of simpler programs • Repeat if necessary • Sufficient, but not necessary • equality of simpler programs -> equality of original programs

Commutativity S1, S2, S3, …,Sn = Sf(1), Sf(2), Sf(3), …,Sf(n) i,j  1..n | i < j  f(i) > f(j). Si; Sj = Sj; Si • Permutation  Sequence of Adjacent Transpositions

Commutative Legality • Loop Distribution • Legality:  1  m < l  N. S2(m); S1(l) = S1(l); S2(m) for i = 1 : n S1(i); S2(i); for i = 1 : n S1(i); for i = 1 : n S2(i);

Commutative Legality • Similar conditions: • Statement reordering • Loop interchange • Loop reversal • Loop tiling • Compiler establishes legality of transformation by testing commute condition

Back to Example • Given p(j)  j, prove: • Simplified comparison => for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for j = 1:n for i = j+1:n a(i) = a(i)/a(j) for j = 1:n tmp = a(j) a(j) = a(p(j)) a(p(j)) = tmp for i = j+1:n a(i) = a(i)/a(j) B1(j): B1(j): B2(j): B2(j):

Simplified Comparison • Given p(j)  j  l > m prove: • Further simplification? for i = m+1:n a(i) = a(i)/a(m) tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp for i = m+1:n a(i) = a(i)/a(m) B2(m): B1(l): B1(l): B2(m):

Too Simple! • Given p(j)  j  l > m  i > m prove: • Too conservative - no longer legal! • Hypothesis: • Repeated application -> dependence analysis tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp a(i) = a(i)/a(m) a(i) = a(i)/a(m) tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp

Symbolic Comparison • Can we symbolically prove?: • Effect can be summarized: • non-recurrent loops • affine indices/loop bounds tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp for i = m+1:n a(i) = a(i)/a(m) for i = m+1:n a(i) = a(i)/a(m) tmp = a(l) a(l) = a(p(l)) a(p(l)) = tmp B2(m): B1(l): B1(l): B2(m):

Guarded Expressions • Compare symbolic values of live output variable in terms of input variables: • aout (k)= • Each • guard : affine expression defining part of aout • expr : symbolic expression on input data guard1 -> expr1 guard2 -> expr2 …… guardn -> exprn

Guarded Expressions • For both program blocks: • aout(k) = • Omega Library (integer programming tool) to manipulate/compare affine guards k  m => ain(k) k = l => ain(p(l))/ain(m) k = p(l) => ain(l)/ain(m) else => ain(k)/ain(m)

Summary of Fractal Symbolic Analysis • Powerful legality test • Explores tradeoffs between • tractability (dependence analysis) • accuracy (symbolic comparison) • Similar application for LU w/ pivoting • 2 recursive simplification steps • 6 guarded regions/expressions • Prototype implemented in OCAML

Overview of this Talk • Fractal Symbolic Analysis • Framework for Symbolically determining Legality of Program Transformations • Matrixization • Generalization of Vectorization to Matrix Operations

Matrixization • Detect Matrix Operations • Map to • hand-optimized libraries (BLAS) • special-purpose hardware • Eliminate loop overhead • MATLAB: type/bounds checks • Exploit Matrix Semantics • Ring Properties of Matrix (+,*)

BLAS Performance

MATLAB Performance

Galerkin Example • In Galerkin, 98% of time spent in: • for i = 1:N • for j = 1:N • phi(k) += A(i,j)*x(i)*y(i); • end • end • A - Matrix • x, y - Vector

Vectorized Code • In Optimized Galerkin: phi(k) += x*A*y’; • Fragment Speedup: 260 • Program Speedup: 110

Conventional Approaches • Syntactic Pattern Replacement (KAPF,VAST,FALCON) • Can encode high-level properties • Limited use on loops • Vector Code Generation • Can detect array operations in loops • Cannot detect/exploit matrix products

Our Approach • Map code to: Abstract Matrix Form (AMF) • Convert to Symmetric AMF formulation • Optimize AMF expressions • factorization • invariant removal • Detect Matrix Products • Map AMF to MATLAB/BLAS

Expansion • Array expressions: • Forall loops: a(1:m,1:n) -> i j ai,j for i = 1:n x(i) = x(i) + y(i); -> ixi = i(xi+yi) end

Reduction • Sum Reduction Loops for i = 1:n k = k + x(i); -> k = k + î i xi end

Product • Matrix Product C = A*B -> ijCi,j= P(ijAi,h,ijBh,j) • Product-Reduction Equivalence: • e1 and e2 are scalar • e1 is constant w.r.t. ik+1…in • e2 is constant w.r.t. i1…ik-1 îk (i1…ine1 * i1…ine2) = Pîk(i1…ike1, ik…ine2)

Other AMF Properties • Distributive Properties • e1·î e2 = î((ie1) ·e2) • when e1 is constant w.r.t. i • Interchange Properties • if(e1, e2,…, en) = f(ie1, ie2,…, ien) • i  e = i e • where î= • i j e = j i e • î  e = î e

Back to Galerkin • Original: • k = k + îi j (ai,j * xi * yj) • Convert to Symmetric Form: • k = k + î(i j ai,j * i j xi * i j yj) • Optimize: • k = k + îi xi * (i j ai,j * i j yj) • Map to Matrix Operations: • k = k + Pî (i xi , P(i j ai,j,j yj)) • => k = k + x’*A*y;

Other Vectorization Examples • Taken from USENET: for i = 1:n for j = 1:n C(i,j) = A(i,j)*x(i); C(1:n,1:n) = A(1:n,1:n) .* repmat(x(1,n),1,n) for i = 1:n x(i) = y(:,i)*A*y(:,i)’; x(1:n) = sum(y.*(y*A’),2);

Summary/Status of Matrixization • General technique to detect matrix/vector operations • Implemented in MAJIC • MATLAB interpreter/compiler • Future work: • Extend to more ops: recurrences • JIT Matrixization

Related Work • Commutativity Analysis • Rinard & Diniz • Parallelization of OO programs • High-Level Pattern Replacement • DeRose, Marsolf, … • Exploit high-level matrix properties • Structure • Symmetry • Orthogonality

Conclusion • High-level symbolic techniques: • Fractal Symbolic Analysis • Matrixization • New techniques to analyze & exploit underlying computation and achieve substantial performance gains

Symbolic Program Transformation for Numerical Codes

Symbolic Program Transformation for Numerical Codes

Presentation Transcript

Symbolic Program Transformation for Numerical Codes

Compilation by Program Transformation

Symbolic

Symbolic Tools for Program Proving

Security-Oriented Program Transformation

Symbolic Execution and Program Testing

Symbolic Codes

Program Transformation for Asynchronous Query Submission

Alpha numerical program input

Symbolic

A Hybrid Symbolic-Numerical Method for Determining Model Structure

symbolic

Cache Locality for Non-numerical Codes

Codes in reflectometry Numerical Schemes and limitations

A PERSONAL TRANSFORMATION PROGRAM

SIYAKHA TRANSFORMATION PROGRAM

Program Analysis and Transformation

Body Transformation Program

Program Transformation

SIYAKHA TRANSFORMATION PROGRAM

Program Analysis and Transformation

DIGITAL TRANSFORMATION TRAINING PROGRAM