350 likes | 464 Vues
This paper by Ernie Chan discusses the evolution of linear algebra libraries towards enhanced performance through parallelism. It covers the inversion of triangular matrices and introduces the use of high-level abstractions in the Formal Linear Algebra Methods Environment (FLAME). The paper emphasizes the static generation of directed acyclic graphs (DAGs) to optimize algorithms while maintaining semantic correctness. Key algorithms such as triangular inversion (Trinv) and efficient LAPACK-style implementations are analyzed, highlighting their practicality in modern computing environments.
E N D
Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan HIPS 2010
Motivation Statically HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Inversion of a Triangular Matrix • Formal Linear Algebra Methods Environment (FLAME) • High-level abstractions for expressing linear algebra algorithms • Triangular Inversion (Trinv) R := U-1 HIPS 2010
Inversion of a Triangular Matrix HIPS 2010
Inversion of a Triangular Matrix • LAPACK-style Implementation DO J = 1, N, NB JB = MIN( NB, N-J+1 ) CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ JB, N-J-JB+1, -ONE, A( J, J ), LDA, $ A( J, J+JB ), LDA ) CALL DGEMM( ‘No transpose’, ‘No transpose’, $ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA, $ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA ) CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ J-1, JB, ONE, A( J, J ), LDA, $ A( 1, J ), LDA ) CALL DTRTI2( ‘Upper’, ‘Non-unit’, $ JB, A( J, J ), LDA, INFO ) ENDDO HIPS 2010
Inversion of a Triangular Matrix • FLASH • Matrix of matrices HIPS 2010
Inversion of a Triangular Matrix FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ) { FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ******** */ /* **************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, 1, 1, FLA_BR ); /*-------------------------------------------------------*/ FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_MINUS_ONE, A11, A12 ); FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE, FLA_ONE, A01, A12, FLA_ONE, A02 ); FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A01 ); FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 ); /*-------------------------------------------------------*/ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ********** */ /* ************* */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); } HIPS 2010
Inversion of a Triangular Matrix • Extensible Markup Language (XML) <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <Update> <Statement name="FLA_Trsm"> <Option type="side">FLA_LEFT</Option> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter>FLA_MINUS_ONE</Parameter> <Parameter partition="11">A<Parameter> <Parameter partition="12">A<Parameter> <Statement name="FLA_Gemm"> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Parameter>FLA_ONE<Parameter> HIPS 2010
Inversion of a Triangular Matrix • Extensible Markup Language (XML) Cont. <Parameter partition="01">A</Parameter> <Parameter partition="12">A</Parameter> <Parameter>FLA_ONE</Parameter> <Parameter partition="02">A</Parameter> </Statement> <Statement name="FLA_Trsm"> <Option type="side">FLA_RIGHT</Option> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter>FLA_ONE</Parameter> <Parameter partition="11">A</Parameter> <Parameter partition="01">A</Parameter> </Statement> <Statement name="FLA_Trinv"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter partition="11">A</Parameter> </Statement> </Update> </Loop> </Function> HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Requisite Semantic Information • Partitioning Scheme <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010
Requisite Semantic Information • Problem Size* <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010
Requisite Semantic Information • Updates <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010
Requisite Semantic Information • Input and Output Parameters <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trsm"> <Declaration> <Operand type=“scalar“ inout=“in">alpha</Operand> <Operand type="matrix“ inout=“in">A</Operand> <Operand type="matrix“ inout=“both“>B</Operand> </Declaration> </Function> <Function name="FLA_Gemm"> <Declaration> <Operand type=“scalar“ inout=“in">alpha</Operand> <Operand type="matrix“ inout=“in">A</Operand> <Operand type="matrix“ inout=“in">B</Operand> <Operand type=“scalar“ inout=“in">beta</Operand> <Operand type="matrix“ inout="both">C</Operand> </Declaration> </Function> <Function name="FLA_Trinv"> <Declaration> <Operand type="matrix“ inout="both">A</Operand> </Declaration> </Function> HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Static Generation of a DAG • Code Generation • Convert XML representation to FLASH code generation intermediary • Annotated with input and output information • Create directed acyclic graph (DAG) by statically unrolling the loop • Operations on submatrix blocks (tasks) are vertices • Data dependencies between tasks are edges HIPS 2010
Static Generation of a DAG • Data Dependencies • Flow (read-after-write) S1: A = B + C; S2: D = A + E; • Anti (write-after-read) S3: F = A + G; S4: A = H + I; • Output (write-after-write) S5: A = J + K; S6: A = L + M; HIPS 2010
Static Generation of a DAG HIPS 2010
Static Generation of a DAG • Problem Size • Problem size cannot be determined a priori • Fix the block size or loop unrolling factor • Balance between instruction footprint and data granularity of tasks • Example • Trinv on 3x3 matrix of blocks HIPS 2010
Static Generation of a DAG • Trinv • Iteration 1 Trsm0 Trsm1 Trinv2 HIPS 2010
Static Generation of a DAG • Trinv • Iteration 2 Trsm5 Gemm4 Trinv6 Trsm3 HIPS 2010
Static Generation of a DAG • Trinv • Iteration 3 Trsm7 Trsm8 Trinv9 HIPS 2010
Static Generation of a DAG Trsm0 Trsm1 Trinv2 Trsm3 Gemm4 Trsm5 Trinv6 Trsm7 Trsm8 Trinv9 HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Performance • LabVIEW • Graphical, data flow programming language (G) • Anti-dependencies cannot exist in G • Copies are made when wire is split HIPS 2010
Performance HIPS 2010
Performance • Target Architecture • 16-core AMD processor • 4 socket quad-core Opteron • 1.9 GHz • 4 GB of RAM per socket • LabVIEW 8.6 • Windows XP • Basic Linear Algebra Subprograms (BLAS) • MKL 7.2 HIPS 2010
Performance HIPS 2010
Performance • Results • Parallelism • Exploit parallelism inherent within DAG • Hierarchical matrix storage • Spatial locality • Overhead • Copy matrix from flat row-major storage to hierarchical matrix and back HIPS 2010
Performance HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Conclusion • Instantiate linear algebra algorithm using a code generation intermediary • Statically produce a directed acyclic graph by fixing block size or loop unrolling factor XML → FLASH → DAG HIPS 2010
Acknowledgments • Jim Nagle, Robert van de Geijn • We thank the other members of FLAME team for their support • Funding • National Instruments • NSF Grants • CCF—0540926 • CCF—0702714 HIPS 2010
Conclusion • More Information http://www.cs.utexas.edu/~flame • Questions? echan@cs.utexas.edu HIPS 2010