SPARSITY: Optimized Sparse Matrix Kernels for Data Mining

Optimization of Sparse Matrix Kernels for Data Mining Eun-Jin Im and Katherine Yelick U.C.Berkeley

Outline • SPARSITY : Performance optimization of Sparse Matrix Vector Operations • Sparse Matrices in Data Mining Applications • Performance Improvements by SPARSITY for Data Mining Matrices Im and Yelick

The Need for optimized sparse matrix codes • Sparse matrix is represented with indirect data structures. • Sparse matrix routines are slower than dense matrix counterparts. • The performance is dependent on the distribution of nonzero element of the sparse matrix. Im and Yelick

The Solution : SPARSITY System • System that provides optimized C codes for sparse matrix vector operations • http://www.cs.berkeley.edu/~ejim/sparsity • Related Work : ATLAS, PHiPAC for dense matrix routines and FFTW Im and Yelick

SPARSITY optimizations (1) Register Blocking 2x2 register blocked matrix • Identify a small dense blocks of nonzeros. • Use an optimized multiplication code for the particular block size. 3 2 1 2 0 1 4 2 2 5 1 0 0 3 1 0 3 1 2 0 5 0 3 7 0 1 1 4 • Improves register reuse, lowers indexing overhead. • Challenge : choosing a block size Im and Yelick

SPARSITY optimizations (2) Cache Blocking • Keeping part of source vector in cache Source vector (x) = Destination Vector (y) Sparse matrix(A) • Improves cache reuse of source vector. • Challenge : choosing a block size Im and Yelick

SPARSITY optimizations (3) Multiple Vectors • Better potential for reuse • Loop unrolled codes multiplying across vectors are generated by a code generator. x j1 y a i2 y ij i1 • Allows reuse of matrix elements. • Choosing the number of vectors for loop unrolling. Im and Yelick

SPARSITY : automatic performance tuning SPARSITY is a system for automatic performance engineering. • Parameterized code generation • Search combined with performance modeling selects : • Register block size • Cache block size • Number of vectors for loop unrolling Im and Yelick

Sparse Matrices from Data Mining App. Im and Yelick

Data Mining Algorithms • For Text Retrieval Term-by-document matrix • Latent Semantic Indexing [Berry et. Al.] • Computation of Singular Value Decomposition • Blocked SVD uses multiple vectors • Concept Decomposition [Dhillon and Modha] • Matrix approximation solving least-squares problem • Also uses multiple vectors Im and Yelick

Data Mining Algorithms • For Image Retrieval • Eigenface Approximation [Li] • Used for face recognition • Pixel-by-image matrix • Each image has multi-resolution hierarchy and is compressed with wavelet transformation. Im and Yelick

Platforms Used in Performance Measurements Im and Yelick

Performance on Web Document Data Im and Yelick

Performance on NSF Abstract Data Im and Yelick

Performance on Face Image Data Im and Yelick

Speedup Im and Yelick

Performance Summary • Performance is better when a matrix is denser. (Face Images) • Cache blocking is effective for a matrix with a large number of columns. (Web Documents) • Optimization of the multiplication with multiple vectors is effective. Im and Yelick

Cache Block Size for Web Document Matrix • Width of cache block is limited by the size of cache. • For multiple vectors, the loop unrolling factor is 10 except for Alpha 21164 where the factor is 3. Im and Yelick

Conclusion • Most of the matrices used in data mining is sparse matrix. • The sparse matrix operation is memory-inefficient and needs optimization. • The optimization is dependent on the nonzero structure of the matrix. • SPARSITY system effectively speeds up this operation. Im and Yelick

For Contribution • Contact ejim@cs.berkeley.edu to donate your matrix ! Thank you. Im and Yelick

SPARSITY: Optimized Sparse Matrix Kernels for Data Mining

SPARSITY: Optimized Sparse Matrix Kernels for Data Mining

Presentation Transcript

Sparse Matrix Methods

Sparse Kernels Methods

Automatic Performance Tuning of Sparse Matrix Kernels

Automatic Performance Tuning of Sparse Matrix Kernels

Autotuning Sparse Matrix and Structured Grid Kernels

Data Mining for Query Optimization

Sparse Matrix Computations

Data Mining for Query Optimization

Automatic Performance Tuning of Sparse Matrix Kernels: Recent Progress

Auto-tuning Sparse Matrix Kernels

Sparse Matrix Methods

Autotuning sparse matrix kernels

Autotuning sparse matrix kernels

Data Mining for Query Optimization

Automatic Performance Tuning Sparse Matrix Kernels

Automatic Performance Tuning of Sparse Matrix Kernels

Autotuning Sparse Matrix and Structured Grid Kernels

Sparse matrix data structure

Sparse Matrix Methods

Sparse matrix

Automatic Performance Tuning of Sparse Matrix Kernels: Recent Progress